Senior Engineer - GPU Performance Optimization
Advanced Micro Devices View all jobs
- Toronto, ON Waterloo, ON
- Permanent
- Full-time
You are a performance‑driven engineer who thrives during new hardware bring‑up and ambiguous early‑silicon phases. You are comfortable working across abstraction layers—from kernel code and library APIs to framework‑visible performance—and you enjoy translating architectural characteristics into concrete software optimizations.You collaborate effectively across teams, communicate performance trade‑offs clearly, and are trusted to take ownership of critical performance paths during time‑sensitive product ramps. You grow influence through technical execution, deep expertise, and mentoring.KEY RESPONSIBILITIES:
- Performance Engineering for New Products: Lead performance optimization efforts for hipDNN, MIOpen, and CK on new AMD GPU architectures. Drive early performance characterization, gap analysis, and optimization plans during pre‑silicon and post‑silicon bring‑up.
- Kernel & Library Optimization: Implement and optimize performance‑critical kernels and operators used by hipDNN and MIOpen, leveraging other libraries where appropriate. Improve kernel selection, fusion strategies, and heuristics to maximize efficiency across diverse workloads.
- Cross‑ASIC Software Engineering: Adapt library implementations and tuning strategies across multiple ASICs, balancing portability with architecture‑specific optimization. Identify when shared abstractions are sufficient versus when targeted specialization is required.
- hipDNN Enablement & Transition: Contribute to hipDNN’s role as the primary execution and fusion layer, including plugin integration and performance validation. Support the transition of functionality from MIOpen into hipDNN while maintaining performance and compatibility.
- Framework‑Facing Performance: Work closely with framework teams (e.g., PyTorch, JAX, Triton) to ensure optimized library paths are exercised in real workloads. Validate performance improvements using representative training and inference models.
- Performance Validation & Regression Control: Define and execute performance benchmarks for new products. Help detect, diagnose, and resolve performance regressions across releases and architectures.
- Collaboration & Mentorship: Partner with Principal engineers, architecture teams, and kernel specialists to align optimization efforts. Share best practices in kernel tuning, performance analysis, and cross‑ASIC optimization with the broader organization.
- Leverages AI‑assisted software development tools to accelerate design, implementation, review, and documentation of complex software libraries. Establishes best practices for responsible use of AI assistance, including validation, review, and traceability of generated code and technical artifacts.
- Strong hands‑on experience with GPU performance engineering and kernel optimization. Practical experience with deep‑learning libraries.
- Experience supporting new hardware bring‑up or optimizing software across multiple GPU architectures. Solid understanding of deep‑learning operator patterns (GEMM, convolution, attention, normalization, fusion).
- Proficiency in C/C++, with Python used for tooling, benchmarking, and analysis.
- Experience using GPU profiling, tracing, and performance analysis tools. Familiarity with framework‑level integration and validation (e.g., PyTorch or JAX).
- Applied experience using AI‑assisted coding tools in professional software engineering workflows, including code generation, refactoring, test creation, documentation, and design exploration.
- Advanced degrees, such as M.Sc., M.Eng., Ph.D. are preferred