
GPU Accelerator Returns Debug Engineer
- Markham, ON
- Permanent
- Full-time
- Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures for continuous yield & quality improvements, and customer quality support within expected timelines.
- Develop and execute DOE's that run targeted tests to reproduce and isolate hard to find failures.
- Develop Automation and tools to run tests and analyze results/logs.
- Perform triage and communicate with the contract manufacturer and/or internal AMD teams (such as Design, BIOS, firmware, memory, I/O, display, diagnostics, Test Engineering, Board operations, etc.) as needed to converge on failure reproduction efforts and root cause identification.
- Document all findings into FA database and create a complete failure analysis report for customer consumption as needed.
- Present findings to key stakeholders, including senior management.
- Implement ongoing continuous improvements of failure analysis process & techniques and create procedures of the steps to follow.
- Oversee the set-up of new products and test stations for Failure Analysis operations.
- Deep expertise in GPU architecture, including debug, validation, and stress/functional test development.
- Skilled in using lab equipment (oscilloscopes, logic analyzers, custom test tools) for hardware validation.
- Strong background in PCBA diagnostics, failure analysis, and debug techniques, from NPI through production.
- Proficient in Python, shell scripting, and working across Windows and Linux environments.
- Solid understanding of firmware, drivers, and hardware interactions, with the ability to tune firmware as needed.
- Extensive experience in hardware verification and system integration.
- Familiarity with PCBA manufacturing processes and IPC-A-610 quality standards.
- Hands-on experience assembling, installing, and configuring computer systems and servers.
- Strong leadership, communication, documentation, and presentation skills.
- Able to read schematics, interpret datasheets, identify components, and perform soldering/rework for debug.
- Proficient in MS Excel for data analysis and reporting.
- Knowledge of high-speed digital design, memory interfaces (HBM, GDDR), PCIe, and display outputs (DP, HDMI).
- Experience with GPU data center infrastructure and AI/ML technologies is a plus.