
GPU Accelerator Returns Debug Engineer
- Markham, ON
- Permanent
- Full-time
- Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures for continuous yield & quality improvements, and customer quality support within expected timelines.
- Execute DOE's that run targeted tests to reproduce and isolate hard to find failures.
- Develop Automation and tools to run tests and analyze results/logs.
- Perform thorough incoming visual inspection and document condition of all units submitted for analysis.
- Perform initial triage and communicate with the contract manufacturer and/or internal AMD teams (such as Design, BIOS, firmware, memory, I/O, display, diagnostics, Test Engineering, Board operations, etc.) as needed to converge on failure reproduction efforts and root cause identification.
- Document all findings into FA database and create a complete failure analysis report for customer consumption as needed.
- Implement ongoing continuous improvements of failure analysis process & techniques and create procedures of the steps to follow.
- Oversee the set-up of new products and test stations for Failure Analysis operations.
- Expertise in GPU architecture, including debug, validation, and stress/functional test development.
- Skilled in using lab equipment (oscilloscopes, logic analyzers) and custom test tools.
- Strong understanding of PCBA diagnostics, failure analysis, and debug techniques.
- Experience with BIOS/firmware configuration and knowledge of firmware-driver-hardware interactions.
- Proficient in Python, shell scripting, and working across Windows and Linux environments.
- Familiarity with PCBA manufacturing processes and IPC-A-610 quality standards.
- Hands-on experience assembling, installing, and maintaining computer systems and servers.
- Able to read schematics, interpret datasheets, identify components, and perform soldering/rework.
- Knowledge of high-speed digital design, memory interfaces (HBM, GDDR), PCIe, and display outputs (DP, HDMI).
- Strong documentation skills and proficiency in MS Excel.
- Experience with GPU data center infrastructure and AI/ML technologies is a plus.