Software Development Engineer - SGLang and Inference Stack
Advanced Micro Devices View all jobs
- Vancouver, BC
- Permanent
- Full-time
- Optimize Deep Learning Frameworks: Enhance performance of frameworks like TensorFlow, PyTorch, and SGLang on AMD GPUs via upstream contributions in open-source repositories.
- Develop and Optimize Deep Learning Models: Profile, analyze, code change and tune large-scale training and inference models for optimal performance on AMD hardware. Day-0 supports to many SOTA models, DeepSeek 3.2, Kimi K2.5, etc.
- GPU Kernel Development: Design, implement, and optimize high-performance GPU kernels using HIP, Triton, TileLang or other DSLs for AI operator efficiency.
- Collaborate with GPU Library and Compiler Teams: Work closely with internal compiler and GPU math library teams to integrate, optimize and align kernel-level optimizations with full-stack performance goals. Initiate and help with different level codegen optimizations.
- Contribute to SGLang Development: Support optimization, feature development, and scaling of the SGLang framework across AMD GPU platforms for LLM, multimodal serving and RL-training.
- Distributed System Optimization: Tune and scale performance across both multi-GPU (scale-up) and multi-node (scale-out) environments, including inference parallelism, prefill-decode disaggregation, Wide-EP and collective communication strategies.
- Graph Compiler Integration: Integrate and optimize runtime execution through graph compilers such as XLA, TorchDynamo, or custom pipelines.
- Open-Source Collaboration: Partner with external maintainers to understand framework needs, propose optimizations, and upstream contributions effectively.
- Apply Engineering Best Practices: Leverage modern software engineering practices in debugging, profiling, test-driven development, and CI/CD integration.
- Strong Programming Skills: Proficient in C++ and/or Python (PyTorch, Triton, TileLang), with demonstrated ability to code, debug, profile, and optimize performance-critical code.
- SGLang and LLM Optimization: Hands-on experience with SGLang or similar LLM inference frameworks is highly preferred.
- Compiler and GPU Architecture Knowledge: Background in compiler design or familiarity with technologies like LLVM, MLIR, or ROCm is a plus.
- Heterogeneous System Workloads: Experience running and scaling workloads on large-scale, heterogeneous clusters (CPU + GPU) using distributed training or inference strategies.
- AI Framework Integration: Experience contributing to or integrating optimizations into deep learning frameworks such as PyTorch, SGLang, vLLM, Slime, VeRL
- GPGPU Computing: Working knowledge of HIP, CUDA, Triton, TileLang or other GPU programming models; experience with GCN/CDNA architecture preferred.
- Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or a related field.