Design and implement core components of the ML runtime framework for inference on embedded systems. Collaborate with compiler, hardware, and model teams to co-design efficient execution paths for AI workloads. Develop and maintain C/C++ code for runtime kernels and system-level integration. Develop tools to assist with performance profiling and debugging of quantized model accuracy Analyze and improve runtime behavior using profiling tools and hardware counters. Strong hands-on experience in performance optimization for embedded or low-power systems. Proficient in C/C++ programming, with a focus on system-level and runtime development. Solid understanding of embedded system design, including memory hierarchy and hardware-software interaction. Experience with Linux/Android development environments and toolchains. Familiarity with computer architecture, especially for AI accelerators or DSPs. Basic knowledge of machine learning concepts and model structures. Master's degree in Computer Science, Engineering, or related field. 2+ years of experience with ML frameworks (e.g., TensorFlow, PyTorch, ONNX). 2+ years of experience in embedded system development and optimization for ML inference. 2+ years of experience with C/C++ in performance-critical environments. Experience with low-level OS interactions (Linux, Android, QNX). Familiarity with quantization, graph optimization, and model deployment pipelines. Experience working in cross-functional teams and large matrixed organizations. Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 1+ year of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR PhD in Computer Science, Engineering, Information Systems, or related field.