Build and operate core infrastructure services that power Fabric Data Engineering on Spark Improve scalability, resiliency, and observability across Spark-based services * Design and develop world-class experiences for a new big data cloud offering, with an emphasis on scale, reliability, and performance. Build and evolve core infrastructure services that power data engineering and analytics workloads (compute, runtime services, job/session management, configuration, and platform integrations). Drive technical design and implementation end-to-end: translate requirements and documentation into robust production code. Troubleshoot and improve systems using source code analysis and production instrumentation (logs, metrics, traces), and turn operational learnings into engineering improvements. Improve platform scalability, resiliency, and observability, including automation to reduce operational toil. Partner closely with product and engineering teams to deliver end-to-end features and continuously raise the quality bar. Software engineering fundamentals and experience shipping production services Comfort working with distributed systems and performance-critical code Experience with Spark and/or big data systems is a big plus (but not required if you're eager to learn) Collaborative mindset and ownership mentality-someone who likes building, iterating, and improving systems over time Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Experience designing and operating large-scale infrastructure for data platforms or compute services (e.g., job orchestration, runtime services, cluster/resource management, multi-tenant systems). Experience with observability and operational excellence (SLOs/SLIs, alerting, incident response, postmortems). Performance and reliability engineering experience (profiling, optimization, capacity planning, cost/performance tradeoffs). Familiarity with modern cloud-native patterns (service ownership, CI/CD, safe deployments, automation). Software engineering fundamentals (data structures, algorithms, testing, debugging, performance). Experience building and shipping production infrastructure (backend services, distributed systems, or platform components) in a cloud environment. Solid understanding of distributed systems concepts: fault tolerance, scaling, scheduling, and resource management. Proficiency in one or more backend/system languages (e.g., Java, Scala, C#, C++, or Python). Quick learner with a growth mindset-able to ramp up rapidly in new domains, tools, and codebases. Ability to thrive in an AI-powered engineering environment: comfortable adopting AI-assisted workflows (e.g., copilots/agents), iterating quickly, and continuously improving productivity and quality.