New

Principal Machine Learning Engineer - CoreAI

Microsoft
United States, Washington, Redmond
Jul 13, 2025
OverviewThe Microsoft CoreAI Post-Training team is dedicated to advancing post-training methods for both OpenAI and open-source models. Their work encompasses continual pre-training, large-scale deep reinforcement learning running on extensive GPU resources, and significant efforts to curate and synthesize training data. In addition, the team employs various fine-tuning approaches to support both research and product development. The team also develops advanced AI technologies that integrate language and multi-modality for a range of Microsoft products. The team is particularly active in developing code-specific models, including those used in Github Copilot and Visual Studio Code, such as code completion model and the software engineering (SWE) agent models. The team has also produced publications as by-products, including work such as LoRA, DeBerTa, Oscar, Rho-1, Florence, and the open-source Phi models. We are looking for a Principal Machine Learning Engineer - CoreAI with significant experience in large-scale model deployment, production systems, and engineering excellence, ideally from leading technology companies. You will build and optimize production systems for LLMs, SLMs, multimodal, and coding models using both proprietary and open-source frameworks. Key responsibilities include ensuring model reliability, inference performance, and scalability in production environments, and managing the full engineering pipeline from model training, serving, monitoring, to continuous deployment. Our team values startup-style efficiency and practical problem-solving. We are seeking a curious, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact. Candidates must be self-driven, able to write production-grade code and debug complex distributed systems, document engineering decisions, and demonstrate a track record in shipping ML systems at scale. The ability to quickly translate ideas into working code for rapid experimentation would be a plus. You may include information about any individual who can serve as your referral in your application. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positivelyimpactour culture every day. ResponsibilitiesCore Qualifications & Responsibilities Design and implement large-scale model training - Especially with LLMs, SLMs, multimodal, or code-specific models. System optimization and performance - Optimize inference latency, throughput, and resource utilization for training and inference workloads. Hands-on coding- Ability to write efficient, production-quality code and debug complex distributed systems. Cross-platform integration - Work with both proprietary and open-source frameworks to build robust ML pipelines. Production model deployment - Design and implement scalable serving infrastructure for LLMs, SLMs, multimodal, and code-specific models. Research & Innovation Contribute to or build on existing innovations like technical report of the well-known models. Develop novel AI solutions that bridge language, vision, and code understanding. Help develop models powering tools like GitHub Copilot, Cursor, and VS Code suggestions. Other: Embody our Culture and Values