Overview
Job Overview Are you a customer-obsessed, engineering-minded program leader who thrives in high-stakes, regulated environments? Do you want to build a new function from the ground up, one that prevents customer outages before they happen and transforms how Microsoft supports its most sensitive cloud customers? Join Advanced Cloud Engineering & Supportability (ACES), a global Azure engineering support organization within Azure Engineering Operations (EngOps). ACES delivers engineering-led, world-class support across Azure's Government and Sovereign cloud portfolio, including US Government (Fairfax), and National Partner Clouds in France (Bleu), Germany (Delos), and Singapore (Merlion). We are building a new Gov Customer Resiliency function within ACES that brings proactive reliability engineering in-house for Government customers. This is not reactive support, this is about changing the probability, blast radius, and recovery time of customer outages through engineering-led detection, readiness, and prevention. The Role We are hiring a Principal Customer Experience Program Manager to lead two interconnected workstreams under ACES Sovereign & Government: 1. Gov Customer Resiliency (60%) You will build and operate a new Gov Customer Resiliency function, standing it up from scratch, starting with a named high-profile Government customer and scaling to a portfolio of 3-5 top Gov/Azure Engineering Direct customers. This function brings proactive resiliency capabilities in-house for Government customers under Sovereign & Government business. You will own the full resiliency lifecycle: proactive detection and monitoring, incident and crisis management coordination, post-incident RCA and problem management, architecture and DR guidance, and parity closure between Government and Commercial cloud environments. This is a build + run role. In the first 90 days, you will shadow the current engagement lead on the named account, codify the operating model into a repeatable playbook, define success metrics, and take ownership of the customer relationship. From month 4 onward, you will run the function, scale it to additional customers, and extend the playbook to Sovereign clouds. 2. Sovereign Cloud Operations & Readiness (40%) You will drive support readiness, operational maturity, and customer experience strategy across Microsoft's Sovereign Cloud portfolio (Bleu, Delos, Merlion). This includes readiness frameworks for new Sovereign cloud launches, escalation flow design, CRI playbooks, Sev handling standards, cross-cloud staffing models, and compliance-aligned operational processes and playbooks. You will partner closely with Sovereign delivery leadership, Azure engineering, and regional National Cloud Operating Entity (NCOE) partners to ensure Sovereign clouds are support-ready, compliant, and capable of delivering exceptional customer outcomes from Day 1. Why This Role Matters This role sits at the intersection of two of ACES' most strategic investments:
- Gov Customer Resiliency brings proactive reliability engineering in-house for Government customers, moving Anusha's org from reactive support to engineered prevention. Instead of depending on another team for Gov customer resiliency, ACES owns the end-to-end customer experience.
- Sovereign Cloud Readiness ensures Microsoft's most compliance-sensitive cloud environments are support-ready from Day 1, protecting customer trust in markets where trust is the product.
The person in this role will build a new function, run it customer-facing, and scale it across the most critical cloud environments Microsoft operates. This is a rare opportunity to define how Microsoft supports its highest-trust customers, and to shape a practice that will become a model for the broader organization. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of psychological safety where everyone can thrive at work.
Responsibilities
Responsibilities
- Gov Customer Resiliency (60%)
- Stand up a new proactive resiliency function for Government cloud customers, define charter, build playbooks, establish operating cadences, and own the end-to-end engagement model
- Own the full resiliency lifecycle: proactive detection and monitoring, incident and crisis coordination, post-incident root cause analysis, and architecture/DR guidance
- Drive Gov-vs-Commercial parity closure across monitoring, tooling, incident response, and remediation maturity
- Drive resiliency and reliability workshops and customer conversations including Field enablement teams to drive customer value.
- Scale the resiliency model from a single anchor customer to a portfolio of 3-5 top Government customers using a repeatable, metrics-driven playbook
- Develop and deliver internal enablement content such as training materials, case studies, and learning sessions, to embed resiliency practices across Gov Support delivery teams and scale knowledge beyond the immediate function
- Define and report on success metrics including mean time to detect, time to engage, incident recurrence, proactive detection rates, and customer confidence
- Leverage telemetry, monitoring data, and trend analysis to proactively identify and address emerging risks before they become customer-reported incidents
- Partner with reliability engineering, product teams, and delivery leadership to ensure resiliency insights feed into upstream engineering actions, product improvements, & prevention strategies
- Sovereign Cloud Operations & Readiness (40%)
- Drive end-to-end support readiness (people, process, technology) for Microsoft's Sovereign Cloud portfolio across multiple regions and future launches
- Design escalation pathways, incident handling standards, and compliance-aligned operational processes for Sovereign environments
- Own readiness frameworks for new Sovereign cloud launches, influence design decisions upstream to prevent customer impact
- Lead operational reporting and insights; translate data into risk assessments and executive-ready recommendations
- Represent Sovereign and Government customer needs in cross-org forums, influencing priorities and investments to strengthen long-term customer trust
- Cross-Cutting
- Leverage AI, automation, and data-driven insights to proactively identify gaps, reduce risk, and improve customer experience at scale
- Extend the Gov Resiliency playbook to Sovereign clouds as they mature, build a unified approach across regulated environments
- Drive alignment across geographically distributed teams and operating partners spanning multiple countries and time zones
- Other:
- Embody our culture and values.
Qualifications
Required / Minimum Qualifications:
- Bachelor's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 6+ years' experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
- Other Requirements:
- US Citizenship & Citizenship Verification:This position requires verification of citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or localgovernment agency customers and is subject to certain citizenship-based restrictions whererequiredorpermittedby applicable law. To meet this legal requirement, and as a condition of employment, the successful candidate's US citizenship will be verified with a valid passport.
- Microsoft Cloud Background Check: Required upon hire/transfer and every two years thereafter.
Preferred / Additional Qualifications:
- Master's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 8+ years' experience in engineering, product/technical program management, data analysis, or product development OR Bachelor's Degree in Computer Science, Engineering, Data Science, Math, Business, or related field AND 12+ years' experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
- Experience in CRE, SRE, ACE, or operational reliability roles within a cloud hyperscaler environment.
- Hands-on experience with resiliency tooling, platform monitoring and similar detection and incident management systems.
- Deep knowledge of Sovereign compliance, data residency, and geo-centric architecture models (e.g., EU Data Boundary, government cloud isolation requirements).
- Track record of executive-level customer engagement, ability to lead confidence calls, MBRs, and exec-level progress reviews with enterprise customers.
- Demonstrated experience in customer-facing resiliency, reliability engineering, or incident management roles, including proactive detection, crisis coordination, or post-incident program management. Customer and Field facing experience driving deep technical and architecture conversations including resiliency workshops.
- Experience working with government agencies, sovereign entities, or regulated industries, with strong understanding of their missions, operating models, compliance requirements, and IT environments.
- Strong understanding of Azure services and cloud technologies, including monitoring, diagnostics, incident response tooling, and infrastructure architecture.
- Proven ability to build new functions or programs from scratch, defining charter, playbooks, metrics, operating cadences, and scaling across customers.
- Exceptional cross-org stakeholder management skills, ability to drive alignment across engineering, support delivery, product teams, and customer-facing partners without direct authority.
- Experience working effectively across multiple geographies, cultures, and organizational boundaries.
Location & Working Model
- Based in the United States. Atlanta, GA strongly preferred (proximity to Gov customers and CRE team members).
- East Coast candidates preferred for European timezone overlap (Sovereign clouds in France, Germany, Singapore).
- West Coast candidates must support early-morning collaboration with Europe.
- Travel:
- In-office at least three days per week (subject to local policy).
#EngOps, #EngOpsACES Customer Experience Program Mgmt IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled. Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
|