
Director of Cloud Operations & Reliability Engineering
- Mississauga, ON
- Permanent
- Full-time
- Define and execute a forward-looking cloud operations and reliability engineering strategy aligned with enterprise goals and digital transformation initiatives.
- Experience with P&L responsibility, technology budget management, and ROI optimization for enterprise platforms
- Manage operational budgets, ensuring alignment with financial goals and constraints.
- Develop and implement operational strategic roadmaps to align business goals with execution plans.
- Create long-term operational strategies and translate them into actionable roadmaps.
- Lead the development of strategic roadmaps to drive operational efficiency and growth.
- Oversee daily operations, including monitoring, incident response, change management, and capacity planning.
- Establish and track SLAs, KPIs, and OKRs to ensure service availability and reliability.
- Manage the relationship with our managed service providers and cloud service providers
- Direct major incident response efforts and lead root cause analysis to prevent recurrence.
- Lead Root Cause Analysis and facilitate comprehensive investigations into critical incidents to identify underlying causes, implement corrective actions, and drive continuous improvement across cloud operations. Ensure timely documentation, stakeholder communication, and follow-through on long-term remediation strategies.
- Enforce cloud governance policies and ensure compliance with regulatory standards such as HIPAA and SOC 2.
- Champion security best practices across all cloud environments.
- Promote the adoption of Infrastructure-as-Code (IaC), CI/CD pipelines, and automation frameworks to streamline operations and reduce manual overhead.
- Inspire, mentor, and manage a global team of cloud operation engineers, SREs, and DevOps professionals.
- Cultivate a culture of collaboration, accountability, and continuous improvement.
- Monitor cloud expenditures and implement cost-effective strategies without compromising performance or security.
- Stay abreast of emerging cloud technologies and trends, and proactively recommend enhancements to improve service delivery and operational agility.
- Bachelor’s or Master’s degree in Computer Science, Information Systems, or equivalent experience.
- Typically a minimum of 15+ years in IT operations, including 6+ years in cloud operations leadership roles.
- Demonstrated experience leading large-scale cloud migrations and operational transitions.
- Deep knowledge of public cloud platforms (e.g., AWS, Azure, GCP), cloud-native services, and hybrid architectures.
- Strong understanding of Site Reliability Engineering (SRE) principles, including service level objectives (SLOs), service level indicators (SLIs), and error budgets.
- Strong understanding of cloud operations, security, compliance frameworks, and risk management practices.
- Proficiency in ServiceNow reporting and dashboard tools for operational insights and performance monitoring.
- Demonstrated expertise in Kubernetes and containerization technologies, with a strong track record of designing, deploying, and managing scalable containerized applications.
- Experience in business process analysis and systems design in collaboration with cloud engineering and cybersecurity teams.
- Proven ability to drive cross-functional collaboration and deliver exceptional customer outcomes.
- Excellent communication skills with the ability to engage effectively with both technical teams and senior executives.