Principal Engineer - Intelligent Ops and Observability (HYBRID)
McCormick View all jobs
- Mississauga, ON
- Permanent
- Full-time
- Bachelor's degree required in Information Technology, Computer Science, Engineering, Information Systems, or a related technical discipline.
- Advanced degree in a related field preferred.
- Minimum 10-12 years of experience
- Relevant certifications in IT service management, cloud, reliability engineering, or operational disciplines are preferred, such as ITIL, SRE, or major platform certifications.
- Deep experience administering and optimizing tools related to application performance monitoring, systems monitoring, event aggregation, alerting, dashboards, and reporting
- Significant experience leading coordinated response during incidents, service degradations, and major operational events
- Experience working across internal support teams and third party service providers to drive accountability, process adherence, and timely service restoration
- Strong working knowledge of incident management, event management, service assurance, and service management practices
- Experience identifying operational trends and using data to drive improvements in resilience, service quality, and response effectiveness
- Experience operating within ITIL, SIAM, or similarly structured service management environments preferred
- Strong operational judgment and ability to remain effective and decisive in high pressure situations
- Strong analytical and problem solving skills, with the ability to interpret events, assess business impact, identify patterns, and recommend practical improvements
- Demonstrated ability to lead through influence and drive action across technical teams and service providers without direct authority
- Strong verbal and written communication skills, with the ability to provide clear direction during incidents and translate technical situations into business relevant terms
- Strong technical aptitude across observability, monitoring, event correlation, dashboarding, and operational reporting
- Ability to improve signal quality by reducing noise, refining thresholds, and increasing the value of alerts and event data
- Strong organizational skills, attention to detail, and ability to manage multiple priorities in a fast moving operational environment
- Ability to work collaboratively across infrastructure, cloud, application, service management, and supplier teams
- Ability to identify broader operational risks and recommend improvements that strengthen enterprise wide service resilience
- Comprehensive health plans covering medical, vision, dental, life and disability benefits
- Family-friendly benefits such as maternity leave top-up and Employee Assistance Program
- Retirement and investment programs including a matching DC pension plan, group RRSP, and Employee Stock Purchase Plan