Lead Platform Reliability Engineer, Global AI Platform & Solutions

Toronto, ON
Permanent
Full-time

16 days ago

The Lead Platform Reliability Engineer (PRE) ensures the stability, performance, and scalability of the shared platform that supports internal AI solution development. It combines software engineering, SRE practices, and operations to keep the platform reliable and developer-friendly.Position Responsibilities:

Reliability and performance: Define SLOs/SLIs, track operations budgets, reduce MTTR, capacity plan, and tune autoscaling.
Observability: Build and maintain logging, metrics, tracing, and alerting; instrument platform components; create runbooks and dashboards.
Incident response: On-call for platform incidents; triage, mitigate, root-cause, and drive postmortems and corrective actions.
Automation and tooling: Develop self-service capabilities, AIOps/MLOps/GitOps/CICD pipelines, and operational automations (provisioning, upgrades, backups).
Infrastructure as code: Manage clusters, networks, storage, and policies via Terraform/Ansible; prevent configuration drift.
Security and compliance: Enforce identity/RBAC, secrets management, supply chain security, and regulatory controls; collaborate with risk and audit.
Scalability and cost: Optimize resource usage, plan capacity, control spend (rightsizing, autoscaling, reservations/spot).
Change management: Safe rollouts, progressive delivery, and policy-as-code guardrails.
Platform productization: Treat the platform as a product, define operations SLAs in alignment to product roadmap, service catalog, and developer experience.
Collaborate with global engineering, security, and AI governance teams to ensure compliance with cross-geo regulations and Asia’s data residency requirements.
Operate scalable backend services supporting high-traffic agent interactions, retrieval operations, and real-time execution flows.
Maintain AI services runbooks, playbooks, and enablement for GOCC

Required Qualifications:

Bachelor’s in Computer Science/Engineering or equivalent experience (not strictly required if skills demonstrated).
5-8 years experience in DevOps/Platform Engineering or Production Operations.
Proven track record operating large-scale distributed systems and running on-call.
Operational experience with cloud-native development: Azure, Kubernetes, containers, CI/CD, and observability stacks.
Knowledge with Python and/or Java/Scala/TypeScript for building backend services and automation.
Understanding of AI solution, LLM systems, retrieval architectures, embeddings, vector stores, prompt/tool orchestration, and agent workflow fundamentals.
Knowledge of API design, asynchronous workflows, concurrency, reliability engineering (SLOs, error budgets), and performance tuning.
Familiarity with security, governance, and compliance for AI/data systems (authN/authZ, data protection, audit logging, model governance).
Ability to collaborate across global teams and translate business requirements into platform capabilities and operational SLAs.

Preferred Qualifications:

ITIL & ITSM certification
Azure Administrator/DevOps certificate (nice to have)
Kubernetes: CKA/CKS certificate (nice to have)
HashiCorp Terraform Associate certificate (nice to have)

When you join our team:

We’ll empower you to learn and grow the career you want.
We’ll recognize and support you in a flexible environment where well-being and inclusion are more than just words.
As part of our global team, we’ll support you in shaping the future you want to see.

#LI-HybridThe role being advertised is an existing vacancy.About Manulife and John HancockManulife Financial Corporation is a leading international financial services provider, helping people make their decisions easier and lives better. To learn more about us, visit .Manulife is an Equal Opportunity EmployerAt Manulife/John Hancock, we embrace our diversity. We strive to attract, develop and retain a workforce that is as diverse as the customers we serve and to foster an inclusive work environment that embraces the strength of cultures and individuals. We are committed to fair recruitment, retention, advancement and compensation, and we administer all of our practices and programs without discrimination on the basis of race, ancestry, place of origin, colour, ethnic origin, citizenship, religion or religious beliefs, creed, sex (including pregnancy and pregnancy-related conditions), sexual orientation, genetic characteristics, veteran status, gender identity, gender expression, age, marital status, family status, disability, or any other ground protected by applicable law.It is our priority to remove barriers to provide equal access to employment. A Human Resources representative will work with applicants who request a reasonable accommodation during the application process. All information shared during the accommodation request process will be stored and used in a manner that is consistent with applicable laws and Manulife/John Hancock policies. To request a reasonable accommodation in the application process, contact .Referenced Salary Location Toronto, OntarioWorking Arrangement HybridSalary range is expected to be between $113,260.00 CAD - $210,340.00 CADEmployees also have the opportunity to participate in incentive programs and earn incentive compensation tied to business and individual performance. The actual salary will vary depending on local market conditions, geography and relevant job-related factors such as knowledge, skills, qualifications, experience, and education/training. If you are applying for this role outside of the primary location, please contact for the salary range for your location.Manulife offers eligible employees a wide array of customizable benefits, including health, dental, mental health, vision, short- and long-term disability, life and AD&D insurance coverage, adoption/surrogacy and wellness benefits, and employee/family assistance plans. We also offer eligible employees various retirement savings plans (including pension and a global share ownership plan with employer matching contributions) and financial education and counseling resources. Our generous paid time off program in Canada includes holidays, vacation, personal, and sick days, and we offer the full range of statutory leaves of absence. If you are applying for this role in the U.S., please contact for more information about U.S.-specific paid time off provisions.We use data and analytics technologies, such as artificial intelligence (AI), and automated processing tools, to analyze and process the information you provide to us or third parties in the application process. For more information, please refer to our .

Manulife

Apply Now