Senior Platform Reliability Engineer - Global AI Platform

Toronto, ON
Permanent
Full-time

16 days ago

As a Senior Platform Reliability Engineer, you will be responsible for monitoring, analyzing and optimizing software architecture and maintaining software environment to best support testing and deployment in continuous integration/continuous delivery environment.Position Responsibilities:

Provides reliable and scalable platform experience to the Global AI Platform Users
Responsible for monitoring, analyzing, optimizing and maintaining software environment to best support testing and deployment in continuous integration/continuous delivery environment.
Develop self-service capabilities, AIOps/MLOps/GitOps/CI/CD pipelines, and operational automations (provisioning, upgrades, backups).
Manage clusters, networks, storage, and policies via Terraform/Ansible; prevent configuration drift.
Enforce identity/RBAC, secrets management, supply chain security, and regulatory controls; collaborate with risk and audit.
Optimize resource usage, plan capacity, control spending (rightsizing, autoscaling, reservations/spot).
Safe rollouts, progressive delivery, and policy-as-code guardrails.
Resolves persistent platform issues when surfaced by technical support teams
Provides performance enhancements through automation and pushes for enhanced reliability of platform to support product development
Delivers resilient and scalable applications, with a focus on continuous delivery and operational insight
Collaborates with platform and software engineers, platform reliability engineers, Product Owners, and engineering leadership to uncover pain points and opportunities to accelerate the delivery of new value through software
Investigates new platform solutions to enhance service delivery experience
Resolves persistent platform issues when surfaced by technical support teams
Delivers good user experience to other engineers, with a focus on self-service and continuous delivery
Addresses incidents and problems, with rotational accountability for on-call support

Required Qualifications:

Familiarity with agile and DevOps principles, test-driven development, continuous integration, and other approaches to accelerate the delivery of new features
Understanding of software development lifecycle
Understanding of how technology supports Manulife business strategy
Deep understanding of DevOps principles, prioritizes platform over products
Attends advanced training sessions and is certified on multiple domains of expertise
Demonstrates all core skills, and good interpersonal skills for the role
Good working and background knowledge of area of practice
Use and combine knowledge of the discipline and the market to formulate the right approach
Participates in functional demos utilizing new tech; designs own control structures
Sees actions partly in terms of longer-term goals
Understands the corporate climate & culture
Strong knowledge of the business
Experience with virtual infrastructure, CICD tools such as Jenkins, Github, TeamCity etc.
Experience in languages such as Python, Java, JavaScript, .NET, HTML5, CSS3, Swift and/or similar technologies
Understanding of systems monitoring tools and analytics (New Relic, MoogSoft, xMatter, etc.)
Experience with Cloud Foundry and other components supporting a highly-automated global engineering platform
Collaborative attitude, willingness to work with team members; able to coach, participate in code reviews, share skills and methods
Constantly learns from both success and failure
Experience with open-source technologies preferable
Good organizational and problem-solving abilities that enable you to manage through creative abrasion
Good verbal and written communication; able to effectively articulate technical vision, possibilities, and outcomes
Experiments with emerging technologies and understanding how they will impact what comes next.

Required Qualifications:

Bachelor’s in Computer Science/Engineering or equivalent experience (not strictly required if skills demonstrated).
5–8+ years in DevOps/Platform Engineering or Production Operations (8+ preferred for senior level).
Proficiency in Python and/or Java/Scala/TypeScript for backend services and automation.
Hands on experience with Azure, Kubernetes, containers, CI/CD, and observability stacks.
Strong understanding of LLM systems, retrieval architectures, embeddings, vector stores, prompt/tool orchestration, and agent workflow fundamentals.
Expertise in API design, asynchronous workflows, concurrency, reliability engineer concepts (SLOs, error budgets), and performance tuning.
Familiarity with security, governance, and compliance for AI/data systems (authN/authZ, data protection, audit logging, model governance).
Proven track record operating large scale distributed systems and running on call.
Ability to collaborate across global teams and translate business needs into platform capabilities and operational SLAs.

When you join our team:

We’ll empower you to learn and grow the career you want.
We’ll recognize and support you in a flexible environment where well-being and inclusion are more than just words.
As part of our global team, we’ll support you in shaping the future you want to see.

#LI-HybridThe role being advertised is an existing vacancy.About Manulife and John HancockManulife Financial Corporation is a leading international financial services provider, helping people make their decisions easier and lives better. To learn more about us, visit .Manulife is an Equal Opportunity EmployerAt Manulife/John Hancock, we embrace our diversity. We strive to attract, develop and retain a workforce that is as diverse as the customers we serve and to foster an inclusive work environment that embraces the strength of cultures and individuals. We are committed to fair recruitment, retention, advancement and compensation, and we administer all of our practices and programs without discrimination on the basis of race, ancestry, place of origin, colour, ethnic origin, citizenship, religion or religious beliefs, creed, sex (including pregnancy and pregnancy-related conditions), sexual orientation, genetic characteristics, veteran status, gender identity, gender expression, age, marital status, family status, disability, or any other ground protected by applicable law.It is our priority to remove barriers to provide equal access to employment. A Human Resources representative will work with applicants who request a reasonable accommodation during the application process. All information shared during the accommodation request process will be stored and used in a manner that is consistent with applicable laws and Manulife/John Hancock policies. To request a reasonable accommodation in the application process, contact .Referenced Salary Location Toronto, OntarioWorking Arrangement HybridSalary range is expected to be between $113,000.00 CAD - $163,000.00 CADEmployees also have the opportunity to participate in incentive programs and earn incentive compensation tied to business and individual performance. The actual salary will vary depending on local market conditions, geography and relevant job-related factors such as knowledge, skills, qualifications, experience, and education/training. If you are applying for this role outside of the primary location, please contact for the salary range for your location.Manulife offers eligible employees a wide array of customizable benefits, including health, dental, mental health, vision, short- and long-term disability, life and AD&D insurance coverage, adoption/surrogacy and wellness benefits, and employee/family assistance plans. We also offer eligible employees various retirement savings plans (including pension and a global share ownership plan with employer matching contributions) and financial education and counseling resources. Our generous paid time off program in Canada includes holidays, vacation, personal, and sick days, and we offer the full range of statutory leaves of absence. If you are applying for this role in the U.S., please contact for more information about U.S.-specific paid time off provisions.We use data and analytics technologies, such as artificial intelligence (AI), and automated processing tools, to analyze and process the information you provide to us or third parties in the application process. For more information, please refer to our .

Manulife

Apply Now