Senior Site Reliability Engineering
Royal Bank of Canada View all jobs
- Toronto, ON
- Permanent
- Full-time
- Technical Leadership: Lead code and non-functional (performance, security, maintainability, compliance, change management) reviews of all production bound SRE solutions
- Drive transformation by continuously looking for ways to automate existing processes, Run engineering mindset meetups accelerating breadth and depth of knowledge in community
- Manage SRE application assets (virtual machines, cloud instances, mainframe, source code repositories, etc.)Publish technical design for SRE solutions
- Publish and/or review implementation plans for SRE solutions bound to production, Explore new capabilities and technologies to drive innovation (including coding and publishing how-to documentation)
- Track, audit, monitor and implement on technical work streams, Act as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, infrastructure of supported application
- Production Support: Escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements Lead in incident management and problem management for applications in scope
- Incident management and problem management for applications in scope and
- RCA Action items fulfillment/ownership Focus on Continuous improvement and technical standards – Drive improvements in productivity, monitoring, tooling and best practices Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities
- Ensure availability and uptime of applications in scope, as per service level objectives, Manage PagerDuty rules/tuning/tagging, Moogsoft Situation management, Dynatrace tuning (RUM, Problem Card reduction), Provide expertise, direction, coaching and development to build the SRE teams capability
- Provide assistance with selecting & building a high performing diverse team that leverages individual capabilities & strengths
- Advanced knowledge of industry practices, with focus on SRE
- Advanced experience in a variety of environments (Cloud, distributed and mainframe, business workflows and services/APIs, databases)
- Excellent communication skills, direct style (e.g. I did or did not do something, it does or does not work as opposed I believe or I understand it to be)
- Effective negotiation skills, stakeholder management, Ability to influence at the Director level (unit and other partner units) Mainframe knowledge and work experience
- Hands-on experience in a variety of SRE languages and tools (Ansible, Dynatrace Managed, Moog, PagerDuty, ServiceNow, GitHub, Slack, Elastic, Logstash, Kibana, Grafana , Catch Point, RedHat OCP)
- Computer Engineering, Computer Science, related (technical) degree/diploma, or related breadth of experience
- Exposure to Azure, docker and OCP
- Exposure to UCD, GitHub
- Experience in agile ways of working
- Middleware experience
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact in technology transformation
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services and technology
- Flexible work/life balance options
- Opportunities to do challenging work
- Opportunities to take on progressively greater accountabilities
- Opportunities to building close relationships with clients