Senior Site Reliability Engineer - GFT
Royal Bank of Canada View all jobs
- Vancouver, BC
- Permanent
- Full-time
- Perform application production support role including off-hours support. Development of SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing, and reliability testing)
- Learn about multiple applications and technologies and how they interact to provide cross-function support. An understanding of how legacy and emerging technologies integrate is required.
- Assist in incident management and problem management for applications in scope. Maintain technology currency (manage server patching, certificate renewal, etc.) with keen eye on automating opportunities
- Ensure availability and uptime of applications in scope, as per service level objectives. Ensure compliance of all systems and applications in scope, including maintaining segregation of duties
- Implement monitoring and alerting, anomaly detection, self-healing and reliability testing for applications in scope
- Supports unit’s goals to adopt automation solutions for applications in scope. Apply design-thinking and agile mindset in working with SREs, Scrum Masters and partner team leads
- Proactively identify and act on emerging issues (and works with Development teams) to resolve them in the short and long term
- Identify opportunities to reduce manual effort and human error. Make recommendations on process improvements and enhances system and support documentation
- 3+ years of hands-on experience in a variety of SRE languages and tools (Ansible, Dynatrace Managed, Moog, PagerDuty, ServiceNow, GitHub)
- Ability to automate manual support repeatable task and bring in 360-degree monitoring solution
- Intermediate knowledge of industry practices, with a focus on SRE
- Intermediate experience in a variety of environments (Cloud, Linux/Unix/Windows and services/APIs, databases)
- Excellent communication and stakeholder management skills
- Experience with job scheduler like Zeke, Stonebranch, Autosys, Airflow
- Experience with troubleshooting using logs, Grafana, Moog, Dynatrace
- Understanding of SRE principles
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services
- Flexible work/life balance options
- Opportunities to do challenging work