
Site Reliability Engineer -Tim Hortons
Restaurant Brands International
- Toronto, ON
- Permanent
- Full-time
- Design, implement, and manage infrastructure automation solutions using Infrastructure as Code (Terraform) to ensure consistency and reliability across environments.
- Support and enhance CI/CD pipelines (GitHub Actions, CircleCi) to facilitate efficient, reliable software deployments.
- Monitor application and infrastructure performance using industry-standard observability tools such as Datadog, proactively addressing issues and optimizing system performance.
- Collaborate closely with development and operations teams to enhance deployment efficiency and reliability.
- Implement and maintain system security best practices, ensuring compliance with security standards and protocols.
- Participate actively in incident response, contributing to root cause analysis and developing preventative measures.
- Support operational excellence initiatives to improve overall system health, reduce downtime, and enhance user experiences.
- Contribute to the creation and maintenance of technical documentation, operational runbooks, and infrastructure standards.
- Actively participate in architectural and production readiness reviews to ensure optimal operational effectiveness.
- Assist leadership in cost engineering and proactive capacity planning.
- Contribute to debugging production issues across the full stack, promoting automation for common operational challenges.
- Participate in scheduled on-call rotations, alongside engineering team members, to uphold system reliability.
- Minimum of 5 years' experience in DevOps or SRE roles.
- Proficiency with cloud infrastructure, especially AWS services (Lambda, API Gateway, DynamoDB, EC2, ECS).
- Hands-on experience with Infrastructure as Code (Terraform).
- Experience in managing and optimizing CI/CD pipelines (GitHub Actions, CircleCi).
- Knowledge of monitoring, logging, and observability tools (Datadog preferred).
- Familiarity with containerization technologies (Docker, Kubernetes).
- Strong programming skills (TypeScript, Node.js, Python), with a proven ability to debug complex issues.
- Basic knowledge of network security, VPN, firewall, and related technologies.
- Deep understanding of systems thinking, particularly edge cases, failure modes, and resilience strategies.
- Experience or willingness to work with AI-driven development tools
- Strong communication skills, capable of explaining complex concepts clearly to diverse technical audiences.
- Detail-oriented with strong analytical and problem-solving abilities.
- Collaborative mindset, effective in cross-functional team environments.
- Self-motivated, proactive approach to learning and staying updated with emerging DevOps trends and technologies.
- Strong sense of urgency, curiosity, and proactive approach to problem-solving and improvements.
- Experience with mobile application development frameworks (React Native).
- Previous experience in quick-service restaurant (QSR), retail, or high-traffic consumer-facing systems.
- AWS Certified DevOps Engineer or Solutions Architect certifications.
- Experience implementing security best practices within DevOps frameworks.
- Exposure to agile software development methodologies (Scrum, Kanban).