Site Reliablity Engineer
Qlik View all jobs
- Ottawa, ON
- $100,000-133,000 per year
- Permanent
- Full-time
- Solve real scale challenges - Work on reliability and performance across a global cloud platform handling millions of transactions.
- Engineer, not just operate - Build tooling, automation, alerts, and scalable infrastructure patterns that prevent problems before they happen.
- Collaborate with highly skilled teams - Partner with Global SRE, Architecture, Platform, and Domain Engineering teams to influence how infrastructure is designed from the ground up.
- Work with modern cloud-native technologies - Kubernetes, IaC, observability tooling, autoscaling, secret management, CI/CD - you'll be hands-on with today's most relevant technologies.
- Shape best practices - Help define and champion cloud optimization and reliability standards across the organization.
- Grow your technical influence - Act as a go-to resource for reliability, incident management, cloud engineering, and production operations.
- Continuously evolve - Stay close to emerging tools and practices, contributing to ongoing improvements in our cloud environment.
- Increase reliability and availability by implementing resilient infrastructure patterns and performance optimizations.
- Reduce incidents and recovery time through better observability, automation, and proactive engineering.
- Strengthen scalability by designing infrastructure that adapts seamlessly to growth.
- Improve cloud efficiency by driving optimization best practices across AWS and Azure environments.
- Resolve complex system challenges across infrastructure, networking, applications, and distributed systems.
- On-Call Support: Participate in on-call duties to maintain the availability and performance of our cloud infrastructure, providing regular updates on project status and activities. This includes first-line incident response.
- Elevate engineering standards by mentoring peers and embedding reliability-first thinking into development workflows.
- Cloud engineering skill across AWS and/or Azure, including hands-on experience supporting production systems running on Kubernetes at scale.
- Infrastructure as Code and microservices experience, using tools such as Terraform, Crossplane or Ansible, with a strong understanding of operating distributed systems in live environments.
- Automation and engineering mindset, with proficiency in Python, Go or Bash, plus experience building and improving CI/CD pipelines and autoscaling strategies.
- Observability and incident management depth, including Prometheus, Grafana, OpenTelemetry, distributed tracing, and SIEM tooling - with the ability to turn insights into reliability improvements.
- Security and networking knowledge, including secret management (e.g., Vault, AWS SSM) and familiarity with infrastructure security and compliance best practices.
- Cloud-native tooling experience, including Helm (managing and creating charts) and exposure to modern database and ecosystem technologies such as MongoDB.
- Strong analytical thinking, with the ability to troubleshoot complex issues across infrastructure, networking, and application layers.
- Curiosity and collaboration at their core; a passion for learning, sharing ideas and insight and comfort with the on-call support rotation - experience here is also welcome.
- National Capital Region's 2025 Top Employers in Canada: https://reviews.canadastop100.com/top-employer-qliktech
- Genuine career progression pathways and mentoring programs.
- Culture of innovation, technology, collaboration, and openness.
- Flexible, diverse, and international work environment.