
SRE Specialist 3
- Burnaby, BC
- Permanent
- Full-time
- Build and maintain automation workflows for service and infrastructure operations using Ansible, Bash, or Python.
- Create and optimize CI/CD pipelines with GitLab, enabling safe, reliable, and fast deployments.
- Contribute to our evolving DevOps architecture, identifying gaps and continuously improving efficiency and resilience.
- Deploy, manage, and support services running on OpenStack and Kubernetes platforms, along with some on VMware and hardware.
- Troubleshoot service issues across application layers, OS, network, and infrastructure.
- Handle service lifecycle tasks including provisioning, monitoring, patching, and scaling.
- Participate in on-call rotation to ensure 24/7 uptime of critical systems.
- Monitor service and system health using tools like Zabbix, Grafana, and the ELK stack.
- Investigate and resolve performance bottlenecks and production incidents.
- Write and maintain documentation for operational procedures, troubleshooting guides, and system workflows.
- Administer and troubleshoot Linux servers (Red Hat/CentOS/Ubuntu) and assist with MySQL database support in production environments.
- Manage network-level configurations and problems (IPtables, routing, LDAP, SMTP, DNS, firewall rules etc).
- Handle infrastructure maintenance on OpenStack, Kubernetes, VMware, and physical servers as needed.
- Work with security and compliance teams to prepare for audits, implement required controls, and ensure visibility into operational activities.
- Maintain secure configurations and enforce access control, logging, and change management processes.
- Assist in integrating security practices into CI/CD pipelines (e.g., image/OS hardening, patching and compliance).
- Ensure system changes are documented and traceable to meet compliance needs (e.g., SOC 2, ISO 27001).
- 5+ years of experience in Linux system administration and production environment support.
- Proven ability to manage services in virtualized and containerized environments (especially OpenStack and Kubernetes).
- Strong experience with infrastructure automation tools like Ansible and scripting in Bash or Python.
- Familiarity with building and operating GitLab CI/CD pipelines or similar.
- Solid knowledge of networking fundamentals (TCP/IP, firewalls, DNS, etc.).
- Experience working with monitoring/logging tools (Zabbix, Grafana, ELK, etc.).
- Familiarity with information security principles and experience supporting compliance-driven environments (e.g., SOC 2, ISO 27001).
- Excellent debugging and root cause analysis skills across complex systems.
- A proactive attitude, strong sense of ownership, and ability to work both independently and within a team
- Experience designing or evolving DevOps systems for service lifecycle management.
- Knowledge of Docker, Git, and software-defined infrastructure tools.
- Experience integrating security tooling (e.g., vulnerability scanners, secret managers, audit logs) into DevOps pipelines.
- Prior experience operating in a 24/7 production support environment.
- Certifications such as:
- RHCE (Red Hat Certified Engineer)
- CKA/CKAD (Kubernetes certifications)
- OpenStack Administrator Certification
- Degree or diploma in Computer Science, Computer Technology, or a related field.
- Join a stable, technically strong team with real impact on customer-facing services.
- Work with modern infrastructure and automation technologies.
- Solve meaningful operational challenges at scale.
- Grow with a team that values action, clarity, and continual improvement.