
Sr. Site Reliability Engineer
- Richmond Hill, ON
- Permanent
- Full-time
As a Site Reliability Administrator (SRE) at OpenText, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure. You will work closely with development and operations teams to design, implement, and maintain systems that are resilient and efficient.WHAT THE ROLE OFFERS
- Design, implement, and manage Kubernetes clusters to ensure high availability and scalability.
- Utilize AWS services to build and maintain cloud infrastructure.
- Develop and maintain Helm charts for application deployment and management.
- Use Terraform to automate infrastructure provisioning and management.
- Monitor and optimize Windows and Linux-based systems to ensure optimal performance and reliability.
- Collaborate with development teams to ensure smooth deployment and operation of applications.
- Implement and maintain CI/CD pipelines to streamline development and deployment processes.
- Troubleshoot and resolve issues related to infrastructure and application performance.
- Participate in on-call rotations to provide 24/7 support for critical systems.
- Over 5 years of previous experience in a Site Reliability Engineer or similar role.
- Strong experience with Kubernetes and container orchestration.
- Proficiency in AWS services and cloud architecture.
- Expertise in Helm charts for Kubernetes application management.
- Solid understanding of Terraform for infrastructure as code.
- In-depth knowledge of Linux operating systems and system administration.
- Experience with CI/CD tools and practices.
- Excellent problem-solving skills and ability to work under pressure.
- Strong communication and collaboration skills.
- Bachelor's degree in Computer Science, Engineering, or related field.
- AWS Certifications
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana or ELK stack).
- Knowledge of scripting languages (e.g., Python, Bash).