
EverPro - Senior Platform Engineer (Remote, CAN)
- Toronto, ON
- $130,000 per year
- Permanent
- Full-time
- Cloud Infrastructure Design: Design, implement, and maintain scalable, secure, and cost-efficient infrastructure using AWS services such as EKS, ECS, Lambda, API Gateway, S3, and CloudFront. Balance managed and serverless offerings to deliver high-performance, resilient systems tailored to product needs.
- Infrastructure as Code: Champion declarative, reproducible infrastructure using tools like Terraform and CloudFormation. Contribute to a culture of GitOps and configuration management that promotes visibility, automation, and reviewability across environments.
- System Architecture & Design: Translate business problems into modular, distributed system architectures. Apply sound engineering judgment to determine what to build and where to simplify. Design for observability, failure recovery, and long-term operability from the start.
- Platform Reliability: Build and maintain highly available systems with automated failover, scaling, and monitoring using Prometheus, Grafana, and Cloud-native tooling. Work closely with SRE and application teams to minimize MTTR and ensure zero-downtime deployments.
- DevOps Enablement: Establish and evolve CI/CD pipelines (e.g., GitHub Actions, Jenkins, or GitLab CI) that empower teams to deploy frequently and safely. Drive continuous delivery practices, automated testing, and secure-by-default deployment standards.
- Technical Leadership: Lead by example through hands-on contributions and deep technical guidance. Provide mentorship and elevate the team’s engineering maturity through design reviews, pair programming, and shared postmortem learnings.
- Cross-functional Collaboration: Work closely with developers, product managers, and security stakeholders to ensure infrastructure decisions align with product strategy, compliance, and user experience goals. Contribute to planning and execution cycles by making platform costs, risks, and capabilities visible.
- Operational Excellence: Establish robust monitoring, alerting, logging, and tracing strategies for platform components. Identify and remove toil through self-service tools, documentation, and automated recovery patterns.