
GCP Observability Engineer
- Montreal, QC
- Permanent
- Full-time
Experience Level: Level 3 (senior): 5-7 years
Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)Role Description:We are seeking an experienced and motivated engineer to join the Observability fleet which focuses on delivering tools in private and public cloud environments. The role focuses on developing and modernizing Observability platforms for cloud-native and hybrid applications, with a primary focus on Google Cloud Platform (GCP).
This role involves designing, integrating, and maintaining solutions for collecting, transporting, and visualizing telemetry (tracing, metrics, and logging) to improve the reliability and uptime of our applications. You will closely collaborate with software developers, SRE, infrastructure, and security teams to drive automation and implement best-in-class observability solutions supporting both development and operations in a hybrid cloud environment.Responsibilities:
- Build and support the modernization and integration of observability tools in private and public cloud offerings (GCP, AKS, EKS)
- Design, implement, and automate telemetry, logging, and monitoring solutions
- including dashboards, alerts, and CI/CD integration.
- Enable teams to leverage observability data for reliability, performance, and security use cases; provide actionable recommendations.
- Collaborate with DevOps, SRE, and security teams to share best practices and support adoption of observability standards.
- Mentor and upskill client teams through knowledge transfer and participate in on call activities as required.
- At least 5 years of relevant experience in Observability, Logging, and Monitoring in enterprise environments.
- Hands-on experience with observability tools such as Grafana, Prometheus, Loki, Cortex, Tempo, ElasticSearch, Datadog, Splunk, or equivalents.
- Experience working with container technologies (Docker, Kubernetes) and orchestration platforms (GKE or similar).
- Proficiency in setting up and configuring dashboards, alerts, and alarms on Grafana and/or GCP Monitoring.
- Experience in integrating observability tools with CI/CD pipelines and automating through scripting (Python, Bash, JSON, YAML, Terraform or similar).
- Excellent communication, presentation, and problem-solving skills.
- Proficiency with Linux operating systems and databases (MySQL, DB2, MSSQL, or similar).
- Solid understanding of how enterprise service delivery components interact (web servers, application servers, databases, web services, storage, security).
- Experience with application instrumentation for distributed tracing, metric and log collection.
- Experience with programming languages (preferably Python, Java, or Go).
- Familiarity with log parsing and regular expressions for data extraction.
- Experience with DevOps tooling and automation.
- Prior experience with Application Performance Management (APM) solutions.
- Experience integrating end-user applications with monitoring and APM tools.
- Experience with other public cloud providers (AWS, Azure) is a plus.
- Understanding of enterprise-architecture concepts: 3-tier architecture, high-availability/disaster recovery, active-active data centers, etc.
- Familiarity with networking concepts and protocols (OSI model, TCP/IP, HTTP, firewalls, load balancers).