Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring | Canada | Remote
Grafana Labs View all jobs
- Canada
- Permanent
- Full-time
- Design and implement high-quality, scalable integrations for various infrastructure components, applications, and data ingestion pipelines
- Create middleware components and libraries that simplify development and maintenance of observability solutions
- When necessary, represent Grafana Labs in open source forums, working groups, and events
- Work with product teams, in addition to design and docs, to develop features that align with wider product strategy and customer needs
- Lead the technical direction and vision of the team, contributing to strategic discussions and future development of observability solutions
- Work with other departments including Sales, Product, and Support teams to deliver a holistic product experience
- Take ownership of the services you're running by deploying well tested clean code
- Embrace our open-source culture and contribute to other projects that may not directly fall within your team's scope
- You have a passion for observability and like to share your knowledge by writing documentation and blog posts.
- You love to engage with customers and help them out.
- You have excellent communication skills.
- You have relevant open source experience, ideally in the observability domain.
- You are willing to become an active member of the OpenTelemetry and Prometheus communities.
- You're curious and you enjoy learning new programming languages and frameworks, setting up examples, and figuring out how things work.
- You have a good understanding of typical production environments. Ideally you have been responsible for operating production services and organizing on-call.
- You actively mentor other team members, identifying areas for focus and improvement.
- Strong 8+ years of experience with at least one programming language - any major language (Python, .NET, Java, Go, Rust, etc) is acceptable
- Demonstrated working experience in operating high-scale production systems running on Kubernetes and monitoring it, including on-call participation, incident response, and postmortem practices
- Familiarity with observability tooling (e.g. Grafana)
- Strong understanding of time-series data, metrics cardinality challenges, and cost/performance tradeoffs/optimizations in observability systems
- Experience in a hands-on technical leadership role - setting technical direction, leading project teams, and influencing architectural decisions beyond your immediate team
- Deep understanding of distributed systems concepts including scalability, consistency, high availability, and failure modes in large-scale systems
- Experience writing clean, maintainable, robust, and performant software
- Experience with delivering projects from start to finish in a self-driven manner
- Excellent problem-solving and debugging skills
- Strong mentoring and leadership skills
- Experience operating or scaling Prometheus in high-cardinality, multi-tenant environments
- Experience working with OpenTelemetry Collector pipelines or similar telemetry ingestion systems
- Certified Kubernetes Administrator (CKA)/ Certified Kubernetes Application Developer (CKAD) or any other Kubernetes related certification from CNCF
- Experience developing Kubernetes operators, controllers, or custom resources
- Strong understanding of metrics collection, visualization, and alerting concepts
- Experience contributing to or maintaining open source projects, with evidence of successful pull requests and community collaboration
- Experience designing and building observability backends for various systems and applications
- 100% Remote, Global Culture - As a remote-only company, we bring together talent from around the world, united by a culture of collaboration and shared purpose.
- Scaling Organization - Tackle meaningful work in a high-growth, ever-evolving environment.
- Transparent Communication - Expect open decision-making and regular company-wide updates.
- Innovation-Driven - Autonomy and support to ship great work and try new things.
- Open Source Roots - Built on community-driven values that shape how we work.
- Empowered Teams - High trust, low ego culture that values outcomes over optics.
- Career Growth Pathways - Defined opportunities to grow and develop your career.
- Approachable Leadership - Transparent execs who are involved, visible, and human.
- Passionate People - Join a team of smart, supportive folks who care deeply about what they do.
- In-Person onboarding - We want you to thrive from day 1 with your fellow new 'Grafanistas' to learn all about what we do and how we do it.
- Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.