Job Board
Senior Site Reliability Engineer (SRE) | SRE Senior
remote
Montreal, Quebec, Canada .
full-time . January 30, 2026
Montreal, Quebec, Canada .
full-time . January 30, 2026
Description
Own production reliability at scale
Clinia builds the search, data, and cloud infrastructure that digital health enterprises across North America rely on to deliver trusted, connected care experiences. As a ~40-person post-Series A scale-up, we operate in a regulated healthcare environment where system reliability, security, and correctness are critical.
We are hiring a Senior Site Reliability Engineer (SRE) to strengthen the reliability, observability, and scalability of our production systems as the company grows. This is a senior, hands-on role with real ownership. You will operate production cloud infrastructure, participate in an on-call rotation, and drive systemic improvements that reduce incidents, operational risk, and long-term toil.
What you will do
- Own production reliability through participation in an on-call rotation, incident response, and post-incident reviews that result in durable system improvements
- Design, build, and evolve cloud infrastructure using Terraform and infrastructure-as-code practices, primarily on AWS, with exposure to GCP and Azure
- Operate, scale, and improve Kubernetes platforms, including Amazon EKS, Bottlerocket, and Cilium / eBPF-based networking
- Deploy and manage services using Helm and FluxCD, with a strong emphasis on GitOps workflows and automation
- Establish and maintain end-to-end observability across distributed systems using OpenTelemetry, Prometheus, and Grafana LGTM (Loki, Tempo, Mimir)
- Partner closely with software engineering and product teams to embed reliability, operability, and failure-mode thinking into system design
- Identify recurring operational issues and replace them with clear automation, platform improvements, or architectural changes
What we are looking for
- Proven experience as a Site Reliability Engineer, DevOps Engineer, or Infrastructure Engineer supporting production systems at scale
- Hands-on experience with on-call rotations, incident management, and operating systems under real uptime and SLA expectations
- Strong experience managing AWS cloud environments using Terraform. Experience with GCP or Azure is a plus
- Deep understanding of Kubernetes internals and cluster operations, including Helm, GitOps tools such as Flux, and community operators (for example, CNPG)
- Solid foundations in Linux systems and TCP/IP networking, including security, compliance, and modern networking technologies such as eBPF and Cilium
- Working knowledge of modern monitoring and observability practices, including OpenTelemetry and Prometheus
- Clear, direct communication during incidents and disciplined follow-through on remediation work
If you bring additional experience in platform security, performance optimisation, cost optimisation (FinOps), or internal tooling, weâd be glad to hear about it.
Why You Will Love Working Here
đ° Equity via our global ESOP, you share in what you build
đŽ 4 weeks vacation plus summer hours
đ„ Group insurance from day one
đ Remote-friendly culture means you can work from anywhere
đ©ââïž 24/7 online doctor access for you and your family
đ§âđ€âđ§ Human first: whether itâs flexible schedules to fit lifeâs curveballs, a listening ear when challenges come up, or celebrating wins big and small, youâre more than just your role here
đ Movement matters: we believe in a balanced, active lifestyle. Thatâs why we offer a bonus ($) for every hour of physical activity you do. Hiking, yoga, climbing or whatever sport you do, we encourage you to keep moving at your own pace
đ» High-performance equipment including MacBook Pro with Apple Silicon
đ¶ Office dog therapy sessions
đ Team events, 5@7s, and celebrations when we ship big
đ± We are proudly B Corp certified and committed to building tools that actually make healthcare better
Letâs Build Something That Matters
This is an opportunity to build something from the ground up, with a team that moves fast, supports one another deeply, and cares about making a lasting impact in health. Ready to make a difference? Apply now.
We care about motivation as much as qualifications. Please answer the pre-screening questions thoughtfully, incomplete applications will not be considered.
*By submitting your application, you consent to share your personal information with Clinia, which will use it to process your application for this job position. Clinia will not use this information for any other purposes than stated above. See our Privacy Policy for more information.
Compensation: $130,000 - $150,000 CAD
SRE Senior
Assumer la fiabilité des systÚmes de production à grande échelle
Clinia dĂ©veloppe lâinfrastructure de recherche, de donnĂ©es et infonuagique sur laquelle sâappuient des organisations de santĂ© numĂ©rique en AmĂ©rique du Nord pour offrir des expĂ©riences de soins connectĂ©es et fiables. En tant quâentreprise dâenviron 40 personnes, post-sĂ©rie A, nous Ă©voluons dans un environnement de santĂ© rĂ©glementĂ© oĂč la fiabilitĂ©, la sĂ©curitĂ© et lâexactitude des systĂšmes sont essentielles.
Nous recrutons un·e ingĂ©nieur·e principal·e en fiabilitĂ© des sites (SRE) afin de renforcer la fiabilitĂ©, lâobservabilitĂ© et la capacitĂ© de mise Ă lâĂ©chelle de nos systĂšmes de production Ă mesure que lâentreprise grandit. Il sâagit dâun rĂŽle senior, trĂšs concret, avec une rĂ©elle prise en charge. Vous exploiterez des infrastructures infonuagiques en production, participerez Ă une rotation de garde et piloterez des amĂ©liorations systĂ©miques visant Ă rĂ©duire les incidents, les risques opĂ©rationnels et la charge opĂ©rationnelle Ă long terme.
Vos responsabilités
- Assumer la fiabilitĂ© des systĂšmes de production par votre participation Ă la rotation de garde, Ă la gestion dâincidents et aux revues post-incident menant Ă des amĂ©liorations durables
- Concevoir, dĂ©ployer et faire Ă©voluer lâinfrastructure infonuagique Ă lâaide de Terraform et de pratiques dâinfrastructure en tant que code, principalement sur AWS, avec une exposition Ă GCP et Azure
- Exploiter, mettre Ă lâĂ©chelle et amĂ©liorer des plateformes Kubernetes, incluant Amazon EKS, Bottlerocket et des rĂ©seaux basĂ©s sur Cilium / eBPF
- DĂ©ployer et gĂ©rer des services Ă lâaide de Helm et FluxCD, avec un fort accent sur les flux GitOps et lâautomatisation
- Mettre en place et maintenir une observabilitĂ© de bout en bout des systĂšmes distribuĂ©s Ă lâaide dâOpenTelemetry, Prometheus et Grafana LGTM (Loki, Tempo, Mimir)
- Collaborer Ă©troitement avec les Ă©quipes de dĂ©veloppement logiciel et de produit afin dâintĂ©grer la fiabilitĂ©, lâopĂ©rabilitĂ© et lâanalyse des modes de dĂ©faillance dĂšs la conception des systĂšmes
- Identifier les problĂšmes opĂ©rationnels rĂ©currents et les remplacer par de lâautomatisation claire, des amĂ©liorations de plateforme ou des changements architecturaux
Profil recherché
- ExpĂ©rience dĂ©montrĂ©e en tant quâingĂ©nieur·e SRE, DevOps ou infrastructure, avec des systĂšmes de production Ă grande Ă©chelle
- ExpĂ©rience concrĂšte des rotations de garde, de la gestion dâincidents et de lâexploitation de systĂšmes avec de rĂ©elles exigences de disponibilitĂ© et dâANS
- Solide expĂ©rience dans la gestion dâenvironnements AWS Ă lâaide de Terraform Une expĂ©rience avec GCP ou Azure est un atout
- Excellente comprĂ©hension des composantes internes de Kubernetes et de lâexploitation de clusters, incluant Helm, des outils GitOps comme Flux et des opĂ©rateurs communautaires (par exemple CNPG)
- Bases solides en systÚmes Linux et en réseaux TCP/IP, incluant les considérations de sécurité, de conformité et des technologies modernes comme eBPF et Cilium
- Connaissance pratique des outils et pratiques modernes dâobservabilitĂ© et de surveillance, incluant OpenTelemetry et Prometheus
- Communication claire et directe lors des incidents, avec une discipline rigoureuse dans le suivi des actions correctives
Si vous apportez une expĂ©rience additionnelle en sĂ©curitĂ© de plateforme, en optimisation de performance, en optimisation des coĂ»ts (FinOps) ou en outils internes, nous serons heureux dâen discuter.
Pourquoi tu aimeras travailler ici
đ° ĂquitĂ© grĂące Ă notre programme global dâoptions dâachat dâactions; tu profiteras directement de ce que tu contribues Ă bĂątir
đŽ 4 semaines de vacances dĂšs lâentrĂ©e en poste + horaires dâĂ©tĂ©
đ„ Assurance collective dĂšs le premier jour
đ Culture Remote - travaillez de nâimporte oĂč
đ©ââïž MĂ©decin en ligne 24/7 pour toi et ta famille
đ§âđ€âđ§ « People-First » : horaires flexibles,« care days », cĂ©lĂ©bration des rĂ©ussites grandes et petites
đ Bouger, câest important: nous offrons une prime ($) pour chaque heure dâactivitĂ© physique : randonnĂ©e, yoga, escalade ou toute autre activitĂ©, Ă votre rythme
đ» Ăquipement haut de gamme, incluant MacBook Pro avec Apple Silicon
đ¶ ZoothĂ©rapie avec chiens au bureau
đ ĂvĂ©nements dâĂ©quipe, 5@7 et cĂ©lĂ©brations lors des grands dĂ©ploiements
đ± FiĂšrement certifiĂ©s B Corp et engagĂ©s Ă crĂ©er des outils qui amĂ©liorent rĂ©ellement les soins de santĂ©
Un mot avant de postuler
PrĂȘt Ă façonner lâavenir des technologies de la santĂ©, Ă traduire des produits complexes en valeur claire et Ă ĂȘtre la voix qui propulsera notre plateforme vers lâavant? Postule dĂšs aujourdâhui.
Nous accordons autant dâimportance Ă la motivation quâaux compĂ©tences. Merci de rĂ©pondre soigneusement aux questions de prĂ©sĂ©lection: les candidatures incomplĂštes ne seront pas considĂ©rĂ©es.
En soumettant votre candidature, vous consentez Ă partager vos renseignements personnels avec Clinia, qui les utilisera pour traiter votre demande dans le cadre de ce poste. Clinia nâutilisera pas ces renseignements Ă dâautres fins que celles Ă©noncĂ©es ci-dessus. Consultez notre Politique de confidentialitĂ© pour en savoir plus.
Compensation
$130,000.00 - $150,000.00 per year