Senior SRE

Full-time | Toronto, ON

I'm interested

Senior SRE

Full-time | Toronto, ON


Be Part of Our Story

Dedicated to telling stories and creating experiences, Indigo is always looking for bright, energetic and customer-focused people who can help bring our exciting mission to life.

Talent DeveloperTalent Developer
Transparent and Open to FeedbackTransparent and Open to Feedback
Customer InspiredCustomer Inspired
Process DrivenProcess Driven
Continuous ImprovementContinuous Improvement

Job Details

Company Description

Current Indigo employees should apply through the Internal Mobility page using your email address.

Dedicated to telling stories and creating experiences, Indigo is always looking for bright, energetic and customer-focused people who can help bring our exciting mission to life in one of our more than 200 Indigo, Indigospirit, Chapters and Coles stores across Canada. We offer a variety of exciting opportunities at our retail stores, distribution centres, and home office for people who share our passions and want to be part of a dynamic and enriching culture.


  • We love books and all things beautiful
  • We are Canada’s Cultural Department Store
  • Books are our heart and our soul and Great Books are JUST the Beginning…

We play by the following rules:

  • We exist to add joy to our customers’ lives each and every time they interact with us and our products
  • Our job is to create joyful moments for our customers
  • We treat each other the way we’d treat a valued friend
  • We inspire each other to do our best work
  • We seek to ignite creativity and innovation every day
  • We give back to the communities in which we operate

Job Description

The Senior Engineer, Service Reliability works cross-functionally with Indigo Digital teams to automate their needs, ensuring Indigo’s production environments remain scalable, highly reliable, secure and monitored at all times. This position is responsible for code deployments and ensuring these deployments remain healthy throughout the development lifecycle. Additionally, this role participates in the design, configuration, and successful operation of Indigo’s cloud platforms.



  • Measure and maintain error budgets and application uptime
  • Measure and maintain high availability capabilities of the Indigo cloud platform
  • Measure and ensure that capacity bottlenecks are identified and remediated proactively
  • Measure and track platform security compliance
  • Establish and track platform high-availability and disaster-record targets
  • Monitor and measure integrations to other cloud providers and/or on-prem systems
  • Monitor, assign, categorize and track cloud costs; manage variance to budgets
  • Maintain and increase provisioning automation


  • Design, implement, and deploy complex cloud-based workloads from initial architecture and design through development, testing, and deployment
  • Provide primary operational support and engineering for cloud-based workloads
  • Participate in architectural discussions and requirements gathering to ensure customer success on cloud platforms
  • Implement and improve Indigo’s continuous integration and continuous delivery pipeline
  • Container and Kubernetes lifecycle management
  • Partner with development teams to improve service reliability through testing, release processes, automated testing and failback capabilities
  • Establish and deploy best practices to manage the use and provisioning of security best practices (MFA, secrets management, code signing and automated pen-testing, DDoS prevention and threat management)
  • Drive the coverage of monitoring and improvement to alerting/communication practices for system and environment issues
  • Keep the team current on SRE tools and practices
  • Drive automation and scripting to reduce repeated manual tasks and human error
  • Highlight issues/risks to project leads and management team
  • Analyze functional, technical and business requirements for projects
  • Maintain the entire CI suite of tools (JIRA, Jira Service Desk, Confluence, Bitbucket, TeamCity, Octopus Deploy, Package Hosts)
  • Provide leadership and design tools and processes to leverage containers in the platform (Kubernetes, etc)
  • Act as an advocate for the customer by placing them at the forefront of all decision-making and design processes
  • Proactively identify and anticipate customer expectations and needs
  • Embrace and seek out technology that creates high tech and high touch solutions for Indigo’s customers
  • Challenge the status quo by consistently identifying areas for improvement, diagnosing issues and working to resolve them
  • Create sustainable systems and services through automation and incremental improvements
  • Establish a balance between feature development with reliability and service level objectives (SLO); establish SLOs where those may not exist
  • Participate in on-call rotation


  • Collaborate with others to drive flexible and iterative solutions, quickly and easily 
  • Share technical knowledge with others and actively seek to learn from those more knowledgeable than yourself 
  • Help others see the impacts of their efforts and proactively engage other functions to get input 
  • Encourage others to freely share their point of view and be open to feedback 


  • Model Indigo’s beliefs and convey a positive image in everything you do 
  • Celebrate diversity of thought and have an open mindset 
  • Take an active role in fostering a culture of continual learning, taking risks without the fear of making mistakes 
  • Embrace, champion and influence change through your team and/or the organization 




  • Computer Science Degree, other related diploma, or equivalent experience
  • Minimum of 3 years of work experience in IT development operations related field
  • Experience in 24/7 production operations, preferably supporting a highly available environment for ecommerce, SaaS or cloud service providers
  • Deployment and Configuration of Kubernetes (AKS)
  • Preference will be given to candidates with experience with Microsoft Azure
  • Experience with Container lifecycle and Helm Charts
  • Systems administration and configuration management automation expertise is a must (Example, Ansible, Desired State Configuration, Terraform and the Hashicorp Stack)
  • Working Experience with CICD/CD and the Development Lifecycle
  • Experience with Continuous Integration Tools (TeamCity, Bitbucket Pipelines, Azure DevOps, Travis CI)
  • Experience with scripting languages (PowerShell, bash, Python, Ruby, JS…)
  • Strong understanding a large array of cloud-enabled-services such as networks, load-balancers, firewalls, security appliances, backups and disaster-recovery techniques
  • Experience with deployment automation (Octopus Deploy, …)
  • Experience with monitoring, dashboarding and alerting systems. New Relic and PagerDuty is a nice to have.
  • Experience with managing and configuring highly available server environments in Linux or Windows or other platforms as necessary
  • Advanced analytical and problem solving skills
  • Excels at navigating large and complex system implementation is a mix of public cloud, SaaS providers and on-prem technologies

Additional Information

Indigo Books & Music is committed to treating all people in a way that allows them to maintain their dignity and independence. We believe in integration and equal opportunity. Accommodations are available upon request for all applicants with a disability throughout the recruitment process. Please contact Human Resources if you require accommodation. We will work with all applicants to accommodate their individual accessibility needs. 



Senior SRE

Home Office
Mid-Senior Level
Information Technology