University of Cambridge

Job Opportunities

Jobs

Senior DevOps / Site Reliability Engineer (SRE)


Are you passionate about secure scalable and repeatable infrastructure, observability, optimisation, and continuous deployment? Are you an expert in Google Cloud, AWS or Azure, Terraform, Ansible or other Configuration Management, Continuous Integration and Continuous Deployment tools? Do you have experience working with cloud native services operations like Container Management, Kubernetes, Docker Containers, and Helm?

The University of Cambridge's Information Services is looking for a Senior DevOps Engineer to join a growing team of 18 engineers working on building new Cloud native services and modernising legacy applications. The services that the team maintains, of which some are public facing, are mainly used by university staff and students (~60,000 people). Your work will have a significant impact on the reputation of one of the world's leading universities. These services use modern web architecture standards with APIs and are continuously built and tested using GitLab. They run on Docker containers and get deployed to GKE or cloud Run (Knative).

In your day to day job you will:

  • Help developers build Dockerfiles.
  • Advise and work together with developers on infrastructure, security, maintenance, scalability, continuous integration and continuous deployment.
  • Create GitLab CI configuration that: builds an efficient and secure production container; integrates with GitLab's built in security tools; runs automated unit, integration, and functional tests using a production container and report on code coverage and code quality; auto-deploys a protected branch to staging and allows deployers to deploy to production on demand by using GitLab's UI. *Make sure that all components are always secure and that GitLab repos, GCP and AWS projects configuration, and GitLab CI/CD jobs follow the principle of least privilege.
  • Be responsible for having a monitoring and alerting setup that can predict and alert when there is going to be a problem that will require a manual intervention. Also, that can report errors/bugs to GitLab.
  • Use Terraform to create repeatable, secure, and scalable infrastructure. We use Google Cloud and Amazon Web Services. A typical infrastructure deployment of one of our products in Google Cloud will have a managed SQL instance, Cloud Run (Knative) or GKE with Helm, Object Storage, GCP Secret Manager, and monitoring and alerting.
  • Participate together with other senior engineers in other areas of expertise (Backend, FrontEnd, API, Security, etc.) in continuously improving services, processes, and technology.

The team has a strong learning mindset and have produced boilerplates for our technology stack (Terraform, Ansible, Python, Django, React, and Typescript) that help us be more efficient, work better at scale and keep ourselves DRY. The team has a DevOps culture and uses Scrum for its day to day work, and we have adopted an "open by default" approach to new work, and so you can find much of our work to date at: https://gitlab.developers.cam.ac.uk/uis/devops

The team also produces a Guidebook that collects team's practices and useful information at https://guidebook.devops.uis.cam.ac.uk/

We are looking for someone who:

  • Has experience working alongside developers in a multidisciplinary team.
  • Loves DevOps culture.
  • Enjoys training and mentoring other engineers.
  • Has designed and implemented cloud architecture for systems using Terraform or similar, and enjoys automating processes.
  • Can explain in detail all the advantages of using containers.
  • Takes security very seriously and has experience designing and implementing deployment solutions that follow the principle of least privilege.
  • Knows when it's better to use Knative, when it's not, and where the potential dragons lie in K8s multi-tenancies.
  • Likes to have green CI pipelines with multiple checks, tests, and validations.
  • Is capable of cutting cloud bills by optimising resource utilisation.

We welcome applications from individuals who wish to be considered for part-time working or other flexible working arrangements.

Click the 'Apply' button below to register an account with our recruitment system (if you have not already) and apply online.

For queries regarding this post please contact Abraham Martin: amc203@cam.ac.uk. The closing date for applications is Friday 13 November 2020.

Please note that this is a rolling campaign and applicants will be reviewed periodically.

Please quote reference VC23727 on your application and in any correspondence about this vacancy.

The University actively supports equality, diversity and inclusion and encourages applications from all sections of society.

The University has a responsibility to ensure that all employees are eligible to live and work in the UK.

Further information

Apply online