Genomics Data Scientist (Fixed Term)

The research group of Al-Kindi Professor Richard Durbin has moved to the Department of Genetics at the University of Cambridge. We are seeking a talented genomics data scientist or bioinformatician to contribute to data analysis and the development of methods and software. The group has an outstanding track record in the analysis of genome sequencing data at scale. Current research focuses on the development of new data structures and algorithms for large-scale sequencing data, and their application to areas such as vertebrate genome assembly, human demography, and genome and species evolution of Lake Malawi cichlid fish. We are also involved in the Vertebrate Genomes Project which has the initial goal of producing reference quality genome assemblies for one species from every vertebrate order using cutting edge long-read DNA sequencing technologies like PacBio, BioNano and 10X Genomics.

The primary responsibilities of the successful applicant will be to:

  • Develop, maintain, and run pipelines and processes for the QC, and analysis of high-throughput sequencing data.
  • Evaluate and compare new tools and technologies such as new assembly programs or tools for inclusion in the pipeline.
  • Develop and maintain a system for tracking data sets and their analysis progress against team projects.
  • Participate in the development of novel bioinformatics software tools and techniques for high-throughput sequencing and assembly.
  • Contribute to scientific projects and publications.
  • Help to make our data and resources available to a wide community of biologists and geneticists.

Sequencing technologies are constantly evolving in terms of the type and volume of the sequence data they produce. The recent progress in long-read sequencing technologies means that we are now beginning to be able to consistently deliver high quality genome assemblies for species that did not previously have such a resource. One of the most challenging aspects of this role will be to produce high-quality scientific results on a large scale while adapting to rapid developments in sequencing technology and software.

This role would suit somebody with some previous experience with bioinformatics or other large scale scientific data analysis, or a newly qualified graduate student with data science skills interested in DNA sequence data. While desirable, previous experience with DNA sequencing data is not strictly necessary for the position. We have a strong publication record and culture of producing open data resources and open source software development. The group also retains an affiliation with the Wellcome Trust Sanger Institute, so we will be working closely with many of the production and research groups there.


  • Advanced degree in a scientific discipline, or equivalent experience
  • Record of multiple years of computational scientific data analysis
  • Familiarity with the unix computing environment
  • Proficiency in one or more scripting languages, preferably Python and Perl
  • Excellent critical and problem-solving skills
  • Attention to detail and the ability to work to agreed timelines
  • A high level of communication skills to be able to elicit complex requirements from, and convey complex information to, groups with different levels of technical knowledge
  • Ability to quickly adapt to new problems and ideas
  • Experience with database management in MySQL or similar


  • Knowledge of DNA sequencing data and technologies
  • Experience with the git version control system
  • Experience with running software on a compute farm or cluster
  • Previous experience with managing large volumes of data
  • Web development experience


Fixed-term: The funds for this post are available for 2 years in the first instance.

For further information or questions about this post please contact Shane McCarthy ( or Richard Durbin (

