Senior Research Computing Solutions Specialist
Research Computing Services is a well-established and leading UK National Supercomputing Center, providing facilities and services to world-renowned scientists, clinicians and engineers across the UK and Europe. We operate some of the world's most powerful academic supercomputers, infrastructure that is managed by an innovative OpenStack Software Environment using a software defined /DevOps methodology.
This role sits within the Research Computing Platforms and Infrastructure group. You will be part of a diverse team, working flexibly across the UK, to explore and exploit transformative technologies and support researchers at the forefront of computational science. You will play a critical role in designing, implementing, and maintaining the high-performance computing storage infrastructure and services for our research community. You will collaborate with researchers, software engineers, and infrastructure specialists to ensure the efficient and reliable storage of large-scale data sets and facilitate the smooth operation of our HPC systems. You will have access to state-of-the-art research facilities and cutting-edge technologies, as well as opportunities for professional development and career advancement.
Responsibilities:
Design, deploy, and manage HPC systems, including compute clusters, interconnects, storage, and job schedulers.
Optimize storage performance, reliability, and scalability for diverse research workloads.
Troubleshoot and resolve complex storage-related issues, ensuring minimal downtime.
Monitor and analyze storage performance metrics, proactively identifying areas for improvement.
Stay up to date with emerging storage technologies and trends, evaluating their potential impact on research computing.
Requirements:
Bachelor's or higher degree in computer science, engineering, or equivalent experience.
Experience in designing and implementing HPC storage systems (e.g. Lustre or GPFS)
Proficiency in scripting languages commonly used in the HPC field (e.g. Python, Bash).
Solid understanding of network protocols and technologies used in HPC
Experience with configuration management and automation tools (e.g., Ansible, Puppet, terraform) and containerization technologies (e.g. Singularity).
Strong problem-solving and troubleshooting skills with a keen attention to detail.
Experience of working in a scientific environment and/or providing support to researchers will be advantageous.
More information about the role is attached in the Further Information document.
Once an offer of employment has been accepted, the successful candidate will be required to undergo a basic disclosure (criminal records check) check and a security check.
We offer flexible working arrangements and hybrid working.
We welcome applications from individuals who wish to be considered for part-time working or other flexible working arrangements.
We particularly welcome applications from women and /or candidates from a BME background for this vacancy as they are currently under-represented at this level in our institution.
Click the 'Apply' button below to register an account with our recruitment system (if you have not already) and apply online.
Informal enquiries are welcomed and should be directed to wjt27@cam.ac.uk
Please quote reference VC36788 on your application and in any correspondence about this vacancy.
The University actively supports equality, diversity and inclusion and encourages applications from all sections of society.
The University has a responsibility to ensure that all employees are eligible to live and work in the UK.