European Molecular Biology Laboratory (EMBL)

Hinxton, Cambridge, United Kingdom

The Archival Infrastructure and Technology team is looking for an enthusiastic, highly motivated bioinformatician to join the FAIRPlus project in the BioSamples database team.
The BioSamples database ( at EMBL-EBI is a resource that integrates biological samples from a wide variety of sources to provide a single location to apply standards and ontologies to sample data. It has grown from 14,000 samples in 2010 to over 6 million samples in 2019.
The FAIRPlus project ( aims to develop tools and guidelines for making life science data FAIR (Findable, Accessible, Interoperable, Reusable). It is a European collaboration between industrials and academic partners. It aims to increase the discovery, accessibility and reusability of data from selected projects funded by the EU’s Innovative Medicine Initiative, and internal data from pharmaceutical industry partners. The project started at the end of January 2019, so this is a timely opportunity to join and help shape it.
The Biosamples team is a multidisciplinary team of bioinformaticians and programmers with experience in ontologies, databases, backend java technologies and user interfaces.

The position is a fantastic opportunity to gain hands-on experience at working on a project with major scientific impact in a world-leading bioinformatics institute.

You will be responsible for establishing and assessing processes for storing of the FAIRplus data, leveraging and improving on existing resources. Those include other EMBL-EBI archives as well as our semantic as a service suite,

This might involve developing the necessary access tools to establish standardised data workflows to FAIRify incoming datasets, liaising with users to gather requirements and translating these into technical specifications and test plans, and developing new applications to complement and replace the existing BioSamples infrastructure. This might also involve diving deeper in the industrial partners data, or building system to broker it through public databases. Your duties will include teleconference calls and on-site events such as Bring Your Own Data workshops. You will present progress and demo pipelines to all projects members during those events, soliciting feedback and leveraging it to design the next processes to be applied to datasets.

We use a variety of frameworks and technologies, including Solr, MongoDB, Docker and Spring – we value matching the right solution to the right problem and you will have the opportunity to improve and contribute to further development of the architecture. We follow agile techniques in our approach to development, so if you’re the sort of person who likes to work in sprints, has worked with tools like Jira in the past to prioritise user requirements, or have ever tried pair-programming or code reviews, then you’ll be a good fit for our team. All of our software is built and published using continuous integration and version control, so you should at least be familiar with GitHub, and you should be confident in making your code public for others to install and run.
We are looking for a bioinformatician, or a software developer with experience in handling biological data and biological requirements. You should have a biological background but you should also be able to find your way around a terminal. Programming experience in at least one programming language (such as Python, Perl or Java) and experience with managing data or data processing pipelines would be desirable. Previous experience of working in a bioinformatics-focused environment would be beneficial. Data FAIRification involves ongoing work with semantic annotation, so if you’ve used ontologies in the past, or are just keen to know more, this would be ideal.

You’ll be working within the AIT team at EBI alongside developers, bioinformaticians and ontologists. As part of your day to day job, you can expect to interact with other groups at EMBL-EBI as well as our external collaborators in order to improve submission of and access to data in the BioSamples database.

  • Experience handling biological data and requirements

  • Experience of at least one programming language

  • Practical experience of data management

  • Knowledge of common data formats such as XML, JSON and JSON schema

  • Good communication skills, ability to work as part of a team of people with a range of skills and a diversity of backgrounds.

  • Several years experience with other programming languages, such as Python or Java
  • Familiarity with Semantic technologies such as ontologies
  • Experience of working in an agile development environment
  • Familiarity with Spring Boot, Solr, MongoDB
  • Interest in Cloud technologies

We value people who demonstrate that they’re eager to learn about specific aspects to support our users’ requirements.

In addition to highly competitive salary and fantastic benefits package this position offers exciting opportunities, both to work as part of a team on a large, established bioinformatics database and to take the initiative in the development of new data processing and submission tools. It would be perfect for a dynamic and motivated individual, especially one with previous software development experience interested in the life sciences or bioinformatics.

Opportunities will be provided for self-growth. These include working in a highly-skilled team on the Wellcome genome campus, a hub of scientific and technical expertise with many trainings and seminars available.

We know the importance of and value a healthy work life balance. The campus provides an on site nursery and since 2017 a holiday club for children 4-14 years old. EMBL-EBI makes progress every year towards building a more diverse workforce, making it an ideal place for smart, curious people across cultures, genders, ethnicities, and lifestyles. Cambridge is a vibrant city which is culturally diverse and has many excellent schools and family-friendly amenities.

