About the team/job
Metagenomics is a rapidly expanding field in which the depth and breadth of data are constantly increasing. Consequently, the number of published research articles associated with the field is growing, but much of the information found in these articles is not associated with the sequence data and/or key biological traits are missed due to the complex nature of the data.
The overall goal of the EMERALD project is to use machine learning to make metagenomics datasets more reusable, and then perform analyses on these data to discover and identify novel secondary metabolite biosynthetic gene clusters (SMGCs). The project will develop methods to identify potentially missing information about sample metadata by comparing to other datasets and/or information gained from text mining (provided by the project partner).
These enriched metadata fields will be used to help enrich our biome classifications and to identify potentially mislabelled datasets. The results will be integrated into the MGnify database, making them more accessible to the wider user community. In the second element of this project, we will work collaboratively with the Literature Service team (whom will search for evidence in research papers for information on novel SMGCs), and use machine learning approaches to aid the identification of both known and novel SMGC in metagenomics datasets.
We are seeking to recruit a machine learning data scientist who will develop and apply cutting edge machine learning algorithms for the EMERALD (Enriching MEtagenomics Results using Artificial intelligence and Literature Data) project. This position will be located within the Sequence Families Team that is responsible for the MGnify metagenomics resource at the European Bioinformatics Institute (EMBL-EBI). This post holder will work in close collaboration with the Literature Services Team, also based at the EMBL-EBI, whom are also project partners.
- Scientific requirements gathering
- Evaluating different machine learning frameworks/technologies for different analysis problems
- Developing and benchmarking prototype mining algorithms for the classification of samples and discovery of SMGCs
- Work with project partners across EMBL-EBI, other EMBL-EBI teams, and other grant collaborators
- Identify potentially mislabelled datasets, which may involve verification from literature and/or data producers
- Integrate data from project partners to enhance machine learning classifiers
- Develop tools and run periodic updates to MGnify production databases
- Interface with potential users of the developed tool
- Writing reports and publications, giving presentations, as required
Higher degree, preferably PhD, in a the area of life sciences and/or text mining and machine learning.
The ideal candidate will have experience working within a bioinformatics environment for at least 3 years as a post-doctoral fellow (or equivalent experience).
- Strong scripting skills in Python, although other languages would be considered for the right candidate
- Experience using different machine learning frameworks, the data that they are most appropriately applied to and their limitations
- Approaches and development of benchmark datasets
- Knowledge of metagenomics approaches, both taxonomic and functional analyses
- Ability to work with different data types produced within MGnify (e.g. FASTA, Biome, XML, JSON)
- Agile development methodologies and version control with Git/GitHub
- Unix/Linux proficiency
The candidate will need to be able to work independently, as well as interact with the rest of the development team. Flexible approach and ability to take on new skills. Good communication skills and attention to detail are essential, as is the ability to work to deadlines.
Excellent English, written and oral.
You might also have
- Relational database querying and basic schema design (e.g. MySQL, Postgresql)
- Knowledge of secondary metabolite gene clusters
- Experience of knowledge of other software for gene cluster identification (e.g. AntiSMASH, ClusterFinder)
- Experience writing scientific papers
Why join us
At EMBL-EBI, we help scientists realise the potential of ‘big data’ in biology by enabling them to exploit complex information to make discoveries that benefit mankind. Working for EMBL-EBI gives you an opportunity to apply your skills and energy for the greater good. As part of the European Molecular Biology Laboratory (EMBL), we are a non-profit, intergovernmental organisation funded by 22 member states and two associate member states. We are located on the Wellcome Genome Campus near Cambridge in the UK, and our 600 staff are engineers, technicians, scientists and other professionals from all over the world.
EMBL is an inclusive, equal opportunity employer offering attractive conditions and benefits appropriate to an international research organisation. The remuneration package comprises a competitive salary, a comprehensive pension scheme and health insurance, educational and other family related benefits where applicable, as well as financial support for relocation and installation. For more information about pay and benefits click here
We have an informal culture, international working environment and excellent professional development opportunities but one of the really amazing things about us is the concentration of technical and scientific expertise – something you probably won’t find anywhere else.
If you’ve ever visited the campus you’ll have experienced first-hand our friendly, collegial and supportive atmosphere, set in the beautiful Cambridgeshire countryside. Our staff also enjoy excellent sports facilities including a gym, a free shuttle bus, an on-site nursery, cafés and restaurant and a library.
What else do I need to know
For a complete Job Description click here
To apply please submit a covering letter and CV through our online system.
Applications are welcome from all nationalities and this will continue after Brexit. For more information please see our website. Visa information will be discussed in more depth with applicants selected for interview.
EMBL-EBI is committed to achieving gender balance and strongly encourages applications from women, who are currently under-represented at all levels. Appointment will be based on merit alone.
This position is limited to the grant duration specified.
Applications will close at 23:00 GMT on the date listed above.