About the team/job
We are seeking to recruit a machine learning scientist with experience in text mining, NLP and machine learning to work on the EMERALD (Enriching MEtagenomics Results using Artificial intelligence and Literature Data) project. This position will be located within the Literature Services Team at the European Bioinformatics Institute (EMBL-EBI), and will work in close collaboration with the metagenomics team (MGnify), also based at the EMBL-EBI.
Metagenomics is a rapidly expanding field in which the depth and breadth of data are constantly increasing. Consequently, the number of published research articles associated with the field is growing. The overall goal of the EMERALD project is to use machine learning to make metagenomics datasets more reusable, and then perform analyses on these data to discover and identify novel secondary metabolite biosynthetic gene clusters (SMGCs). The project will develop methods to identify full text publications on metagenomics and extract useful experimental concepts (such as biome descriptions and experimental techniques). These key concepts can then be integrated to the public datasets in MGnify, making them more usefully cross-comparable. In the second element of this project, we will work collaboratively with the MGnify team to search for evidence in research papers for information on novel SMGCs, identified through re-analysis across multiple metagenomics datasets. For example, co-occurrence of sets of gene names, secondary metabolites, and inferred relationships between these concepts).
The successful candidate will be responsible for researching and developing machine learning and related methods to extract concepts from research publications pertaining to the above goals and sharing the results with the MGnify team. The role will require you to work within the context of the Literature services multi-disciplinary team, which includes full stack developers, ontology expertise, data scientists, and biologists as well as text mining and machine learning scientists. You will not be starting from scratch: the Europe PMC team runs basic text mining workflows on all incoming Europe PMC content, which will provide a foundation for this project. This is a great opportunity for someone who wants to make an impact with their text and data mining skills in an open research data infrastructure.
Specific job responsibilities include:
- Scientific requirements gathering
- Point of contact for the project with the MGnify team
- Understand the scientific drivers and context of the project
- Developing and benchmarking prototype core text and data mining algorithms
- Application of methods to large datasets (millions of full text articles)
- Writing reports and publications, giving presentations, as required
The successful candidate must be able to demonstrate as much of the following as possible:
- Higher degree, preferably PhD, in a the area of life sciences and/or text mining and machine learning
- Proven experience of a range of techniques in the areas of NLP, deep learning, machine learning, such as text classification and information extraction.
- Application of these skills within biology/biomedical domain, in an academic, industrial or publishing settings;
- Evidence of applications you have developed, with clarity on your specific contributions.
- Previous experience working with full text XML documents, preferably in the life sciences;
- Flexible approach and ability to take on new skills;
- Self starter;
- Team player and good communicator - written and verbal.
Why join us
At EMBL-EBI, we help scientists realise the potential of ‘big data’ in biology by enabling them to exploit complex information to make discoveries that benefit mankind. Working for EMBL-EBI gives you an opportunity to apply your skills and energy for the greater good. As part of the European Molecular Biology Laboratory (EMBL), we are a non-profit, intergovernmental organisation funded by 22 member states and two associate member states. We are located on the Wellcome Genome Campus near Cambridge in the UK, and our 600 staff are engineers, technicians, scientists and other professionals from all over the world.
EMBL is an inclusive, equal opportunity employer offering attractive conditions and benefits appropriate to an international research organisation. The remuneration package comprises a competitive salary, a comprehensive pension scheme and health insurance, educational and other family related benefits where applicable, as well as financial support for relocation and installation. For more information about pay and benefits click here
We have an informal culture, international working environment and excellent professional development opportunities but one of the really amazing things about us is the concentration of technical and scientific expertise – something you probably won’t find anywhere else.
If you’ve ever visited the campus you’ll have experienced first-hand our friendly, collegial and supportive atmosphere, set in the beautiful Cambridgeshire countryside. Our staff also enjoy excellent sports facilities including a gym, a free shuttle bus, an on-site nursery, cafés and restaurant and a library.
What else do I need to know
To apply please submit a covering letter and CV through our online system. In your application, please include a short statement (less than 500 words) as to why you are suited to this role. Applicants invited for interview will be expected to give a short (less than 15 minute) presentation on their work in this area to date, and what their initial approach to this project would be.
Applications are welcome from all nationalities and this will continue after Brexit. For more information please see our website. Visa information will be discussed in more depth with applicants selected for interview.
EMBL-EBI is committed to achieving gender balance and strongly encourages applications from women, who are currently under-represented at all levels. Appointment will be based on merit alone.
This position is limited to the grant duration specified.
Applications will close at 23:00 GMT on the date listed above.