Harmonisation of in-silico next-generation sequencing based methods for diagnostics and surveillance

Nunez-Garcia, J.; AbuOun, M.; Storey, N.; Brouwer, M. S.; Delgado-Blas, J. F.; Mo, S. S.; Ellaby, N.; Veldman, K. T.; Haenni, M.; Châtre, P.; Madec, J. Y.; Hammerl, J. A.; Serna, C.; Getino, M.; La Ragione, R.; Naas, T.; Telke, A. A.; Glaser, P.; Sunde, M.; Gonzalez-Zorn, B.; Ellington, M. J.; Anjum, M. F.

doi:10.1038/s41598-022-16760-9

Download PDF

Article
Open access
Published: 23 August 2022

Harmonisation of in-silico next-generation sequencing based methods for diagnostics and surveillance

J. Nunez-Garcia¹,
M. AbuOun¹,
N. Storey¹,
M. S. Brouwer²,
J. F. Delgado-Blas³,
S. S. Mo⁴,
N. Ellaby⁵,
K. T. Veldman²,
M. Haenni⁶,
P. Châtre⁶,
J. Y. Madec⁶,
J. A. Hammerl⁷,
C. Serna³,
M. Getino⁸,
R. La Ragione⁸,
T. Naas¹⁰,
A. A. Telke⁴,
P. Glaser⁹,
M. Sunde⁴,
B. Gonzalez-Zorn³,
M. J. Ellington⁵ &
…
M. F. Anjum^1,8

Scientific Reports volume 12, Article number: 14372 (2022) Cite this article

2411 Accesses
8 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Improvements in cost and speed of next generation sequencing (NGS) have provided a new pathway for delivering disease diagnosis, molecular typing, and detection of antimicrobial resistance (AMR). Numerous published methods and protocols exist, but a lack of harmonisation has hampered meaningful comparisons between results produced by different methods/protocols vital for global genomic diagnostics and surveillance. As an exemplar, this study evaluated the sensitivity and specificity of five well-established in-silico AMR detection software where the genotype results produced from running a panel of 436 Escherichia coli were compared to their AMR phenotypes, with the latter used as gold-standard. The pipelines exploited previously known genotype–phenotype associations. No significant differences in software performance were observed. As a consequence, efforts to harmonise AMR predictions from sequence data should focus on: (1) establishing universal minimum to assess performance thresholds (e.g. a control isolate panel, minimum sensitivity/specificity thresholds); (2) standardising AMR gene identifiers in reference databases and gene nomenclature; (3) producing consistent genotype/phenotype correlations. The study also revealed limitations of in-silico technology on detecting resistance to certain antimicrobials due to lack of specific fine-tuning options in bioinformatics tool or a lack of representation of resistance mechanisms in reference databases. Lastly, we noted user friendliness of tools was also an important consideration. Therefore, our recommendations are timely for widespread standardisation of bioinformatics for genomic diagnostics and surveillance globally.

An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance

Article Open access 04 January 2023

Integrating whole-genome sequencing within the National Antimicrobial Resistance Surveillance Program in the Philippines

Article Open access 01 June 2020

Datasets for benchmarking antimicrobial resistance genes in bacterial metagenomic and whole genome sequencing

Article Open access 15 June 2022

Introduction

Next Generation Sequencing (NGS), a DNA sequencing technology, has become an established technique with hundreds of publications each year detailing the use and advancement of this technology, often replacing other gene-based typing tools such as PCRs, and microarrays^1,2,3. Furthermore, NGS high-throughput platforms, which in recent years have seen radical improvements in quality, running times and cost, have revolutionised the diagnosis of health-related issues in animals and humans. This includes infectious disease diagnosis, where in-silico (or computer based) genetic data analysis is aiding and, in some cases, substituting more complex and costly laboratory techniques⁴. The COVID-19 pandemic is a testimony of the usefulness of this technology for both research and surveillance^5,6. Similarly, by using NGS technologies such as sequencing the whole genome of bacterial isolates, transmission chains of pathogens and a global overview of their population structure is being identified, helping inform surveillance and to trace outbreaks⁷. For antimicrobial resistance (AMR) detection in bacteria, which is another global threat leading to decreasing therapeutic options and increasing treatment failures, antimicrobial susceptibility testing which produces a phenotype, is progressively being substituted by detection of the underlying genetic mechanisms using whole genome sequencing (WGS) of bacterial isolates. Nevertheless, performing correlations between pheno- and geno-types remains essential, as phenotypes are still accepted as the gold standard due to genotypes being based only on already known AMR genes so new variants may be missed. WGS analysis also facilitates the identification of bacteria such as Escherichia coli, their lineage, and plasmids, in addition to genetic features such as resistance to critically important antimicrobials for therapeutics, and chromosomal mutations, deletions and insertions that may be associated with AMR phenotypes^{8,9,10,11,12,13}. Characterisation of bacterial plasmids, which often transfer AMR genes due to their mobility, is particularly important and has become more achievable by combining short and long read WGS so complete AMR plasmid genomes can be determined^14,15,16.

However, a major barrier to the application of bioinformatics software for AMR detection beyond individual research applications, to diagnosis and national and/or international surveillance, is the standardisation of both DNA-based laboratory techniques and in-silico analysis. The availability of a plethora of bioinformatics tools and pipelines, with continual rapid advancement in this area, has resulted in no standardised methodology or nomenclature making comparisons across compartments (e.g. humans, animals, and environment) or institutes difficult. In 2019, Hendriksen et al.¹⁷ reported at least 47 bioinformatics tools were freely available, and no doubt this has increased further. However, to understand the epidemiology of AMR in an One-Health context, it is vital to harmonise in-silico AMR detection methods, as has been established for bacteria such as Methicillin-resistant Staphylococcus aureus (MRSA), where ideal requirements for molecular typing techniques have been clearly defined^18,19. There have been similar discussions for in-silico AMR detection²⁰, and although the recommendations have not been properly evaluated, a small multi-centre study with nine institutes that performed predictions of AMR genotypes from 10 samples harbouring carbapenem-resistant organisms, showed that differences in the database selected and gene coverage thresholds were some of the factors contributing to variation in AMR results²¹. Such evaluations are required at a much larger scale because supranational organisations such as the World Health Organisation (WHO), European Centre for Disease Control (ECDC) and the European Food Safety Authority (EFSA) have recommended the use of genomics within international surveillance programmes that compare AMR trends across countries in Europe and worldwide, to help tackle the spread of multi- and extensive drug resistant bacteria which are the cause of great concern^22,23,24.

In-silico AMR detection methods are based on a three-step process: i) sample preparation such as bacterial culture and DNA extraction; ii) whole genome sequencing; and iii) in-silico analysis of data produced from isolate WGS. This process offers attractive possibilities for diagnostic test automation, including parallelising tests for multiple characteristics and even retrospective exploration for novel AMR genotypes without having to repeat steps i) or ii). AMR detection pipelines are based on existing knowledge of AMR genotype–phenotype associations²⁵.

To detect the genotype or underlying genetic mechanism for resistance by screening the WGS data obtained from isolated bacteria, the bacterial DNA is compared against a reference set of DNA sequences, also known as a database (normally in FASTA or text-based format) containing the genotypes (i.e. AMR genes or point mutations) responsible for known AMR phenotypes.

Bacterial DNA can be compared against the database using two different techniques: either by mapping the WGS short-reads onto the reference DNA sequences in the database, or by basic local alignment search²⁶ using the assembled genome contigs as a query against the database. While the first approach may be faster (< 10 min for a single core computer) and straight forward, it involves dealing with large raw data files (e.g., paired short-reads raw data files of up to 300 Mbytes for a 5 Mbase genome such as Escherichia coli). The second method may require a longer running time (~ 25 min) as the raw data files must be de novo assembled prior to comparison against the database. Many laboratories perform the assembly step as a routine for other purposes, so the extra running time might not be a burden. Once assembly for the bacterial genome has been stored, re-running the pipeline to screen for a novel AMR gene should be a matter of a few seconds.

As part of the One-Health European Joint Programme Project ARDIG (Antibiotic Resistance Dynamics: the influence of geographic origin and management systems on resistance gene flows within humans, animals and the environment)²⁷, nine partners compared the software performance of five pipelines for AMR detection based on WGS of E. coli isolates, and this paper describes that work. Each software detected the presence or absence of genes and point mutations associated with the sensitivity phenotype to 14 antimicrobials established for all isolates, which were taken as the “gold-standard”. This study restricted its scope to comparing the performance of pipelines in terms of their sensitivity and specificity to detect AMR, under the default settings defined by the pipelines’ authors. No attempt to evaluate the software installation process was considered in this study since most of the users agreed that IT support is provided institutionally. Isolate DNA extraction and sequencing protocols used by participating institutes were also not evaluated in this study.

Results

Antimicrobial sensitivity of E. coli

A total of 436 E. coli collected by nine different collaborating institutes working in the veterinary and human health sectors in Europe, were included in the study. Table 1 provides overview of the isolates, including year of isolation and percentage from each reservoir and country, with full details provided in Methods and Supplementary Table S1. The antimicrobial susceptibilities of all isolates were established to a panel of 14 antimicrobials; these were used as the gold standard for the study and are given for each isolate in Supplementary Table S1, with the number of resistant, in comparison to sensitive isolates, provided in Table 2.

Table 1 Distribution of the 436 E. coli isolates by year, source, and country.

Full size table

Table 2 Number of resistant and susceptible isolates per antimicrobial used in this study.

Full size table

Our panel of E. coli did not show an even distribution of resistance for all 14 antimicrobials, although for most antimicrobials the numbers of resistant isolates in comparison to sensitive ones was more than 10% of the total, so enabled accurate sensitivity estimates to be made (Table 2). However, smaller numbers of isolates were determined as resistant to colistin (n = 27 or 6.2%) and tigecycline (n = 1 or 0.23%), consistent with the scarcity of E. coli resistant to these high priority critically important antimicrobials (HP-CIAs) in Europe. The unbalanced ratio between resistant and susceptible isolates for these antimicrobials will result in loss of precision for the sensitivity estimates. Nevertheless, sensitivity estimates for colistin was still calculated but for tigecycline, the sensitivity was not calculated due to the very low numbers of resistant isolates.

Pipeline comparisons

Five pipelines were selected for testing: GeneFinder²⁸; APHA SeqFinder/Abricate¹⁴; WBVR BLAST (in-house pipeline); ResFinder/PointFinder^29,30; ARIBA³¹. Some differences existed between the pipelines in their data input method or the algorithm used for detecting gene presence, which were part of the criteria for selecting these pipelines (see Methods and Supplementary Table S2).

Table 3 and Fig. 1 shows estimates of sensitivity and specificity for each antimicrobial compared to the phenotypic data per isolate for each pipeline, which was calculated based on the phenotype and AMR gene output from each pipeline (Supplementary Table S3). The cells in Table 3 have been coloured depending on the different levels of performance by each pipeline. This should be interpreted with care as there is significant overlap between the confidence intervals from different pipelines, as demonstrated by Fig. 1, with plots showing the sensitivity, specificity and the 95% confidence intervals for different antimicrobials on the receiver operating characteristic (ROC) coordinate system. All pipelines had an overall sensitivity value between 0.9 and 0.95 when comparing the genotype with phenotype for each antimicrobial class (Table 3A) except for ARIBA, due to this pipeline in the default setting only reporting the presence or absence of acquired resistance genes and not including resistance associated with chromosomal point mutations that reduces susceptibility to antimicrobials such as fluroquinolones²⁵.The average specificity value, when comparing the genotype with phenotype for each antimicrobial class for all pipelines, (Table 3B) was around 0.89, except for the APHA Seqfinder/ABRicate pipeline that showed a slightly higher value at 0.93 due to the combination of a discovery stage (APHA SeqFinder) and a validation stage (ABRicate). From the sensitivity and specificity estimates, all the pipelines, in general terms, showed similar levels of performance.

Table 3 Estimated sensitivity and specificity for each pipeline for each antimicrobial.

Full size table

Resistance to several antimicrobials were easier to detect, such as to ampicillin and tetracycline, with average sensitivity/specificity values equal to 0.95/0.96 and 0.97/0.93, respectively (Table 3 and Fig. 1). Colistin resistance was the most difficult to detect with average sensitivity/specificity values of 0.70/0.99 (Table 3 and Fig. 1). This difficulty can be attributed to two possible factors: the genotype associated to the colistin AMR phenotype might not be fully represented in the database, i.e., there may be an unknown gene or chromosomal mutation that has not yet been associated to this phenotype; and/or the thresholds used in the pipelines for AMR gene detection might be suboptimal for this antimicrobial. Commonly used thresholds for AMR detection in some pipelines include the percentage of an AMR gene present within the isolate and the percentage of similarity between the reference AMR gene and the test isolate. By decreasing or increasing the thresholds, it will be possible to trade off the sensitivity and specificity values, and hence to adjust the detection capabilities. Relaxing the pipeline thresholds will increase the sensitivity and decrease the false negative rate (type II error) causing a decrease of the specificity and the consequent increase of the false positive rate (type I error). Thus, being able to input the pipeline thresholds may be an important feature to adjust the balance between sensitivity and specificity values. Further, this adjustment should be individually defined for each of the antimicrobial classes represented in the database, since a set of thresholds that may be optimal for the detection of one antimicrobial resistance might produce poor results for another. The GeneFinder pipeline was the only pipeline in this study that allowed the user to set individual similarity threshold for each database entry.

Optimum threshold values might also depend on the specific purpose for detecting resistance. In some cases, a very high sensitivity may be preferred as a trade-off to lowering the specificity. For example, more relaxed thresholds for colistin, which belongs to the HP-CIA list³², may be used to minimise the occurrence of undetected resistance.

Interpretation of pipeline results

While performance of the pipeline is the major factor when deciding which one is most suitable to user needs, the ease of interpreting the results files is also an important point to consider. To this end, a questionnaire was completed by the person from each institute responsible for extracting the pipeline output information for their corresponding isolates. The questionnaire contained seven subjective questions to measure the degree of user friendliness related to the interpretation of the pipeline output, and the score for each question and pipeline, given in response by the operator in each institute, are provided in Supplementary Table S4. The average and standard deviation values for responses from all nine institutes, as well as those from the six institutes with no link to any of the software are shown in Table 4. There were no major differences found between the pipelines, with the APHA SeqFinder/ABRicate having the highest average mark and ARIBA the lowest. The preferred pipelines (as per question 7) was GeneFinder by a small margin over APHA SeqFinder/ABRicate when all responses were considered, although the differential was greater when response from only the six “independent” institutes were considered, with ResFinder performing equally well in the latter group with APHA SeqFinder. Therefore, differences in levels of user-friendliness i.e. the ease of finding and linking geno- and pheno-types to understand results, or availability of QC metrics, should also be considered as part of any pipeline harmonisation process, as it may influence who and how often tools are used.

Table 4 Questionnaire, average scores and standard deviation (between brackets) for responses from the 9 collaborators for each of the pipelines evaluated (top line), and responses from the six institutes independent from any software used in this study (bottom line).

Full size table

Discussion

Standardisation of any methodology is essential to enable comparison, as well as reproducibility across different sectors and countries, but can often be undervalued or overlooked. Monitoring systems that are harmonised already exists in areas such as AMR, which has been invaluable to determine AMR trends overtime across Europe although it only uses phenotypic testing results³³. The wealth of data WGS provides and the increased cost effectiveness of NGS technology has presented genomic epidemiology as a feasible alternative. However, the availability and continual development of new bioinformatics tools has resulted in a call for harmonisation of in-silico genomic methods to track AMR globally¹⁷. Recommendations made in a workshop to implement WGS for surveillance recognised the challenges facing its implementation, including some of the bioinformatics processes²⁰, which was the focus of this study. Although some of the participating institutes for this study extensively use their preferred in-silico methodologies for AMR detection, in addition to classical wet-lab techniques, the global harmonisation of phenotype-genotype AMR susceptibilities is still in its early stages, due to a lack of pre-set standards. Here we compared several AMR bioinformatics pipelines using the same isolate data set, with default (generalised) pipeline settings and under the same interpretation conditions. Using these conditions, we concluded that no pipeline clearly stands out from the rest, in terms of performance and ease of output interpretation although some user preferences were noted from our questionnaire. Further, we observed that the performance of the pipelines depended in some instances on the antimicrobial for which the resistance determinant was being detected and therefore the ability to set individual thresholds for each database entry is an important feature but not widely available.

We believe that the results of our study can be applied to inform future initiatives for harmonisation of results from WGS pipelines, whether for AMR or any other area of diagnostics and surveillance. Just as multi-locus sequence typing of E. coli using underlying genetics⁷ is increasingly being used in place of serotyping³⁴ to identify pathogens due to its ability to provide more detailed/accurate subtyping of populations, we believe a harmonised WGS method will do the same for bacterial characterisation, including for AMR. Therefore, our recommendations for harmonisation are as follows. Firstly, it may not be relevant which pipeline is used as long as it verifies a certain level of performance that can be agreed by the relevant scientific experts, depending on the application and the establishment of common inclusion/exclusion criteria of targeted matches. We propose that a control set of isolates are used to test and evaluate any pipeline with an appropriate representative sample and pre-set validation thresholds. For example, the collection of isolates used in this study may be appropriate for testing in-silico AMR pipelines, although any well validated set may be included, provided there is a balanced ratio between resistance and susceptibility of isolates to antimicrobials included in the test panel, e.g., to the EFSA panel of antimicrobials. However, the isolate panel will need to be regularly updated to incorporate isolates with new/novel AMR genes and the pipelines re-evaluated. However, for AMR pipelines, different bacterial species such as MRSA or Brachyspira, which may have different AMR mechanisms, a control set of isolates representative of AMR in those species will need to be included. The pipeline database will require to be updated to include species specific AMR genes/mutations and thresholds for these resistance determinants evaluated. For ease of evaluation and interpretation of pipeline results we recommend different species be tested separately using the same principles as performed in this study.

Also, a certain level of pre-agreed performance in terms of minimum sensitivity and specificity thresholds, when comparisons are made between phenotypes and genotypes, should be used as a validation test for any AMR detection software. From the results of this study a sensitivity and specificity value of ~ 0.9, would be reasonable to use, although, for detecting resistance to HP-CIAs e.g. colistin or carbapenem, a more relaxed threshold may be used to maximise resistance detection, including of new gene variants. Secondly, we recommend unifying the databases used by different pipeline software for positive identification; unless genes present in databases, including their nomenclature, are harmonised, there will be differences in the output even from the same isolate test set. In the AMR context, AMR gene identifiers or sequences, including any chromosomal point mutations leading to reduced susceptibility, and the translation rules from genotype to phenotype, should be consistent and transparent. This will also help the naïve user in interpretation of genotypic data, in addition to promoting harmonisation. And thirdly, to allow greater access and usability of this technology for routine surveillance, the final output information should be standardised into user-friendly documents. This will enable individuals with minimal background in genetics to benefit from these softwares.

As there are countless bioinformatics tools available, and many of them pursue similar aims but use different approaches with numerous fine tune adjustments, continual comparison of their performance is a difficult task. Our recommendation to achieve harmonisation does not require focusing on the best performing software, but on setting a common evaluation process based on universal minimal performance thresholds e.g. sensitivity and specificity measures applied to a representative testing sample set. In other words, we have made recommendations which will help towards creation of an appropriate structure for global standardisation of the bioinformatics component to enable genomic surveillance and diagnostics to become routine and standardised worldwide.

Methods

The isolates

A total of 436 E. coli isolates were provided by nine European institutes: the French Agency for Food, Environmental and Occupational Health and Safety, Lyon France (49 isolates), the Universidad Complutense de Madrid, Spain (50 isolates), the Institute Pasteur, France (50 isolates), the German Federal Institute for Risk Assessment, Germany (50 isolates), the Norwegian Veterinary Institute, Norway (50 isolates), the Wageningen Bioveterinary Research, The Netherlands (50 isolates), the University of Surrey, United Kingdom (50 isolates), the Animal and Plant Health Agency, United Kingdom (37 isolates) and Public Health England, United Kingdom (50 isolates).

The raw WGS reads of isolates, generated from Illumina sequencing described elsewhere¹⁶, are available in the NCBI nucleotide archive under project number PRJNA805266.

The antimicrobials

The sensitivity of isolates to the 14 antimicrobials used for AMR monitoring by the European Food and Safety Authority, was assessed using a standard MIC protocol³⁵. The antimicrobials were: Ampicillin, Azithromycin, Cefotaxime, Ceftazidime, Chloramphenicol, Ciprofloxacin, Colistin, Gentamicin, Meropenem, Nalidixic Acid, Sulfamethoxazole, Tetracycline, Tigecycline and Trimethoprim. The susceptibility of wild type E. coli to the panel were categorised as sensitive (S) or resistant ( R) by: Sensitive, when the isolate was inhibited at an antimicrobial concentration equal or lower than the established ECOFF value for the MIC, as described by the European Committee on Antimicrobial Susceptibility Testing (EUCAST)³⁶ ; and Resistant, when the isolate was not inhibited at a specific antimicrobial concentration higher than the established ECOFF values³⁶. The full S and R profiles for each isolate to the panel of antimicrobials, interpreted using ECOFFs, are provided in Table S1 and the total values in Table 2.

For some institutions MIC values were not available for part or all isolates for an antimicrobial, this has been marked with an asterisk in Table 2. In most cases these were for human samples for antimicrobials which are not routinely screened by PHE (e.g. azithromycin, chloramphenicol, sulfamethoxazole, tetracycline and trimethoprim).

Detection software

Description of the five AMR detection software used in this study are provided below:

GeneFinder. Public Health England (PHE), UK²⁸. URL: https://github.com/phe-bioinformatics/gene_finder. Version: 2.7. Operator: PHE. Language: python 2.7.5. Input format: FASTQ. Algorithm: mapping (bowtie 2.1.0). Reference database: provides three in house references sets in FASTA format for E. coli, Salmonella and Campylobacter. Users can incorporate their own reference set. Reference database used in this study: in house (based on institute knowledge, ResFinder database (updated 10.02.2020) and CARD (The Comprehensive Antibiotic Resistance Database, https://card.mcmaster.ca). The database is provided with the tool. Detection: presence or absence of sequences and mutations. It also reports insertions, deletions, mixed positions and large indels. Possibility to set the similarity thresholds (between sample DNA and a reference DNA) individually for each gene. Quality metrics: coverage, similarity, depth and coverage distribution.

APHA SeqFinder/ABRicate. Animal and Plant Health Agency (APHA), UK¹⁴. URL: https://github.com/APHA-AMR-VIR/APHASeqFinderVersion: 3.0. Operator: APHA. Language: python 3. Input format: FASTQ. Algorithm: mapping (smalt 0.7.6). APHA SeqFinder Reference database: provides three in house reference sets in FASTA format for AMR genes, mutations, plasmids, virulence factors and heavy metal resistances. APHA SeqFinder Reference database used in this study: in house (based on institute knowledge, ResFinder database [updated 10.02.2020] and CARD). Detection: presence or absence of sequences and mutations. Quality metrics: coverage, similarity, depth and normalised depth by MLST genes. ABRicate¹⁹ is used in conjunction with SeqFinder as an additional filter. URL: https://github.com/tseemann/abricateVersion: 0.7. Language: perl. Input format: FASTA assembled contigs. (SPAdes 3.13.1). Algorithm: BLAST 2.7.0 or higher. ABRicate Reference database: same reference database as used for APHA SeqFinder (see above); it also provides additional databases which were not used in this study. Detection: presence or absence of genes. Quality metrics: coverage and similarity.

BLAST, Wageningen Bioveterinary Research (WBVR), The Netherlands. Pipeline not published at the time of this study. Operator: WBVR. Input format: FASTA assembled contigs. Algorithm: raw reads are error corrected with Tadpole from the BBduk suite v38.71. Quality trimming to Q20 with BBduk. Genomes are assembled using SPAdes 3.13.1. Assemblies are compared to the reference database using BLAST version 2.9.0. (with filters: 98% sequence identity and 97% gene coverage). Reference database: ResFinder database (updated 10.02.2020). Reference database used in this study: ResFinder database (updated 10.02.2020). Detection: presence or absence of sequences and mutations. Quality metrics: sequence identity and gene coverage provided by BLAST.

ResFinder v.3.2 + PointFinder v.3.1.0^29,30 Technical University of Denmark.

URL: https://bitbucket.org/genomicepidemiology/resfinder/src/master/. Operator: The Norwegian Veterinary Institute (NVI). Language: python 3. Input format: FASTQ or FASTA assembled contigs. Algorithm: BLAST is used to analyse assemblies (FASTA files). Mapper KMA is used to analyse read data (FASTQ files). Reference database: ResFinder database (updated 10.02.2020) and PointFinder_database. Reference database used in this study: ResFinder database (updated 10.02.2020) . Detection: presence or absence of sequences and mutations. Quality metrics:

ARIBA v2.12³¹, Sanger Institute, UK. URL: https: //github.com/sanger-pathogens/ariba. Operator: Universidad Complutense de Madrid (UCM). Language: python 3. Input format: FASTQ. Algorithm: mapping (Bowtie 2.1.0). Reference database: does not provide its own reference database but has an integrated method to download and standardise one from different sources such as CARD, ResFinder, ARG-ANNOT, MEGARes, NCBI, PlasmidFinder, VFDB, SRST2 and VirulenceFinder. Users can incorporate their own reference set. Reference database used in this study: ResFinder database (updated 10.02.2020)). Detection: presence or absence of AMR sequences only (This is the default setting and was used in this study. However it is possible to incorporate an external reference database for detecting mutations, but currently there is not an integrated and standardised database for mutations conferring AMR).It also reports genetic fragmentations, interruptions, and duplications. Quality metrics: gene coverage, sequence identity.

Data analysis

All 436 isolates were run through each pipeline by five independent operators (one per pipeline). Result tables from the five pipeline runs were sent to each of the nine institutes who extracted the results corresponding to their isolates. For the following pipelines the antimicrobial class associated with each resistance gene was provided in the output to enable matching with the phenotype by operators: ResFinder; APHA SeqFinder; GeneFinder; WBVR Blast; for ARIBA prior knowledge from operators was required. The AMR genotype information (genes or chromosomal mutations) for each isolate was collated with the gold standard phenotype on a standardised form for each antimicrobial (Supplementary Table S3).

A bespoke R script was used to calculate the sensitivity and specificity and their 95% confidence intervals for each pipeline, for each antimicrobial, by using the information provided in Supplementary Table S3. For a specific pipeline-antimicrobial-isolate combination, if an AMR element was detected, the isolate was considered resistant to that antimicrobial from that pipeline (test positive). If no AMR element was detected the isolate was considered sensitive (test negative). Test results were then compared to the gold standard resistant/sensitive phenotypic profiles.

Pipelines output evaluation questionnaire

A questionnaire to evaluate user friendliness and quality control metrics of the pipelines output documents was sent to the 9 people, one at each institute, that were responsible for extracting the information for each of the 5 pipelines for their corresponding isolates. Three of the responses were from APHA, PHE and WBVR, who were also running their own pipelines, APHA Seqfinder, GeneFinder and BLAST; but both tasks were not carried out by the same person. The other six responses were from institutes with no link to any of the software used in the study, and an additional evaluation was performed on this subset.

Data availability

All WGS data is available through NCBI BioProject ID: PRJNA805266.

References

Wragg, P. et al. Characterisation of Escherichia fergusonii isolates from farm animals using an Escherichia coli virulence gene array and tissue culture adherence assays. Res. Vet. Sci. 86, 27–35. https://doi.org/10.1016/j.rvsc.2008.05.014 (2009).
Article CAS PubMed Google Scholar
Figueiredo, R. et al. Virulence characterization of Salmonella enterica by a new microarray: Detection and evaluation of the cytolethal distending toxin gene activity in the unusual host S. typhimurium. PLoS ONE 10, e0135010. https://doi.org/10.1371/journal.pone.0135010 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pan, Z. et al. Identification of genetic and phenotypic differences associated with prevalent and non-prevalent Salmonella Enteritidis phage types: Analysis of variation in amino acid transport. Microbiology (Reading, England) 155, 3200–3213. https://doi.org/10.1099/mic.0.029405-0 (2009).
Article CAS Google Scholar
Lefterova, M. I., Suarez, C. J., Banaei, N. & Pinsky, B. A. Next-generation sequencing for infectious disease diagnosis and management: A report of the association for molecular pathology. J. Mol. Diagn. 17, 623–634. https://doi.org/10.1016/j.jmoldx.2015.07.004 (2015).
Article CAS PubMed Google Scholar
Meredith, L. W. et al. Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: A prospective genomic surveillance study. Lancet Infect. Dis. 20, 1263–1271. https://doi.org/10.1016/S1473-3099(20)30562-4 (2020).
Article CAS PubMed PubMed Central Google Scholar
Furuse, Y. Genomic sequencing effort for SARS-CoV-2 by country during the pandemic. Int. J. Infect. Dis. 103, 305–307. https://doi.org/10.1016/j.ijid.2020.12.034 (2021).
Article CAS PubMed Google Scholar
Zhou, Z. et al. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Res. 30, 138–152. https://doi.org/10.1101/gr.251678.119 (2020).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Thungrat, K. & Boothe, D. M. Occurrence of OXA-48 carbapenemase and other β-lactamase genes in ESBL-producing multidrug resistant Escherichia coli from dogs and cats in the United States, 2009–2013. Front. Microbiol. 7, 1057. https://doi.org/10.3389/fmicb.2016.01057 (2016).
Article PubMed PubMed Central Google Scholar
Ewers, C. et al. Genomic diversity and virulence potential of ESBL- and AmpC-β-lactamase-producing Escherichia coli strains from healthy food animals across Europe. Front. Microbiol. 12, 626774. https://doi.org/10.3389/fmicb.2021.626774 (2021).
Article PubMed PubMed Central Google Scholar
Schaufler, K. et al. Genomic and functional analysis of emerging virulent and multidrug-resistant escherichia coli lineage sequence type 648. Antimicrob. Agents Chemother. 63, e00243-19. https://doi.org/10.1128/aac.00243-19 (2019).
Article CAS PubMed PubMed Central Google Scholar
AbuOun, M. et al. Characterizing antimicrobial resistant Escherichia coli and associated risk factors in a cross-sectional study of pig farms in Great Britain. Front. Microbiol. 11, 861. https://doi.org/10.3389/fmicb.2020.00861 (2020).
Article PubMed PubMed Central Google Scholar
Duggett, N. A. et al. Occurrence and characterization of mcr-1-harbouring Escherichia coli isolated from pigs in Great Britain from 2013 to 2015. J. Antimicrob. Chemother. 72, 691–695. https://doi.org/10.1093/jac/dkw477 (2017).
Article CAS PubMed Google Scholar
Duggett, N. et al. The importance of using whole genome sequencing and extended spectrum beta-lactamase selective media when monitoring antimicrobial resistance. Sci. Rep. 10, 19880. https://doi.org/10.1038/s41598-020-76877-7 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
AbuOun, M. et al. A genomic epidemiological study shows that prevalence of antimicrobial resistance in Enterobacterales is associated with the livestock host, as well as antimicrobial usage. Microb. Genom. 7, 000630. https://doi.org/10.1099/mgen.0.000630 (2021).
Article CAS PubMed Central Google Scholar
Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239. https://doi.org/10.1186/s13059-016-1103-0 (2016).
Article CAS PubMed PubMed Central Google Scholar
Storey, N. et al. Use of genomics to explore AMR persistence in an outdoor pig farm with low antimicrobial usage. Microb. Genom. 8, 000782. https://doi.org/10.1099/mgen.0.000782 (2022).
Article CAS PubMed Central Google Scholar
Hendriksen, R. S. et al. Using genomics to track global antimicrobial resistance. Front. Public Health 7, 242. https://doi.org/10.3389/fpubh.2019.00242 (2019).
Article PubMed PubMed Central Google Scholar
Stefani, S. et al. Meticillin-resistant Staphylococcus aureus (MRSA): Global epidemiology and harmonisation of typing methods. Int. J. Antimicrob. Agents 39, 273–282. https://doi.org/10.1016/j.ijantimicag.2011.09.030 (2012).
Article CAS PubMed Google Scholar
Friedrich, A. W. et al. A European laboratory network for sequence-based typing of methicillin-resistant Staphylococcus aureus (MRSA) as a communication platform between human and veterinary medicine—An update on SeqNet.org. Euro Surveill. 13, 18862 (2008).
PubMed Google Scholar
Angers-Loustau, A. et al. The challenges of designing a benchmark strategy for bioinformatics pipelines in the identification of antimicrobial resistance determinants using next generation sequencing technologies. F1000Res https://doi.org/10.12688/f1000research.14509.2 (2018).
Article PubMed PubMed Central Google Scholar
Doyle, R. M. et al. Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: An inter-laboratory study. Microb. Genom. 6, e000335. https://doi.org/10.1099/mgen.0.000335 (2020).
Article CAS PubMed Central Google Scholar
European Food Safety Authority, EFSA. Antimicrobial Resistance Shows No Signs of Slowing Down (2019).
European Food Safety Authority, EFSA. European Food Safety Authority. Antimicrobial Resistance in the EU: Infections with foodborne bacteria becoming harder to Treat. https://www.efsa.europa.eu/en/news/antimicrobial-resistance-eu-infections-foodborne-bacteria-becoming-harder-treat. https://www.efsa.europa.eu/en/news/antimicrobial-resistance-eu-infections-foodborne-bacteria-becoming-harder-treat (2020).
World Health Organization. Global Antimicrobial Resistance and Use Surveillance System (GLASS) Whole-Genome Sequencing for Surveillance of Antimicrobial Resistance (2020).
Stubberfield, E. et al. Use of whole genome sequencing of commensal Escherichia coli in pigs for antimicrobial resistance surveillance United Kingdom, 2018. Euro Surveill 24, 1900136. https://doi.org/10.2807/1560-7917.ES.2019.24.50.1900136 (2019).
Article PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2 (1990).
Article CAS PubMed Google Scholar
Anonymous. One Health European Joint Programme. Antibiotic Resistance Dynamics: The Influence of Geographic Origin and Management Systems or Resistance Gene Flows Within Humans, Animals and the Environment (ARDIG). https://onehealthejp.eu/jrp-ardig/ (2020).
Neuert, S. et al. Prediction of phenotypic antimicrobial resistance profiles from whole genome sequences of non-typhoidal Salmonella enterica. Front. Microbiol. 9, 592. https://doi.org/10.3389/fmicb.2018.00592 (2018).
Article PubMed PubMed Central Google Scholar
Bortolaia, V. et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 75, 3491–3500. https://doi.org/10.1093/jac/dkaa345 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zankari, E. et al. PointFinder: A novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens. J. Antimicrob. Chemother. 72, 2764–2768. https://doi.org/10.1093/jac/dkx217 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hunt, M. et al. ARIBA: Rapid antimicrobial resistance genotyping directly from sequencing reads. Microb. Genom. 3, e000131. https://doi.org/10.1099/mgen.0.000131 (2017).
Article PubMed PubMed Central Google Scholar
World Health Organization. Critically Important Antimicrobials for Human Medicine: 6th Revision (2019).
European Food Safety Authority (EFSA) et al. Technical specifications on harmonised monitoring of antimicrobial resistance in zoonotic and indicator bacteria from food-producing animals and food. EFSA J. 17, e05709. https://doi.org/10.2903/j.efsa.2019.5709 (2019).
Article Google Scholar
Anjum, M. F., Tucker, J. D., Sprigings, K. A., Woodward, M. J. & Ehricht, R. Use of miniaturized protein arrays for Escherichia coli O serotyping. Clin. Vaccine Immunol. 13, 561–567. https://doi.org/10.1128/CVI.13.5.561-567.2006 (2006).
Article CAS PubMed PubMed Central Google Scholar
European Food Safety Authority. Guidance on the assessment of bacterial susceptibility to antimicrobials of human and veterinary importance. EFSA Panel on Additives and Products or Substances used in Animal Feed (FEEDAP). EFSA J. 10, 2740 (2012).
Google Scholar
EUCAST, TECoAST. Antimicrobial wild type distributions of microorganisms (2021).

Download references

Funding

Funding was from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 773830, in the ARDIG project within the One Health European Joint Programme; MFA also received funding from the Veterinary Medicines Directorate, UK, project VM0533.

Author information

Authors and Affiliations

Animal and Plant Health Agency (APHA), Weybridge, UK
J. Nunez-Garcia, M. AbuOun, N. Storey & M. F. Anjum
Wageningen Bioveterinary Research (WBVR), Lelystad, The Netherlands
M. S. Brouwer & K. T. Veldman
Universidad Complutense de Madrid (UCM), Madrid, Spain
J. F. Delgado-Blas, C. Serna & B. Gonzalez-Zorn
Norwegian Veterinary Institute (NVI), Oslo, Norway
S. S. Mo, A. A. Telke & M. Sunde
Public Health England (PHE), London, UK
N. Ellaby & M. J. Ellington
Agence nationale de sécurité sanitaire de l’alimentation, de l’environnement et du travail (ANSES), Unité Antibiorésistance et Virulence Bactériennes, Maisons-Alfort, France
M. Haenni, P. Châtre & J. Y. Madec
German Federal Institute for Risk Assessment (BfR), Berlin, Germany
J. A. Hammerl
University of Surrey (UoS), Guildford, UK
M. Getino, R. La Ragione & M. F. Anjum
Institute Pasteur, EERA Unit, Paris, France
P. Glaser
Assistance Publique Hopitaux de Paris, Paris, France
T. Naas

Authors

J. Nunez-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
M. AbuOun
View author publications
You can also search for this author in PubMed Google Scholar
N. Storey
View author publications
You can also search for this author in PubMed Google Scholar
M. S. Brouwer
View author publications
You can also search for this author in PubMed Google Scholar
J. F. Delgado-Blas
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Mo
View author publications
You can also search for this author in PubMed Google Scholar
N. Ellaby
View author publications
You can also search for this author in PubMed Google Scholar
K. T. Veldman
View author publications
You can also search for this author in PubMed Google Scholar
M. Haenni
View author publications
You can also search for this author in PubMed Google Scholar
P. Châtre
View author publications
You can also search for this author in PubMed Google Scholar
J. Y. Madec
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Hammerl
View author publications
You can also search for this author in PubMed Google Scholar
C. Serna
View author publications
You can also search for this author in PubMed Google Scholar
M. Getino
View author publications
You can also search for this author in PubMed Google Scholar
R. La Ragione
View author publications
You can also search for this author in PubMed Google Scholar
T. Naas
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Telke
View author publications
You can also search for this author in PubMed Google Scholar
P. Glaser
View author publications
You can also search for this author in PubMed Google Scholar
M. Sunde
View author publications
You can also search for this author in PubMed Google Scholar
B. Gonzalez-Zorn
View author publications
You can also search for this author in PubMed Google Scholar
M. J. Ellington
View author publications
You can also search for this author in PubMed Google Scholar
M. F. Anjum
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualisation: M.F.A., J. N-G., M.A., P.G, M.J.E, B.G-Z, M.S, R.L.R, M.S.B. Methodology/Initial Analysis: M.A., N.S., M.S.B, J.F. D-B, M.S.S., N.E, M.H, C.S., A.A.T., J.A.H., P.C., M.G. Statistical Analysis of WGS data: J. N-G. Writing—Original Draft Preparation: J. N-G., M.A. and M.F.A. Writing—Review and Editing: M.F.A., J. N-G., M.A., P.G, M.J.E, B.G-Z, M.S, R.L.R, M.S.B., N.S., J.F. D-B, M.S.S., N.E, M.H, C.S., A.A.T., J.A.H., P.C., K.T.V., J.Y.M., M.G., T.N. Funding: M.F.A., P.G., M.J.E, B.G-Z, M.S, R.L.R, M.S.B and J.A.H.

Corresponding author

Correspondence to M. F. Anjum.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nunez-Garcia, J., AbuOun, M., Storey, N. et al. Harmonisation of in-silico next-generation sequencing based methods for diagnostics and surveillance. Sci Rep 12, 14372 (2022). https://doi.org/10.1038/s41598-022-16760-9

Download citation

Received: 11 February 2022
Accepted: 14 July 2022
Published: 23 August 2022
DOI: https://doi.org/10.1038/s41598-022-16760-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance

Integrating whole-genome sequencing within the National Antimicrobial Resistance Surveillance Program in the Philippines

Datasets for benchmarking antimicrobial resistance genes in bacterial metagenomic and whole genome sequencing

Introduction

Results

Antimicrobial sensitivity of E. coli

Pipeline comparisons

Interpretation of pipeline results

Discussion

Methods

The isolates

The antimicrobials

Detection software

Data analysis

Pipelines output evaluation questionnaire

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links