Realising the promise of genomics to revolutionise identification and surveillance of antimicrobial resistance (AMR) has been a long-standing challenge in clinical and public health microbiology. Here, we report the creation and validation of abritAMR, an ISO-certified bioinformatics platform for genomics-based bacterial AMR gene detection. The abritAMR platform utilises NCBI’s AMRFinderPlus, as well as additional features that classify AMR determinants into antibiotic classes and provide customised reports. We validate abritAMR by comparing with PCR or reference genomes, representing 1500 different bacteria and 415 resistance alleles. In these analyses, abritAMR displays 99.9% accuracy, 97.9% sensitivity and 100% specificity. We also compared genomic predictions of phenotype for 864 Salmonella spp. against agar dilution results, showing 98.9% accuracy. The implementation of abritAMR in our institution has resulted in streamlined bioinformatics and reporting pathways, and has been readily updated and re-verified. The abritAMR tool and validation datasets are publicly available to assist laboratories everywhere harness the power of AMR genomics in professional practice.
Antimicrobial resistance (AMR) is an increasingly well-recognised threat to global health1,2,3. A clear understanding of the genomic and mechanistic basis for AMR is required to inform clinicians and public health teams, from the level of individual patients through to population-level surveillance4,5. By providing additional, timely data on acquired AMR genes or gene mutations that confer resistance, genomic sequencing has the potential to significantly enhance AMR surveillance and inform patient treatment beyond conventional phenotypic susceptibility testing methods6,7.
The use of genomics in the detection and surveillance of bacterial AMR is lagging behind other applications of genomics, such as strain typing and phylogenetic analysis. Contributors to the lack of uptake include the fact that phenotypic testing can be performed more rapidly than genotypic testing for many common pathogens, and the correlation between genotype and phenotype can be variable due to incomplete knowledge of AMR mechanisms that impact function4,8. However, technological advances in whole genome sequencing (WGS) means the process is becoming more cost-effective and the turnaround time for sequencing a microbial genome is decreasing significantly9.
A lack of international standards for genomic detection of AMR mechanisms means it is difficult to compare results between laboratories10,11. To facilitate implementation, the development of standardised and extensive open-access AMR databases and the validation of bioinformatic analytical tools for the detection of AMR is crucial4,12. Another hurdle to the acceptance and implementation of AMR genomics is how the data can be meaningfully reported outside of research or reference laboratory settings4. If the implementation of WGS for AMR is going to be accepted for the detection of AMR resistance, it is important to consider the way in which complex genomic data is presented to clinicians, nurses, public health surveillance teams, and other stakeholders with varying understanding of genomics, and thus how to interpret findings13. This is a gap in the bioinformatic tools currently available for AMR, as outputs are not usually tailored for clinical reports, or easily modifiable to suit local reporting requirements.
In many countries, clinical and public health microbiology laboratories are required to meet International Standards Organization (ISO), or ISO-equivalent, standards to be accredited to operate14. These standards require the implementation of standardised operating procedures, quality management systems, staff training and rigorous validation of all processes used to generate results and reports in each laboratory15. Currently, the relevant standards for medical laboratories (last released in 2012) are not designed to assess the performance of bioinformatic tools, making validation to meet these standards difficult for laboratories, in addition to the paucity of publicly-available validation datasets4.
Here we design and validate a bioinformatic tool, abritAMR, a wrapper for the NCBI AMRFinderPlus tool16 for the detection of AMR determinants from whole genome sequencing data16, with outputs adapted for clinical and public health microbiology reporting. We envisage that this pipeline, extensive validation methods and validation dataset could be adopted for use in public health and clinical sequencing laboratories to assist those involved in AMR surveillance and clinical applications.
The abritAMR bioinformatics pipeline
abritAMR is a pipeline for characterisation and reporting of AMR determinants from bacterial sequences, adapting the AMRFinderPlus tool and database for use in clinical and public health microbiology. Using the outputs from AMRFinderPlus, AMR mechanisms are further classified by antimicrobial class and/or mechanism to suit clinical and public health microbiology (CPHM) needs, and subsequently filtered according to local reporting requirements, with results ready for incorporation into sample reports (overview Fig. 1, outputs Fig. 2; further details available in Methods and Supplementary Figure 1). An additional module generates inferred susceptibility results (currently validated for Salmonella spp.; output Fig. 2, reporting logic detailed in Supplementary Figure 2).
Validation of the abritAMR pipeline: overview
To validate the abritAMR pipeline, we compared performance to PCR results for key AMR genes, to AMRFinderPlus results on synthetic read data from reference genomes, and to phenotypic data for Salmonella spp. (Fig. 3).
abritAMR performed very well against the four validation panels, with an overall accuracy of 99.9% (95% CI 99.9–99.9%), sensitivity 97.9% (97.5-98.4%), and specificity 100% (100-100%) (Table 1). Importantly, the abritAMR pipeline was reliable for the high-risk AMR gene classes that are notifiable as part of our national critical antimicrobial resistance surveillance system (CARAlert) in Australia17, with 99.9% accuracy (95% CI 99.9–100%), 98.9% sensitivity (98.3–99.3%) and 100% specificity (100–100%) across these classes (carbapenemases, 16 S ribosomal methyltransferases, mobile colistin resistance genes, ESBLs (including AmpCs), vancomycin resistance genes, and oxazolidinone and phenicol resistance (optrA, cfr and poxtA genes)).
Validation results compared to PCR and Sanger sequencing
The abritAMR pipeline was highly accurate compared to PCR (carbapenemase, ESBL, van and mec gene PCRs) with 1179/1184 (99.6%) resistance genes correctly detected, and compared well to Sanger sequencing (carbapenemase allele calling) with 355/356 (99.7%) alleles correctly identified by WGS. After discrepancy resolution (including repeat PCR and/or WGS, or examination of partial genes detected by abritAMR), five discrepancies between PCR and WGS results remained, including three potential false negatives (PCR positive, WGS negative) consisting of one CTX-M and two CMY genes not detected by abritAMR; at least one of these was due to the presence of a contig break in the gene leading to smaller fragments not detected by AMRFinderPlus. Additionally, two potential false positives (PCR negative, WGS positive) were identified, one CMY-42 and one IMP-62, confirmed by repeat PCR and sequencing; both genes were reported to be within the inclusivity range of the assay as per the manufacturer’s instructions, although this was validated by in silico PCR by the manufacturer (and observed in our dataset). Alternatively, these discrepancies may be due to plasmid dropout in culture (which is commonly observed with suspected CPE isolates such as these), as different colonies with and without the ESBL/AmpC gene may have been picked for PCR and WGS, potentially explaining these discrepancies. No discrepant results were detected for mecA and van gene detection, and only one allele was incorrectly assigned compared to Sanger sequencing (99.7% accuracy)(Table 1 and Fig. 4). Overall performance of abritAMR against PCR yielded 99.6% accuracy (95% CI 99.0-99.9%), 99.6% sensitivity (99.0-99.9%) and 99.4% specificity (97.9-99.9%).
Identification of AMR genes from synthetic reads
The presence or absence of 415 AMR genes across 321 genomes (133215 alleles) was tested by running abritAMR on synthetic reads from complete reference genomes, and comparing to the (native) AMRFinderPlus results on the complete genome, considered the ‘gold standard’ (Fig. 3). Overall accuracy of AMR gene detection by abritAMR was excellent, with 133127/133215 alleles called correctly, resulting in 99.9% accuracy (95% CI 99.9-99.9%), 97.5% sensitivity (96.9-98.0%), and 100% sensitivity (100-100%) (Fig. 5). Note that any discrepancies here include differences in abritAMR performance compared to AMRFinderPlus, as well as differences between complete genomes and (synthetic) short-read data, which likely accounts for at least a proportion of the discrepant results.
The majority of discrepancies were false negatives, with the aminoglycoside AMR genes being most common (32/88, 36.4%), especially the aac(6’)-Ib family, implicated in 18 false negatives, and specifically the aac(6’)-Ib-cr5 allele (11/18). Some of these were detected as partial genes at the site of contig breaks, possibly related to slightly higher GC content (leading to lower sequence coverage) in these genes. The other major theme was difficulty resolving sequences with multiple alleles of the same gene family, which were often collapsed into a single gene detection by abritAMR or miscalled as a different allele. For example, this included a sequence with CTX-M-3, CTX-M-14 and CTX-M-65 identified by AMRFinderPlus, and called as CTX-M-3 and CTX-M-24 by abritAMR. Four out of the five ‘false positive’ detections were actually allele miscalls within the same gene family. Collapse of repeated regions or duplicate alleles is often a feature of short-read sequencing, hence the discrepancies here may be a feature of comparing (synthetic) short-read data to complete genomes, rather than a feature of abritAMR. Notably, use of alternative genome assembly tools (SKESA18 and SPAdes19) did not resolve the discrepancies, with similar performance to Shovill20 (based on SPAdes; Supplementary Table 1).
Limit of detection and precision
The limit of detection of the abritAMR pipeline was assessed to determine the minimum average sequencing depth for acceptable accuracy of AMR gene detection (as required for clinical microbiology validation and accreditation). Accuracy was found to be consistent (99.9%) across the 40X to 150X range, with 40X being the minimum coverage accepted by our accredited quality control (QC) pipeline. Repeatability and reproducibility (precision) were assessed (replicates within and across sequencing runs) and found to be 100% concordant.
Validation of inferred antibiogram (Salmonella spp.)
Validation of inferred phenotype against phenotypic AST data demonstrated 98.9% accuracy (95% CI 98.7–99.1%), 98.9% sensitivity (98.4–99.3) and 98.9% specificity (98.7–99.1%) overall (Table 1, Supplementary Table 2 and Fig. 6). Accuracy of phenotypic inference was ≥98% for 11/13 antimicrobials (85%), with lower accuracy identified for streptomycin (95.5%, 95% CI 93.7–96.9%) and ciprofloxacin (96.8%, 95% CI 95.4–97.8%), similar to previous findings using different bioinformatic methods21.
A number of ‘false positive’ results were identified for streptomycin (resistant genotype [AMR genes or mutations detected], susceptible phenotype; n = 30/716 (4.2%) isolates). The AMR genes detected in phenotypically susceptible isolates were also detected in non-susceptible isolates, although the non-susceptible isolates more often had >1 AMR gene (1 AMR gene, 22% phenotypically resistant; 2 or more AMR genes, 81.4% phenotypically resistant), suggesting that these AMR mechanisms had small but additive effects on phenotype. Evaluation of phenotype-genotype concordance for azithromycin identified five ‘false negatives’ (susceptible phenotype, no AMR mechanism detected) and two ‘false positives’ (AMR mechanism detected but phenotypically susceptible; neither isolate carried the dominant resistance mechanism for azithromycin in Salmonella spp. (mph(A); one carried mef(B), an efflux pump with variable activity, and one carried ere(A), an esterase with lower affinity for azithromycin22).
Similar to streptomycin, ciprofloxacin also had a number of phenotype-genotype mismatches, likely due to low-level resistance conferred by AMR mechanisms. Isolates with one AMR gene most often had an intermediate phenotype (81.3% intermediate, 12.4% resistant, 6.2% susceptible), whilst isolates with ≥2 AMR genes were all phenotypically resistant. Despite these discordances, the best correlation between genotype and phenotype (S/I/R) was determined according to the number of AMR mechanisms detected (any type). This was coded into the reporting logic: absence of AMR mechanisms, ‘susceptible’, one AMR mechanism, ‘intermediate’, two or more AMR mechanisms, ‘resistant’. In this application, the use of an ‘intermediate’ category implies that MICs are likely to be borderline for these samples, i.e. may test susceptible or resistant on AST. Note that these results are only used for epidemiologic purposes (not for patient treatment), and hence over-calling resistance is more suitable for this purpose than non-detection of AMR mechanisms.
Sample outputs and incorporation into microbiology report
Two different outputs are used in the abritAMR pipeline: (i) Detailed Report output, where AMR genes or mutations are shown by enhanced subclass, as classified by the abritAMR database, and (ii) Final AMR Gene Report output (binned into reportable and non-reportable genes) after reporting logic is applied (Fig. 2 and Supplementary Data 1). An example of the output of the additional module incorporating mutational resistance and Inferred Antibiogram Report phenotype (currently validated for Salmonella spp.) is also shown in Supplementary Data 1.
After the validation process, implementation processes included modifying report outputs, integration with the existing LIMS, and documentation of the standard operating procedure (SOP). Multiple groups were consulted on the proposed report outputs, including reporting scientists with domain expertise (ensure results were easily interpretable and met reporting obligations across different pathogens), quality management staff (ensure results met legal requirements), and end-users (public health teams and clinicians). Subsequently, all staff involved in detection, reporting or interpretation of AMR results were trained in the use and interpretation of abritAMR, and clients were educated about the change (although only minimal differences were noticeable to clients, such as change in report formats) before full implementation into routine workflows. Implementation led to streamlined workflows, including rapid bioinformatic processing of large sequencing runs (AMR gene detection for a 96-sample run completed in <3 min with 256 CPUs), and less manual re-classification of AMR gene results by laboratory scientists (e.g. moving genes between reportable and non-reportable fields, removing intrinsic AMR genes from reportable fields).
The use of genomics in clinical and public health microbiology (CPHM) has increased substantially in the last decade, particularly in the fields of pathogen typing and outbreak investigations5,7,23. Detection of AMR from WGS data has somewhat lagged behind other applications of WGS, likely due its inherent complexity in comparison to simple and effective phenotypic AST4. This complexity is multi-faceted but includes the vast array of resistance mechanisms for testing (a single phenotype may be encoded by many different AMR mechanisms), and the limitations of phenotype-genotype correlation, particularly for less-common organisms and drug classes24,25. If not addressed systematically, these issues may render genomic AMR difficult to identify comprehensively across all pathogens seen in a CPHM laboratory, and difficult to communicate to clinicians and public health units13,26.
Globally, the paucity of highly accurate, reproducible bioinformatic tools for detection of AMR mechanisms has been recognised as one of the main limiting factors to wider application of genomics in the CPHM setting4,27. Here, we have designed and validated a bioinformatic platform for genomic detection of AMR determinants across bacterial species focusing on reporting requirements for clinical and public health microbiology, performed a rigorous validation, and implemented it to achieve an ISO-ccertified genomic workflow for AMR. This was achieved by adapting an existing software tool and database (AMRFinderPlus), and adding a modified classification step plus reporting logic to produce tailored reports for a CPHM audience.
This platform relies heavily on the comprehensive, well-curated and frequently updated AMR database behind AMRFinderPlus, as well as the excellent software tool, which uses multiple search methods to best identify AMR genes and mutations (with results annotated by the type of ‘match’, to allow scientists and clinicians to understand the degree of confidence behind each call)16. Notably, outputs for other large AMR databases, such as CARD28 and ResFinder29, could be modified to achieve similar tailored reports to abritAMR; our choice was based on the ease of integration into our existing workflows and reporting. abritAMR’s speed allows for rapid detection of AMR genes in routine high-throughput workflows, with AMR gene detection completed on a 96-sample sequencing run within 3 min. The addition of mutations to AMRFinderPlus for an increasing number of species has been very useful in our early applications, enabling our public health laboratory to move to a fully-genomic workflow for Salmonella surveillance, as all samples were sequenced for typing and phylogenetic analysis (a sub-sample of isolates still undergo AST to ensure new AMR mechanisms are detected). The capacity to include AMR genes or mutations of local significance would be a welcome addition to AMRFinderPlus, further extending its utility.
When this work commenced, there were no classifications of AMR mechanisms into drug classes in the AMRFinderPlus database, hence we created our own classification database, which has also evolved in parallel with the great advances made by the AMRFinderPlus team. There are (now) a small number of essential differences in the drug class classifications that we feel are important to enhance its utility for CPHM. Key examples include separating carbapenem resistance into different groups based on their mechanisms; separating into ‘carbapenemase’, ‘carbapenemase (MBL)’ and ‘carbapenemase (OXA-51 family)’ enables reporting each group separately, as antibiotic choices differ with metallo-beta-lactamases (MBLs) compared to non-MBL carbapenemases, and OXA-51 family carbapenemases are weak and intrinsic to Acinetobacter spp. and routine reporting is not required (coded as part of reporting logic). This combination of tailored classification and reporting logic allows the vast and complex array of AMR mechanisms to be distilled into results and reports that can be understood by scientists, clinicians and public health teams alike, without a great deal of prior knowledge. Future development will focus on restructuring the database to include different levels (classes, subclasses) to take advantage of the higher resolution of classifications now included in AMRFinderPlus.
Notably, the key limitations of AMRFinderPlus are also limitations of both ResFinder30 and CARD-RGI31 tools in different ways; ResFinder classifies AMR determinants into a small number of antimicrobial classes, but lacks the resolution needed for the more difficult classes described above (particularly beta-lactams), whilst CARD-RGI maintains an ontologic focus, where antimicrobial targets and mechanisms are identified at varying levels, but not grouped in a way that facilitates CPHM reporting and clinician understanding. However, both tools offer the accessibility of a graphical user interface (GUI) and the option of using raw reads as inputs for analysis, which are particularly important considerations for laboratories without dedicated bioinformatic expertise. Ideally, all large AMR databases and tools should facilitate clinically-relevant reporting of AMR determinants through ‘interpretation’ of outputs for clinical needs, and modifiable reporting logic to tailor outputs to reporting requirements, whilst maintaining a balance between accessibility and accuracy to enable validation and accreditation.
In a CPHM setting, it is critical to validate any new test or analytical process to ensure the veracity of results, and that the results (outputs or reports in this case) are fit-for-purpose12. However, formal test validation and accreditation procedures are based on wet-lab assays, and not always easily transferable to new methods such as WGS32. This may require some creative thinking about different ways to validate a new genomic test33, as demonstrated here with the use of synthetic sequencing reads generated from complete reference genomes. Ideally, a broad range of publicly-available reference datasets with genotypic and phenotypic data would be made freely available to assist with validation and bench-marking for databases, tools and new pipelines such as this, greatly advancing the development of AMR genomics4,34. Initiatives such as the NCBI National Database of Antibiotic Resistant Organisms (NDARO)35 and PATRIC36 are promising, but currently limited in scale, and further global data sharing is required here to advance phenotype-genotype correlations. Here, we have contributed a dataset that may be used for validation of genomic detection of AMR determinants against PCR results, and a method for validating against synthetic genomic data, to assist other laboratories to validate their own AMR workflows.
As with all attempts to validate new WGS pipelines and workflows, our study has limitations. The absence of a ‘gold-standard’ dataset to compare results from our pipeline to means that we must compare to imperfect standards, such as existing testing methods with lower resolution (PCR) and use synthetic sequencing data to compare targets not covered by PCR in our laboratory. Until these issues are addressed globally, laboratories will have to persist with these challenging comparisons, and rely on new initiatives such as proficiency testing programs (PTPs) for WGS (with participation being a requirement for test accreditation in our setting) to start to standardise results across laboratories and countries. Whilst abritAMR was highly accurate overall, a small proportion of discrepant results were identified, of which the majority were false negative results. Most of these discrepancies are likely due to the comparison of synthetic short-read data to complete genomes, where contig breaks within a gene result in non-detection, or plasmid dropout in culture, leading to non-identification of plasmid-borne AMR determinants in WGS data. In our validation, use of a different genome assembly tool had minimal impact, although it is critical to include this as a consideration in the validation process, particularly where discrepancies are present.
We envisage that the abritAMR pipeline will most likely be applied in CPHM settings, and hope that it may assist sequencing laboratories address the difficult question of how to best report these data to clinicians and public health teams with limited AMR knowledge. However, it may also have utility in other settings including research, particularly where complex AMR data need to be binned into functional classes to facilitate understanding when the user is less familiar with AMR. In our view, it is critical for medical microbiologists, scientists and bioinformaticians to continue to work together to navigate the challenges of communicating complex AMR data to clients, to advance the reach of genomic AMR and maximise the benefits of this potentially transformative technology.
Setting and existing genomics workflow
The Microbiological Diagnostic Unit Public Health Laboratory (MDU PHL) is a state reference laboratory for bacterial pathogens, including carbapenemase-producing Enterobacterales (CPE), Acinetobacter spp., Pseudomonas spp., vancomycin-resistant enterococci (VRE) and enteric pathogens37,38,39. The laboratory has a strong emphasis on genomics, primarily for epidemiologic surveillance, with increasing applications for clinical purposes. In conjunction with the Department of Health Victoria, we have embarked upon a broad program to increase implementation of pathogen genomics for public health purposes, either enhancing or superseding current laboratory methods.
Our existing genomics workflow (incorporating sample receipt, nucleic acid extraction, library preparation, short-read sequencing (Illumina NextSeq or MiSeq), and quality control (QC) of reads including de novo genome assembly) has already been validated and accredited by the National Association of Testing Authorities Australia (NATA, analogous to Clinical Laboratory Improvement Amendments [CLIA] in USA)40,41,42. Details of this workflow can be found in Supplementary Methods. Briefly, single colonies from overnight pure bacterial sub-cultures were selected and placed in lysis buffer. DNA extraction was performed on the QIAsymphony using the DSP Virus/Pathogen Mini Kit, and library preparation performed using Nextera XT (Illumina Inc.) according to manufacturer’s instructions. WGS was performed on NextSeq 500/550 or MiSeq platforms (Illumina Inc.), generating 150 bp or 300 bp paired-end reads respectively. Reads were assembled de novo using Shovill20. QC requirements for fastq reads to be included in subsequent analysis were (i) Q-score ≥30, (ii) data with a minimum estimated average genome coverage of >40X, and (iii) estimated genome size within range for observed species (see Supplementary Methods for detailed descriptions).
The abritAMR bioinformatics pipeline
The aims for development of this bioinformatic pipeline were to detect AMR genes and mutations accurately and reliably from bacterial whole genome sequencing (WGS) data, which could be validated against PCR and other data sources, implemented in a public health or clinical microbiology laboratory, and successfully accredited by governing bodies. The abritAMR bioinformatic platform takes a genome assembly from short-read data, long-read data or hybrid assemblies (fasta file) as input (once it has met defined QC parameters), and includes five main components (Fig. 1):
NCBI’s AMRFinderPlus tool (https://github.com/ncbi/amr) – abritAMR implements this tool to identify AMR genes in genome sequences, using a combination of BLASTx (matching the protein sequences of AMR genes to the protein sequence of the query isolate) and Hidden Markov Models (HMMs)16.
NCBI’s AMRFinderPlus database – abritAMR uses this frequently updated database (https://github.com/ncbi/amr/wiki/AMRFinderPlus-database), which is a comprehensive and extensively curated database of AMR gene sequences. Current functionality includes mainly AMR genes (‘core’ database), with point mutations (species-specific) and virulence genes increasingly being included in the ‘plus’ database. In more recent iterations, AMR genes and point mutations include information about the antimicrobial class and subclass (or specific antimicrobials) that they confer resistance to.
Classification database – While the AMRFinderPlus database includes some information about the antibiotic class and subclass affected for each AMR gene, these classifications are not always easily translatable for clinical and public health practice. For example, the beta-lactam subclass ‘cephalosporin’ includes AMR genes conferring resistance to first-generation cephalosporins (narrow-spectrum cephalosporinases, such as blaOXA-1), or third-generation cephalosporins (such as blaCTX-M ESBLs), which have very different implications for AMR surveillance and patient management. The local abritAMR classification database is based on the current version of the AMRFinderPlus database, with an added field (‘Enhanced subclass’) to translate the NCBI subclasses into more functional versions for our purposes (logic detailed in Supplementary Table 3, examples in Fig. 2). This field is updated following each new database release (logic detailed in Supplementary Table 4).
Species-specific reporting logic (AMR genes, all species) – Currently, most AMR genes detected by this pipeline are not required to be reported for surveillance or clinical purposes; reporting data on all AMR genes found in an isolate runs the risk of overwhelming clients with unnecessary data and missing the most pertinent AMR genes detected. As such, we developed a reporting logic process to filter the AMR genes detected in each isolate into ‘reportable’ or ‘non-reportable’ categories, to mirror the usual reporting requirements diagnostic laboratories (Supplementary Figure 1). This logic takes into account the species when determining what is reportable, limiting the reporting of intrinsic AMR genes (such as blaOXA-51 subtypes in Acinetobacter baumannii), and differentiating between AMR genes that are only reportable in certain species (e.g. ESBL genes reportable for national surveillance of Salmonella spp.), while always reporting significant AMR genes that are not limited by species (e.g. carbapenemase and mcr genes). Non-reportable genes are also made available to the reporting pathologists and senior scientists and recorded in the laboratory information management system (LIMS), enabling detailed review of all detected AMR genes, correlation with phenotype, and movement between reportable and non-reportable categories when required as part of any routine results review process before reporting.
Inferred phenotype (AMR genes and mutations, validated species only) – The pattern of AMR genes and mutations detected can be used to infer phenotype for a given isolate. In abritAMR, this is currently validated for Salmonella spp., and reported for epidemiologic purposes in our laboratory, replacing routine antimicrobial susceptibility testing (AST) of Salmonella spp. for public health surveillance (reporting logic detailed in Supplementary Figure 2, Inferred Antibiogram Report example shown in Supplementary Data 1).
abritAMR outputs include a Detailed Report output, consisting of a table (comma separated values file) of AMR genes or mutations detected for each sample, listed by enhanced subclass (e.g. “Carbapenemase (MBL)”, “Colistin”), or a Final AMR Gene Reports, a table of AMR genes detected for each sample binned into ‘reportable’ or ‘not reportable’ fields, when the species-specific reporting logic is applied (Fig. 2). Additionally, when run on validated species (currently Salmonella spp.), abritAMR also produces an Inferred Antibiogram Report. All alleles listed in these outputs are either ‘exact matches’ (100% identity and 100% sequence coverage compared to the reference protein sequence) or ‘close matches’ (90-<100% identity and 90-<100% sequence coverage compared to the reference protein sequence, marked by an asterisk [*] to distinguish from exact matches), as defined by AMRFinderPlus. Partial matches (>90% identity, 50-<90% coverage compared to reference protein sequence) are listed separately, and must be examined further if deemed suitable for reporting. Where an internal stop codon (i.e. truncated gene) or HMM match are recorded by AMRFinderPlus, no result is reported by abriTAMR. Examples of abritAMR pipeline outputs are shown in Fig. 2, demonstrating how the AMRFinderPlus output is modified by abritAMR (binned into enhanced subclass according to abritAMR’s classification database, and separated into reportable and non-reportable categories by the reporting logic).
Validation of the abritAMR pipeline
To validate abritAMR, results from the pipeline were compared to results from PCR testing, Sanger sequencing and synthetic read sets as detailed below. For the purposes of validation, both ‘exact’ and ‘close’ matches were considered as ‘detected’. Pre-specified sensitivity and specificity thresholds were defined for successful validation prior to analysis.
All isolates used in validation were obtained as part of routine AMR surveillance under public health laboratory functions, and hence were exempt from requiring ethics approval. Data were de-identified for the validation study (no patient or clinical data were used).
This dataset included 1184 bacterial isolates (42 species), that had previously been tested by PCR, including a carbapenemase and ESBL real-time multiplex PCR (n = 1020 isolates, AusDiagnostics 16-well CRE panel, catalogue no. 21098, version 03; Sydney, Australia), van gene PCR (n = 121, in-house assay for vanA, vanB, vanC1 and vanC2/3 genes43) and mecA PCR (n = 43, in-house assay for mecA44)(Supplementary Figures 3–6).
PCR and Sanger sequencing for allelic variants
This dataset included 347 isolates (20 species) with carbapenemase resistance genes detected by a range of carbapenemase and ESBL PCR assays across six different carbapenemase resistance gene families (targets and primers detailed in Supplementary Table 5), with Sanger sequencing subsequently performed to identify the carbapenemase allelic variant (Supplementary Figures 7 & 8).
For the remaining AMR gene targets where PCR was not readily available to compare with abritAMR, we created synthetic short-read sequence data from complete, publicly available genomes from RefSeq or GenBank, and compared abritAMR results on synthetic short reads to AMRFinderPlus results from the complete genomes. To do this, we generated synthetic 150 bp paired-end reads using the art-illumina tool45 to fragment the complete genome sequences, incorporating error profile data from a NextSeq500 sequencer, at 40X to 150X average genome coverage (40X is the minimum coverage accepted for QC) (Fig. 3). This dataset comprised 321 isolates (49 species) covering 415 unique AMR alleles from 43 resistance subclasses (Supplementary Table 3 and Figures 9 & 10). abritAMR results from synthetic reads were compared to (native) AMRFinderPlus results from complete genome sequences. This allowed direct comparisons of presence or absence of AMR genes, therefore avoiding the problem of discrepancies in AMR gene nomenclature that may lead to false discordance if two different AMR gene databases were compared.
abritAMR results from a test panel of 13 organisms (12 genera, Supplementary Table 6) sequenced multiple times (both within and across sequencing runs) using different sequencing platforms in our laboratory (NextSeq and MiSeq) and a range of sequencing modes (low, mid and high throughput) and read lengths (75–300 bp). Different combinations were compared to assess analytical precision (repeatability and reproducibility).
Determination of limit of detection
The limit of detection for molecular assays is normally the lowest amount of nucleic acid target that can be detected by the assay. This definition is not strictly applicable to whole genome sequencing, as the WGS assay is qualitative with a standardised DNA concentration being used in the sequencing reaction. Instead, the limit of detection in this context was calculated as the minimum average coverage across the genome required for accurate detection of gene targets or allele variants. Synthetic paired-end reads (150 bp) were generated at a range of sequencing coverages, from the minimum average coverage accepted for our routine QC (40 X) up to 150 X coverage.
Determination of inferred phenotype (Salmonella spp.)
We validated phenotypic inference (Susceptible/Intermediate/Resistant, S/I/R) against an existing dataset of 864 sequenced Salmonella spp. with antimicrobial susceptibility (AST) data generated by agar dilution from 2018-2019. For the fluoroquinolone drug class, the S/I/R phenotypes associated with combinations of AMR genes and mutations were analysed to determine the relative weighting of each AMR mechanism to infer a phenotype most reliably from in silico analysis.
Discordant result resolution
Discordant results were divided into two categories: firstly, PCR negative, WGS positive (false positive) - this may be due to the AMR gene detected by WGS not being included in the range of the PCR panel. If the gene was known to be included in the range of the PCR panel (as stated by the manufacturer), the isolate was retested by PCR and WGS to resolve this discrepancy. Secondly, PCR positive, WGS negative (false negative) – this may be due to an AMR gene being fragmented across two or more contigs, hence partial matches were assessed; if no partial matches were found, the sequence was interrogated using alternative tools; if this failed to resolve the discrepancy, the isolate was retested by PCR and WGS. Where possible, discrepancies between phenotypic and genotypic results were investigated through repeat phenotypic testing and/or repeat sequencing of the isolate.
In accordance with ISO standards, the abritAMR pipeline must be re-verified after each database or tool update. Database updates are reverified by confirming that the updated database performs to the same criteria as was defined in the original validation, using the synthetic dataset described above (‘abritAMR test suite’). Updates to the abritAMR software may take the form of minor patches or major updates. Minor patches are changes that do not impact underlying structure or core logic of the pipeline, such as fixes for typographical errors or addition of functionality which does not impact the core logic of the tool, e.g. changes to log outputs. In these cases, a full reverification is deemed unnecessary and running of the abritAMR test suite is sufficient. However, other changes which may impact the core logic or structure of the outputs require a complete reverification as described for database updates. Any change in performance is assessed, the cause identified, and modifications made before the changes are implemented for reporting. All changes to abritAMR are tracked in GitHub and the versions managed using conda.
Test performance characteristics (accuracy, sensitivity, specificity, positive and negative predictive values, including confidence intervals) were calculated using the epiR package for R (version 4.1.1), used in RStudio (version 1.4.1717).
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Sequence data used in this study are available on NCBI Sequence Read Archive (BioProjects PRJNA529744, PRJNA565795, PRJNA856406, PRJNA856415, PRJNA857525, PRJNA857526, PRJNA857528, PRJNA857531, PRJNA857533, PRJNA857534, PRJNA870170 and PRJNA319593) with accession numbers provided in Supplementary Data 2. Accession numbers for the complete genomes used to generate the synthetic validation dataset are provided in Supplementary Data 2. PCR results for the PCR validation dataset are available in Supplementary Data 2 and on GitHub (https://github.com/MDU-PHL/abritAMR)46. Source data are provided with this paper.
O’Neill J. Review on antimicrobial resistance: Tackling a crisis for the health and wealth of nations. London, UK: UK Government (2014).
Centres for Disease Control and Prevention (CDC). Antibiotic resistance threats in the United States, 2019. Atlanta, GA: U.S. Department of Health & Human Services (2019).
World Health Organization. Global action plan on antimicrobial resistance. Geneva: WHO (2015).
Ellington, M. J. et al. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee. Clin. Microbiol Infect. 23, 2–22 (2017).
Motro, Y. & Moran-Gilad, J. Next-generation sequencing applications in clinical bacteriology. Biomol. Detect Quantif. 14, 1–6 (2017).
Maugeri, G., Lychko, I., Sobral, R. & Roque, A. C. A. Identification and antibiotic-susceptibility profiling of infectious bacterial agents: a review of current and future trends. Biotech. J. 14, 1700750 (2019).
Besser, J., Carleton, H. A., Gerner-Smidt, P., Lindsey, R. L. & Trees, E. Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin. Microbiol Infect. 24, 335–341 (2018).
Boolchandani, M., D’Souza, A. W. & Dantas, G. Sequencing-based methods and resources to study antimicrobial resistance. Nat. Rev. Genet 20, 356–370 (2019).
Vincent, A. T., Derome, N., Boyle, B., Culley, A. I. & Charette, S. J. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. J. Microbiol Methods 138, 60–71 (2017).
Coolen J. P. M., et al. Centre-specific bacterial pathogen typing affects infection-control decision making. Microbial Genomics 7 (2021).
Doyle R. M., et al. Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study. Microbial Genomics 6 (2020).
Gargis, A. S., Kalman, L. & Lubin, I. M. Assuring the quality of next-generation sequencing in clinical microbiology and public health laboratories. J. Clin. Microbiol 54, 2857–2865 (2016).
Crisan, A., McKee, G., Munzner, T. & Gardy, J. L. Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory. PeerJ 6, e4218 (2018).
International Organization for Standardization (ISO). ISO15189:2012: Medical laboratories - Requirements for quality and competence. 2012. https://www.iso.org/standard/56115.html (accessed 18/09/2022 2022).
International Organization for Standardization (ISO). Medical laboratory testing: how can we trust the results? 2021. https://www.iso.org/news/ref2617.html (accessed 18/09/2022 2022).
Feldgarden M., et al AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci. Rep. 11 (2021).
Australian Commission on Safety and Quality in Health Care. National Alert System for Critical Antimicrobial Resistances (CARAlert). https://www.safetyandquality.gov.au/our-work/antimicrobial-resistance/antimicrobial-use-and-resistance-australia-surveillance-system/national-alert-system-critical-antimicrobial-resistances-caralert (2021).
Souvorov A., Agarwala R., Lipman D. J. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol. 19 (2018).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput Biol. 19, 455–477 (2012).
Seemann T. Shovill: assemble bacterial isolate genomes from Illumina paired-end reads. GitHub; (2017).
Sia, C. M. et al. Genomic diversity of antimicrobial resistance in non-typhoidal Salmonella in Victoria, Australia. Microb. Genomics 7, 000725 (2021).
Gomes, C. et al. Macrolide resistance mechanisms in Enterobacteriaceae: focus on azithromycin. Crit. Rev. Microbiol 43, 1–30 (2017).
Armstrong, G. L. et al. Pathogen genomics in public health. N. Engl. J. Med 381, 2569–2580 (2019).
Ruppe E., Cherkaoui A., Lazarevic V., Emonet S., Schrenzel J. Establishing genotype-to-phenotype relationships in bacteria causing hospital-acquired pneumonia: a prelude to the application of clinical metagenomics. Antibiotics (Basel) 6 (2017).
Mahfouz, N., Ferreira, I., Beisken, S., Von Haeseler, A. & Posch, A. E. Large-scale assessment of antimicrobial resistance marker databases for genetic phenotype prediction: a systematic review. J. Antimicrob. Chemother. 75, 3099–108 (2020).
Rossen, J. W. A., Friedrich, A. W. & Moran-Gilad, J. Practical issues in implementing whole-genome-sequencing in routine diagnostic microbiology. Clin. Microbiol Infect. 24, 355–360 (2018).
World Health Organization. GLASS whole-genome sequencing for surveillance of antimicrobial resistance. Geneva: WHO (2020).
Mcarthur, A. G. et al. The Comprehensive Antibiotic Resistance Database. Antimicrob. Agents Chemother. 57, 3348–3357 (2013).
Zankari, E. et al. Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67, 2640–2644 (2012).
Bortolaia, V. et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 75, 3491–3500 (2020).
Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, D517–d25 (2020).
Kozyreva, V. K. et al. Validation and implementation of clinical laboratory improvements act-compliant whole-genome sequencing in the public health microbiology laboratory. J. Clin. Microbiol 55, 2502–2520 (2017).
Angers-Loustau A. et al. The challenges of designing a benchmark strategy for bioinformatics pipelines in the identification of antimicrobial resistance determinants using next generation sequencing technologies. F1000Research 7 (2018).
Bogaerts, B. et al. Validation of a bioinformatics workflow for routine analysis of whole-genome sequencing data and related challenges for pathogen typing in a European national reference center: Neisseria meningitidis as a proof-of-concept. Front Microbiol 10, 362 (2019).
National Center for Biotechnology Information. National Database of Antibiotic Resistant Organisms (NDARO). (2022). https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/ (accessed 2022-04 21 2022).
Davis, J. J. et al. The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities. Nucleic Acids Res. 48, D606–D612 (2020).
Lane, C. R. et al. Search and Contain: Impact of an integrated genomic and epidemiological surveillance and response program for control of carbapenemase-producing Enterobacterales. Clin. Infect. Dis. 73, e3912–e3920 (2021).
Ingle, D. J. et al. Genomic epidemiology and antimicrobial resistance mechanisms of imported typhoid in Australia. Antimicrob. Agents Chemother. 65, e0120021–e0120021 (2021).
Ingle D. J. et al. Prolonged outbreak of multidrug-resistant Shigella sonnei harboring blaCTX-M-27 in Victoria, Australia. Antimicrob Agents Chemother 64 (2020).
National Association of Testing Authorities Australia (NATA). https://nata.com.au/ (2021).
Centres for Disease Control and Prevention (CDC). Clinical Laboratory Improvement Amendments (2022).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Dutka-Malen, S., Evers, S. & Courvalin, P. Detection of glycopeptide resistance genotypes and identification to the species level of clinically relevant enterococci by PCR. J. Clin. Microbiol 33, 24–27 (1995).
Louie, L. et al. Rapid detection of methicillin-resistant staphylococci from blood culture bottles by using a multiplex PCR assay. J. Clin. Microbiol 40, 2786–2790 (2002).
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2011).
Horan K., Goncalves da Silva A., Seemann T. Establishing ISO-certified genomics workflows for identification and surveillance of antimicrobial resistance (code for abritAMR software). https://github.com/MDU-PHL/abritamr; https://doi.org/10.5281/zenodo.7370627 (2022).
MDU PHL is funded by the Victorian Government Department of Health. BPH receives an investigator grant from National Health and Medical Research Council Australia (GNT1196103). NLS received an Australian Government Research Training Program (RTP) scholarship. We sincerely thank the NCBI’s AMRFinderPlus team for their dedication to producing and maintaining high-quality tools and database for detection of AMR mechanisms from WGS data. We also thank Cheryll Sia for sharing her insights on genotype-phenotype correlations for fluoroquinolone resistance in Salmonella.
The authors declare no competing interests.
Peer review information
Nature Communications thanks Frank Aarestrup, Kara Tsang and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sherry, N.L., Horan, K.A., Ballard, S.A. et al. An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance. Nat Commun 14, 60 (2023). https://doi.org/10.1038/s41467-022-35713-4
This article is cited by
BMC Infectious Diseases (2023)
Reduced selection for antibiotic resistance in community context is maintained despite pressure by additional antibiotics
ISME Communications (2023)
Nature Reviews Genetics (2023)
Nature Communications (2023)