Benchmarking second and third-generation sequencing platforms for microbial metagenomics

Meslier, Victoria; Quinquis, Benoit; Da Silva, Kévin; Plaza Oñate, Florian; Pons, Nicolas; Roume, Hugo; Podar, Mircea; Almeida, Mathieu

doi:10.1038/s41597-022-01762-z

Download PDF

Data Descriptor
Open access
Published: 11 November 2022

Benchmarking second and third-generation sequencing platforms for microbial metagenomics

Scientific Data volume 9, Article number: 694 (2022) Cite this article

22k Accesses
43 Citations
88 Altmetric
Metrics details

Subjects

Abstract

Shotgun metagenomic sequencing is a common approach for studying the taxonomic diversity and metabolic potential of complex microbial communities. Current methods primarily use second generation short read sequencing, yet advances in third generation long read technologies provide opportunities to overcome some of the limitations of short read sequencing. Here, we compared seven platforms, encompassing second generation sequencers (Illumina HiSeq 300, MGI DNBSEQ-G400 and DNBSEQ-T7, ThermoFisher Ion GeneStudio S5 and Ion Proton P1) and third generation sequencers (Oxford Nanopore Technologies MinION R9 and Pacific Biosciences Sequel II). We constructed three uneven synthetic microbial communities composed of up to 87 genomic microbial strains DNAs per mock, spanning 29 bacterial and archaeal phyla, and representing the most complex and diverse synthetic communities used for sequencing technology comparisons. Our results demonstrate that third generation sequencing have advantages over second generation platforms in analyzing complex microbial communities, but require careful sequencing library preparation for optimal quantitative metagenomic analysis. Our sequencing data also provides a valuable resource for testing and benchmarking bioinformatics software for metagenomics.

Measurement(s)	mock community • Metagenomic profiling
Technology Type(s)	Illumina HiSeq 3000 • ThermoFisher Ion Proton and GeneStudio S5 • MGI DNBseq G400 and T7 • Oxford Nanopore Sequencing • PacBio Sequel System • Genome mapping and assembly
Sample Characteristic - Organism	Synthetic microbial communities • Bacteria • Archaea

Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies

Article Open access 26 November 2019

Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4

Article Open access 23 February 2023

HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota

Article Open access 26 October 2022

Background & Summary

High throughput metagenomic sequencing has drastically changed our understanding of microbial ecosystems. One of the most popular approach is to use metagenomic sequencing, assembly and binning procedures^1,2,3,4 to investigate the structure, functionalities and ecological interactions of microbial communities with their environment or host^5,6,7,8,9. Most metagenomic studies rely on second generation sequencing providing billions of short sequences in a single run, with the Illumina sequencing platforms being the most widely used¹⁰. Improvement of third generation sequencing yields millions of long reads per run but are mostly used for genomic assembly procedures^11,12,13, and further benchmarking is required to evaluate their performance for quantitative metagenomic analysis.

For this purpose, we produced three synthetic uneven DNA mocks, varying in their microbial richness (64 to 87 strains, full composition in Supplementary Table S1) and belonging to 29 prokaryotic phyla (Fig. 1). We particularly focused on combining a large spectrum of genome sizes, GC content and mixing closely related species. These mocks represent to date the most complex synthetic communities for evaluating sequencers performances compared to previous studies^{14,15,16,17,18,19}, and not obtained from in silico simulated microbial communities^20,21,22,23. We performed five short read sequencing (Ion Proton P1, Ion S5, Illumina HiSeq 3000, DNBSEQ G400, DNBSEQ T7) and two long read sequencing technologies (ONT MinION and PacBio Sequel II), making this study the one covering the widest diversity of sequencing technologies (Table 1).

Table 1 Overview of the main sequencing platform characteristics used in this study.

Full size table

The mock1 (71 strains) was sequenced using all technologies and mocks 2 and 3 (64 and 87 strains, respectively) were additionally sequenced to estimate the impact of various microbial richness (Table 2). After sequencing and quality control, we were able to align more than 99% of all reads for each technology back to their reference genomes, with almost 100% of uniquely mapped reads for long read technologies, down to 87% for Ion Proton and S5 technologies^15,24. All technologies provided up to 99% identity, except for the MinION R9 with about 89% identity due to a high in/del errors and substitution errors²⁵. The PacBio Sequel II provided the lowest substitution error rate and the DNBSeq G400 and T7 the lowest in/dels rate^26,27.

Table 2 Mapping summary per mock and sequencing technology.

Full size table

To evaluate the impact of sequencing depth, we performed a subsampling analysis and compared observed versus theoretical genome abundances (Fig. 2). In general, Spearman correlations were high for all technologies, reaching values above 0.9 when mapping at least 100,000 reads. Notably, correlations were slightly lower for mock communities with higher microbial richness, partially due to cross-matching events during mapping procedures. Whilst second generation sequencers were equivalent for taxonomic profiling²⁸, we found more pronounced decreases for MinION and PacBio correlations, even if reads were almost entirely uniquely mapped. Although the PacBio sequencer presented the lowest error rate, these results could be explained by the DNA size filtering step performed during library preparation, which was calibrated to maximize the read length. We hypothesize that the filtering step could remove highly fragmented DNA, thus impacting strains relative abundances^29,30,31.

By focusing on mock1 individual genome abundances, we found that most genomes were accurately estimated for all technologies (Fig. 3). Over or under abundance estimation for most genomes was not particularly related to sequencing technology, read length, taxonomy, nor by GC-content, genome size and genome completeness, even at a low depth of 500,000 reads. These results suggest promising opportunities for affordable alternatives to high depth metagenomic sequencing, by using a limited number of reads- the so-called shallow shotgun sequencing- to explore the composition of complex microbiota³², even with third generation sequencers¹⁹.

Finally, we performed de novo metagenomic assembly and confronted assemblies with their reference genomes (Table 3). PacBio Sequel II generated the most contiguous assemblies with 36 full genomes out of 71 in mock1, followed by MinION (22 genomes), making third generation sequencers more adapted for genome reconstruction. When considering the mismatches per 100kbps, PacBio Sequel II was also providing the most accurate assemblies, followed by Illumina HiSeq 3000 and DNBSeq G400 (Table 3). However, the lower indels rates obtained with DNBSeq G400 and Illumina HiSeq suggests that hybrid procedures may provide more accurate assemblies than those obtained using long reads alone. We tested this hypothesis by generating hybrid assemblies (Supplementary Information S2) for each technology. For MinION, the hybrid assemblies improved notably the genome fraction recovery compared to MinION assembly only, while reducing the number of fully unaligned contigs, confirming our initial hypothesis. For PacBio, the hybrid assembly did not improve assembly metrics, except for Illumina and DNBSeq with a lower indels rate per 100 kpbs and an improvement in genome fraction recovery with Illumina.

Table 3 Mock1 metaquast assembly report.

Full size table

By this work, we provide a new resource with highly complex synthetic mock samples and extensive metagenomic sequencing data, using the most popular second and third generation sequencing platforms. These data could be used to benchmark or improve metagenomic assemblers, binning software and taxonomic profilers^33,34.

Methods

Synthetic microbial communities’ construction

A total of 91 different strains were used in this study. For 58 strains of Archaea and Bacteria, we used archived gDNA from the Shakya et al. study¹⁴. To further increase the complexity of the constructed community, we cultured nine additional microbes for which the genomic sequence was available. High molecular weight DNA was isolated and quantified as described previously¹⁴. Purified DNA from 4 other bacteria and archaea was obtained from the laboratory of Dr. Cynthia Gilmour (Smithsonian Research Institute, Edgewater, Maryland, USA). We also used the 20 Strain Even Mix Genomic Material from ATCC (MSA-1002). Three different genomic microbial synthetic communities were assembled by mixing individual, purified DNAs. The composition of each community aimed to provide variation in the number and relative abundance of individual microbe and their represented taxonomic category. The communities achieved a diversity ranging from 64 to 87 strains, representing 29 phyla of Archaea and Bacteria, with a relative abundance distribution spanning over three orders of magnitude. The genome size distribution ranged from 0.49 to 9.7 Mbp and the G + C content was between 27 and 69%. Within the 91 strains, 21 have extrachromosomal DNA such as plasmids or additional chromosomes (Supplementary Table S1). A phylogenetic tree for all strains was constructed using 40 universal protein markers as previously described¹ and taxonomic ranks were updated using gtdbtk Release 07-RS207³⁵. The tree was visualized and annotated using iTOL³⁶.

Library preparation and Sequencing

ThermoFisher Ion Proton P1 and Ion GeneStudio S5 library preparation and sequencing

Ion Proton P1 and Ion GeneStudio S5 libraries were built using Ion Plus Fragment Library kit (Thermo Fisher Scientific, Waltham, MD, USA). 500 ng of High Molecular Weight (HMW) DNA was sheared using Covaris E220 sonicator and AFA microtubes (Covaris, Brighton, UK) in 100 µL to achieve maximum distribution at 150pb. After shearing, Ampure XP purification and Qubit quantification were performed. Sheared DNA (100 ng) was submitted to enzymatic treatment steps (End repair, barecode ligation with IonXpress Barcode Adaptors kit and final 9 cycles of PCR amplification). Ampure XP beads were used for size selection to 150 pb after End repair reaction and for purification after the other enzymatic treatment steps. Libraries were quantified and size controlled by using High Sensitivity Small Fragment kit and Fragment Analyzer (Agilent Technologies Inc., Santa Clara, CA, USA). Libraries’ molarity was between 8.000 and 10.000 pM, before normalization at 95 pM and multiplexing for sequencing. Pipetting was performed with Biomek Fxp or Biomek 3000 Liquid handling (Beckman Coulter Inc., Brea, CA, USA). Pre-sequencing step was performed by Ion Chef (Thermo Fisher) for each sequencing device. A first sequencing was performed by Ion Proton with Ion PI HiQ Chef kit (Thermo Fisher) and Ion PI chip kit v3 (Thermo Fisher). Several run were performed including first path, any path with rebalance libraries and final path to get up to 20 million raw reads for each multiplexed sample. A second sequencing was performed by Ion GeneStudio S5 Prime with Ion 550 chef kit and Ion 550 chip kit. A single run was sufficient to obtain up to 20 million raw reads per multiplexed sample.

MGI DNBSEQ-G400 and DNBSEQ-T7 library preparation and sequencing

DNBSEQ-G400 and DNBSEQ-T7 libraries were constructed from 500 ng of HMW DNA and fragmented using Covaris sonicator E220 (Covaris, Brighton, UK). Sheared DNA underwent End repair and A-tailing steps as described in the MGI Easy Universal DNA Library Prep Set User Manual v1 (MGI Tech Co., Shenzen, China). Adapters ligation was performed following the instructions of the MGIEasy DNA Adapters kit, and cleaned up with the provided DNA Clean Beads. PCR amplification was carried out on purified adapter-ligated DNA and cleaned-up again using magnetic beads. After quality control using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MD, USA), purified PCR products were denaturated and ligated to generate single-strand circular DNA libraries. Barcode libraries were pooled in equal amounts to make DNA Nanoballs (DNB), and sequenced using DNBSEQ-G400 and DNBSEQ-T7 sequencer technologies following the manufacturer’s recommendations.

Illumina library preparation and HiSeq 3000 sequencing

DNA libraries have been prepared using the Illumina TruSeq PCR-free HT (Illumina, San Diego, CA, USA), following the manufacturer protocol. Briefly, 2 µg of HMW genomic DNA was fragmented by sonication using Covaris sonicator (Covaris, Brighton, UK). Sheared fragments were cleaned using the Sample Purification Beads provided in the kit, before Ends repair and size selection procedures. Adapters were ligated and libraries underwent an additional cleaning step with magnetic beads. Library quality was assessed using an Advanced Analytical Fragment Analyser (Agilent Technologies Inc., Santa Clara, CA, USA) and libraries were quantified by q-PCR using the Kapa Library Quantification Kit (Illumina, San Diego, CA, US) following the manufacturer’s recommendations. Prior to multiplexing, libraries were normalized to 4 nM and equal volumes were pooled together. Final libraries were sequenced on Illumina HiSeq 3000 using a paired-end read length of 2 × 150 pb with the Illumina HiSeq 3000 Reagent Kits.

Oxford Nanopore MinION R9 library preparation and sequencing

Libraries were built with 1D Native barcoding genomic DNA kit (SQK-LSK109 rev E) from Oxford Nanopore Technologies. To increase sequencing yield, 1.5 µg DNA samples were sheared using G-Tube (Covaris) for 2 times 30 seconds at 7,200 rpm. Sheared Fragments (1 µg), of length comprised between 8 and 9 kb, underwent end repair and A-Tailing (New England Biolabs M6630L and E7546L kits). Next, 500 ng of repaired DNA was ligated with adapter barcode using Native Barcoding Expansion 1–12 kit (EXP-NBD104) and Blunt/TA Ligase (New England Biolabs, M0367L). Native barcode ligated DNA was quantified with Qubit and Fragment Analyzer. Equimolar libraries were pooled for a total quantity of 700 ng to ligate to the sequencing adapters. At this step, we choose the Long Fragment Buffer (LFB) from SQK-LSK109 kit to increase the recovery of 3 kb or longer fragments. Pooled libraries were loaded onto R9 FLO-MIN106 flowcell and sequencing was performed during 48 hours.

Pacific Biosciences Sequel II library preparation and sequencing

For Pacific Biosciences Sequel II sequencing, 500 ng of HMW genomic DNA was used to make unamplified libraries using the SMRTbell® Express Template Prep Kit 2.0. First, gDNA was sheared to a targeted fragment size of 12 kb using Megaruptor and Long Hydropores (Diagenode, Denville, NJ, USA). Sheared gDNA were concentrated using AMPure PB Beads according to the manufacturer recommendations (Pacific Biosciences, Menlo Park, CA, USA) and underwent two treatment procedures for DNA damage repair and end-repair. Barcoded overhang Hairpins adapters from the manufacturer were ligated to the fragment ends to create SMRTbell templates used for sequencing. SMRTbell templates were purified using an exonuclease procedure to remove any free ends molecules or no adapter templates. Then, size-selection was conducted using Ampure PB beads at a concentration of 0.45X to ensure the removing of short fragments. Our mock SMRTbell template was multiplexed with tree additional samples (not included in this study) to equal molarity. On the resulting template, fragments < 3 kb were removed using an additional diluted Ampure PB beads procedure. PacBio primer v2 annealing to the SMRTbell template and polymerase binding to the annealed template were achieved before being sequenced with Sequel II sequencer using Chemistry 2.0 and 30-hour movie.

Sequence QC

The raw reads were quality trimmed using software tools with similar trimming parameters to improve technical comparisons. Illumina HiSeq 3000 and DNBSeq G400 and T7 paired-end reads were trimmed with FASTP v.0.20.0³⁷, using Illumina TruSeq adapaters for the Illumina HiSeq 300 sequencer and DNBSeq adapters for the DNBSeq G400 and T7. The minimum read length after trimming was 45nt and all reads with a single N nucleotide or unpaired reads after trimming were discarded. The Ion S5 and Ion Proton reads were trimmed using AlienTrimmer v2.0³⁸ by providing the Ion S5 and Proton contaminants and using the following parameters for trimming: “-k 10 -l 45 -m 5 -p 40 -q 20”. The minION R9 reads were base called and quality trimmed using Guppy v2.3.1 + 9514fbc³⁹ with the kit SQK-LSK109 and the barcoding kit EXP-NBD103. The PacBio CSS reads were processed through PacBio custom pipeline. Finally, all PacBio and MinION reads shorter than 500nt were discarded.

Read mapping procedures

All read mapping procedures were performed on reference genomes corresponding to the expected mock composition. For Illumina, DNBSEQ G400 and T7 platforms, mapping was done with bowtie2 v2.3.5.1⁴⁰ using paired-end best hit end-to-end match and sensitive presets parameters. For Ion Proton and S5, bowtie2 with single-end best-hit end-to-end match and sensitive presets parameters. For MinION and PacBio, mapping was performed with minimap2 version 2.15-r915-dirty⁴¹ using default parameters, soft clipping activated and by keeping only the best hit.

Read subsampling

Subsampling was performed by a python script using the random library and differential analysis in Figs. 2 and 3 between the observed and expected mock composition at different depth, from 10k to 1 M reads were performed under R version 3.6.0 using stats, ggplot2, data.table and reshape2 packages.

Metagenomic assembly

The Illumina HiSeq 3000, DNBSeq G400 and T7 paired-end reads were assembled with SPAdes v3.14.1⁴² with “--meta” presets and kmer iteration “--k 21,33,55” for DNBSeq, and “--k 21,33,55,77” for Illumina to account for their respective maximal read length. The Ion Proton and Ion S5 single reads were also assembled with SPADES, using the “--iontorrent” and “--careful” flag, as the “--meta” flag is not available for single reads, and a kmer iteration “--k 21,33,55,77”. MinION and Pacbio were assembled with metaFlye v2.8.1-b1688⁴³ using the “--meta” preset, “--plasmids” to recover short unassembled plasmids, a minimum overlap of 2000nt, “--pacbio-hifi--hifi-error 0.003” for Pacbio and “--nano-raw” for MinION reads. Finally, the assemblies’ quality was assessed using metaquast v4.6.3⁴⁴.

Hybrid metagenomic assembly

Hybrid assemblies were generated using SPAdes v3.14.1⁴² with the same parameters previously described and by adding –pacbio and –nanopore parameters when combining with PacBio reads or MinION reads respectively.

Data Records

Shotgun metagenomes are publicly available without restriction in the EMBL-EBI European Nucleotide Archive (ENA) under accession number PRJEB52977⁴⁵. All binning and taxonomy assignment results and parameters are available as a publicly shared KBase narrative (https://narrative.kbase.us/narrative/125743) and can also be seen at Figshare⁴⁶.

Technical Validation

Library QC checks

Before library preparation by the different sequencing platforms, gDNA mock samples were required to pass quality and quantity controls. Initial DNA quality control included DNA quantification using Quant-iT™ dsDNA Assay Kit broad range (Q33130) reading by FiltermaxF3 (Molecular Devices, Sunnyvale, CA, USA), Qubit dsDNA Assay Kit (Q32853) and fragment analysis using HS Genomic DNA kit (DNG-488-500) on Fragment Analyzer (Agilent Technologies Inc., Santa Clara, CA, USA). The size of initial DNA peak was between 14 and 17 kb without major degradation smear. Additional specific technical validations for DNA integrity were required during each sequencing library preparation to ensure high quality of the final libraries on each platform. Depending on the sequencing technology, these validation steps typically included QC checks after DNA shearing, size selections, purifications on magnetic stands and on the pooled final libraries.

Usage Notes

Run accession numbers for all metagenomic samples, accessible in the ENA website (PRJEB52977), are fully described in Table 4.

Table 4 Shotgun metagenomic datasets description.

Full size table

The protocols and datasets we are presenting in this work can be reused for different applications, in particular to benchmark and improve metagenomic assemblers, taxonomic profilers and binning software. As an example for binning applications, we used three binning software (CONCOCT⁴⁷ v.1.1, MetaBAT2⁴⁸ v1.7 and MaxBin2⁴⁹ v2.2.4) by importing the mock1 Illumina HiSeq 3000 reads, assemblies and hybrid assemblies into KBase⁵⁰ (https://narrative.kbase.us/narrative/125743), with minimum contig size of 1500 nt (Supplementary Table S3). The bins were then optimized using DAS Tool⁵¹ v1.1.2. We observed comparable number of binned MAGs corresponding to reference genomes using hybrid assembly and higher than using the Illumina short reads dataset alone. With all assemblies and datasets, the recovery of high quality MAGs was not successful for very low abundance genomes, present at less than 0.1% of the mock1 community (Supplementary Table S3).

Code availability

All reference genomes and scripts for mapping, assembly, genome coverage estimation, subsampling and correlation calculations associated with tables and figures are available at https://forgemia.inra.fr/metagenopolis/benchmark_mock.

References

Almeida, M. et al. Construction of a dairy microbial genome catalog opens new perspectives for the metagenomic analysis of dairy fermented products. BMC Genomics 15 (2014).
Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci. Data 5 (2017).
Pasolli, E. et al. Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle. Cell 176, 649–662.e20 (2019).
Article CAS PubMed PubMed Central Google Scholar
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 199–504 (2019).
Article Google Scholar
Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea J. Craig Venter. Science (80-.). 304, 66–74 (2004).
Article ADS CAS Google Scholar
Tringe, S. G. et al. Comparative metagenomics of microbial communities. Science 308, 554–7 (2005).
Article ADS CAS PubMed Google Scholar
Qin, N. et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 513, 59–64 (2014).
Article ADS CAS PubMed Google Scholar
Uritskiy, G. & DiRuggiero, J. Applying Genome-Resolved Metagenomics to Deconvolute the Halophilic Microbiome. Genes (Basel) 10, 220 (2019).
Article CAS Google Scholar
Fromentin, S. et al. Microbiome and metabolome features of the cardiometabolic disease spectrum. Nat. Med. 28, 303–314 (2022).
Article CAS PubMed PubMed Central Google Scholar
Segerman, B. The Most Frequently Used Sequencing Technologies and Assembly Methods in Different Time Segments of the Bacterial Surveillance and RefSeq Genome Databases. Front. Cell. Infect. Microbiol 10, 1–7 (2020).
Article Google Scholar
Van Dijk, E. L., Jaszczyszyn, Y., Naquin, D. & Thermes, C. The Third Revolution in Sequencing Technology. Trends Genet. 34, 666–681 (2018).
Article PubMed Google Scholar
Athanasopoulou, K. et al. Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics. Life 12, 30 (2021).
Article PubMed PubMed Central Google Scholar
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shakya, M. et al. Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Env. Microbiol 15, 1882–1899 (2014).
Article Google Scholar
Sevim, V. et al. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Sci. Data 6, 285 (2019).
Article CAS PubMed PubMed Central Google Scholar
Singer, E. et al. Next generation sequencing data of a defined microbial mock community. Sci. Data 3, 160081 (2016).
Article PubMed PubMed Central Google Scholar
Nicholls, S. M., Quick, J. C., Tang, S. & Loman, N. J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 8, giz043 (2019).
Article PubMed PubMed Central Google Scholar
Tourlousse, D. M. et al. Characterization and Demonstration of Mock Communities as Control Reagents for Accurate Human Microbiome Community. Microbiol Spectr 10, e0191521 (2022).
Article PubMed Google Scholar
Hu, Y., Fang, L., Nicholson, C. & Wang, K. Implications of Error-Prone Long-Read Whole- Genome Shotgun Sequencing on Characterizing Reference Microbiomes. iScience 23 (2020).
Yang, C., Chu, J., Warren, L. & Inanc, B. NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 6, 1–6 (2017).
Article PubMed PubMed Central Google Scholar
Moss, E. L., Maghini, D. G. & Bhatt, A. S. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat. Biotechnol. 38, 701–707 (2020).
Article CAS PubMed PubMed Central Google Scholar
Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. L. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics 16, 1–19 (2015).
Article Google Scholar
Alili, R. et al. Exploring Semi-Quantitative Metagenomic Studies Using Oxford Nanopore Sequencing: A Computational and Experimental Protocol. Genes (Basel) 12, 1496 (2021).
Article CAS PubMed Central Google Scholar
Lahens, N. F. et al. A comparison of Illumina and Ion Torrent sequencing platforms in the context of differential gene expression. BMC Genomics 18, 602 (2017).
Article PubMed PubMed Central Google Scholar
Tyler, A. D. et al. Evaluation of Oxford Nanopore’s MinION Sequencing Device for Microbial Whole Genome Sequencing Applications. Sci. Rep. 8, 10931 (2018).
Article ADS PubMed PubMed Central Google Scholar
Hon, T. et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci. Data 7, 399 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. et al. Comparative analysis of 7 short-read sequencing platforms using the Korean Reference Genome: MGI and Illumina sequencing benchmark for whole-genome sequencing. Gigascience 10, 1–9 (2021).
Article CAS Google Scholar
Anslan, S. et al. Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms. PeerJ 9, e12254 (2021).
Article PubMed PubMed Central Google Scholar
Bowers, R. M. et al. Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics 16 (2015).
Jones, M. B. et al. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc Natl Acad Sci USA 112, 14024–14029 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).
Article CAS PubMed Google Scholar
Hillmann, B. et al. Evaluating the information content of shallow shotgun metagenomics. mSystems 3, e00069–18 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation — a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
Article CAS PubMed PubMed Central Google Scholar
Meyer, F. et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat. Methods 19, 429–440 (2022).
Article CAS PubMed PubMed Central Google Scholar
Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, 785–794 (2022).
Article Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, 256–259 (2019).
Article Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, 884–890 (2018).
Article Google Scholar
Criscuolo, A. & Brisse, S. ALIENTRIMMER: A tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. Genomics 102, 500–506 (2013).
Article CAS PubMed Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 1–10 (2019).
Article CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–360 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).
Article CAS PubMed Google Scholar
Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
Article CAS PubMed Google Scholar
ENA European Nucleotide Archive https://identifiers.org/ena.embl:PRJEB52977 (2022).
Meslier, V. et al. Benchmarking second and third-generation sequencing platforms for microbial metagenomics, Figshare, https://doi.org/10.6084/m9.figshare.21261396.v1 (2022).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Article CAS PubMed Google Scholar
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and ef fi cient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Article PubMed PubMed Central Google Scholar
Wu, Y., Simmons, B. A. & Singer, S. W. MaxBin 2. 0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).
Article CAS PubMed Google Scholar
Arkin, A. P. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol. 36, 566–569 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the European FP7 Marie Skłodowska-Curie actions AgreenSkillsPlus PCOFUND-GA-2013-609398 grant to MA. Additional funding was from the Metagenopolis grant ANR-11-DPBS-0001. MP was supported by the U.S. Department of Energy, Office of Science, Biological, and Environmental Research as part of the Genomic Science Program (the Plant Microbe Interfaces SFA), by the Subsurface Biogeochemical Research Program (Biogeochemical Transformations at Critical Interfaces SFA) and by grant R01DE024463 from the National Institute of Dental and Craniofacial Research of the US National Institutes of Health. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725. We thank Dr. Cynthia Gilmour (Smithsonian Research Institute, Edgewater, Maryland, USA) for providing some of the purified gDNA used in this study. We also thank David Stucki, Deborah Moine and Jules Bourgon (Pacific Biosciences, Menlo Park, California, USA) for PacBio sequencing, Jihua Sun and Yong Hou (BGI, Shenzhen, Guangdong, China) for DNBseq sequencing and Clemence Genthon (Plateforme Génomique GeT INRAE Transfert, Toulouse, France) for Illumina sequencing.

Author information

Authors and Affiliations

Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France
Victoria Meslier, Benoit Quinquis, Kévin Da Silva, Florian Plaza Oñate, Nicolas Pons, Hugo Roume & Mathieu Almeida
Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Mircea Podar

Authors

Victoria Meslier
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Quinquis
View author publications
You can also search for this author in PubMed Google Scholar
Kévin Da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Florian Plaza Oñate
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Pons
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Roume
View author publications
You can also search for this author in PubMed Google Scholar
Mircea Podar
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Almeida
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.A., M.P. and H.R. conceived the project. M.P. generated mock communities. M.A. and V.M. analysed the data and generated the figures. B.Q. performed library preparation and sequencing for Ion proton, DNBseq and Minion technologies. N.P. deposited the data. F.P.O., M.P. and K.D.S. contributed to data analysis. V.M., M.A., B.Q. and M.P. wrote the paper. All authors revised the manuscript.

Corresponding authors

Correspondence to Mircea Podar or Mathieu Almeida.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table S1

Supplementary Table S2

Supplementary Table S3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Meslier, V., Quinquis, B., Da Silva, K. et al. Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Sci Data 9, 694 (2022). https://doi.org/10.1038/s41597-022-01762-z

Download citation

Received: 02 June 2022
Accepted: 04 October 2022
Published: 11 November 2022
DOI: https://doi.org/10.1038/s41597-022-01762-z

This article is cited by

Evaluating metagenomics and targeted approaches for diagnosis and surveillance of viruses
- Sarah Buddle
- Leysa Forrest
- Oscar Enrique Torres Montaguth
Genome Medicine (2024)
Phylogenetic incongruence in an Asiatic species complex of the genus Caryodaphnopsis (Lauraceae)
- Shiting Yang
- Jiepeng Huang
- Yu Song
BMC Plant Biology (2024)
Unveiling errors in soil microbial community sequencing: a case for reference soils and improved diagnostics for nanopore sequencing
- Daniel K. Manter
- Catherine L. Reardon
- David B. Knaebel
Communications Biology (2024)
Crossing epigenetic frontiers: the intersection of novel histone modifications and diseases
- Weiyi Yao
- Xinting Hu
- Xin Wang
Signal Transduction and Targeted Therapy (2024)
Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities
- Alexander Van Uffelen
- Andrés Posadas
- Kevin Vanneste
Scientific Data (2024)

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Synthetic microbial communities’ construction

Library preparation and Sequencing

ThermoFisher Ion Proton P1 and Ion GeneStudio S5 library preparation and sequencing

MGI DNBSEQ-G400 and DNBSEQ-T7 library preparation and sequencing

Illumina library preparation and HiSeq 3000 sequencing

Oxford Nanopore MinION R9 library preparation and sequencing

Pacific Biosciences Sequel II library preparation and sequencing

Sequence QC

Read mapping procedures

Read subsampling

Metagenomic assembly

Hybrid metagenomic assembly

Data Records

Technical Validation

Library QC checks

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links