Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Metagenome assembly of high-fidelity long reads with hifiasm-meta

Abstract

De novo assembly of metagenome samples is a common approach to the study of microbial communities. Current metagenome assemblers developed for short sequence reads or noisy long reads were not optimized for accurate long reads. We thus developed hifiasm-meta, a metagenome assembler that exploits the high accuracy of recent data. Evaluated on seven empirical datasets, hifiasm-meta reconstructed tens to hundreds of complete circular bacterial genomes per dataset, consistently outperforming other metagenome assemblers.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Metagenome assemblies of empirical datasets.

Similar content being viewed by others

Data availability

HiFi data were obtained from NCBI Sequence Read Archive (SRA) with accession numbers shown in Table 1. All generated assemblies and underlying data for the figures are available at https://zenodo.org/record/6330282. ZymoBIOMICS mock reference genomes were downloaded from https://s3.amazonaws.com/zymo-files/BioPool/D6331.refseq.zip. The list of reference genomes in the ATCC mock community is available at https://www.atcc.org/products/msa-1003. CheckM database: https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz. GTDB-Tk database: https://data.ace.uq.edu.au/public/gtdb/data/releases/release95/95.0/auxillary_files/. Source data are provided with this paper.

Code availability

Hifiasm-meta is available at https://github.com/xfengnefx/hifiasm-meta.

References

  1. Lapidus, A. L. & Korobeynikov, A. I. Metagenomic data assembly—the way of decoding unknown microorganisms. Front. Microbiol. 12, 613791 (2021).

    Article  Google Scholar 

  2. Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).

    Article  CAS  Google Scholar 

  3. Chen, L.-X., Anantharaman, K., Shaiber, A., Eren, A. M. & Banfield, J. F. Accurate and complete genomes from metagenomes. Genome Res. 30, 315–333 (2020).

    Article  CAS  Google Scholar 

  4. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).

    Article  Google Scholar 

  5. Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).

    Article  CAS  Google Scholar 

  6. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    Article  CAS  Google Scholar 

  7. Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

    Article  CAS  Google Scholar 

  8. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article  CAS  Google Scholar 

  9. Cao, C. et al. Reconstruction of microbial haplotypes by integration of statistical and physical linkage in scaffolding. Mol. Biol. Evol. 38, 2660–2672 (2021).

    Article  CAS  Google Scholar 

  10. Hui, J., Shomorony, I., Ramchandran, K. & Courtade, T. A. Overlap-based genome assembly from variable-length reads. In IEEE International Symposium on Information Theory, ISIT 2016 1018–1022 (IEEE, 2016).

  11. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Article  CAS  Google Scholar 

  12. Bowers, R. M. et al. Minimum information about a single amplified genome (misag) and a metagenome-assembled genome (mimag) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

    Article  CAS  Google Scholar 

  13. Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 17, 132 (2016).

    Article  Google Scholar 

  14. Bickhart, D. M. et al. Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-01130-z (2022).

  15. Moss, E. L., Maghini, D. G. & Bhatt, A. S. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat. Biotechnol. 38, 701–707 (2020).

    Article  CAS  Google Scholar 

  16. Vicedomini, R., Quince, C., Darling, A. E. & Chikhi, R. Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat. Commun. 12, 4485 (2021).

    Article  CAS  Google Scholar 

  17. Hon, T. et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci. Data 7, 399 (2020).

    Article  CAS  Google Scholar 

  18. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36, 1925–1927 (2019).

    PubMed Central  Google Scholar 

  19. Asnicar, F., Weingart, G., Tickle, T. L., Huttenhower, C. & Segata, N. Compact graphical representation of phylogenetic data and metadata with graphlan. PeerJ 3, e1029 (2015).

    Article  Google Scholar 

  20. Tange, O. GNU Parallel - the command-line power tool. The USENIX Magazine 36, 42–47 (2011).

    Google Scholar 

  21. Li, H. et al. The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  22. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  Google Scholar 

  23. Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).

    Article  CAS  Google Scholar 

  24. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank W. Fan from Agricultural Genomics Institute, Shenzhen, China, for sharing the chicken gut dataset. This study was supported by US National Institutes of Health (grant R01HG010040 and U01HG010971, to H.L.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

X.F. and H.L. conceived the project, designed the algorithm and wrote the manuscript. X.F. implemented the algorithm and evaluated the metagenome assemblies. H.C. helped with the algorithm implementation. D.P. helped with assembly evaluations. All authors helped with the data analysis and revised the manuscript.

Corresponding author

Correspondence to Heng Li.

Ethics declarations

Competing interests

D.P. is an employee of Pacific Biosciences. H.L. is a consultant of Integrated DNA Technologies and on the Scientific Advisory Boards of Sentieon and Innozeen. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Mads Albertsen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The hifiasm-meta assembly graph of the sheepA dataset.

Short disconnected contigs are not shown.

Source data

Extended Data Fig. 2 Yak QV score correlated with contig coverage.

Plots showing >1Mb contigs in sheepB assemblies. Contig coverage was estimated by jgi_summarize_bam_contig_depths from metabat2, and alignment was done with minimap2 -a -k 19 -w 10 -I 10G -g 5000 -r 2000 –lj-min-ratio 0.5 -A 2 -B 5 -O 5,56 -E 4,1 -z 400,50. Hifiasm-meta assembled 37 contigs without k-mer errors. HiCanu and metaFlye each assembled 4 contigs without k-mer errors.

Source data

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 1–6.

Source data

Source Data Fig. 1

The data points used to plot Fig. 1.

Source Data Extended Data Fig. 1

Instruction for redirecting the interested reader to data release (on Zenodo) for the assembly graph, which is the underlying data for Extended Data Fig. 1.

Source Data Extended Data Fig. 2

The data points used to plot Extended Data Fig. 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, X., Cheng, H., Portik, D. et al. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat Methods 19, 671–674 (2022). https://doi.org/10.1038/s41592-022-01478-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01478-3

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics