  • Article
  • Published:

Pangenome graph construction from genome alignments with Minigraph-Cactus


Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph’s ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.

Fig. 1: Minigraph-Cactus pangenome construction.
Fig. 2: Evaluating GRCh38-based and T2T-CHM13-based human pangenomes.
Fig. 3: Comparing pangenome SV genotyping.
Fig. 4: A D. melanogaster pangenome.

Data availability

All data, software versions and commands are available at

HPRC graphs can be downloaded from Consult the Data Portal for explanations of the different files: Variant calls can be downloaded from SV genotyping results are available at D. melanogaster graphs can be downloaded from Consult the Data Portal for explanations of the different files: D. melanogaster mapping and calling results can be downloaded from

Code availability

All source code for the Minigraph-Cactus pangenome pipeline, as well as release binaries, Docker images and user manuals, can be found at


We thank A. D. Long for many suggestions and insights regarding the D. melanogaster data and the whole vg team for their work to create and maintain vg, upon which much of this work depends. B.P., A.N., J.M.E. and J.M. were partly supported by National Institutes of Health (NIH) grants R01HG010485, U24HG010262, U24HG011853, OT3HL142481, U01HG010961 (with H.L.) and OT2OD033761. H.L. was partly supported by NIH grant R01HG010040 and T.M. by U01HG010973. Computational infrastructure and support for running PanGenie were provided by the Centre for Information and Media Technology at Heinrich Heine University Düsseldorf.

Author information

Authors and Affiliations




G.H., J.M., H.L. and B.P. designed the method. G.H., J.M. and J.E. contributed to the results and analysis. G.H., J.M., A.N., J.E. and B.P. wrote the mansuscript. All authors contributed to the software. B.P. led the project.

Corresponding authors

Correspondence to Glenn Hickey or Benedict Paten.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20 and Supplementary Tables 1–6.

Reporting Summary

This article is cited by


