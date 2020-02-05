Butler enables rapid cloud-based analysis of thousands of human genomes

Nature Biotechnology (2020)Cite this article

Subjects

Abstract

We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

Rent or Buy

All prices are NET prices.

Fig. 1: Butler framework architecture.
Fig. 2: Butler performance comparison.

Data availability

PCAWG’s final callsets, somatic and germline variant calls, mutational signatures, subclonal reconstructions, transcript abundance, splice calls and other core data generated by the ICGC/TCGA Pan-cancer Analysis of Whole Genomes Consortium is described in ref. 7 and available for download at https://dcc.icgc.org/releases/PCAWG. Additional information on accessing the data, including raw read files, can be found at https://docs.icgc.org/pcawg/data/. In accordance with the data access policies of the ICGC and TCGA projects, most molecular, clinical and specimen data are in an open tier that does not require access approval. To access potentially identifying information, such as germline alleles and underlying sequencing data, researchers will need to apply to the TCGA Data Access Committee (DAC) via dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) for access to the TCGA portion of the dataset and to the ICGC Data Access Compliance Office (DACO; http://icgc.org/daco) for access to the ICGC portion. In addition, to access somatic single nucleotide variants derived from TCGA donors, researchers will also need to obtain dbGaP authorization.

Code availability

The source code for Butler is freely available at http://github.com/llevar/butler under the GPL v3.0 license.

The project-specific deployment settings, configurations, analysis definitions, and workflows are available at the following:

PCAWG Germline Project: https://github.com/llevar/pcawg-germline

EOSC Pilot: https://github.com/llevar/eosc_pilot

Pan-Prostate Cancer Group: https://github.com/llevar/pan-prostate

The R source code for the analysis is available at https://github.com/llevar/butler_perf_analysis.

The core computational pipelines used by the PCAWG Consortium for alignment, quality control and variant calling are available to the public at https://dockstore.org/search?search=pcawg under the GNU General Public License v3.0, which allows for reuse and distribution.

References

  1. 1.

    Habermann, N., Mardin, B. R., Yakneen, S. & Korbel, J. O. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer. C. R. Biol. 339, 308–313 (2016).

  2. 2.

    Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

  3. 3.

    Vivian, J. & Paten, B. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 35, 314–316 (2017).

  4. 4.

    Mashl, R. J. et al. GenomeVIP: a cloud platform for genomic variant discovery and interpretation. Genome Res. 27, 1450–1459 (2017).

  5. 5.

    Stein, L. D., Knoppers, B. M., Campbell, P., Getz, G. & Korbel, J. O. Data analysis: create a cloud commons. Nature 523, 149–151 (2015).

  6. 6.

    Molnár-Gábor, F., Lueck, R., Yakneen, S. & Korbel, J. O. Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally. Genome Med. 9, 58 (2017).

  7. 7.

    Pan-cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature https://doi.org/10.1038/s41586-020-1969-6 (2020).

  8. 8.

    Soergel, D. A. Rampant software errors may undermine scientific results. F1000 Res. 3, 303 (2014).

  9. 9.

    Gormley, C. & Tong, Z. Elasticsearch: The Definitive Guide (O’Reilly Media, 2015).

  10. 10.

    Leipzig, J. A review of bioinformatic pipeline frameworks. Brief. Bioinformatics 18, 530–536 (2017).

  11. 11.

    Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014, 2 (2014).

  12. 12.

    Amstutz, P. et al. Common Workflow Language, v1. 0. https://w3id.org/cwl/v1.0/; https://doi.org/10.6084/m9.figshare.3115156.v2 (2016).

  13. 13.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  14. 14.

    Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).

  15. 15.

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

  16. 16.

    Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 15, 15.7.11–15.7.12 (2015).

  17. 17.

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

  18. 18.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  19. 19.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

Download references

Acknowledgements

We acknowledge the contributions of the many clinical networks across ICGC and TCGA who provided samples and data to the PCAWG Consortium, and the contributions of the Technical Working Group and the Germline Working Group of the PCAWG Consortium for collation, realignment and harmonized variant calling of the PCAWG cancer genomes. We thank the patients and their families for their participation in the individual ICGC and TCGA projects. We also thank the PPCG project, and J. Weischenfeldt for assistance with the PPCG data. We are grateful to C. Yung, B. O’Connor, J. Zhang and L. Stein for their assistance and invaluable advice throughout the project and to A. Cafferkey, C. Short, D. Ocaña, D. Vianello, E. van den Bergh, S. Newhouse and E. Birney for invaluable support with the EMBL-EBI Embassy Cloud used largely for the computing in this study. We also acknowledge The Cancer Genome Collaboratory, Amazon Web Services, Google Compute Platform and Microsoft Azure for providing computing or cloud infrastructure. J.O.K. acknowledges support by the EOSC Pilot study (European Commission award number 739563), the BMBF (de.NBI project 031A537B), the European Research Council (336045) and the Heidelberg Academy of Sciences and Humanities. S.W. was supported through an SNSF Early Postdoc Mobility fellowship (P2ELP3_155365) and an EMBO Long-Term Fellowship (ALTF 755-2014).

Author information

Author notes
    • Sergei Yakneen
    •  & Sergei Yakneen

    Present address: Sophia Genetics SA, Saint Sulpice, Switzerland

Affiliations

  1. European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
    • Sergei Yakneen
    • , Sebastian M. Waszak
    • , Sergei Yakneen
    • , Sebastian M. Waszak
    • , Joachim Weischenfeldt
    • , Jan O. Korbel
    •  & Jan O. Korbel
  2. Institute of Computer Science, Heidelberg University, Heidelberg, Germany
    • Sergei Yakneen
    • , Sergei Yakneen
    •  & Michael Gertz
  3. EMBL, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
    • Rich Boyce
    • , Andy Cafferkey
    • , Paul Flicek
    • , Nuno A. Fonseca
    • , Steven J. Newhouse
    • , David Ocana
    • , Charles Short
    • , Jan O. Korbel
    •  & Jan O. Korbel
  4. Genome Informatics Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
    • Brice Aminou
    • , Niall J. Byrne
    • , Nodirjon Fayzullaev
    • , Vincent Ferretti
    • , Bob Gibson
    • , George L. Mihaiescu
    • , Hardeep K. Nahal-Bose
    • , Brian D. O’Connor
    • , Marc D. Perry
    • , Christina K. Yung
    •  & Junjun Zhang
  5. Barcelona Supercomputing Center (BSC), Barcelona, Spain
    • Javier Bartolome
    •  & Josep Ll. Gelpi
  6. Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
    • Keith A. Boroevich
  7. RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
    • Keith A. Boroevich
    •  & Hidewaki Nakagawa
  8. Broad Institute of MIT and Harvard, Cambridge, MA, USA
    • Angela N. Brooks
    • , Gad Getz
    • , Julian M. Hess
    • , Ignaty Leshchiner
    • , Dimitri Livitz
    • , Esther Rheinbay
    • , Mara Rosenberg
    • , Gordon Saksena
    • , Grace Tiao
    •  & Jeremiah A. Wala
  9. Dana-Farber Cancer Institute, Boston, MA, USA
    • Angela N. Brooks
    •  & Jeremiah A. Wala
  10. University of California Santa Cruz, Santa Cruz, CA, USA
    • Angela N. Brooks
    •  & Brian D. O’Connor
  11. Oregon Health and Science University, Portland, OR, USA
    • Alex Buchanan
    • , Kyle Ellrott
    •  & Adam J. Struck
  12. Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
    • Ivo Buchhalter
    • , Roland Eils
    • , Michael C. Heinold
    • , Rolf Kabbe
    • , Jules N. A. Kerssemakers
    • , Kortine Kleinheinz
    • , Nagarajan Paramasivam
    • , Manuel Prinz
    • , Matthias Schlesner
    •  & Johannes Werner
  13. Heidelberg Center for Personalized Oncology (DKFZ-HIPO), German Cancer Research Center, Heidelberg, Germany
    • Ivo Buchhalter
  14. Institute of Pharmacy and Molecular Biotechnology and BioQuant, Heidelberg University, Heidelberg, Germany
    • Ivo Buchhalter
    • , Roland Eils
    • , Michael C. Heinold
    • , Daniel Hübschmann
    •  & Kortine Kleinheinz
  15. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
    • Adam P. Butler
    • , Peter J. Campbell
    • , Peter Clapham
    • , Jonathan Nicholson
    •  & Keiran M. Raine
  16. Department of Haematology, University of Cambridge, Cambridge, UK
    • Peter J. Campbell
  17. University of California San Diego, San Diego, CA, USA
    • Zhaohong Chen
    • , Michelle T. Dow
    • , Claudiu Farcas
    • , Antonios Koures
    • , Lucila Ohno-Machado
    •  & Ashley Williams
  18. PDXen Biosystems Inc, Seoul, South Korea
    • Sunghoon Cho
  19. Electronics and Telecommunications Research Institute, Daejeon, South Korea
    • Wan Choi
    • , Seung-Hyup Jeon
    • , Hyunghwan Kim
    •  & Youngchoon Woo
  20. Seven Bridges Genomics, Charlestown, MA, USA
    • Brandi N. Davis-Dusenbery
  21. Annai Systems, Inc, Carlsbad, CA, USA
    • Francisco M. De La Vega
  22. Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
    • Francisco M. De La Vega
  23. Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
    • Francisco M. De La Vega
  24. Departments of Genetics and Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
    • Francisco M. De La Vega
  25. University of Leuven, Leuven, Belgium
    • Jonas Demeulemeester
    •  & Peter Van Loo
  26. The Francis Crick Institute, London, UK
    • Jonas Demeulemeester
    •  & Peter Van Loo
  27. Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
    • Lewis Jonathan Dursi
    • , Wei Jiao
    • , Solomon I. Shorser
    • , Lincoln D. Stein
    • , Adam J. Wright
    •  & Denis Yuen
  28. The Hospital for Sick Children, Toronto, Ontario, Canada
    • Lewis Jonathan Dursi
  29. Heidelberg University, Heidelberg, Germany
    • Juergen Eils
    • , Roland Eils
    •  & Daniel Hübschmann
  30. New BIH Digital Health Center, Berlin Institute of Health (BIH) and Charité – Universitätsmedizin Berlin, Berlin, Germany
    • Juergen Eils
    • , Roland Eils
    •  & Chris Lawerenz
  31. Rigshospitalet, Copenhagen, Denmark
    • Francesco Favero
  32. Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
    • Vincent Ferretti
  33. CIBIO/InBIO— Research Center in Biodiversity and Genetic Resources, Universidade do Porto, Vairão, Portugal
    • Nuno A. Fonseca
  34. Department Biochemistry and Molecular Biomedicine, University of Barcelona, Barcelona, Spain
    • Josep Ll. Gelpi
  35. Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
    • Gad Getz
  36. Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
    • Gad Getz
  37. Harvard Medical School, Boston, MA, USA
    • Gad Getz
    • , Esther Rheinbay
    •  & Jeremiah A. Wala
  38. University of Chicago, Chicago, IL, USA
    • Robert L. Grossman
    •  & Jonathan Spring
  39. Division of Biomedical Informatics, Department of Medicine, & Moores Cancer Center, UC San Diego School of Medicine, San Diego, CA, USA
    • Olivier Harismendy
  40. Children’s Hospital of Philadelphia, Philadelphia, PA, USA
    • Allison P. Heath
  41. Massachusetts General Hospital Center for Cancer Research, Charlestown, MA, USA
    • Julian M. Hess
  42. University of Melbourne Centre for Cancer Research, University of Melbourne, Melbourne, Victoria, Australia
    • Oliver Hofmann
  43. Syntekabio Inc, Daejeon, South Korea
    • Jongwhi H. Hong
  44. AbbVie, North Chicago, IL, USA
    • Thomas J. Hudson
  45. Genomics Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
    • Thomas J. Hudson
  46. German Cancer Consortium (DKTK), Heidelberg, Germany
    • Barbara Hutter
  47. Heidelberg Center for Personalized Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
    • Barbara Hutter
    •  & Nagarajan Paramasivam
  48. National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg, Germany
    • Barbara Hutter
  49. National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
    • Carolyn M. Hutter
    •  & Heidi J. Sofia
  50. Department of Pediatric Immunology, Hematology and Oncology, University Hospital, Heidelberg, Germany
    • Daniel Hübschmann
  51. German Cancer Research Center (DKFZ), Heidelberg, Germany
    • Daniel Hübschmann
  52. Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM), Heidelberg, Germany
    • Daniel Hübschmann
  53. Institute of Medical Science, University of Tokyo, Tokyo, Japan
    • Seiya Imoto
    • , Satoru Miyano
    • , Naoki Miyoshi
    •  & Kazuhiro Ohi
  54. Seven Bridges, Charlestown, MA, USA
    • Sinisa Ivkovic
    • , Sanja Mijalkovic
    • , Ana Mijalkovic Lazic
    • , Mia Nastic
    • , Petar Radovic
    •  & Nebojsa Tijanic
  55. Genome Integration Data Center, Syntekabio, Inc, Daejeon, South Korea
    • Jongsun Jung
    •  & Milena Kovacevic
  56. Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY, USA
    • Andre Kahles
    •  & Gunnar Rätsch
  57. ETH Zurich, Department of Biology, Zurich, Switzerland
    • Andre Kahles
  58. ETH Zurich, Department of Computer Science, Zurich, Switzerland
    • Andre Kahles
  59. SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
    • Andre Kahles
    •  & Gunnar Rätsch
  60. University Hospital Zurich, Zurich, Switzerland
    • Andre Kahles
    •  & Gunnar Rätsch
  61. Department of Biochemistry, College of Medicine, Ewha Womans University, Seoul, South Korea
    • Hyung-Lae Kim
  62. Health Sciences Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
    • Jihoon Kim
  63. Department of Health Sciences and Technology, Sungkyunkwan University School of Medicine, Seoul, South Korea
    • Youngwook Kim
  64. Samsung Genome Institute, Seoul, South Korea
    • Youngwook Kim
  65. Functional and Structural Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
    • Michael Koscher
  66. Leidos Biomedical Research, Inc, McLean, VA, USA
    • Jia Liu
  67. Sage Bionetworks, Seattle, WA, USA
    • Larsson Omberg
  68. Genome Informatics, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
    • B. F Francis Ouellette
  69. Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
    • B. F Francis Ouellette
  70. Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
    • Marc D. Perry
  71. CSRA Incorporated, Fairfax, VA, USA
    • Todd D. Pihl
  72. Barcelona Supercomputing Center, Barcelona, Spain
    • Montserrat Puiggròs
    • , Romina Royo
    • , David Torrents
    •  & David Vicente
  73. Massachusetts General Hospital, Boston, MA, USA
    • Esther Rheinbay
    • , Mara Rosenberg
    •  & Miguel Vazquez
  74. Department of Biology, ETH Zurich, Zurich, Switzerland
    • Gunnar Rätsch
  75. Department of Computer Science, ETH Zurich, Zurich, Switzerland
    • Gunnar Rätsch
  76. Weill Cornell Medical College, New York, NY, USA
    • Gunnar Rätsch
  77. Bioinformatics and Omics Data Analytics, German Cancer Research Center (DKFZ), Heidelberg, Germany
    • Matthias Schlesner
  78. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
    • Lincoln D. Stein
  79. Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
    • David Torrents
    •  & Miguel Vazquez
  80. National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
    • Zhining Wang
    •  & Liming Yang
  81. Finsen Laboratory and Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
    • Joachim Weischenfeldt
  82. Department of Urology, Charité Universitätsmedizin Berlin, Berlin, Germany
    • Joachim Weischenfeldt
  83. Department of Biological Oceanography, Leibniz Institute of Baltic Sea Research, Rostock, Germany
    • Johannes Werner
  84. Ontario Institute for Cancer Research, Toronto, Ontario, Canada
    • Qian Xiang
Authors
  1. Search for Sergei Yakneen in:
  2. Search for Sebastian M. Waszak in:
  3. Search for Michael Gertz in:
  4. Search for Jan O. Korbel in:

Consortia

PCAWG Technical Working Group

PCAWG Consortium

Contributions

This manuscript was written by S.Y. and J.O.K., with input from all authors. S.Y. and J.O.K. are responsible for study conception. S.Y. designed, implemented, and executed the Butler software framework in the context of the analyses described in this manuscript. S.M.W. designed workflows and assessed the integrity of the framework. S.Y. led the data analysis, and S.M.W., M.G. and J.O.K. contributed to data analysis. The PCAWG Technical Working group provided invaluable assistance and feedback. M.G. and J.O.K. provided supervision and project oversight.

Corresponding authors

Correspondence to Sergei Yakneen or Jan O. Korbel.

Ethics declarations

Competing interests

G.G. receives research funds from IBM and Pharmacyclics and is an inventor on patent applications related to MuTect, ABSOLUTE, MutSig, MSMuTect, MSMutSig and POLYSOLVER.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Freebayes workflow.

Freebayes workflow can be used for small variant discovery and genotyping and splits into tasks by chromosome, where each task can run in parallel (not all tasks are visible in figure to save space). Workflow is started and ended by standard start_analysis_run and end_analysis_run that keep track of Analysis state. validate_sample makes sure that access to the data is available.

Supplementary Figure 2 Freebayes task durations.

Boxplot of freebayes task durations during the SNV genotyping stage across 5668 samples. Durations are highly correlated with chromosome length (Pearson’s r=0.92). n=5668 biologically independent samples Boxplot center line corresponds to the median, lower and upper hinges to the 25%th and 75%th percentiles, and whiskers to +- 1.5 Interquartile range from the hinges. The experiment was performed once.

Supplementary Figure 3 Delly workflow durations.

(a) Distribution of Delly workflow durations for genotyping of 244,889 germline deletions across 5668 PCAWG samples. (b) Distribution of Delly workflow durations for genotyping of 217,433 germline duplications across 5668 PCAWG samples. n=5668 biologically independent samples. The experiment was performed once.

Supplementary Figure 4 Analysis Tracker UML diagram.

The Analysis Tracker consists of four entities that are necessary for keeping track of the state of scientific analyses run in Butler. The Workflow object keeps a registry of known workflows and their attributes. The Analysis object keeps track of analyses that are being performed. An Analysis Run represents an instance of running a particular workflow under a particular analysis on a particular sample. Configuration objects keep track of the parameters supplied to the workflow invocation.

Supplementary Figure 5 Analysis Run state transitions.

Each Analysis Run keeps track of its state and has a set of rules governing allowable state transitions. A Run is created in the Ready state from which it may be scheduled for execution. Once the corresponding workflow task is picked up for execution it is transitioned to In-Progress. Upon successful completion it is marked Completed. At any point a failure may put this run in an Error state from which it can recover only to the Ready state to initiate a re-execution of the corresponding workflow.

Supplementary Figure 6 Hierarchical tri-level configuration.

Configuration can be applied at three levels of granularity within Butler - Workflow, Analysis, and Analysis Run. Each higher level configuration may override and augment the configurations supplied at lower levels. At runtime all three levels of configuration are resolved into an “effective configuration”, which is then applied for execution.

Supplementary Figure 7 Butler compute cluster performance metrics during germline deletion genotyping for PCAWG.

(a) Overall load per VM that is part of the Butler cluster - shows no load prior to analysis kick-off, then steady load throughout the analysis, and drop-off in load at the end when VMs start running out of work. (b) CPU profile shows highly variable CPU utilization that is typical of Delly executions. (c) Memory profile is stable and similar between all VMs that are running the analysis. Similar measurements have been observed over the other 5 analyses performed with Butler during PCAWG, although the exact pattern of CPU and Memory utilization is dependent on the algorithms that comprise the workflow being executed.

Supplementary Figure 8 SQL Database state monitoring dashboard.

SQL Database health can be ascertained from logs harvested on the database server. (a) 75th, 99th, and 99.5th percentile of query response times. (b) Count queries by type. (c) Database READ and WRITE counts. (d) Data throughput in and out. These measurements were collected over a single 2-hour run of the software and serve as an example of visualization capabilities, not an indication of typical database performance.

Supplementary information

Supplementary Materials

Supplementary Figures 1–8, Supplementary Tables 1–3 and Supplementary Notes 1 and 2

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yakneen, S., Waszak, S.M., Yakneen, S. et al. Butler enables rapid cloud-based analysis of thousands of human genomes. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-019-0360-3

Download citation