Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR

Yang, Hui; Wang, Kai

doi:10.1038/nprot.2015.105

Protocol
Published: 17 September 2015

Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR

Nature Protocols volume 10, pages 1556–1566 (2015)Cite this article

15k Accesses
551 Citations
18 Altmetric
Metrics details

Subjects

Abstract

Recent developments in sequencing techniques have enabled rapid and high-throughput generation of sequence data, democratizing the ability to compile information on large amounts of genetic variations in individual laboratories. However, there is a growing gap between the generation of raw sequencing data and the extraction of meaningful biological information. Here, we describe a protocol to use the ANNOVAR (ANNOtate VARiation) software to facilitate fast and easy variant annotations, including gene-based, region-based and filter-based annotations on a variant call format (VCF) file generated from human genomes. We further describe a protocol for gene-based annotation of a newly sequenced nonhuman species. Finally, we describe how to use a user-friendly and easily accessible web server called wANNOVAR to prioritize candidate genes for a Mendelian disease. The variant annotation protocols take 5–30 min of computer time, depending on the size of the variant file, and 5–10 min of hands-on time. In summary, through the command-line tool and the web server, these protocols provide a convenient means to analyze genetic variants generated in humans and other species.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: The three different types of annotations supported by ANNOVAR are gene-based, region-based and filter-based annotations.**

**Figure 2: Screenshot of wANNOVAR, including the general steps to upload and prioritize variants.**

**Figure 3: Screenshot of the wANNOVAR result page.**

Jasmine and Iris: population-scale structural variant comparison and analysis

Article 19 January 2023

Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank

Article Open access 29 June 2023

GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data

Article 21 August 2023

References

Li, H. & Homer, N. A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11, 473–483 (2010).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article PubMed Google Scholar
Nagarajan, N. & Pop, M. Sequence assembly demystified. Nat. Rev. Genet. 14, 157–167 (2013).
Article CAS PubMed Google Scholar
Li, H. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28, 1838–1844 (2012).
Article CAS PubMed Google Scholar
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Article CAS PubMed Google Scholar
Xie, Y. et al. SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-seq reads. Bioinformatics 30, 1660–1666 (2014).
Article CAS PubMed Google Scholar
Andrews, S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc (2010).
Nielsen, R., Paul, J.S., Albrechtsen, A. & Song, Y.S. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).
Article CAS PubMed Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed Google Scholar
Zhao, M., Wang, Q., Wang, Q., Jia, P. & Zhao, Z. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics 14, S1 (2013).
Article PubMed Google Scholar
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Article CAS PubMed Google Scholar
Zhu, M. et al. Using ERDS to infer copy-number variants in high-coverage genomes. Am. J. Hum. Genet. 91, 408–421 (2012).
Article CAS PubMed Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed Google Scholar
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
Article CAS PubMed Google Scholar
De Baets, G. et al. SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Res. 40 (Database issue): D935–D939 (2012).
Article CAS PubMed Google Scholar
Hu, H. et al. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genet. Epidemiol. 37, 622–634 (2013).
Article PubMed Google Scholar
Makarov, V. et al. AnnTools: a comprehensive and versatile annotation toolkit for genomic variants. Bioinformatics 28, 724–725 (2012).
Article CAS PubMed Google Scholar
Michaelson, J.J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).
Article CAS PubMed Google Scholar
Girard, S.L. et al. Increased exonic de novo mutation rate in individuals with schizophrenia. Nat. Genet. 43, 860–863 (2011).
Article CAS PubMed Google Scholar
Weedon, M.N. et al. Exome sequencing identifies a DYNC1H1 mutation in a large pedigree with dominant axonal Charcot-Marie-Tooth disease. Am. J. Hum. Genet. 89, 308–312 (2011).
Article CAS PubMed Google Scholar
Lai, C.-C. et al. Whole-exome sequencing to identify a novel LMNA gene mutation associated with inherited cardiac conduction disease. PLoS ONE 8, e83322 (2013).
Article PubMed Google Scholar
Brownstein, C.A. et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 15, R53 (2014).
Article PubMed Google Scholar
Liu, J. et al. Regenerative phenotype in mice with a point mutation in transforming growth factor β type I receptor (TGFBR1). Proc. Natl. Acad. Sci. USA 108, 14560–14565 (2011).
Article CAS PubMed Google Scholar
Nam, K. et al. Strong selective sweeps associated with ampliconic regions in great ape X chromosomes. arXiv:1402.5790 (2014).
Chang, X. & Wang, K. wANNOVAR: annotating genetic variants for personal genomes via the web. J. Med. Genet. 49, 433–436 (2012).
Article PubMed Google Scholar
Yang, H., Robinson, P.N. & Wang, K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat. Methods 10.1038/nmeth.3484 (20 July 2015).
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed Google Scholar
Lewis, B.P., Shih, I.-h., Jones-Rhoades, M.W., Bartel, D.P. & Burge, C.B. Prediction of mammalian microRNA targets. Cell 115, 787–798 (2003).
Article CAS PubMed Google Scholar
Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).
Article CAS PubMed Google Scholar
Consortium, G.P. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Article Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Article CAS PubMed Google Scholar
Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS PubMed Google Scholar
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed Google Scholar
Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Article CAS PubMed Google Scholar
Lyon, G.J. & Wang, K. Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress. Genome Med. 4, 58 (2012).
Article PubMed Google Scholar
Hu, H. et al. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat. Biotechnol. 32, 663–669 (2014).
Article CAS PubMed Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Article CAS PubMed Google Scholar
Paila, U., Chapman, B.A., Kirchner, R. & Quinlan, A.R. GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput. Biol. 9, e1003153 (2013).
Article CAS PubMed Google Scholar
Habegger, L. et al. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. Bioinformatics 28, 2267–2269 (2012).
Article CAS PubMed Google Scholar
Ng, S.B. et al. Exome sequencing identifies the cause of a Mendelian disorder. Nature Genet. 42, 30–35 (2010).
Article CAS PubMed Google Scholar
Vuong, H. et al. AVIA v2.0: annotation, visualization and impact analysis of genomic variants and genes. Bioinformatics 31, 2748–2750 (2015).
Article CAS PubMed Google Scholar
Medina, I. et al. VARIANT: command line, web service and web interface for fast and accurate functional characterization of variants found by next-generation sequencing. Nucleic Acids Res. 40, W54–W58 (2012).
Article CAS PubMed Google Scholar
McCarthy, D.J. et al. Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014).
Article PubMed Google Scholar
Dong, C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole-exome sequencing studies. Hum. Mol. Genet. 24, 2125–2137 (2015).
Article CAS PubMed Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS PubMed Google Scholar
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article CAS PubMed Google Scholar
Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
Article PubMed Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed Google Scholar
Consortium, G.P. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Article Google Scholar
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).
Article CAS PubMed Google Scholar
Landrum, M.J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42 (Database issue): D980–D985 (2014).
Article CAS PubMed Google Scholar
Day, I.N. dbSNP in the detail and copy number complexities. Hum. Mutat. 31, 2–4 (2010).
Article CAS PubMed Google Scholar
Karolchik, D. et al. The UCSC genome browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014).
Article CAS PubMed Google Scholar
Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Article CAS PubMed Google Scholar
Hsu, F. et al. The UCSC known genes. Bioinformatics 22, 1036–1046 (2006).
Article CAS PubMed Google Scholar
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
Article CAS PubMed Google Scholar
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Article CAS PubMed Google Scholar
Ng, P.C. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS PubMed Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

Development of the ANNOVAR/wANNOVAR tool is supported by US National Institutes of Health (NIH) grant R01 HG006465. We thank X. Chang for the initial development of the wANNOVAR server. We thank all ANNOVAR and wANNOVAR users for their helpful suggestions, comments and bug reports to improve the software tools and web servers.

Author information

Authors and Affiliations

Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California, USA
Hui Yang & Kai Wang
Neuroscience Graduate Program, University of Southern California, Los Angeles, California, USA
Hui Yang
Department of Psychiatry, University of Southern California, Los Angeles, California, USA
Kai Wang
Department of Preventive Medicine, Division of Bioinformatics, University of Southern California, Los Angeles, California, USA
Kai Wang

Authors

Hui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.Y. and K.W. drafted and revised this manuscript.

Corresponding author

Correspondence to Kai Wang.

Ethics declarations

Competing interests

K.W. is a shareholder and board member of Tute Genomics.

Integrated supplementary information

Supplementary Figure 1 The expected results for discovering candidate genes of the ‘hemolytic anemia’ example in the Phenolyzer website.

Each ball represents one of the top 50 ranked genes. The larger the ball, the higher the ranking. The blue balls represent disease genes reported before and the yellow ones represent predicted disease genes. For detailed explanations on each color and shape, and on how the algorithm works to find disease genes, please visit http://phenolyzer.usc.edu/FAQ.php

Supplementary Figure 2 Explanation of each column in the wANNOVAR web view.

This is a sample output with default parameters. The first 5 columns represent the original information on the input variants. The following 5 columns give gene-based annotations on each variant. The following columns give region-based and filter-based annotations on each variant.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2, Supplementary Tables 1–3 (PDF 841 kb)

Supplementary Data

ANNOVAR and VEP comparison results (XLSX 453 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Wang, K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc 10, 1556–1566 (2015). https://doi.org/10.1038/nprot.2015.105

Download citation

Published: 17 September 2015
Issue Date: October 2015
DOI: https://doi.org/10.1038/nprot.2015.105

This article is cited by

A loss-of-function variant in ZCWPW1 causes human male infertility with sperm head defect and high DNA fragmentation
- Yuelin Song
- Juncen Guo
- Hongjing Wang
Reproductive Health (2024)
Statistical methods for assessing the effects of de novo variants on birth defects
- Yuhan Xie
- Ruoxuan Wu
- Hongyu Zhao
Human Genomics (2024)
The genetic basis of early-onset hereditary ataxia in Iran: results of a national registry of a heterogeneous population
- Nejat Mahdieh
- Morteza Heidari
- Mahmoud Reza Ashrafi
Human Genomics (2024)
Analysis of genetic testing in fetuses with congenital heart disease of single atria and/or single ventricle in a Chinese prenatal cohort
- Min Li
- Baoying Ye
- Weiwei Cheng
BMC Pediatrics (2023)
In silico methods for predicting functional synonymous variants
- Brian C. Lin
- Upendra Katneni
- Chava Kimchi-Sarfaty
Genome Biology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR

Subjects

Abstract

Access options

Similar content being viewed by others

Jasmine and Iris: population-scale structural variant comparison and analysis

Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank

GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 The expected results for discovering candidate genes of the ‘hemolytic anemia’ example in the Phenolyzer website.

Supplementary Figure 2 Explanation of each column in the wANNOVAR web view.

Supplementary information

Supplementary Text and Figures

Supplementary Data

Rights and permissions

About this article

Cite this article

This article is cited by

A loss-of-function variant in ZCWPW1 causes human male infertility with sperm head defect and high DNA fragmentation

Statistical methods for assessing the effects of de novo variants on birth defects

The genetic basis of early-onset hereditary ataxia in Iran: results of a national registry of a heterogeneous population

Analysis of genetic testing in fetuses with congenital heart disease of single atria and/or single ventricle in a Chinese prenatal cohort

In silico methods for predicting functional synonymous variants

Comments

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links