Brief Report | Published:

Catching hidden variation: systematic correction of reference minor allele annotation in clinical variant calling

Genetics in Medicine volume 20, pages 360364 (2018) | Download Citation



We comprehensively assessed the influence of reference minor alleles (RMAs), one of the inherent problems of the human reference genome sequence.


The variant call format (VCF) files provided by the 1000 Genomes and Exome Aggregation Consortium (ExAC) consortia were used to identify RMA sites. All coding RMA sites were checked for concordance with UniProt and the presence of same codon variants. RMA-corrected predictions of functional effect were obtained with SIFT, PolyPhen-2, and PROVEAN standalone tools and compared with dbNSFP v2.9 for consistency.


We systematically characterized the problem of RMAs and identified several possible ways in which RMA could interfere with accurate variant discovery and annotation. We have discovered a systematic bias in the automated variant effect prediction at the RMA loci, as well as widespread switching of functional consequences for variants located in the same codon as the RMA. As a convenient way to address the problem of RMAs we have developed a simple bioinformatic tool that identifies variation at RMA sites and provides correct annotations for all such substitutions. The tool is available free of charge at


Correction of RMA annotation enhances the accuracy of next-generation sequencing–based methods in clinical practice.

  • Subscribe to Genetics in Medicine for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.


  1. 1.

    , , et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016;536:283–291.

  2. 2.

    , , et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17:405–423.

  3. 3.

    , , et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860–921.

  4. 4.

    , , et al. VaRank: a simple and powerful tool for ranking genetic variants. PeerJ 2015;3:e796.

  5. 5.

    , , et al. Exome sequencing identifies potential risk variants for Mendelian disorders at high prevalence in Qatar. Hum Mutat 2014;35:105–116.

  6. 6.

    , , et al. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet 2011;7:e1002280.

  7. 7.

    , , et al. The use of non-variant sites to improve the clinical assessment of whole-genome sequence data. PLoS One 2015;10:e0132180.

  8. 8.

    , , et al. Characterization and identification of hidden rare variants in the human genome. BMC Genomics 2015;16:340.

  9. 9.

    , , et al. A global reference for human genetic variation. Nature 2015;526:68–74.

  10. 10.

    , , , , . Predicting the functional effect of amino acid substitutions and indels. PLoS One 2012;7:e46688.

  11. 11.

    , . PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 2015;31:2745–2747.

  12. 12.

    , , . Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009;4:1073–1081.

  13. 13.

    , . Predicting deleterious amino acid substitutions. Genome Res 2001;11:863–874.

  14. 14.

    , , et al. A method and server for predicting damaging missense mutations. Nat Methods 2010;7:248–249.

  15. 15.

    , , . dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat 2013;34:E2393–E2402.

  16. 16.

    , , et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 2012;493:216–220.

  17. 17.

    , , et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 2015;43:D805–D811.

  18. 18.

    , , et al. Enlarged parietal foramina caused by mutations in the homeobox genes ALX4 and MSX2: from genotype to phenotype. Eur J Hum Genet 2006;14:151–158.

Download references


The study was supported by Russian Science Foundation grant 14 50 00069. Equipment from the Biobank of the Research Park of St. Petersburg State University was used for whole-exome sequencing experiments analyzed in the present study.

Author information


  1. Bioinformatics Institute, St. Petersburg, Russia

    • Yury A Barbitoff
    • , Igor V Bezdvornykh
    •  & Alexander V Predeus
  2. Biobank of the Research Park, St. Petersburg State University, St. Petersburg, Russia

    • Yury A Barbitoff
    • , Dmitrii E Polev
    • , Elena A Serebryakova
    •  & Andrey S Glotov
  3. Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia

    • Yury A Barbitoff
    •  & Oleg S Glotov


  1. Search for Yury A Barbitoff in:

  2. Search for Igor V Bezdvornykh in:

  3. Search for Dmitrii E Polev in:

  4. Search for Elena A Serebryakova in:

  5. Search for Andrey S Glotov in:

  6. Search for Oleg S Glotov in:

  7. Search for Alexander V Predeus in:

Competing interests

The authors declare no conflict of interest.

Corresponding author

Correspondence to Alexander V Predeus.

Supplementary information

About this article

Publication history





Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.

Supplementary material is linked to the online version of the paper at