Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

SIFT missense predictions for genomes


The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of the SIFT and SIFT 4G algorithms.
Figure 3: The SIFT 4G annotator graphical user interface.
Figure 4: Select the database for the desired organism.
Figure 5: View of the SIFT 4G annotator after annotation has been completed.
Figure 2: Workflow for the SIFT 4G annotator.


  1. Xia, Q. et al. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326, 433–436 (2009).

    Article  CAS  Google Scholar 

  2. The Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009).

  3. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

    Article  CAS  Google Scholar 

  4. McNally, K.L. et al. Sequencing multiple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiol. 141, 26–31 (2006).

    Article  CAS  Google Scholar 

  5. The 3,000 rice genomes project. The 3,000 rice genomes project. Gigascience 3, 7 (2014).

  6. Herper, M. Gene Machine (Forbes, 2010).

  7. Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

  8. Atanur, S.S. et al. The genome sequence of the spontaneously hypertensive rat: analysis and functional significance. Genome Res. 20, 791–803 (2010).

    Article  CAS  Google Scholar 

  9. Seppälä, E.H. et al. LGI2 truncation causes a remitting focal epilepsy in dogs. PLoS Genet. 7, e1002194 (2011).

    Article  Google Scholar 

  10. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

    Article  CAS  Google Scholar 

  11. Ng, P.C. & Henikoff, S. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12, 436–446 (2002).

    Article  CAS  Google Scholar 

  12. Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article  CAS  Google Scholar 

  13. Sim, N.L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).

    Article  CAS  Google Scholar 

  14. Henikoff, S., Till, B.J. & Comai, L. TILLING. Traditional mutagenesis meets functional genomics. Plant Physiol. 135, 630–636 (2004).

    Article  CAS  Google Scholar 

  15. Mitsui, J. et al. CSF1R mutations identified in three families with autosomal dominantly inherited leukoencephalopathy. Am. J. Med. Genet. B Neuropsychiatr. Genet. 159B, 951–957 (2012).

    Article  Google Scholar 

  16. Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    Article  CAS  Google Scholar 

  17. Lamichhaney, S. et al. Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).

    Article  CAS  Google Scholar 

  18. Leida, C. et al. Variability of candidate genes, genetic structure and association with sugar accumulation and climacteric behavior in a broad germplasm collection of melon (Cucumis melo L.). BMC Genet. 16, 28 (2015).

    Article  Google Scholar 

  19. Moreira, G.C. et al. Variant discovery in a QTL region on chromosome 3 associated with fatness in chickens. Anim. Genet. 46, 141–147 (2015).

    Article  CAS  Google Scholar 

  20. Ortega, R., Guzmán, C. & Alvarez, J. Wx gene in diploid wheat: molecular characterization of five novel alleles from einkorn (Triticum monococcum L. ssp. monococcum) and T. urartu. Mol. Breeding 34, 1137–1146 (2014).

    Article  CAS  Google Scholar 

  21. Renaut, S. & Rieseberg, L.H. The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other Compositae crops. Mol. Biol. Evol. 32, 2273–2283 (2015).

    Article  CAS  Google Scholar 

  22. Choi, J.W. et al. Whole-genome resequencing analyses of five pig breeds, including Korean wild and native, and three European origin breeds. DNA Res. 22, 259–267 (2015).

    Article  CAS  Google Scholar 

  23. Schensted, C. Longest increasing and decreasing subsequences. Can. J. Math. 13, 179–191 (1961).

    Article  Google Scholar 

  24. Korpar, M., Sosic, M., Blazeka, D. & Sikic,, M. SW#db: GPU-accelerated exact sequence similarity database search. 10.1101/013805 (14 January 2015).

  25. Ng, P.C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).

    Article  CAS  Google Scholar 

  26. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  27. Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C.H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).

    Article  CAS  Google Scholar 

  28. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  Google Scholar 

  29. Pace, H.C. et al. Lac repressor genetic map in real space. Trends Biochem. Sci. 22, 334–339 (1997).

    Article  CAS  Google Scholar 

  30. Rennell, D., Bouvier, S.E., Hardy, L.W. & Poteete, A.R. Systematic mutation of bacteriophage T4 lysozyme. J. Mol. Biol. 222, 67–88 (1991).

    Article  CAS  Google Scholar 

  31. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

  32. Goodstein, D.M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).

    Article  CAS  Google Scholar 

  33. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).

    Article  CAS  Google Scholar 

Download references


This work is financed in part by A*STAR and the Croatian Science Foundation (project no. 7353, Algorithms for Genome Sequence Analysis). We thank P.C.N.'s significant other for donating his gaming computer 'for science' and M. Korpar for providing the SW#db library.

Author information

Authors and Affiliations



M.S. and P.C.N. conceived the project. R.V. implemented and tested the performance of the SIFT 4G algorithm. S.A. and S.N.L. implemented the SIFT 4G annotator. S.A. and P.C.N. wrote the manuscript.

Corresponding author

Correspondence to Pauline C Ng.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Sensitivity and specificity of SIFT and SIFT 4G.

The algorithms were applied to four datasets: HumDiv (red), HumVar (green), LacI (brown), and lysozyme (blue). SIFT and SIFT 4G’s performances are shown in light-colored and dark-colored bars, respectively. Reproduced under a Creative Commons license from

Supplementary Figure 2 ROC comparison of SIFT and SIFT 4G.

The algorithms were applied to four datasets: HumDiv (red), HumVar (green), LacI (beige), and lysozyme (blue). SIFT’s performance is depicted with dashed lines; SIFT 4G with solid lines. Reproduced under a Creative Commons license from

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2, Supplementary Tables 1 and 2 (PDF 673 kb)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vaser, R., Adusumalli, S., Leng, S. et al. SIFT missense predictions for genomes. Nat Protoc 11, 1–9 (2016).

Download citation

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing