The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.
At a glance
- Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326, 433–436 (2009). et al.
- The Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009).
- A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012). et al.
- Sequencing multiple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiol. 141, 26–31 (2006). et al.
- The 3,000 rice genomes project. The 3,000 rice genomes project. Gigascience 3, 7 (2014).
- Gene Machine (Forbes, 2010).
- Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).
- The genome sequence of the spontaneously hypertensive rat: analysis and functional significance. Genome Res. 20, 791–803 (2010). et al.
- LGI2 truncation causes a remitting focal epilepsy in dogs. PLoS Genet. 7, e1002194 (2011). et al.
- Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009). , &
- Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12, 436–446 (2002). &
- SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003). &
- SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012). et al.
- TILLING. Traditional mutagenesis meets functional genomics. Plant Physiol. 135, 630–636 (2004). , &
- CSF1R mutations identified in three families with autosomal dominantly inherited leukoencephalopathy. Am. J. Med. Genet. B Neuropsychiatr. Genet. 159B, 951–957 (2012). et al.
- Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). et al.
- Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015). et al.
- Variability of candidate genes, genetic structure and association with sugar accumulation and climacteric behavior in a broad germplasm collection of melon (Cucumis melo L.). BMC Genet. 16, 28 (2015). et al.
- Variant discovery in a QTL region on chromosome 3 associated with fatness in chickens. Anim. Genet. 46, 141–147 (2015). et al.
- Wx gene in diploid wheat: molecular characterization of five novel alleles from einkorn (Triticum monococcum L. ssp. monococcum) and T. urartu. Mol. Breeding 34, 1137–1146 (2014). , &
- The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other Compositae crops. Mol. Biol. Evol. 32, 2273–2283 (2015). &
- Whole-genome resequencing analyses of five pig breeds, including Korean wild and native, and three European origin breeds. DNA Res. 22, 259–267 (2015). et al.
- Longest increasing and decreasing subsequences. Can. J. Math. 13, 179–191 (1961).
- SW#db: GPU-accelerated exact sequence similarity database search. doi:10.1101/013805 (14 January 2015). , , &
- Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001). &
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). et al.
- UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007). , , , &
- A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010). et al.
- Lac repressor genetic map in real space. Trends Biochem. Sci. 22, 334–339 (1997). et al.
- Systematic mutation of bacteriophage T4 lysozyme. J. Mol. Biol. 222, 67–88 (1991). , , &
- The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002). et al.
- Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012). et al.
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005). , &
- Supplementary Figure 1: Sensitivity and specificity of SIFT and SIFT 4G. (227 KB)
The algorithms were applied to four datasets: HumDiv (red), HumVar (green), LacI (brown), and lysozyme (blue). SIFT and SIFT 4G’s performances are shown in light-colored and dark-colored bars, respectively. Reproduced under a Creative Commons license from http://sift-dna.org/sift4g/AboutSIFT4G.html.
- Supplementary Figure 2: ROC comparison of SIFT and SIFT 4G. (163 KB)
The algorithms were applied to four datasets: HumDiv (red), HumVar (green), LacI (beige), and lysozyme (blue). SIFT’s performance is depicted with dashed lines; SIFT 4G with solid lines. Reproduced under a Creative Commons license from http://sift-dna.org/sift4g/AboutSIFT4G.html.
- Supplementary Text and Figures (1,142 KB)
Supplementary Figures 1 and 2, Supplementary Tables 1 and 2