Quantifying harmful mutations in human populations

Subramanian, Sankar

doi:10.1038/ejhg.2012.68

Download PDF

Short Report
Published: 18 April 2012

Quantifying harmful mutations in human populations

Sankar Subramanian¹

European Journal of Human Genetics volume 20, pages 1320–1322 (2012)Cite this article

1021 Accesses
15 Citations
2 Altmetric
Metrics details

Subjects

Abstract

A number of previous studies suggested the presence of deleterious amino acid altering nonsynonymous single-nucleotide polymorphisms (nSNPs) in human populations. However, the proportions of deleterious nSNPs among rare and common variants are not known. To estimate these, >77 000 SNPs from human protein-coding genes were analyzed. Based on two independent methods, this study reveals that up to 53% of rare nSNPs (minor allele frequency (MAF)<0.002) could be deleterious in nature. The fraction of deleterious nSNPs declines with the increase in their allele frequencies and only 12% of the common nSNPs (MAF>0.4) were found to be harmful. This shows that even at high frequencies significant fractions of deleterious polymorphisms are present in human populations. These results could be useful for genome-wide association studies in understanding the relative contributions of rare and common variants in causing human genetic diseases.

A structural variation reference for medical and population genetics

Article Open access 27 May 2020

Ryan L. Collins, Harrison Brand, … Michael E. Talkowski

Never-homozygous genetic variants in healthy populations are potential recessive disease candidates

Article Open access 08 September 2022

Torsten Schmenger, Gaurav D. Diwan, … Robert B. Russell

Effective variant filtering and expected candidate variant yield in studies of rare human disease

Article Open access 15 July 2021

Brent S. Pedersen, Joe M. Brown, … Aaron R. Quinlan

Introduction

Although harmful mutations affect fitness of an organism they are nevertheless present in human populations and contribute to the diversity due to random genetic drift.¹ However natural selection eliminates such deleterious mutations over time and thus they are prevented from reaching high frequencies. Therefore low-frequency single-nucleotide polymorphisms (SNPs) typically comprise deleterious as well as neutral polymorphisms, whereas high frequency SNPs are largely neutral in nature. As amino-acid-changing SNPs might be detrimental to proper protein function, a significant proportion of them could be harmful. A number of previous studies have shown an enrichment of low-frequency nonsynonymous SNPs (nSNPs) compared with those with high frequencies,^{2, 3, 4, 5} which indirectly suggests that these nSNPs are deleterious and removed over time by natural selection. However the fraction of deleterious nSNPs with respect to their allele frequencies is unclear. In other words the proportion of deleterious nSNPs among low (or high) frequency variants has not been quantified. To estimate this, the present investigation has gathered over 77 000 SNPs from human protein-coding genes and grouped them based on their minor allele frequencies. Two independent methods were used to estimate the proportion of deleterious nSNPs and the frequency distribution of these harmful nSNPs was examined.

Materials and methods

SNP data

First, SNPs of all human protein-coding genes (dbSNP build130) were obtained from the UCSC genome resource (http://genome.ucsc.edu/). Then using the rsIDs of SNPs, their corresponding minor allele frequencies were obtained from the dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/). For consistency, only the SNPs and their allele frequencies reported by the 1000 genome project² (1000 Genome phase 1 – May 2011 data release) were used for further analysis. This final data set consisted of 37 123 nSNPs and 40 599 synonymous SNPs (sSNPs). These SNPs were grouped into 10 categories based on their minor allele frequencies and the proportion of deleterious nSNPs was computed for each category as described below.

Estimation of the deleterious proportion of nSNPs

McDonald and Kreitman⁶ showed that under neutral evolution the ratio of nonsynonymous (P_n) to synonymous (P_s) polymorphisms (P_n/P_s) within species is expected to be equal to the ratio of nonsynonymous (D_n) to synonymous (D_s) substitutions between species, that is,

However, it is clear from Table 1 that the ratios of SNPs are always higher than that of the substitutions between human–chimp, that is,

Table 1 Human polymorphisms, substitutions (between human–chimp) and deleterious fractions of nSNPs

Full size table

This is due to the presence of deleterious nSNPs in the human populations as predicted by previous theoretical studies.^{1, 7} Hence to subtract the fraction of deleterious nSNPs (δ) the equation could be written as

This equation could be simplified to estimate the fraction of deleterious nSNPs (δ) as:

The measure δ is the fraction of deleterious nSNPs that are segregating in the population.⁸ The numbers of nonsynonymous (D_n=47 079) and synonymous (D_s=71 956) substitutions (based on 13 454 orthologous human–chimpanzee protein coding genes) were obtained from a previous study.⁹ To obtain the standard error, a bootstrap procedure was used by resampling the SNPs (1000 replications).

Quantification of the fraction of damaging nSNPs

To determine the deleterious nature of each nSNP, the online software tool Polyphen-2 (http://genetics.bwh.harvard.edu/pph2/bgi.shtml) was used.¹⁰ Using protein secondary structures, functional motifs, and relative conservation of each amino acid in the protein, the above program predicts the possible impact of an amino-acid replacement polymorphism on the structure and/or function of a human protein. For each nSNP, this program predicted whether the given type of amino acid change is benign, possibly damaging or probably damaging. The fraction of damaging nSNPs (ρ) was computed by adding the counts of possibly and probably damaging nSNPs and dividing this by the total nSNP count. The binomial variance was used to estimate the SE.

Results

As some of the amino-acid polymorphisms are deleterious, selection prevent such nSNPs from spreading in a population. Therefore nSNPs are expected to be more abundant at low frequencies than at high frequencies. In contrast, all sSNPs are largely neutral and hence they are likely to be present in equal proportions at low and high frequencies. Therefore sSNPs could be used as a normalizing factor and thus the ratio of nSNPs to sSNPs (P_n/P_s) will reflect the excess fraction of nSNPs. Table 1 shows that this ratio has a negative relationship with the minor allele frequencies of SNPs. This ratio is roughly two times (1.4 vs 0.74) higher for the SNPs with a minor allele frequency (MAF) of <0.002 compared with those with a MAF>0.4. It should be noted that the discovery of very-low-frequency variants (MAF<0.002) might be error prone as the observed number of minor alleles was small (<4).² However the method used to estimate the fraction of deleterious SNPs is based on the ratio of nSNPs and sSNPs. Hence this estimate will not be significantly affected as the error rate is expected to be fairly the same for both types of SNPs.

The ratio of nonsynonymous to synonymous substitutions (D_n/D_s) estimated for the human–chimp pair (0.65) is significantly smaller than all P_n/P_s ratios (G test, P<0.0001). This suggests an overabundance of nSNPs with respect to the nonsynonymous substitutions and this excess fraction of nSNPs is deleterious as they were prevented from becoming fixed (see Rand and Kann¹¹). This deleterious fraction was estimated as described in the Materials and Methods section. Clearly the deleterious proportion of nSNPs (δ) shows a negative relationship with minor allele frequencies (Figure 1a). Deleterious nSNPs constitute as high as 53% of the nSNPs with a MAF <0.002, whereas for common nSNPs (MAF=0.4–0.5) the deleterious fraction is only 12%.

I also used an independent method to quantify the fraction of deleterious nSNPs using the online software tool Polyphen-2.¹⁰ This program determines the deleterious nature of amino-acid-changing nSNPs based on their effect on protein structure and/or function and based on their location in the protein. Using this software the fraction of damaging nSNPs (ρ) was estimated as explained in the Materials and Methods section. Interestingly, the relationship between ρ and MAF shown in Figure 1b is very similar to that observed for δ and MAF (Figure 1a). The estimate ρ obtained for low-frequency nSNPs (MAF<0.002) was 2.6 times higher than that estimated for high-frequency nSNPs with a MAF=0.4–0.5 (0.34 vs 0.13). Here the estimate ρ includes the nSNPs that are predicted by Polyphen-2 as ‘possibly damaging’ and ‘probably damaging’ with probabilities of >50% and >95%, respectively, to disrupt the structure and/or function of a protein. However using only ‘probably damaging’ nSNPs also produced a negative relationship with similar magnitude and the ρ of low-frequency nSNPs was three times higher than that of high-frequency SNPs (0.19 vs 0.06).

Discussion

Based on two independent methods this study estimated the proportion of deleterious amino acid variants in human populations. The first method showed a much higher fraction of deleterious nSNPs among the rare variants (MAF<0.002) compared with the second method (53% vs 34%; Table 1). As the second method (using Polyphen-2) depends on the relevant information available for a protein (to predict the deleterious nature of an SNP), this method is rather subjective. More detailed information about proteins in the future might result in redefining some of the harmless nSNPs to harmful ones. In contrast the first method is based on a ratio, which is objective and not depended on the availability of protein specific information.

The high fraction of deleterious nSNPs reported for the low-frequency nSNPs suggests that rare variants are more likely to be associated with diseases than common variants.^{5, 12} On the other hand the results also showed that a significant fraction of high-frequency nSNPs could be deleterious in nature. This suggests a likely association of some of the common variants to human genetic diseases.¹³ The deleterious fraction of nSNPs reported here could be an underestimate of deleterious mutations in humans as it does not include lethal or strongly deleterious mutations. On the other hand, these estimates might include false positive SNPs due to sequencing errors.¹⁴

The present study has estimated the proportion of deleterious SNPs (δ) only for protein-coding regions. However, the same formula could be used to estimate δ for SNPs in constrained noncoding regions such as UTRs, promotors, enhancers, and silencers. For such a calculation, P_n and D_n are the number of SNPs and substitutions observed in the noncoding region (eg, promotor), and P_s and D_s are the number of SNPs and substitutions in synonymous positions or intron(s). The findings of this study might have implications in genome-wide association studies in understanding the respective contributions of rare as well as common variants to human diseases.

References

Kimura M : The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University press, 1983.
Book Google Scholar
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010; 467: 1061–1073.
Article Google Scholar
Cargill M, Altshuler D, Ireland J et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 1999; 22: 231–238.
Article CAS Google Scholar
Frazer KA, Ballinger DG, Cox DR et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007; 449: 851–861.
Article CAS Google Scholar
Zhu Q, Ge D, Maia JM et al. A genome-wide comparison of the functional properties of rare and common genetic variants in humans. Am J Hum Genet 2011; 88: 458–468.
Article CAS Google Scholar
McDonald JH, Kreitman M : Adaptive protein evolution at the Adh locus in Drosophila. Nature 1991; 351: 652–654.
Article CAS Google Scholar
Kryazhimskiy S, Plotkin JB : The population genetics of dN/dS. Plos Genet 2008; 4: e1000304.
Article Google Scholar
Subramanian S : High proportions of deleterious polymorphisms in constrained human genes. Mol Biol Evol 2011; 28: 49–52.
Article CAS Google Scholar
Mikkelsen TS, Hillier LW, Eichler EE et al. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005; 437: 69–87.
Article CAS Google Scholar
Adzhubei IA, Schmidt S, Peshkin L et al. A method and server for predicting damaging missense mutations. Nat Methods 2010; 7: 248–249.
Article CAS Google Scholar
Rand DM, Kann LM : Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. Mol Biol Evol 1996; 13: 735–748.
Article CAS Google Scholar
Pritchard JK : Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet 2001; 69: 124–137.
Article CAS Google Scholar
Reich DE, Lander ES : On the allelic spectrum of human disease. Trends Genet 2001; 17: 502–510.
Article CAS Google Scholar
MacArthur DG, Tyler-Smith C : Loss-of-function variants in the genomes of healthy humans. Hum Mol Genet 2010; 19: R125–R130.
Article CAS Google Scholar

Download references

Acknowledgements

The author is grateful to David Lambert and thanks Leon Huynen and two anonymous reviewers for valuable comments.

Author information

Authors and Affiliations

Environmental Futures Centre and Australian Rivers Institute, School of Environment, Griffith University, Nathan, Qld, Australia
Sankar Subramanian

Authors

Sankar Subramanian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sankar Subramanian.

Ethics declarations

Competing interests

The author declares no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subramanian, S. Quantifying harmful mutations in human populations. Eur J Hum Genet 20, 1320–1322 (2012). https://doi.org/10.1038/ejhg.2012.68

Download citation

Received: 07 December 2011
Revised: 09 March 2012
Accepted: 15 March 2012
Published: 18 April 2012
Issue Date: December 2012
DOI: https://doi.org/10.1038/ejhg.2012.68

Keywords

This article is cited by

Harmful mutation load in the mitochondrial genomes of cattle breeds
- Sankar Subramanian
BMC Research Notes (2021)
DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels
- Huiying Zhao
- Yuedong Yang
- Yaoqi Zhou
Genome Biology (2013)
Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease
- David N. Cooper
- Michael Krawczak
- Hildegard Kehrer-Sawatzki
Human Genetics (2013)