This page has been archived and is no longer updated

 

Using SNP Data to Examine Human Phenotypic Differences

By: Karen Norrgard, Ph.D. (Write Science Right) & Joanna Schultz, Ph.D. (Write Science Right) © 2008 Nature Education 
Citation: Norrgard , K. & Schultz, J. (2008) Using SNP data to examine human phenotypic differences. Nature Education 1(1):85
Email
Genetic variation among human races can be observed in almost any trait, from the physical and biochemical, to disease resistance. What role do single nucleotide polymorphisms play in this?
Aa Aa Aa

 

Humans are identical over most of their genomes. Thus, only a relatively small number of genetic differences have resulted in the striking variation seen among individuals of our species. This phenotypic variation among humans was the subject of a recent study by Luis B. Barreiro and his colleagues at the Pasteur Institute in Paris (Barreiro et al., 2008). In particular, Barreiro and his colleagues were interested in how natural selection has led to phenotypic differences.

When we think of variation between people, we often think of differences in height, weight, and skin color. Each of these characteristics is only partially controlled by genes. The complex interaction between genes and the environment, as well as between multiple genes, makes trying to understand and quantify human phenotypic variation difficult. Therefore, instead of looking at complex human traits, Barreiro and his colleagues went straight to the source and looked for nucleotide sequences in the genome that could tell them about individual human variation. For this study, the identification of single base changes (single nucleotide polymorphisms, or SNPs) was considered ideal.

Barreiro and his colleagues obtained data for their research from the HapMap project, an international consortium that has built a vast and growing repository of human genetic variation. To date, the project has analyzed over 3.1 million SNPs across the human genome common to 270 individuals of African, Asian, and European ancestry. A SNP is a variation of a single nucleotide between individuals. These polymorphisms can therefore be used to discern small differences both within a population and among different populations. The beauty of SNPs is that the observed variation can be followed over time and quantified. If SNPs change either the function of a gene or its expression, and the change provides greater fitness for a population (i.e., a higher capacity to survive and/or reproduce in a given environment), the change will be favored by natural selection. Therefore, SNPs can be the basis of evolutionary change. This was the basic premise of Barreiro's study.

Measuring Genetic Change Over Time

As previously mentioned, Barreiro and his colleagues were interested in determining the role that natural selection has played in the wide range of phenotypic variation observed in humans. They questioned whether the differences we observe are in part the result of adaptations to specific environmental conditions during the course of evolution in modern humans. By comparing alleles among individuals of various ethnic backgrounds, the researchers estimated how much differentiation has occurred in each of 2.8 million SNPs since the human population began to diverge from its African origins some 50,000 to 75,000 years ago. By measuring the level of differentiation for each SNP, they hoped to demonstrate that, over the course of modern human evolution, certain genetic changes have been selected for or against. To measure this, the researchers chose to estimate a parameter known as the fixation index (FST), which describes the degree of population differentiation based on genetic polymorphisms. This parameter is especially useful for SNP data.

The Fixation Index

The fixation index is a measure of how populations differ genetically. One derivation of the fixation index is FST = (HT – HS)/HT, in which HT and HS represent heterozygosity of the total population and of the subpopulation, respectively. This derivation measures the extent of genetic differentiation among subpopulations. The value of FST can theoretically range from 0.0 (no differentiation) to 1.0 (complete differentiation, in which subpopulations are fixed for different alleles).

A simple visualization of this idea is that of two squirrel subpopulations that are physically separated by a canyon and therefore cannot interbreed. Each subpopulation is homozygous for one allele of a SNP (in other words, each individual of one subpopulation might have a C at that position, while individuals from the other subpopulation have a T). The heterozygosity of the total population (HT) would therefore be 0.5. The heterozygosity of each subpopulation (HS) would be 0.0 (because every member of the subpopulation is homozygous). The calculation of FST in this oversimplified case would be (0.5 – 0.0)/0.5 = 1.0. In other words, 100% of the genetic variation of this population is between subpopulations, with zero variation within subpopulations.

While a value of 1.0 for the fixation index is theoretically possible, such value in reality is usually much smaller. In general, high FST values reflect a low level of shared alleles between individuals in the sampled population and the total population. Conversely, low FST values indicate that members of the subpopulation share alleles with the total population. The proportion of individuals in a population that carry a certain allele varies over time and is influenced by the forces of migration, genetic drift, and natural selection.

Using FST to Show How Human Populations Have Diverged Through Evolution

FST has proven to be a very useful parameter in many respects, especially in its ability to describe the degree of population substructure. One major advantage of FST is the possibility that it may tell us a lot about the processes leading to divergence between subpopulations and/or the maintenance of that divergence. This is because when two subpopulations begin to diverge, genetic differentiation between the two subpopulations begins. In humans, this divergence started with the exodus from Africa approximately 50,000 to 75,000 years ago. Mutations acquired by individuals were then subject to pressures of natural selection and were therefore either maintained or lost.

In their study, Barreiro and colleagues calculated the degree of differentiation (the FST value) for each of the 2.8 million different SNPs in individuals of African, European, Chinese, and Japanese ancestry. These SNPs were divided into different classes (see "Study Background"), based either on their location (nongenic, genic, intronic, 5' UTR, or 3' UTR) or their effect on the resulting protein (synonymous or nonsynonymous mutations). This was one of the novelties of the group's experiment. Based on the assumption that the different SNP classes are equally influenced by the demographic forces of migration and genetic drift, any difference in the mean FST value between SNP classes could be attributed to natural selection.

Study Background

The researchers divided the SNP data into two major classes: genic and nongenic. Genic SNPs were further subdivided into the following classes:

  • Intronic SNPs: SNPs within introns (noncoding sections of a gene that are transcribed into the pre-mRNA but removed during formation of the mature mRNA and therefore not translated into the peptide)
  • 5' UTR SNPs: SNPs at the 5' end of the gene that are transcribed into mRNA but not translated by the ribosome, including the region of the mRNA between the transcription start site and the ATG codon for translation initiation
  • 3' UTR SNPs: SNPs at the 3' end of the gene that are transcribed into mRNA but not translated by the ribosome, including the region of the mRNA between the stop codon and the poly-A tail
  • Synonymous SNPs: Those nucleotide substitutions that do not change the amino acid (due to wobble)
  • Nonsynonymous SNPs: Nucleotide substitutions that result in a change to the amino acid

Barreiro et al. then hypothesized the following:

  1. Selection pressures would be stronger in genic regions than in nongenic regions.
  2. Nonsynonymous mutations (which result in amino acid changes) and variations within cis-regulatory regions (5' and 3' UTR) would experience increased selective pressure over synonymous mutations (which do not change the amino acid).

Study Findings

In terms of this study, a low FST value for a SNP can be interpreted to mean that individuals from the subpopulation tend to share alleles with the total population. In other words, a low level of differentiation for that SNP has occurred in the subpopulation since breaking off from the total population. On the other hand, a high FST value for a SNP means that a high level of differentiation for that SNP has occurred in members of the subpopulation compared to the total population, and therefore members of the subpopulation tend to carry unique alleles compared to the total population.

In this study, the genome-wide mean FST value across all SNP classes was found to be 0.11, interpreted as a moderate level of differentiation. The researchers interpreted FST values below 0.05 as indicating low differentiation and values above 0.65 as indicating extreme differentiation.

According to the research results, genic SNPs, especially nonsynonymous SNPs, tend to have low FST values, indicating that they have been under negative selective pressure. However, the findings additionally showed that natural selection has also selected for SNPs that alter amino acids. The researchers therefore concluded that the changes responsible for human phenotypic diversity have come about through both positive and negative selection pressures.

Low FST Values of Nonsynonymous SNPs

Although mean FST values across all SNP classes were similar, there were significantly more low FST values among genic SNPs than among nongenic SNPs. This observed difference in proportion of low FST values was even more marked when comparing nonsynonymous to nongenic SNPs. The fact that the researchers observed significantly more low FST values in nonsynonymous SNPs makes intuitive sense. A change in an amino acid is likely to alter gene function, which could be detrimental to the organism. Therefore, nonsynonymous SNPs have been under negative selection pressures (i.e., negative selection has not allowed them to increase in frequency), causing low differentiation between populations.

The researchers took a closer look at low-FST nonsynonymous SNPs and made predictions about the effect they might have on a protein. Using an algorithm that considers protein structure and/or sequence conservation information for each gene, each low-FST nonsynonymous SNP was categorized as benign, possibly damaging, or probably damaging. Barreiro and his colleagues found that mutations identified as possibly or probably damaging were at a significantly higher frequency among low-FST nonsynonymous SNPs. Additionally, they compared the corresponding gene for each low-FST nonsynonymous SNP with a catalog of human genes and genetic disorders (the Online Mendelian Inheritance in Man [OMIM] database), and found that low-FST nonsynonymous SNPs were significantly more frequent in genes known to modulate disease. These findings may be of special interest to medical research, as it seems that low-FST nonsynonymous SNPs are deleterious and may be involved in disease.

High FST Values of SNPs

Next, Barreiro et al. looked at the proportion of SNPs with FST values greater than 0.65 and found that genic SNPs were significantly more prevalent. This excess of genic SNPs having high FST values once again included nonsynonymous SNPs, but also 5' UTR SNPs (changes in gene-expression regulatory regions). Because nonsynonymous SNPs were found to be more heavily represented among those with low and high FST values, the researchers concluded that nonsynonymous SNPs have been influenced by both negative and positive selection pressures.

Genes under positive selection pressure are thought to play an important role in human survival. The researchers identified 582 genes that contained a SNP with FST values greater than 0.65 and have therefore been under positive selection pressures. Several of these genes are known to control variable morphological traits in humans. Furthermore, some of these genes are responsible for complex phenotypes of medical relevance. Two examples described in this study are the CR1 gene, which modulates the severity of malarial attacks, and the ENPP1 gene, in which a known mutation protects against the development of obesity and type II diabetes. A missense-derived mutation in the CR1 gene is present in 85% of Africans but completely absent in other populations. The mutation has thus been selected for in a region of the world where malaria is more prevalent. The ENPP1 mutation, which is virtually absent in Africans but present in approximately 90% of non-Africans, has presumably been selected for under modern conditions of food availability. Unlike Mendelian disorders, in which a mutation confers increased disease risk, the ENPP1 allele that increases susceptibility to diabetes is an ancestral allele that has become disadvantageous after changes in environment and lifestyle.

Human Genetic Variation and Disease

This research demonstrates an elegant use of the growing HapMap database to characterize the molecular basis of human variation. Barreiro and his colleagues used the HapMap data to provide evidence of the role of natural selection in modern human differentiation. The results of this work suggest that both negative and positive natural selection have operated in the course of modern human evolution. Positive selection has increased differentiation within gene regions to result in local adaptation of human populations. This phenomenon was observed in several genes, including those for skin pigmentation and hair development, immune response to pathogens, DNA repair and replication, sensory functions, and metabolic pathways, among others. In contrast, negative selection has reduced population differentiation at the level of amino acid change, particularly in disease-related genes. The genetic changes determined to be under negative selection were evaluated in terms of the severity of their effect on the resultant protein, and results indicated that damaging variants were maintained at low frequencies in the study populations.

This work will serve as a stepping stone for further research in identifying candidate genes for disease, many of which may be tied to ethnicity. Hopefully, with further understanding of how the process of natural selection contributes to human genetic variation, the genetic basis of human disease will be further elucidated.

References and Recommended Reading


Barreiro, L. B., et al. Natural selection has driven population differentiation in modern humans. Nature Genetics 40, 340–345 (2008) doi:10.1038/ng.78 (link to article)

Email

Article History

Close

Flag Inappropriate

This content is currently under construction.
Explore This Subject

Connect
Connect Send a message


Scitable by Nature Education Nature Education Home Learn More About Faculty Page Students Page Feedback



Genes and Disease

Visual Browse

Close