For decades, it has been hypothesized that gene regulation has had a central role in human evolution, yet much remains unknown about the genome-wide impact of regulatory mutations. Here we use whole-genome sequences and genome-wide chromatin immunoprecipitation and sequencing data to demonstrate that natural selection has profoundly influenced human transcription factor binding sites since the divergence of humans from chimpanzees 4–6 million years ago. Our analysis uses a new probabilistic method, called INSIGHT, for measuring the influence of selection on collections of short, interspersed noncoding elements. We find that, on average, transcription factor binding sites have experienced somewhat weaker selection than protein-coding genes. However, the binding sites of several transcription factors show clear evidence of adaptation. Several measures of selection are strongly correlated with predicted binding affinity. Overall, regulatory elements seem to contribute substantially to both adaptive substitutions and deleterious polymorphisms with key implications for human evolution and disease.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Ohno, S. An argument for the genetic simplicity of man and other mammals. J. Hum. Evol. 1, 651–662 (1972).
King, M.C. & Wilson, A.C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Wilson, A.C., Maxson, L.R. & Sarich, V.M. Two types of molecular evolution. Evidence from studies of interspecific hybridization. Proc. Natl. Acad. Sci. USA 71, 2843–2847 (1974).
Britten, R.J. & Davidson, E.H. Gene regulation for higher cells: a theory. Science 165, 349–357 (1969).
Stern, D.L. Evolutionary developmental biology and the problem of variation. Evolution 54, 1079–1091 (2000).
Carroll, S.B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245 (2005).
Wray, G.A. The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216 (2007).
Hoekstra, H.E. & Coyne, J.A. The locus of evolution: evo devo and the genetics of adaptation. Evolution 61, 995–1016 (2007).
Andolfatto, P. Adaptive evolution of non-coding DNA in Drosophila. Nature 437, 1149–1152 (2005).
Haygood, R., Fedrigo, O., Hanson, B., Yokoyama, K.-D. & Wray, G.A. Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. Nat. Genet. 39, 1140–1144 (2007).
Torgerson, D.G. et al. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 5, e1000592 (2009).
Gaffney, D.J., Blekhman, R. & Majewski, J. Selective constraints in experimentally defined primate regulatory regions. PLoS Genet. 4, e1000157 (2008).
Chen, K. & Rajewsky, N. Natural selection on human microRNA binding sites inferred from SNP data. Nat. Genet. 38, 1452–1456 (2006).
Stoletzki, N. & Eyre-Walker, A. Estimation of the neutrality index. Mol. Biol. Evol. 28, 63–70 (2011).
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Gronau, I., Arbiza, L., Mohammed, J. & Siepel, A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol. Biol. Evol. 30, 1159–1171 (2013).
McDonald, J.H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991).
Sawyer, S.A. & Hartl, D.L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).
Smith, N.G. & Eyre-Walker, A. Adaptive protein evolution in Drosophila. Nature 415, 1022–1024 (2002).
Charlesworth, J. & Eyre-Walker, A. The McDonald-Kreitman test and slightly deleterious mutations. Mol. Biol. Evol. 25, 1007–1015 (2008).
Bierne, N. & Eyre-Walker, A. The genomic rate of adaptive amino acid substitution in Drosophila. Mol. Biol. Evol. 21, 1350–1360 (2004).
Boyko, A.R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).
Wilson, D.J., Hernandez, R.D., Andolfatto, P. & Przeworski, M. A population genetics–phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet. 7, e1002395 (2011).
Fay, J.C., Wyckoff, G.J. & Wu, C.I. Positive and negative selection on the human genome. Genetics 158, 1227–1234 (2001).
Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973).
Kondrashov, A.S. Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? J. Theor. Biol. 175, 583–594 (1995).
Williamson, S.H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102, 7882–7887 (2005).
Eyre-Walker, A., Woolfit, M. & Phelps, T. The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics 173, 891–900 (2006).
Eyre-Walker, A. & Keightley, P.D. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol. Biol. Evol. 26, 2097–2108 (2009).
Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).
Locke, D.P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533 (2011).
Rhesus Macaque Genome Sequencing and Analysis Consortium. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007).
Eory, L., Halligan, D.L. & Keightley, P.D. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol. Biol. Evol. 27, 177–192 (2010).
Moses, A.M., Chiang, D.Y., Kellis, M., Lander, E.S. & Eisen, M.B. Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003).
Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).
Kosiol, C. et al. Patterns of positive selection in six mammalian genomes. PLoS Genet. 4, e1000144 (2008).
Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Chen, F.-C. & Li, W.-H. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68, 444–456 (2001).
Gojobori, J., Tang, H., Akey, J.M. & Wu, C.I. Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution. Proc. Natl. Acad. Sci. USA 104, 3907–3912 (2007).
Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001).
Lohmueller, K.E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008).
Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).
Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A. & Luscombe, N.M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Lunter, G., Ponting, C.P. & Hein, J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol. 2, e5 (2006).
Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120 (2012).
Muller, H.J. Our load of mutations. Am. J. Hum. Genet. 2, 111–176 (1950).
Morton, N.E., Crow, J.F. & Muller, H.J. An estimate of the mutational damage in man from data on consanguineous marriages. Proc. Natl. Acad. Sci. USA 42, 855–863 (1956).
Bittles, A.H. & Neel, J.V. The costs of human inbreeding and their implications for variations at the DNA level. Nat. Genet. 8, 117–121 (1994).
Asthana, S., Schmidt, S. & Sunyaev, S. A limited role for balancing selection. Trends Genet. 21, 30–32 (2005).
Bubb, K.L. et al. Scan of human genome reveals no new loci under ancient balancing selection. Genetics 173, 2165–2177 (2006).
Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).
Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).
Mu, X.J., Lu, Z.J., Kong, Y., Lam, H.Y. & Gerstein, M.B. Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project. Nucleic Acids Res. 39, 7058–7076 (2011).
Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).
Jukes, T.H. & Cantor, C.R. Evolution of protein molecules. in Mammalian Protein Metabolism (ed. Munro, H.) 21–132 (Academic Press, New York, 1969).
Hubisz, M.J., Pollard, K.S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinform. 12, 41–51 (2011).
Hernandez, R.D. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24, 2786–2787 (2008).
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988 (2011).
Kondrashov, A.S. & Crow, J.F. A molecular approach to estimating the human deleterious mutation rate. Hum. Mutat. 2, 229–234 (1993).
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Schneider, T.D., Stormo, G.D., Gold, L. & Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431 (1986).
Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).
Berg, O.G. & von Hippel, P.H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193, 723–750 (1987).
Stormo, G.D. DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000).
We thank R. Blekhman, C. Danko, A. Boyko, K. Pollard, N. Goldman and A. Clark for comments on the manuscript. This research was supported by a Packard Fellowship, a Sloan Research Fellowship, US National Science Foundation grant DBI-0644111 and US National Institutes of Health (National Institute of General Medical Sciences, NIGMS) grant GM102192 (to A.S.). In addition, L.A. was supported in part by a postdoctoral fellowship award from the Cornell Center for Vertebrate Genomics, and B.A.A. was supported by US National Institutes of Health training grant T32-GM083937.
The authors declare no competing financial interests.
About this article
Cite this article
Arbiza, L., Gronau, I., Aksoy, B. et al. Genome-wide inference of natural selection on human transcription factor binding sites. Nat Genet 45, 723–729 (2013). https://doi.org/10.1038/ng.2658
Nucleic Acids Research (2020)
Molecular Biology and Evolution (2020)
PLOS Genetics (2020)
InMeRF: prediction of pathogenicity of missense variants by individual modeling for each amino acid substitution
NAR Genomics and Bioinformatics (2020)