Genome-wide inference of natural selection on human transcription factor binding sites


For decades, it has been hypothesized that gene regulation has had a central role in human evolution, yet much remains unknown about the genome-wide impact of regulatory mutations. Here we use whole-genome sequences and genome-wide chromatin immunoprecipitation and sequencing data to demonstrate that natural selection has profoundly influenced human transcription factor binding sites since the divergence of humans from chimpanzees 4–6 million years ago. Our analysis uses a new probabilistic method, called INSIGHT, for measuring the influence of selection on collections of short, interspersed noncoding elements. We find that, on average, transcription factor binding sites have experienced somewhat weaker selection than protein-coding genes. However, the binding sites of several transcription factors show clear evidence of adaptation. Several measures of selection are strongly correlated with predicted binding affinity. Overall, regulatory elements seem to contribute substantially to both adaptive substitutions and deleterious polymorphisms with key implications for human evolution and disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Results for data sets simulated under three different mixtures of selective modes.
Figure 2: Estimates of key parameters for the binding sites of each transcription factor in our study.
Figure 3: Information content, binding affinity and selection.
Figure 4: Genome-wide analyses of adaptive and deleterious mutations in protein-coding sequences and transcription factor binding sites.


  1. 1

    Ohno, S. An argument for the genetic simplicity of man and other mammals. J. Hum. Evol. 1, 651–662 (1972).

    Article  Google Scholar 

  2. 2

    King, M.C. & Wilson, A.C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).

    CAS  Article  Google Scholar 

  3. 3

    Wilson, A.C., Maxson, L.R. & Sarich, V.M. Two types of molecular evolution. Evidence from studies of interspecific hybridization. Proc. Natl. Acad. Sci. USA 71, 2843–2847 (1974).

    Article  CAS  Google Scholar 

  4. 4

    Britten, R.J. & Davidson, E.H. Gene regulation for higher cells: a theory. Science 165, 349–357 (1969).

    Article  CAS  Google Scholar 

  5. 5

    Stern, D.L. Evolutionary developmental biology and the problem of variation. Evolution 54, 1079–1091 (2000).

    Article  CAS  Google Scholar 

  6. 6

    Carroll, S.B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Wray, G.A. The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Hoekstra, H.E. & Coyne, J.A. The locus of evolution: evo devo and the genetics of adaptation. Evolution 61, 995–1016 (2007).

    Article  Google Scholar 

  9. 9

    Andolfatto, P. Adaptive evolution of non-coding DNA in Drosophila. Nature 437, 1149–1152 (2005).

    Article  CAS  Google Scholar 

  10. 10

    Haygood, R., Fedrigo, O., Hanson, B., Yokoyama, K.-D. & Wray, G.A. Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. Nat. Genet. 39, 1140–1144 (2007).

    Article  CAS  Google Scholar 

  11. 11

    Torgerson, D.G. et al. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 5, e1000592 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Gaffney, D.J., Blekhman, R. & Majewski, J. Selective constraints in experimentally defined primate regulatory regions. PLoS Genet. 4, e1000157 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Chen, K. & Rajewsky, N. Natural selection on human microRNA binding sites inferred from SNP data. Nat. Genet. 38, 1452–1456 (2006).

    Article  CAS  Google Scholar 

  14. 14

    Stoletzki, N. & Eyre-Walker, A. Estimation of the neutrality index. Mol. Biol. Evol. 28, 63–70 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  17. 17

    Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  19. 19

    Gronau, I., Arbiza, L., Mohammed, J. & Siepel, A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol. Biol. Evol. 30, 1159–1171 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. 20

    McDonald, J.H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Sawyer, S.A. & Hartl, D.L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Smith, N.G. & Eyre-Walker, A. Adaptive protein evolution in Drosophila. Nature 415, 1022–1024 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Charlesworth, J. & Eyre-Walker, A. The McDonald-Kreitman test and slightly deleterious mutations. Mol. Biol. Evol. 25, 1007–1015 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Bierne, N. & Eyre-Walker, A. The genomic rate of adaptive amino acid substitution in Drosophila. Mol. Biol. Evol. 21, 1350–1360 (2004).

    Article  CAS  Google Scholar 

  25. 25

    Boyko, A.R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Wilson, D.J., Hernandez, R.D., Andolfatto, P. & Przeworski, M. A population genetics–phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet. 7, e1002395 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Fay, J.C., Wyckoff, G.J. & Wu, C.I. Positive and negative selection on the human genome. Genetics 158, 1227–1234 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973).

    Article  CAS  Google Scholar 

  29. 29

    Kondrashov, A.S. Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? J. Theor. Biol. 175, 583–594 (1995).

    Article  CAS  Google Scholar 

  30. 30

    Williamson, S.H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102, 7882–7887 (2005).

    Article  CAS  Google Scholar 

  31. 31

    Eyre-Walker, A., Woolfit, M. & Phelps, T. The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics 173, 891–900 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Eyre-Walker, A. & Keightley, P.D. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol. Biol. Evol. 26, 2097–2108 (2009).

    Article  CAS  Google Scholar 

  33. 33

    Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).

  34. 34

    Locke, D.P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Rhesus Macaque Genome Sequencing and Analysis Consortium. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007).

  36. 36

    Eory, L., Halligan, D.L. & Keightley, P.D. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol. Biol. Evol. 27, 177–192 (2010).

    Article  CAS  Google Scholar 

  37. 37

    Moses, A.M., Chiang, D.Y., Kellis, M., Lander, E.S. & Eisen, M.B. Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38

    Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Kosiol, C. et al. Patterns of positive selection in six mammalian genomes. PLoS Genet. 4, e1000144 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872 (2002).

    Article  CAS  Google Scholar 

  41. 41

    Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Chen, F.-C. & Li, W.-H. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68, 444–456 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Gojobori, J., Tang, H., Akey, J.M. & Wu, C.I. Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution. Proc. Natl. Acad. Sci. USA 104, 3907–3912 (2007).

    Article  CAS  Google Scholar 

  45. 45

    Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001).

    Article  CAS  Google Scholar 

  46. 46

    Lohmueller, K.E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).

    Article  CAS  Google Scholar 

  48. 48

    Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A. & Luscombe, N.M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).

    Article  CAS  Google Scholar 

  49. 49

    Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  50. 50

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. 51

    Lunter, G., Ponting, C.P. & Hein, J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol. 2, e5 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Muller, H.J. Our load of mutations. Am. J. Hum. Genet. 2, 111–176 (1950).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Morton, N.E., Crow, J.F. & Muller, H.J. An estimate of the mutational damage in man from data on consanguineous marriages. Proc. Natl. Acad. Sci. USA 42, 855–863 (1956).

    Article  CAS  Google Scholar 

  55. 55

    Bittles, A.H. & Neel, J.V. The costs of human inbreeding and their implications for variations at the DNA level. Nat. Genet. 8, 117–121 (1994).

    Article  CAS  Google Scholar 

  56. 56

    Asthana, S., Schmidt, S. & Sunyaev, S. A limited role for balancing selection. Trends Genet. 21, 30–32 (2005).

    Article  CAS  Google Scholar 

  57. 57

    Bubb, K.L. et al. Scan of human genome reveals no new loci under ancient balancing selection. Genetics 173, 2165–2177 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Mu, X.J., Lu, Z.J., Kong, Y., Lam, H.Y. & Gerstein, M.B. Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project. Nucleic Acids Res. 39, 7058–7076 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. 61

    Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).

    Google Scholar 

  62. 62

    Jukes, T.H. & Cantor, C.R. Evolution of protein molecules. in Mammalian Protein Metabolism (ed. Munro, H.) 21–132 (Academic Press, New York, 1969).

  63. 63

    Hubisz, M.J., Pollard, K.S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinform. 12, 41–51 (2011).

    Article  CAS  Google Scholar 

  64. 64

    Hernandez, R.D. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24, 2786–2787 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  66. 66

    Kondrashov, A.S. & Crow, J.F. A molecular approach to estimating the human deleterious mutation rate. Hum. Mutat. 2, 229–234 (1993).

    Article  CAS  Google Scholar 

  67. 67

    Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  Google Scholar 

  68. 68

    Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Schneider, T.D., Stormo, G.D., Gold, L. & Ehrenfeucht, A. Information content of binding sites on nucleotide sequences. J. Mol. Biol. 188, 415–431 (1986).

    Article  CAS  Google Scholar 

  70. 70

    Wasserman, W.W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287 (2004).

    Article  CAS  Google Scholar 

  71. 71

    Berg, O.G. & von Hippel, P.H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193, 723–750 (1987).

    Article  CAS  Google Scholar 

  72. 72

    Stormo, G.D. DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000).

    Article  CAS  Google Scholar 

Download references


We thank R. Blekhman, C. Danko, A. Boyko, K. Pollard, N. Goldman and A. Clark for comments on the manuscript. This research was supported by a Packard Fellowship, a Sloan Research Fellowship, US National Science Foundation grant DBI-0644111 and US National Institutes of Health (National Institute of General Medical Sciences, NIGMS) grant GM102192 (to A.S.). In addition, L.A. was supported in part by a postdoctoral fellowship award from the Cornell Center for Vertebrate Genomics, and B.A.A. was supported by US National Institutes of Health training grant T32-GM083937.

Author information




L.A., I.G. and A.S. conceived and designed the study. L.A., I.G., B.A.A., M.J.H., B.G. and A.S. analyzed the data. L.A., I.G., B.A.A. and M.J.H. contributed materials and analysis tools. A.S. and A.K. supervised the research. L.A., I.G. and A.S. wrote the manuscript with review and contributions from all authors.

Corresponding author

Correspondence to Adam Siepel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 1–10, Supplementary Note (PDF 2980 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Arbiza, L., Gronau, I., Aksoy, B. et al. Genome-wide inference of natural selection on human transcription factor binding sites. Nat Genet 45, 723–729 (2013).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing