This page has been archived and is no longer updated
Analysis of one million base pairs of Neanderthal DNA
Author: Richard E. Green
Keywords
Keywords for this Article
Add keywords to your Content
Save
|
Cancel
Share
|
Cancel
Revoke
|
Cancel
Rate & Certify
Rate Me...
Rate Me
!
Comment
Save
|
Cancel
Flag Inappropriate
The Content is
Objectionable
Explicit
Offensive
Inaccurate
Comment
Flag Content
|
Cancel
Delete Content
Reason
Delete
|
Cancel
Close
Full Screen
"ARTICLES Analysis of one million base pairs of Neanderthal DNA RichardE.Green 1 ,JohannesKrause 1 ,SusanE.Ptak 1 ,AdrianW.Briggs 1 ,MichaelT.Ronan 2 ,JanF.Simons 2 ,LeiDu 2 , Michael Egholm 2 , Jonathan M. Rothberg 2 , Maja Paunovic 3 { & Svante Pa�a�bo 1 Neanderthals are the extinct hominid group most closely related to contemporary humans, so their genome offers a unique opportunitytoidentifygeneticchangesspecifictoanatomicallyfullymodernhumans.Wehaveidentifieda38,000-year-old NeanderthalfossilthatisexceptionallyfreeofcontaminationfrommodernhumanDNA.Directhigh-throughputsequencing of a DNA extract from this fossil has thus far yielded over one million base pairs of hominoid nuclear DNA sequences. Comparison with the human and chimpanzee genomes reveals that modern human and Neanderthal DNA sequences diverged on average about 500,000years ago. Existing technology and fossil resources are now sufficient to initiate a Neanderthal genome-sequencing effort. Neanderthals were first recognized as a distinct group of hominids fromfossilremains discovered 150yearsagoatFeldhofer inNeander Valley, outside Du�sseldorf, Germany. Subsequent Neanderthal finds in Europe and western Asia showed that fossils with Neanderthal traits appear in the fossil record of Europe and western Asia about 400,000yearsagoandvanishabout30,000yearsago.Overthisperiod they evolved morphological traits that made them progressively more distinct from the ancestors of modern humans that were evol- ving in Africa 1,2 . For example, the crania of late Neanderthals have protrudingmid-faces,braincasesthatbulgeoutwardatthesides,and featuresofthebaseoftheskull,jawandinnerearsthatsetthemapart from modern humans 3 . The nature of the interaction between Neanderthals and modern humans, who expanded out of Africa around 40,000?50,000 years ago and eventually replaced Neanderthals as well as other archaic hominids across the Old World is still a matter of some debate. Although there is no evidence of contemporaneous cohabitation at any single site, there is evidence of geographical and temporal over- lap in their ranges before the disappearance of Neanderthals. Additionally, late in their history, some Neanderthal groups adopted cultural traits such as body decorations, potentially through cultural interactions with incoming modern humans 4 . In 1997, a segment of the hypervariable control region of the mater- nally inherited mitochondrial DNA (mtDNA) of the Neanderthal type specimen found at Feldhofer was sequenced. Phylogenetic analysis showed that it falls outside the variation of contemporary humans and shares a common ancestor with mtDNAs of present-day humans approximately half a million years ago 5,6 . Subsequently, mtDNA sequenceshave beenretrievedfromelevenadditionalNeanderthalspe- cimens: Feldhofer 2 in Germany 7 , Mezmaiskaya in Russia 8 , Vindija 75, 77and80inCroatia 9,10 ,Engis2inBelgium,LaChapelle-aux-Saintsand RochersdeVilleneuveinFrance 10 ,ScladinainBelgium 11 ,MonteLessini in Italy 12 , and El Sidron 441 in Spain 13 . Although some of these sequences are extremely short, they are all more closely related to one another than to modern human mtDNAs 9,11 . This fact, in conjunction with the absence of any related mtDNA se- quences in currentlyliving humans or in a small number of earlymod- ernhumanfossils 5,10 stronglysuggeststhatNeanderthalscontributedno mtDNA to present-day humans. On the basis of various population models, it has been estimated that a maximal overall genetic contri- bution of Neanderthals to the contemporary human gene pool is between 25% and 0.1% (refs 10, 14). Because the latter conclusions are based on mtDNA, a single maternally inherited locus, they are limited in their ability to detect a Neanderthal contribution to the current human gene pool both by the vagaries of genetic drift and by the possibility of a sex bias in reproduction. However, both morpho- logicalevidence 4,15 and thevariation in the modern human gene pool 16 supporttheconclusionthatifanygeneticcontributionofNeanderthals to modern human occurred, it was of limited magnitude. Neanderthals are the hominid group most closely related to cur- rently living humans, so a Neanderthal nuclear genome sequence would be an invaluable resource for annotating the human genome. Roughly35millionnucleotidedifferencesexistbetweenthegenomes of humans and chimpanzees, our closest living relatives 17 . Soon, genome sequences from other primates such as the orang-utan and the macaque will allow such differences to be assigned to the human and chimpanzee lineages. However, temporal resolution of the gen- etic changes along the human lineage, where remarkable morpho- logical, behavioural and cognitive changes occurred, are limited without a more closely related genome sequence for comparison. In particular, comparison to the Neanderthal would enable the iden- tification of genetic changes that occurred during the last few hun- dred thousand years, when fully anatomically and behaviourally modern humans appeared. Identification of a Neanderthal fossil for DNA sequencing Although it is possible to recover mtDNA 18 and occasionally even nuclear DNA sequences 19?22 from well-preserved remains of organ- isms that are less than a few hundred thousand years old, determina- tion of ancient hominid sequences is fraught with special difficulties andpitfalls 18 .Inadditiontodegradationandchemicaldamagetothe DNA that can cause any ancient DNA to be irretrievable or misread, contamination of specimens, laboratory reagents and instruments with traces of DNA from modern humans must be avoided. In fact, when sensitive polymerase chain reaction (PCR) is used, human {Deceased. 1 Max-Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany. 2 454 Life Sciences, 20 Commercial Street, Branford, Connecticut 06405, USA. 3 Institute of Quaternary Paleontology and Geology, Croatian Academy of Sciences and Arts, A. Kovacica 5/II, HR-10 000 Zagreb, Croatia. Vol 444|16 November 2006|doi:10.1038/nature05336 330 Nature Publishing Group �2006 mtDNA sequences can be retrieved from almost every ancient spe- cimen 23,24 . This problem is especially severe when Neanderthal remains are studied because Neanderthal and human are so closely related that one expects to find few or no differences between Neanderthals and modern humans within many regions 25 , making itimpossible to relyon the sequence information itself to distinguish endogenous from contaminating DNA sequences. A necessary first step for sequencing nuclear DNA from Neanderthals is therefore to identifyaNeanderthal specimenthatisfreeoralmostfreeofmodern human DNA. Wetestedmorethan70Neanderthalboneandtoothsamplesfrom different sites in Europe and western Asia for bio-molecular preser- vation by removing samples of a few milligrams for amino acid analysis. The vast majority of these samples had low overall contents of amino acids and/or high levels of amino acid racemization, a stereoisomeric structural change that affects amino acids in fossils, indicating that they are unlikely to contain retrievable endogenous DNA 26 . However, some of the samples are better preserved in that they contain high levels of amino acids (more than 20,000p.p.m.), low levels of racemization of amino acids such as aspartate that racemize rapidly, as well as amino acid compositions that suggest that the majority of the preserved protein stems from collagen. From100?200mgofbonefromsixofthesespecimensweextracted DNA and analysed the relative abundance of Neanderthal-like mtDNA sequences and modern human-like mtDNA sequences by performing PCR with primer pairs that amplify both human and Neanderthal mtDNA with equal efficiency. The amplification pro- ducts span segments of the hypervariable region of the mtDNA in which all Neanderthals sequenced to date differ from all contempor- ary humans. From subsequent cloning into a plasmid vector and sequencing of more than a hundred clones from each product, we determined the ratio of Neanderthal-like to modern human-like mtDNA in each extract. We used two different primer pairs that amplify fragments of 63base pairs and 119base pair to gauge the contamination levels for different lengths of DNA molecules. Figure 1 shows that the level of contamination differs drastically among the samples. Whereas only around 1% of the mtDNA pre- sent in three samples from France, Russia and Uzbekistan was Neanderthal-like, onesample fromCroatia andone fromSpain con- tained around 5% and 75% Neanderthal-like mtDNA, respectively. One bone (Vi-80) from Vindija Cave, Croatia, stood out in that ,99% of the 63-base-pair mtDNA segments and ,94% of the 119-base pair segments are of Neanderthal origin. Assuming that the ratio of Neanderthal to contaminating modern human DNA is the same for mtDNA as it is for nuclear DNA, the Vi-80 bone there- fore yields DNA fragments that are predominantly of Neanderthal origin and provided that the contamination rate was not increased during the downstream sequencing process, the extent of contam- ination in the final analyses is below,6%. The Vi-80 bone was discovered by M. Malez and co-workers in layer G3 of Vindija Cave in 1980. It has been dated by carbon-14 acceleratormassspectrometryto38,310 6 2,130yearsbeforepresent and its entire mtDNA hypervariable region I has been sequenced 10 . Out of 14 Neanderthal remains from layer G3 that we have analysed, this bone is one of six samples that show good bio-molecular preser- vation,whiletheothereightbonesshowintermediatetobadstatesof preservation that do not suggest the presence of amplifiable DNA. Preservation conditions in Vindija Cave thus vary drastically from bonetobone,asituationthatmaybeduetodifferentextentsofwater percolation in different parts of the cave. Direct large-scale DNA sequencing from the Vindija Neanderthal BecausetheVi-80Neanderthalboneextractislargelyfreeofcontam- inating modern human mtDNA, we chose this extract to perform large-scale parallel 454 sequencing 27 . In this technology, single- stranded libraries, flanked by common adapters, are created from the DNA sample and individual library molecules are amplified through bead-based emulsion PCR, resulting in beads carrying mil- lions of clonal copies of the DNA fragments from the samples. These are subsequently sequenced by pyrosequencing on the GS20454 sequencing system. For several reasons, the 454 sequencing platform is extremely well suited for analyses of bulk DNA extracted from ancient remains 28 . First, it circumvents bacterial cloning, in which the vast majority of initial template molecules are lost during transformation and estab- lishment of clones. Second, because each molecule is amplified in isolation from other molecules it also precludes template competi- tion, which frequently occurs when large numbers of different DNA fragments are amplified together. Third, its current read length of 100?200 nucleotides covers the average length of the DNA preserved in most fossils 29 . Fourth, it generates hundreds of thousands of reads per run, which is crucial because the majority of the DNA recovered fromfossilsisgenerallynotderivedfromthefossilspecies,butrather from organisms that have colonized the organism after its death 20,30 . Fifth, because each sequenced product stems from just one original single-stranded template molecule of known orientation, the DNA strand from which the sequence is derived is known 28 . This provides an advantage over traditional PCR from double-stranded templates, in which the template strand is not known, because the frequency of differentnucleotidemisincorporationscanbededuced.Forexample, using454sequencing,therateatwhichcytosineisconvertedtouracil and read as thymine can be distinguished from the rate at which guanine is converted to xanthine and read as adenine, whereas this is impossible using traditional PCR or bacterial cloning. This is important since nucleotide conversions and misincorporations in ancient DNA are caused by damage that affects different bases differently 28,31 and this pattern of false substitutions can be used to estimatetherelativeprobabilitythataparticularsubstitution(thatis, the observation of a nucleotide difference between DNA sequences) represents the authentic DNA sequence of the organism versus an artefact from DNA degradation. We recovered a total of 254,933 unique sequences from the Vi-80 bone (see Supplementary Methods). These were aligned to the human (build 36.1) 32 , chimpanzee (build 1) 17 and mouse (build 34.1) 33 complete genome sequences, to environmental sample sequences in the GenBank env database (version 3, September 2005), and to the complete set of redundant nucleotide sequences inGenBanknt (version3,September2005,excluding EST,STS,GSS, environmental and HTGS sequences) 34 using the program BLASTN (NCBIversion2.2.12) 35 .Themostsimilardatabasesequenceforeach querywasidentifiedandclassifiedbyitstaxonomicorder(Fig.2)(see Supplementary Methods). No significant nucleotide sequence sim- ilarity in the databases was found for 79% of the fossil extract 0 0 1 - - - - - Vi-80 Vi-77 St Cesaire Okladnikov El Sidron Teshik Tash 20 40 60 80 10020406080100 Neanderthal mtDNA (%) Modern human mtDNA (%) 94 99 75 95 99 99 98 100 99 100 0 No data No data Figure 1 | Ratio of Neanderthal to modern human mtDNA in six hominid fossils. For each fossil, primer pairs that amplify a long (119base pairs; upperlighterbars)andshort(63basepairs;lowerdarkerbars)productwere usedtoamplifysegmentsofthemtDNAhypervariableregion.Theproducts were sequenced and determined to be either of Neanderthal (yellow) or modern human (blue) type. NATURE|Vol 444|16 November 2006 ARTICLES 331 Nature Publishing Group �2006 sequence reads. This is typical of large-scale sequencing both from other ancient bones 20,22,28 and from environmental samples 36,37 , although some permafrost-preserved specimens can yield high amounts of endogenous DNA 22 . Sequences with similarity to a data- base sequence were classified by the taxonomic order of their most significant alignment. Actinomycetales, a bacterial order with many soil-living species, was the most populous order and accounted for 6.8% of the sequences. The second most populous order, to which 15,701 unique sequences or 6.2% of the sequence reads were most similar, was that of primates. All other individual orders were sub- stantially less frequent. Notably, the average percentage identity for the primate sequence alignments was 98.8%, whereas it was 92?98% for the other frequently occurring orders. Thus, the primate reads, unlike many of the prokaryotic reads, are aligned to a very closely related species. Neanderthal mtDNA sequences Among the 15,701 sequences of primate origin, we first identified all mtDNA in order to investigate whether their evolutionary relation- ship to the current human mtDNA pool is similar to what is known from previous analyses of Neanderthal mtDNA. A total of 41 unique DNAsequencesfromtheVi-80fossilhadtheirclosesthitstodifferent partsofthehumanmtDNA,andcomprised,intotal,2,705basepairs of unique mtDNA sequence. None of the putative Neanderthal mtDNA sequences map to the two hypervariable regions that have been previously sequenced in Neanderthals. We aligned these mtDNAsequencestothecompletemtDNAsequencesof311modern humans from different populations 38 as well as to the complete mtDNA sequences of three chimpanzees and two bonobos (Supple- mentary Information). A schematic neighbour-joining tree esti- mated from this alignment is shown in Fig. 3. In agreement with previous results, the Neanderthal mtDNA falls outside the variation among modern humans. However, the length of the branch leading to the Neanderthal mtDNA is 2.5 times as long as the branch leading to modern human mtDNAs. This is likely to be due to errors in our Neanderthal sequences derived from substitution artefacts from damaged, ancient DNA and from sequencing errors 28 . To analyse the extent to which errors occur in the Neanderthal mtDNA reads, we designed 29 primer pairs (Supplementary Methods) flanking all 39 positions at which the Vi-80 Neanderthal mtDNA sequences differed by substitutions from the consensus bases seen among the 311 human mtDNA sequences. These primer pairs, which are designed to yield amplification products that vary in length between 50 and 98 base pairs (including primers), were used in a multiplex two-step PCR 39 from the same Neanderthal extract that had been used for large-scale 454sequencing. Twenty five of the PCR products,containing34ofthepositionswheretheNeanderthaldiffers from humans, were successfully amplified and cloned, and then six or moreclonesofeachproductweresequenced.Theconsensussequence seenamongtheseclonesrevealedthesamenucleotidesseenbythe454 sequencing at 20 of the 34 positions and no additional differences. Of the14positionsfound to representerrorsinthesequencereads,seven wereCtoTtransitions,fourwereGtoA,twowereGtoTandonewas T to C. This pattern of change is typical for ancient DNA, where deamination of cytosine residues 31 and, to a lesser extent, modifica- tionsofguanosineresidues 28 havebeenfoundtoaccountforthemajor- ity of nucleotide misincorporations during PCR. These results also show that the likelihood of observing errors in the sequencing reads is drastically different depending on whether one considers nucleotide positions where a base in the Neanderthal mtDNA sequence differs from both the human and chimpanzee sequences, or positions where the Neanderthal differ from the humans but is identical to the chimpanzee mtDNA sequences. Among themtDNA sequencesanalysed, thereare 14positions where the Neanderthal carries a base identical to the chimpanzee, and 13 of those were confirmed by PCR. In contrast, among the remaining 20 positions, where the Neanderthal sequences differed from both humans and chimpanzees, only seven were confirmed. When only PCR-confirmed sequence data are used to estimate the mtDNA tree (Fig. 3), the Neanderthal branch has a length comparable to that of contemporary humans. This suggests that no large source of errors otherthanwhatisdetectedbythePCRanalysisaffectsthesequences. Using these PCR-confirmed substitutions and a divergence time between humans and chimpanzees of 4.7?8.4 million years 40?42 ,we estimate the divergence time for the mtDNA fragments determined here to be 461,000?825,000 years. This is in general agreement with previous estimates of Neanderthal?human mtDNA divergence of 317,000?741,000 years 6 based on mtDNA hypervariable region sequences and is compatible with our presumption that the mtDNA sequencesdeterminedfromtheVi-80extractareofNeanderthalorigin. Nuclear DNA sequences Wenextanalysedthesequencereadswhoseclosestmatchesaretothe human or chimpanzee nuclear genomes and that are at least 30 base env (8,408; 3.3%) No hit (200,829; 79.0%) Primates (15,701; 6.2%) Actinomycetales (17,213; 6.8%) Rhizobiales (1,230; 0.5%) Burkholderiales (1,912; 0.8%) Pseudomonadales (1,470; 0.6%) Enterobacteriales (788; 0.3%) Poales (429; 0.2%) All other orders (6,559; 2.6%) Rhodocyclales (394; 0.2%) Figure 2 | Taxonomic distribution of DNA sequences from the Vi-80 extract. The taxonomic order of the database sequence giving the best alignment for each unique sequence read was determined. The most populous taxonomic orders are shown. 311 Modern humans Vindija-80 Neanderthal Figure 3 | Schematic tree relating the Vi-80 Neanderthal mtDNA sequences to 311 human mtDNA sequences. The Neanderthal branch length is given with uncorrected sequences (red triangle) and after correctionofsequencesviaindependentPCRs(blacktriangle).Chimpanzee andbonobosequences(notshown)wereusedtoroottheneighbour-joining tree. Several substitution models (Kimura 2-parameter, Tajima-Nei, and Tamura 3-parameter with uniform or gamma-distributed (c 5 0.5?1.1) rates)yieldedbootstrapsupportvaluesforthehumanbranchfrom72?83%. ARTICLES NATURE|Vol 444|16 November 2006 332 Nature Publishing Group �2006 pairs long. Figure 4 shows where they map to the human karyotype (see Supplementary Methods). Overall, 0.04% of the autosomal gen- ome sequence is covered by the Neanderthal reads?on average 3.61 bases per 10,000 bases. Both X and Y chromosomes are represented, with a lower coverage of 2.18 and 1.62 bases per 10,000, respectively, showing that the Vi-80 bone is derived from a male individual. The data presented in Fig. 4 show that when the hit density for sequences that have a single best hit in the human genome is plotted along the chromosomes, several suggestive local deviations from the average hit density are seen, which may represent copy-number differences in the Neanderthal relative to the human reference gen- ome. For comparison, we generated 454sequence data from a DNA sample from a modern human. Interestingly, some of the deviations seen in the Neanderthal are present also in the modern human, whereas others are not. The latter group of sequences may indicate copy-number differences that are unique to the Neanderthal relative to the modern human genome sequence. Thus, when more Neanderthal sequence is generated in the future, it may be possible to determine copy number differences between the Neanderthal, the chimpanzee and the human genomes. Patterns of nucleotide change on lineages We generated three-way alignments between all Neanderthal sequences that map uniquely within the human genome and the corresponding human and chimpanzee genome sequences (see Supplementary Methods). An important artefact of local sequence alignments, such as those produced here, is that they necessarily begin and end with regions of exact sequence identity. The size of these regions is a function of the scoring parameters for the align- ment. In this case, five bases atboth endsof the alignments, amount- ing to,14% of all data, needed to be removed (Supplementary Fig. 1) to eliminate biases in estimates of sequence divergence. Each autosomal nucleotide position in the alignment that did not contain a deletion in the Neanderthal, the human or the chimpanzee sequences and was associated with a chimpanzee genome position withqualityscore$30wasclassifiedaccordingtowhichspeciesshare the same bases (Fig. 5). A total of 736,941 positions contained the same base in all three groups. The next largest category comprises 10,167positionsinwhichthehumanandNeanderthalbaseareident- ical,butthechimpanzeebaseisdifferent.Thesepositionsarelikelyto have changed either on the hominid lineage before the divergence between human and Neanderthal sequences or on the chimpanzee lineage. At 3,447 positions, the Neanderthal base differs from both the human and chimpanzee bases, which are identical to each other. As suggested by the analysis of the mtDNA sequences, this category contains positions that have changed on the Neanderthal lineage, as wellasalargeproportionoferrorsthatderivebothfrombasedamage that have accumulated in the ancient DNA and from sequencing errors. At 434 positions, the human base differs from both the Neanderthalandchimpanzeebases,whichareidenticaltoeachother. 25 megabases 1.62 Gaps in human reference sequence 3.75 Chr avg bases per 10,000 Chr Y Example of region with apparent 2X hit density Chr 5 3.27 Chr 4 3.48 Chr 6 3.44 Chr 7 3.44 Chr 8 3.67 Chr 10 3.80 Chr 11 3.72 Chr X 2.18 Chr 1 3.79 Chr 2 3.87 Chr 3 3.66 Chr 9 3.17 Chr 21 2.94 Chr 20 3.89 Chr 19 4.19 Chr 18 3.39 Chr 17 3.69 Chr 15 3.54 Chr 14 3.52 Chr 13 3.59 Chr 12 3.48 Chr 16 4.06 Chr 22 3.81 Chr avg hit density 2X chr avg 0.5X chr avg Non-uniquely mapping hits Uniquely mapping hits Cytogenetic band Neanderthal hit density Human hit density Figure 4 | Location on the human karyotype of Neanderthal DNA sequences. Allsequenceslongerthan30 nucleotideswhosebestalignments were to the human genome are shown. The blue lines above each chromosomemarkthepositionofallalignmentsthat areuniqueintermsof bit-score within the human genome. Orange lines are alignments that have morethanonealignmentofequalbit-score.Totheleftofeachchromosome, the average number of Neanderthal bases per 10,000 is given. Lines (Neanderthal, blue; human, red) within each chromosome show the hit density, on a log-base 2 scale, within sliding windows of 3megabases along eachchromosome.Thecentreblacklinesindicatetheaveragehit-densityfor thechromosomes.Thepurplelinesaboveandbelowindicatehitdensitiesof 2X and 1/2X the chromosome average, respectively. On chromosome 5, an example of a region of increased sequence density is highlighted. Sequence gaps in the human reference sequence are indicated by dark grey regions. Chromosomal banding pattern is indicated by light grey regions. NATURE|Vol 444|16 November 2006 ARTICLES 333 Nature Publishing Group �2006 Thesepositionsarelikelytohavechangedonthehumanlineageafter the divergence from Neanderthal. Finally, a total of 51 positions contain different bases in all three groups. Because the 454sequencing technology allows the base in a base pair from which a sequence is derived to be determined, the relative frequencies of each of the 12 possible categories of base changes can be estimated for each evolutionary lineage. As seen in Fig. 5, the patterns of the chimpanzee-specific and human-specific changes are similar to each other in that the eight transversional changes are of approximately equal frequency and about fourfold less fre- quentthaneachofthefourtransitionalchanges,yieldingatransition to transversion ratio of 2.04, typical of closely related mammalian genomes 43 . For the Neanderthal-specific changes the pattern is very different in that mismatches are dominated by C to T and G to A differences.Thus,thepatternofchangeseenamongtheNeanderthal- specific alignment mismatches is typical of the nucleotide substi- tution pattern observed in PCR of ancient DNA. Consistent with this, modern human sequences determined by 454sequencingshownoexcessamountofCtoTorGtoAdifferences (Supplementary Fig. 2), indicating that lesions in the ancient DNA ratherthansequencingerrorsaccountforthemajorityoftheerrorsin the Neanderthal sequences. Assuming that the evolutionary rate of DNA change was the same on the Neanderthal and human lineages, the majority of observed differences specific to the Neanderthal lin- eage are artefacts. All Neanderthal-specific changes were therefore disregarded in the subsequent analyses and the Neanderthal sequences were used solely to assign changes to the human or chim- panzee lineage where the human and chimpanzee genome sequences differ and the Neanderthal sequence carries either the human or the chimpanzee base. Genomic divergence between Neanderthals and humans Assuming that the rates of DNA sequence change along the chim- panzee lineage and the human lineage were similar, it can be esti- matedthat8.2%oftheDNAsequencechangesthathaveoccurredon the human lineage since the divergence from the chimpanzee lineage occurred after the divergence of the Neanderthal lineage. However, although the Neanderthal-specific changes that are heavily influ- enced by errors are not used for this analysis, some errors in the single-passsequencingreadsfromtheNeanderthalextractwillcreate positions where the Neanderthal is identical either to human or chimpanzee sequences, and thus affect the estimates of sequence change on the human and chimpanzee lineages. When the effects of such errors in the Neanderthal sequences are quantified and removed (see Supplementary Methods), ,7.9% of the sequence changesalongthehumanlineageareestimatedtohaveoccurredafter divergence from the Neanderthal. If the human?chimpanzee diver- gence time is set to 6,500,000years (refs 40, 41, 44), this implies an average human?Neanderthal DNA sequence divergence time of ,516,000years. A 95% confidence interval generated by bootstrap re-sampling of the alignment data gives a range of 465,000 to 569,000years. Obviously, these divergence estimates are dependent on the human?chimpanzee divergence time, which is a much larger source of uncertainty. We analysed the DNA sequences generated from a contemporary human using the same sequencing protocol as was used for the Neanderthal. Although ancient DNA is degraded and damaged, this comparisoncontrolsformanyoftheaspectsoftheanalysisincluding sequencing and alignment methodology. In this case, ,7.1% of the divergence along the human lineage is assigned to the time subsequent to the divergence of the two human sequences. The aver- age divergence time between alleles within humans is thus ,459,000years with a 95% confidence interval between 419,000 and 498,000years. As expected, this estimate of the average human diversity is less than the divergence seen between the human and the Neanderthal sequences, but constitutes a large fraction of it because much of the human sequence diversity is expected to predate the human?Neanderthal split 25 . Neanderthal genetic differences to humans must therefore be interpreted within the context of human diversity. Ancestral population size Humans differ from apes in that their effective population size is of the order of 10,000 while those of chimpanzees, gorillas and orang- utans are two to four times larger 45?47 . Furthermore, the population size of the ancestor of humans and chimpanzees was found to be similar to those of apes, rather than to humans 42,48 . The Neanderthal sequence data now allow us to ask if the effective size ofthepopulationancestraltohumansandNeanderthalswaslarge,as is the caseforapesand thehuman?chimpanzee ancestor, orsmall, as for present-day humans. We applied a method 42 that co-estimates the ancestral effective population size and the split time between Neanderthal and human populations(Fig.6a;seeSupplementaryMethods).AsseeninFig.6b, we recover a line describing combinations of population sizes and split times compatible with the data and lack power to be more A C T G C A G T C G G C A T T A A G T C G A C T A C T G C A G T C G G C A T T A A G T C G A C T A C T G C A G T C G G C A T T A A G T C G A C T Human=Neanderthal; Chimpanzee different Neanderthal=Chimpanzee; Human different Human=Chimpanzee; Neanderthal different Human= Neanderthal= Chimpanzee ACG T Neanderthal Human sn h p Neanderthal base Aligned base 0 50,000 100,000 150,000 200,000 250,000 736,941 total (739,966 corrected) Chimpanzee 0 500 1,000 1,500 2,000 0 400 800 1,200 p+h=10,167 (10,208 corrected) s=434 (422 corrected) n=3,447 (422 corrected) 0 20 40 60 80 100 Figure 5 | Schematic tree illustrating the number of nucleotide changes inferred to have occurred on hominoid lineages. In blue is the distribution of all aligned positions that did not change on any lineage. In brown are the changes that occurred either on the chimpanzee lineage (p) or on the hominidlineage(h)beforethehumanandNeanderthallineagesdiverged.In redarethechangesthatareuniquetotheNeanderthallineage(n),including allchangesduetobase-damageandbase-callingerrors.Inyellowarechanges unique to the human lineage. The distributions of types of changes in each category are also given. The numbers of changes in each category, corrected for base-calling errors in the Neanderthal sequence (see Supplementary Methods), are shown within parentheses. ARTICLES NATURE|Vol 444|16 November 2006 334 Nature Publishing Group �2006 precise (see Supplementary Methods and Results). Using this line we can estimate the ancestral population size, given estimates about the populationsplittimefromindependentsources.Ifweuseasplittime of400,000yearsinferredfromthefossilrecord(J.J.Hublin,personal communication),thenourpointestimateoftheancestralpopulation size is ,3,000. Given uncertainty in both the sequence divergence time and the population split time, our estimate of the ancestral population size varies from 0 to 12,000. These results suggest that the population ancestral to present-day humans and Neanderthals was similar to present-day humans in having a small effective size and thus that the effective population size on the hominid lineage had already decreased before the split between humans and Neanderthals. Therefore, the small effective population size seen in present-day human samples may not be unique to modern humans, but was present also in the common ancestor of Neanderthals and modern humans. We speculate that a small effective size, perhaps associated with numerous expansions from small groups, was typical not only of modern humans but of many groups of the genus Homo. In fact, the origin of Homo erectus may have been associated with genetic or cultural adaptations that resulted in drastic population expansions as indicated by their appearance outside Africa around two million years ago. Neanderthal sequences and human polymorphisms Another question that can be addressed with these data is how often theNeanderthalhastheancestralallele(thatis,thesamealleleseenin the chimpanzee) versus the derived (or novel) allele at sites where humans carry a single nucleotide polymorphism (SNP). The latter case identifies SNPs that were present in the common ancestor of Neanderthals and present-day humans. Using the SNPs that overlap withourdatafromtwolargegenome-wide datasets(HapMap 49 ,786 SNPsandPerlegen 50 ,318SNPs),wefindthattheNeanderthalsample has the derived allele in,30% of all SNPs. This number is presum- ably an overestimate since the SNPs analysed were ascertained to be of high frequency in present-day humans and hence are more likely to be old. Nevertheless, this high level of derived alleles in the Neanderthal is incompatible with the simple population split model estimated in the previous section, given split times inferred from the fossil record. This may suggest gene flow between modern humans andNeanderthals.GiventhattheNeanderthalXchromosomeshows ahigherlevelofdivergencethantheautosomes(R.E.G.,unpublished observation), gene flow may have occurred predominantly from modernhumanmalesintoNeanderthals.Moreextensivesequencing of the Neanderthal genome is necessary to address this possibility. Rationale and prospects for a Neanderthal genome sequence WedemonstrateherethatDNAsequencescanbegeneratedfromthe Neanderthal nuclear genome by massive parallel sequencing on the 454sequencing platform. It is thus feasible to determine large amounts of sequences from this extinct hominid. As a corollary, it is possible to envision the determination of a Neanderthal genome sequence. For several reasons, we believe that this would represent a valuable genomic resource. First, a Neanderthal genome sequence would allow all nucleotide sequence differences as well as many copy-number differences between the human and chimpanzee genomes to be temporally resolved with respect to whether they occurred before the separation of humans from Neanderthals, or whether they occurred after or at the time of separation. The latter class of changes is of interest, because some of them will be associated with the emergence of mod- ern humans. A Neanderthal genome sequence would therefore allow theresearchcommunitytodeterminewhetherDNAsequencediffer- ences between humans and chimpanzees that are found to be func- tionally important represent recent changes on the human lineage. No data other than a Neanderthal genome sequence can provide this information. Second, the fact that Neanderthals carry the derived allele for a substantial fraction of human SNPs suggests a method of identifying genomic regions that have experienced a selective sweep subsequent to the separation of human and Neanderthal populations. Such selective sweeps in the human genome will make the variation in these regions younger than the separation of humans and Neanderthals. As we show above, in regions not affected by sweeps a substantial proportion of polymorphic sites in humans will carry derived alleles intheNeanderthal genomesequence, whereasnosites will do so in regions affected by sweeps. This represents an approach to identifying selective sweeps in humans that is not possible from other data. Third, once large amounts of Neanderthal genome sequence is generated, it will become possible to estimate the misincorporation probabilities for each class of nucleotide differences between the Neanderthal and chimpanzee genomes with high accuracy by ana- lysing regions covered by many reads such as mtDNA, repeated gen- ome regions of high sequence identity, as well as single-copy regions covered by multiple reads. Once this is done, the confidence that any particular nucleotide position where the Neanderthal differs from human as well as chimpanzee is correct can be reliably estimated. In combination with future knowledge about the function of genes and biological systems, comprehensive information from the Neanderthal genome will then allow aspects of Neanderthal biology to be deciphered that are unavailable by any other means. Are fossil and technical resources today sufficient to imagine the determination of a Neanderthal genome sequence? The results pre- sentedherearederivedfromapproximatelyonefifteenthofanextract Present Time before present Ancestral effective population size Split time into two groups Divergence time for sequence one Divergence time for sequence two Neanderthal HumanChimpanzee a 0 600,000 Most likely Least likely 200,000 400,000 Split time (years) 5,000 10,000 ~3,000 Effective population size of ancestor 0 15,000b Figure 6 | Estimate of the effective population size of the ancestor of humans and Neanderthals. a, Schematic illustration of the model used to estimate ancestral effectivepopulationsize. By split time, we meanthe time, inthepast,afterwhichtherewasnomoreinterbreedingbetweentwogroups. By divergence, we mean the time, in the past, at which two genetic regions separated and began to accumulate substitutions independently. Effective population size is the number of individuals needed under ideal conditions to produce the amount of observed genetic diversity within a population. b, The likelihood estimates of population split times and ancestral population sizes. The likelihoods are grouped by colour. The red?yellow points are statistically equivalent based on the likelihood ratio test approximation. The black line is the line of best fit to red?yellow points (see Supplementary Methods). This graph is scaled assuming a human?chimpanzee average sequence divergence time of 6,500,000 years. NATURE|Vol 444|16 November 2006 ARTICLES 335 Nature Publishing Group �2006 preparedfrom,100mgofbone.Toachieveone-foldcoverageofthe Neanderthal genome (3gigabases) without any further improvement in technology, about twenty grams of bone and 6,000 runs on the current version of the 454sequencing platform would be necessary. Althoughthisisatpresentadauntingtask,technicalimprovementsin the procedures described here that would make the retrieval of DNA sequences of the order of ten times more efficient can easily be envi- sioned (our unpublished results). In view of that prospect, we have recently initiated a project that aims at achieving an initial draft ver- sion of the Neanderthal genome within two years. Received 14 July; accepted 11 October 2006. 1. Bischoff, J. L. et al. The Sima de los Huesos hominids date to beyond U/Th equilibrium (.350 kyr) and perhaps to 400?500 kyr: New radiometric dates. J. Archaeol. Sci. 30, 275?280 (2003). 2. Hublin, J-J. (ed.) Climatic Changes, Paleogeography, and the Evolution of the Neandertals (Plenum Press, New York, 1998). 3. Franciscus, R. G. (ed.) Neanderthals (Oxford Univ. Press, Oxford, 2002). 4. Hublin, J. J., Spoor, F., Braun, M., Zonneveld, F. & Condemi, S. A late Neanderthal associated with Upper Palaeolithic artefacts. Nature 381, 224?226 (1996). 5. Krings,M.etal.NeandertalDNAsequencesandtheoriginofmodernhumans.Cell 90, 19?30 (1997). 6. Krings,M.,Geisert, H.,Schmitz,R.W.,Krainitzki,H.&Pa�a�bo,S.DNAsequenceof the mitochondrial hypervariable region II from the Neandertal type specimen. Proc. Natl Acad. Sci. USA 96, 5581?5585 (1999). 7. Schmitz, R. W. et al. The Neandertal type site revisited: Interdisciplinary investigations of skeletal remains from the Neander Valley, Germany. Proc. Natl Acad. Sci. USA 99, 13342?13347 (2002). 8. Ovchinnikov,I.V.etal.MolecularanalysisofNeanderthalDNAfromthenorthern Caucasus. Nature 404, 490?493 (2000). 9. Krings,M.et al.AviewofNeandertalgeneticdiversity.Nature Genet.26,144?146 (2000). 10. Serre, D. et al. No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol. 2, 313?317 (2004). 11. Orlando, L. et al. Revisiting Neandertal diversity with a 100,000 year old mtDNA sequence. Curr. Biol. 16, R400?R402 (2006). 12. Caramelli,D.etal.AhighlydivergentmtDNAsequenceinaNeandertalindividual from Italy. Curr. Biol. 16, R630?R632 (2006). 13. Lalueza-Fox, C. et al. Neandertal evolutionary genetics: mitochondrial DNA data from the Iberian peninsula. Mol. Biol. Evol. 22, 1077?1081 (2005). 14. Currat,M.&Excoffier,L.ModernhumansdidnotadmixwithNeanderthalsduring their range expansion into Europe. PLoS Biol. 2, e421 (2004). 15. Stringer, C. Modern human origins: progress and prospects. Phil. Trans. R. Soc. Lond. B 357, 563?579 (2002). 16. Takahata, N., Lee, S. H. & Satta, Y. Testing multiregionality of modern human origins. Mol. Biol. Evol. 18, 172?183 (2001). 17. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69?87 (2005). 18. Pa�a�bo, S. et al. Genetic analyses from ancient DNA. Annu. Rev. Genet. 38, 645?679 (2004). 19. Greenwood, A. D., Capelli, C., Possnert, G. & Paabo, S. Nuclear DNA sequences from late Pleistocene megafauna. Mol. Biol. Evol. 16, 1466?1473 (1999). 20. Noonan, J. P. et al. Genomic sequencing of Pleistocene cave bears. Science 309, 597?599 (2005). 21. Rompler,H.etal.Nucleargeneindicatescoat-colorpolymorphisminmammoths. Science 313, 62 (2006). 22. Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392?394 (2006). 23. Hofreiter, M., Serre, D., Poinar, H. N., Kuch, M. & Pa�a�bo, S. Ancient DNA. Nature Rev. Genet. 2, 353?359 (2001). 24. Malmstrom, H., Stora, J., Dalen, L., Holmlund, G. & Gotherstrom, A. Extensive human DNA contamination in extracts from ancient dog bones and teeth. Mol. Biol. Evol. 22, 2040?2047 (2005). 25. Pa�a�bo, S. Human evolution. Trends Cell Biol. 9, M13?M16 (1999). 26. Poinar, H. N., Ho�ss, M., Bada, J. L. & Pa�a�bo, S. Amino acid racemization and the preservation of ancient DNA. Science 272, 864?866 (1996). 27. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376?380 (2005). 28. Stiller, M. et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl Acad. Sci. USA 103, 13578?13584 (2006). 29. Pa�a�bo, S. Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc. Natl Acad. Sci. USA 86, 1939?1943 (1989). 30. Ho�ss, M., Dilling, A., Currant, A. & Pa�a�bo, S. Molecular phylogeny of the extinct ground sloth Mylodon darwinii. Proc. Natl Acad. Sci. USA 93, 181?185 (1996). 31. Hofreiter, M., Jaenicke, V., Serre, D., Haeseler Av, A. & Pa�a�bo, S. DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res. 29, 4793?4799 (2001). 32. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860?921 (2001). 33. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520?562 (2002). 34. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic Acids Res. 34, D16?D20 (2006). 35. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389?3402 (1997). 36. Beja,O.etal.Constructionandanalysisofbacterialartificialchromosomelibraries from a marine microbial assemblage. Environ. Microbiol. 2, 516?529 (2000). 37. Venter,J.C.etal.EnvironmentalgenomeshotgunsequencingoftheSargassoSea. Science 304, 66?74 (2004). 38. Ingman, M. & Gyllensten, U. mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences. Nucleic Acids Res. 34, D749?D751 (2006). 39. Krause, J. et al. Multiplex amplification of the mammoth mitochondrial genome and the evolution of Elephantidae. Nature 439, 724?727 (2006). 40. Kumar,S.,Filipski,A.,Swarna,V.,Walker,A.&BlairHedges,S.Placingconfidence limitsonthemolecularageofthehuman-chimpanzeedivergence.Proc.NatlAcad. Sci. USA 102, 18842?18847 (2005). 41. Patterson,N.,Richter,D.J.,Gnerre,S.,Lander,E.S.&Reich,D.Geneticevidencefor complex speciation of humans and chimpanzees. Nature 441, 1103?1108 (2006). 42. Wall, J. D. Estimating ancestral population sizes and divergence times. Genetics 163, 395?404 (2003). 43. Yang, Z. & Yoder, A. D. Estimation of the transition/transversion rate bias and species sampling. J. Mol. Evol. 48, 274?283 (1999). 44. Innan, H. & Watanabe, H. The effect of gene flow on the coalescent time in the human-chimpanzee ancestral population. Mol. Biol. Evol. 23, 1040?1047 (2006). 45. Kaessmann,H.,Wiebe,V.,Weiss,G.&Pa�a�bo,S.GreatapeDNAsequencesreveal areduceddiversityandanexpansioninhumans.NatureGenet.27,155?156(2001). 46. Yu, N., Jensen-Seaman, M. I., Chemnick, L., Ryder, O. & Li, W-H. Nucleotide diversity in gorillas. Genetics 166, 1375?1383 (2004). 47. Fischer, A., Pollack, J., Thalmann, O., Nickel, B. & Pa�a�bo, S. Demographic history and genetic differentiation in apes. Curr. Biol. 16, 1133?1138 (2006). 48. Rannala,B.&Yang,Z.Bayesestimationofspeciesdivergencetimesandancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645?1656 (2003). 49. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299?1320 (2005). 50. Hinds, D. A. et al. Whole genome patterns of common DNA variation in diverse human populations. Science 307, 1072?1079 (2005). Supplementary Information is linked to the online version of the paper at www.nature.com/nature. AcknowledgementsWeareindebtedtoG.Coop,W.Enard,I.Hellmann,A.Fischer, P. Johnson, S. Kudaravalli, M. Lachmann, T. Maricic, J. Pritchard, J. Noonan, D. Reich, E. Rubin, M. Slatkin, L. Vigilant and T. Weaver for discussions. We thank A. P. Derevianko, C. Lalueza-Fox, A. Rosas and B. Vandermeersch for fossil samples. We also thank the Croatian Academy of Sciences and Arts for support and the Innovation Fund of Max Planck Society for financial support. 454 Life SciencesthanksNHGRIforcontinuedsupportforthedevelopmentofthisplatform, as well as all of its employees who developed the sequencing system. R.E.G. is supported by an NSF postdoctoral fellowship in Biological Informatics. Author Contributions M.P. provided Neanderthal samples and palaeontological information; J.M.R. and S.P. conceived of and initiated the 454 Neanderthal sequencing approach; M.T.R. developed the library preparation method, and generated and processed the sequencing data; J.F.S. planned and coordinated library preparation and sequencing activities; L.D. processed and transferred data between454LifeSciencesandtheMPI;M.E.supervised,planned andcoordinated research between MPI and 454 Life Sciences; J.K. and A.W.B. extracted ancient DNAandperformedanalysesinthe??IdentificationofaNeanderthalfossilforDNA sequencing?? section; J.K. and R.E.G. performed analyses in the ??Neanderthal mtDNA sequences?? section; R.E.G. performed the analyses in the sections ??Direct large-scale DNA sequencing?? to ??Genomic divergence between Neanderthals and humans??; S.E.P. performed analyses in the sections ??Ancestral population size?? and ??Neanderthal sequences and human polymorphisms??; S.P. conceived of the ideaspresented inthe section ??Rationale and prospects foraNeanderthalgenome sequence??,and initiated, planned and coordinated the study; R.E.G., S.E.P., J.K. and S.P. wrote the paper. Author Information Neanderthal fossil extract sequences were deposited at EBI with accession numbers CAAN01000001?CAAN01369630. Reprints and permissions information is available at www.nature.com/reprints. The authors declare competing financial interests: details accompany the paper on www.nature.com/nature. Correspondence and requests for materials should be addressed to R.E.G. (green@eva.mpg.de). ARTICLES NATURE|Vol 444|16 November 2006 336 Nature Publishing Group �2006 "
Add Content to Group
|
Bookmark
|
Keywords
|
Flag Inappropriate
share
Close
Digg
Facebook
MySpace
Google+
Comments
Close
Please Post Your Comment
*
The Comment you have entered exceeds the maximum length.
Submit
|
Cancel
*
Required
Comments
Please Post Your Comment
No comments yet.
Save Note
Note
View
Public
Private
Friends & Groups
Friends
Groups
Save
|
Cancel
|
Delete
Please provide your notes.
Next
|
Prev
|
Close
|
Edit
|
Delete
Genetics
Gene Inheritance and Transmission
Gene Expression and Regulation
Nucleic Acid Structure and Function
Chromosomes and Cytogenetics
Evolutionary Genetics
Population and Quantitative Genetics
Genomics
Genes and Disease
Genetics and Society
Cell Biology
Cell Origins and Metabolism
Proteins and Gene Expression
Subcellular Compartments
Cell Communication
Cell Cycle and Cell Division
Scientific Communication
Career Planning
Loading ...
Scitable Chat
Register
|
Sign In
Visual Browse
Close
Comments
CloseComments
Please Post Your Comment