Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

1970s and ‘Patient 0’ HIV-1 genomes illuminate early HIV/AIDS history in North America

This article has been updated

Abstract

The emergence of HIV-1 group M subtype B in North American men who have sex with men was a key turning point in the HIV/AIDS pandemic. Phylogenetic studies have suggested cryptic subtype B circulation in the United States (US) throughout the 1970s1,2 and an even older presence in the Caribbean2. However, these temporal and geographical inferences, based upon partial HIV-1 genomes that postdate the recognition of AIDS in 1981, remain contentious3,4 and the earliest movements of the virus within the US are unknown. We serologically screened >2,000 1970s serum samples and developed a highly sensitive approach for recovering viral RNA from degraded archival samples. Here, we report eight coding-complete genomes from US serum samples from 1978–1979—eight of the nine oldest HIV-1 group M genomes to date. This early, full-genome ‘snapshot’ reveals that the US HIV-1 epidemic exhibited extensive genetic diversity in the 1970s but also provides strong evidence for its emergence from a pre-existing Caribbean epidemic. Bayesian phylogenetic analyses estimate the jump to the US at around 1970 and place the ancestral US virus in New York City with 0.99 posterior probability support, strongly suggesting this was the crucial hub of early US HIV/AIDS diversification. Logistic growth coalescent models reveal epidemic doubling times of 0.86 and 1.12 years for the US and Caribbean, respectively, suggesting rapid early expansion in each location3. Comparisons with more recent data reveal many of these insights to be unattainable without archival, full-genome sequences. We also recovered the HIV-1 genome from the individual known as ‘Patient 0’ (ref. 5) and found neither biological nor historical evidence that he was the primary case in the US or for subtype B as a whole. We discuss the genesis and persistence of this belief in the light of these evolutionary insights.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Maximum clade credibility (MCC) tree summary of the Bayesian spatio-temporal reconstruction based on complete HIV-1 genome data.
Figure 2: Demographic reconstruction based on the nested coalescent model.
Figure 3: The early patterns of HIV-1 subtype B spread in the Americas.

Change history

  • 02 October 2016

    In the online version of this paper the images in Figures 2 and 3 were switched, this has been corrected.

References

  1. Korber, B. et al. Timing the ancestor of the HIV-1 pandemic strains. Science 288, 1789–1796 (2000)

    Article  CAS  ADS  Google Scholar 

  2. Gilbert, M. T. et al. The emergence of HIV/AIDS in the Americas and beyond. Proc. Natl Acad. Sci. USA 104, 18566–18570 (2007)

    Article  CAS  ADS  Google Scholar 

  3. Holmes, E. C. When HIV spread afar. Proc. Natl Acad. Sci. USA 104, 18351–18352 (2007)

    Article  CAS  ADS  Google Scholar 

  4. Pape, J. W. et al. The epidemiology of AIDS in Haiti refutes the claims of Gilbert et al. Proc. Natl Acad. Sci. USA 105, E13 (2008)

    Article  CAS  ADS  Google Scholar 

  5. Auerbach, D. M., Darrow, W. W., Jaffe, H. W. & Curran, J. W. Cluster of cases of the acquired immune deficiency syndrome. Patients linked by sexual contact. Am. J. Med. 76, 487–492 (1984)

    Article  CAS  Google Scholar 

  6. Stevens, C. E. et al. Human T-cell lymphotropic virus type III infection in a cohort of homosexual men in New York City. J. Am. Med. Assoc. 255, 2167–2172 (1986)

    Article  CAS  Google Scholar 

  7. Szmuness, W., Stevens, C. E., Zang, E. A., Harley, E. J. & Kellner, A. A controlled clinical trial of the efficacy of the hepatitis B vaccine (Heptavax B): a final report. Hepatology 1, 377–385 (1981)

    Article  CAS  Google Scholar 

  8. Koblin, B. A., Morrison, J. M., Taylor, P. E., Stoneburner, R. L. & Stevens, C. E. Mortality trends in a cohort of homosexual men in New York City, 1978–1988. Am. J. Epidemiol. 136, 646–656 (1992)

    Article  CAS  Google Scholar 

  9. Jaffe, H. W. et al. The acquired immunodeficiency syndrome in a cohort of homosexual men. A six-year follow-up study. Ann. Intern. Med . 103, 210–214 (1985)

    Article  CAS  Google Scholar 

  10. Foley, B., Pan, H., Buchbinder, S. & Delwart, E. L. Apparent founder effect during the early years of the San Francisco HIV type 1 epidemic (1978–1979). AIDS Res. Hum. Retroviruses 16, 1463–1469 (2000)

    Article  CAS  Google Scholar 

  11. Centers for Disease Control (CDC) A cluster of Kaposi’s sarcoma and Pneumocystis carinii pneumonia among homosexual male residents of Los Angeles and Orange Counties, California. MMWR Morb. Mortal. Wkly. Rep. 31, 305–307 (1982)

  12. McKay, R. A. Imagining ‘Patient Zero’: Sexuality, Blame, and the Origins of the North American AIDS Epidemic. Doctoral thesis, Univ. of Oxford (2011)

  13. Harden, V. A. AIDS at 30: A History (Potomac Books, 2012)

  14. Darrow, W. W. Trip report to New York City, July 12–16 and August 3–6, 1982. CDC Task Force on AIDS, internal communication (3 September 1982)

  15. Darrow, W. W. Time–space clustering of KS cases in the City of New York: evidence for horizontal transmission of some mysterious microbe. CDC Task Force on Kaposi’s Sarcoma and Opportunistic Infections, internal communication (3 March 1982)

  16. Darrow, W. W. & Auerbach, D. M. Los Angeles cluster: background. CDC Task Force on Kaposi’s Sarcoma and Opportunistic Infections, internal communication (12 May 1982)

  17. Shilts, R. And the Band Played On: Politics, People, and the AIDS Epidemic (St. Martin’s Press, 1987)

  18. McKay, R. A. “Patient Zero”: the absence of a patient’s view of the early North American AIDS epidemic. Bull. Hist. Med. 88, 161–194 (2014)

    Article  Google Scholar 

  19. Moss, A. R. In response to: AIDS without end. New York Rev. Books 35, 60 (1988)

    Google Scholar 

  20. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)

    Article  CAS  Google Scholar 

  21. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. Genetics 172, 2665–2681 (2006)

    Article  CAS  Google Scholar 

  22. Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015)

    Article  Google Scholar 

  23. Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012)

    Article  CAS  Google Scholar 

  24. Rambaut, A. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16, 395–399 (2000)

    Article  CAS  Google Scholar 

  25. Lemey, P., Rambaut, A., Drummond, A. J. & Suchard, M. A. Bayesian phylogeography finds its roots. PLOS Comput. Biol . 5, e1000520 (2009)

    Article  MathSciNet  ADS  Google Scholar 

  26. Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol . 4, e88 (2006)

    Article  Google Scholar 

  27. Rambaut, A., Lam, T. T., de Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst. Virus Evol . 2, DOI: http://dx.doi.org/10.1093/ve/vew007 (2016)

  28. Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013)

    Article  CAS  Google Scholar 

  29. Faria, N. R. et al. HIV epidemiology. The early spread and epidemic ignition of HIV-1 in human populations. Science 346, 56–61 (2014)

    Article  CAS  ADS  Google Scholar 

  30. Edwards, C. J. et al. Ancient hybridization and an Irish origin for the modern polar bear matriline. Curr. Biol. 21, 1251–1258 (2011)

    Article  CAS  Google Scholar 

  31. Minin, V. N. & Suchard, M. A. Counting labeled transitions in continuous-time Markov models of evolution. J. Math. Biol. 56, 391–412 (2008)

    Article  MathSciNet  Google Scholar 

  32. Lemey, P. et al. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog . 10, e1003932 (2014)

    Article  Google Scholar 

  33. Suchard, M. A. & Rambaut, A. Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009)

    Article  CAS  Google Scholar 

  34. Gräf, T. et al. Contribution of epidemiological predictors in unraveling the phylogeographic history of HIV-1 subtype C in Brazil. J. Virol. 89, 12341–12348 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

We thank C. Stevens and D. Hemmerlein for facilitating access to archival sera; G.-Z. Han, A. Bjork, W. Switzer, V. Sullivan, R. Ruboyianes and P. Sprinkle for technical assistance; T. Spira and M. Owen for geographical data on some published sequences; and the NIH AIDS Reagent program for providing reference virus samples US657 and HT599. W. W. Darrow led the initial 1982 cluster investigation and provided R.A.M. with access to his copies of archival CDC documents. This work was supported by NIH/NIAID R01AI084691 and the David and Lucile Packard Foundation (M.W.); the Wellcome Trust (080651), the University of Oxford’s Clarendon Fund, the Economic and Social Research Council (PTA-026-27-2838), and a J. Armand Bombardier Internationalist Fellowship (R.A.M.); the Research Fund KU Leuven (Onderzoeksfonds KU Leuven, Program Financing no. PF/10/018) and the ‘Fonds voor Wetenschappelijk Onderzoek Vlaanderen’ (FWO) (G066215N) (P.L); and NSF DMS 1264153, NIH R01 HG006139 and NIH R01 AI107034 (M.A.S.).

Author information

Authors and Affiliations

Authors

Contributions

M.W., H.W.J., P.L. and R.A.M. conceived the study. T.D.W and M.W. designed the RNA jackhammering method. T.D.W. generated the sequences. B.A.K. provided serum samples from New York City. W.H. and T.G. acquired specimens and provided serological data. D.E.T. provided conceptual input. M.W., M.A.S. and P.L. prepared the data sets and performed the phylogenetic analyses. R.A.M. performed the historical analyses. M.W., H.W.J., P.L. and R.A.M. wrote the paper. All authors discussed the results and commented on the manuscript. The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Corresponding authors

Correspondence to Michael Worobey or Richard A. McKay.

Ethics declarations

Competing interests

A patent, ‘Methods and systems for RNA or DNA detection and sequencing’ (US patent application 62/325,320), has been filed with the United States Patent and Trademark Office. It will be used to facilitate the licensing of this methodology.

Additional information

Reviewer Information Nature thanks K. Andersen and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Jackhammering schematic and primer panels and pools.

ad, Detection and amplification of target RNA molecules in old, degraded, low-titre samples. For the purposes of illustration, consider a tube with 1013 RNA molecules, but (because of the low RNA quality) only one molecule that is (i) capable of being primed by the given reverse primer(s) and (ii) long enough to form a 200-bp product. a, Conventional RT–PCR with a long amplification product, oversized for a sample with RNA less than ~200 bases in length. b, RT–PCR with a shorter amplification product. c, Use of multiple primer pairs to increase the chance of at least one PCR-positive result. d, The jackhammering approach, which overcomes the problems encountered in ac by (i) targeting an extensive panel of short amplicons appropriately sized to the level of RNA survival in the sample, (ii) conducting reverse transcription with pools of primer pairs that amplify discrete, non-overlapping genomic regions, and (iii) employing a multiplex pre-amplification step, in the tube with the reverse transcription product, to generate sufficient DNA to ensure that each aliquot from it contains numerous template molecules for final PCR amplification. In this schematic, we show just two primer pairs per pool, but we used pools of ten pairs with our largest primer panels (shown in e, HXB2 numbering along HIV-1 genome). With a 10 primer-pair pool, and 10 final reactions, one can reliably recover 10 bands for sequencing. Five such pools (one entire panel of 50 pairs), allows complete HIV-1 genome recovery even in heavily degraded samples.

Extended Data Figure 2 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on complete HIV-1 genome data.

a, ‘full genome 46’, b, ‘full genome 38’. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States; CA, California, GA, Georgia; NY, New York. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. Grey bars indicate the 95% credibility intervals for the internal node ages. The tree in b represents the fully annotated version of the tree in Fig. 1 in the main text.

Extended Data Figure 3 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on different genome region data sets.

MCC trees for the same strains are shown for a, gag, b, pol, c, env and d, the complete genome. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States. Tip labels are provided for the newly obtained archival HIV-1 genomes. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. We also depict the posterior probability densities for the time of the introduction event from the Caribbean into the US on the time scale of the trees.

Extended Data Figure 4 Maximum likelihood phylogenies for the different genome region data sets.

a, gag, b, pol, c, env and d, the complete genome. We analysed the same data sets as in Extended Data Fig. 3. The diameters of the internal node circles reflect bootstrap support values. We manually coloured the branches in a similar way as for the Bayesian phylogeographic reconstructions.

Extended Data Figure 5 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on different env data sets.

a, ‘env 105’, b, ‘env 74’. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States, CA, California; GA, Georgia; NJ, New Jersey, NY, New York; PA, Pennsylvania. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. We also depict the posterior probability density for the time of the introduction event from the Caribbean into the U.S on the time scales of the trees. The three partial env sequences from SF in 1978 (ref. 10) are highlighted with bullets.

Extended Data Figure 6 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstruction comparing early and late strains.

a, ‘env 133’, b, only ‘late’ sequences from ‘env 133’. In a, we classified US sequences as ‘early’ or ‘late’ depending on whether they were sampled before or after (and including) 1985. In b, the analysis was conducted on an empirical tree distribution of ‘env 133’ from which we pruned early US sequences (in grey), but we still annotate the reconstruction on the complete phylogenies for reference. The tips of the tree correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US early, the United States sampled <1985; US late, the United States sampled in or after 1985; CA, California; GA, Georgia; NC, North Carolina, NY, New York. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95.

Extended Data Figure 7 A cluster of 40 early AIDS patients linked through sexual contact.

Reprinted from figure 1 of ref. 5 with permission from Elsevier.

Extended Data Figure 8 Jackhammering validation with reference viruses.

a, The consensus sequences for primer panels HIVM and HIVR (‘RMcon’ suffix) were included, with previously published sequences for an US (US657) virus and a Haitian (HT599) virus, in a maximum likelihood tree. The two clusters of paired sequences are highlighted by coloured boxes. b, Plot of the root to tip genetic distance against sampling time for the tree in a. The colours for the data points are consistent with those used for sampling locations in the phylogenies (the two African outgroup tips are not shown for clarity). The data points with black circles represent the published sequences while the data points with a target symbol represent the newly obtained sequences.

Extended Data Figure 9 Plots of the root-to-tip genetic distance against sampling time for different genome region data sets (gag, pol, env and the complete genome).

We used TempEst27 to obtain exploratory regressions based on the maximum likelihood trees (Extended Data Fig. 4). Each data point represents a tip; colours are consistent with those used for sampling locations in the phylogenies. The US data points with black circles represent the new genomes dating back to 1978–1979. The data point with the target symbol represents the Patient 0 genome. In each plot, we provide the R2 for the regression and the slope, reflecting the evolutionary rate (in substitutions per site per year).

Extended Data Table 1 Molecular clock, phylogeographic and recombination estimates for the different data sets

Related audio

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary References and Supplementary Tables 1-2. (PDF 1016 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Worobey, M., Watts, T., McKay, R. et al. 1970s and ‘Patient 0’ HIV-1 genomes illuminate early HIV/AIDS history in North America. Nature 539, 98–101 (2016). https://doi.org/10.1038/nature19827

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature19827

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing