Retroviruses integrate into a shared, non-palindromic DNA motif

  • Nature Microbiology volume 2, Article number: 16212 (2016)
  • doi:10.1038/nmicrobiol.2016.212
  • Download Citation


Many DNA-binding factors, such as transcription factors, form oligomeric complexes with structural symmetry that bind to palindromic DNA sequences1. Palindromic consensus nucleotide sequences are also found at the genomic integration sites of retroviruses2,​3,​4,​5,​6 and other transposable elements7,​8,​9, and it has been suggested that this palindromic consensus arises as a consequence of the structural symmetry in the integrase complex2,3. However, we show here that the palindromic consensus sequence is not present in individual integration sites of human T-cell lymphotropic virus type 1 (HTLV-1) and human immunodeficiency virus type 1 (HIV-1), but arises in the population average as a consequence of the existence of a non-palindromic nucleotide motif that occurs in approximately equal proportions on the plus strand and the minus strand of the host genome. We develop a generally applicable algorithm to sort the individual integration site sequences into plus-strand and minus-strand subpopulations, and use this to identify the integration site nucleotide motifs of five retroviruses of different genera: HTLV-1, HIV-1, murine leukaemia virus (MLV), avian sarcoma leucosis virus (ASLV) and prototype foamy virus (PFV). The results reveal a non-palindromic motif that is shared between these retroviruses.

  • Subscribe to Nature Microbiology for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Change history

  • Corrected online 14 July 2017

    In the PDF version of this article previously published, the year of publication provided in the footer of each page and in the 'How to cite' section was erroneously given as 2017, it should have been 2016. This error has now been corrected. The HTML version of the article was not affected.


  1. 1.

    & Protein–DNA recognition. Annu. Rev. Biochem. 53, 293–321 (1984).

  2. 2.

    , , , & Weak palindromic consensus sequences are a common feature found at the integration target sites of many retroviruses. J. Virol. 79, 5211–5214 (2005).

  3. 3.

    & Symmetrical base preferences surrounding HIV-1, avian sarcoma/leukosis virus, and murine leukemia virus integration sites. Proc. Natl Acad. Sci. USA 102, 6103–6107 (2005).

  4. 4.

    Symmetrical recognition of cellular DNA target sequences during retroviral integration. Proc. Natl Acad. Sci. USA 102, 5903–5904 (2005).

  5. 5.

    et al. Genome-wide mapping of foamy virus vector integrations into a human cell line. J. Gen. Virol. 87, 1339–1347 (2006).

  6. 6.

    , , , & HTLV-1 integration into transcriptionally active genomic regions is associated with proviral expression and with HAM/TSP. PLoS Pathog. 4, e1000027 (2008).

  7. 7.

    , & Insertion site preferences of the P transposable element in Drosophila melanogaster. Proc. Natl Acad. Sci. USA 97, 3347–3351 (2000).

  8. 8.

    , , , & DNA transposon Hermes inserts into DNA in nucleosome-free regions in vivo. Proc. Natl Acad. Sci. USA 107, 21966–21972 (2010).

  9. 9.

    et al. Serial number tagging reveals a prominent sequence preference of retrotransposon integration. Nucleic Acids Res. 42, 8449–8460 (2014).

  10. 10.

    , & Retroviral DNA integration. Chem. Rev. 116, 12730–12757 (2016).

  11. 11.

    et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).

  12. 12.

    , , & Transcription start regions in the human genome are favored targets for MLV integration. Science 300, 1749–1751 (2003).

  13. 13.

    et al. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol. 2, e234 (2004).

  14. 14.

    et al. Genome-wide analyses of avian sarcoma virus integration sites. J. Virol. 78, 11656–11663 (2004).

  15. 15.

    et al. Genome-wide determinants of proviral targeting, clonal abundance and expression in natural HTLV-1 infection. PLoS Pathog. 9, e1003271 (2013).

  16. 16.

    et al. HIV-1 integrase forms stable tetramers and associates with LEDGF/p75 protein in human cells. J. Biol. Chem. 278, 372–381 (2003).

  17. 17.

    et al. LEDGF/p75 is essential for nuclear and chromosomal targeting of HIV-1 integrase in human cells. J. Biol. Chem. 278, 33528–33539 (2003).

  18. 18.

    et al. LEDGF/p75 functions downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev. 21, 1767–1778 (2007).

  19. 19.

    et al. Human T-cell leukemia virus type 1 integration target sites in the human genome: comparison with those of other retroviruses. J. Virol. 81, 6731–6741 (2007).

  20. 20.

    , , & Selection of target sites for mobile DNA integration in the human genome. PLoS Comput. Biol. 2, e157 (2006).

  21. 21.

    , & Chromosome structure and human immunodeficiency virus type 1 cDNA integration: centromeric alphoid repeats are a disfavored target. J. Virol. 72, 4005–4014 (1998).

  22. 22.

    & Sequence analysis of the human DNA flanking sites of human immunodeficiency virus type 1 integration. J. Virol. 70, 6459–6462 (1996).

  23. 23.

    , , , & HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res. 17, 1186–1194 (2007).

  24. 24.

    & Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

  25. 25.

    et al. Structural basis for retroviral integration into nucleosomes. Nature 523, 366–369 (2015).

  26. 26.

    et al. Chromatin landscapes of retroviral and transposon integration profiles. PLoS Genet. 10, e1004250 (2014).

  27. 27.

    & Nucleosomes, DNA-binding proteins, and DNA sequence modulate retroviral integration target site selection. Cell 69, 769–780 (1992).

  28. 28.

    & DNA bending creates favored sites for retroviral integration: an explanation for preferred insertion sites in nucleosomes. EMBO J. 13, 4704–4714 (1994).

  29. 29.

    , , , & Key determinants of target DNA recognition by retroviral intasomes. Retrovirology 12, 39 (2015).

  30. 30.

    , & The mechanism of retroviral integration from X-ray structures of its key intermediates. Nature 468, 326–329 (2010).

  31. 31.

    et al. Structural basis of instability of the nucleosome containing a testis-specific histone variant, human H3T. Proc. Natl Acad. Sci. USA 107, 10454–10459 (2010).

  32. 32.

    et al. Intasome architecture and chromatin density modulate retroviral integration into nucleosome. Retrovirology 12, 13 (2015).

  33. 33.

    et al. Integrase residues that determine nucleotide preferences at sites of HIV-1 integration: implications for the mechanism of target DNA binding. Nucleic Acids Res. 42, 5164–5176 (2014).

  34. 34.

    et al. Crystal structure of the Rous sarcoma virus intasome. Nature 530, 362–366 (2016).

  35. 35.

    et al. A novel T-cell line derived from adult T-cell leukemia. Gan 71, 155–156 (1980).

  36. 36.

    et al. The host genomic environment of the provirus determines the abundance of HTLV-1-infected T-cell clones. Blood 117, 3113–3122 (2011).

  37. 37.

    pscl: Classes and Methods for R Developed in the Political Science Computational Laboratory (Stanford Univ., 2015).

  38. 38.

    R Core Team. R A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014);

  39. 39.

    A stability index for feature selection. In Proceedings of the 25th International Multi-Conference on Artificial Intelligence and Applications 390–395 (2007).

  40. 40.

    , & Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B Met. 39, 1–38 (1977).

  41. 41.

    & Estimation and hypothesis testing in finite mixture models. J. Roy. Stat. Soc. B. Met. 47, 67–75 (1985).

  42. 42.

    On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J. Roy. Stat. Soc. C. Appl. Stat. 36, 318–324 (1987).

Download references


This work was supported by the Wellcome Trust UK (Senior Investigator Award 100291 to C.R.M.B.; Investigator Award 107005 to G.N.M.) and the MRC (project reference MC_UP_0801/1). The authors thank the following individuals for providing materials: A. Zhyvoloup and A. Fassati (Division of Infection and Immunity, University College London) and H. Niederer (Division of Infectious Diseases, Imperial College London). The authors also thank L. Game and M. Dore at the Medical Research Council Clinical Sciences Centre Genomics Laboratory at Hammersmith Hospital, London, UK.

Author information


  1. MRC Biostatistics Unit, Cambridge Institute for Public Health, Cambridge CB2 0SR, UK

    • Paul D. W. Kirk
  2. Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK

    • Maxime Huvet
  3. Section of Virology, Division of Infectious Diseases, Imperial College London, London SW7 2AZ, UK

    • Anat Melamed
    • , Goedele N. Maertens
    •  & Charles R. M. Bangham


  1. Search for Paul D. W. Kirk in:

  2. Search for Maxime Huvet in:

  3. Search for Anat Melamed in:

  4. Search for Goedele N. Maertens in:

  5. Search for Charles R. M. Bangham in:


P.K. and C.B. conceived the project. A.M. and G.M. performed the experiments. P.K. and M.H. performed the statistical analysis and modelling. P.K. and C.B. co-wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Charles R. M. Bangham.

Supplementary information

PDF files

  1. 1.

    Supplementary information

    Supplementary Figures 1–5, Supplementary References