Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.


From sequence to proteins

Credit: Stocktrek Images, Inc. /Alamy Stock Photo

1985 marked the year when the full nucleotide sequence of HIV-1 was reported by three groups, a development that was instrumental to further understanding of the genetics and molecular biology of the virus. Ratner et al., Sanchez-Pescador et al. and Wain-Hobson et al. were the first to describe the full DNA sequence and genome organization of viral isolates. The complete sequences (>9,000 kb in length) were derived from proviral DNA and circular unintegrated viral DNA, and they encompassed the long terminal repeats (LTRs), which have crucial roles in the regulation of transcription of viral genes and integration. It is now established that the viral genome encodes the capsid proteins (Gag), viral enzymes (Pol) and the envelope glycoprotein (Env), as well as six additional open reading frames. By determining the locations and sizes of the viral open reading frames, it was revealed that the fundamental genetic structure is similar to that of other retroviruses, but that HIV-1 not only has distinctive genetic complexity but also encodes genes with features not previously recognized in biology.

Understanding the genetic structure of the virus led to new insights into the regulation of viral gene expression and RNA export.

An important discovery following rapidly on the heels of the full sequence information was the finding that HIV-1 encodes a trans-acting factor, termed Tat, which was shown to be vital for the transactivation of viral gene expression from the 5ʹ LTR. During the following years, several studies revealed the mechanism of Tat-mediated transactivation: early during infection, low levels of viral transcripts are generated, which are subsequently spliced and translated to make Tat. Tat binds to an RNA stem-loop structure, the trans-activation response element (TAR), a regulatory element located downstream of the transcriptional initiation site at the 5ʹ end of nascent viral transcripts. Following binding to TAR, Tat recruits the positive transcription elongation factor b (P-TEFb), a host factor that comprises cyclin-dependent kinase 9 (CDK9) and cyclin T1 as well as other elongation factors. Cyclin T1 binds directly to Tat and CDK9 phosphorylates the C-terminal domain of RNA polymerase II, thus promoting efficient transcriptional elongation. Smaller fully spliced messages, such as those encoding Tat, are exported readily from the nucleus to the cytoplasm and are translated, whereas unspliced and incompletely spliced mRNAs require the action of Rev (regulator of expression of virion proteins), a regulatory HIV-1 protein that is also expressed during the early phase of infection.

The mechanism of Rev-dependent export of HIV-1 mRNA species became apparent in the late 1980s. Rev induces the sequence-specific nuclear export of late-phase HIV-1 mRNA species and promotes the cytoplasmic expression of HIV-1 mRNAs that encode viral accessory and structural proteins, including Gag and Env. The initial step in this pathway involves binding of Rev to the Rev-response element (RRE; a stem-loop structure that is present in intron-retaining viral mRNAs) in a highly cooperative manner. An important finding was that RRE-bound Rev forms a complex with cellular nuclear export factor CRM1 through its nuclear export signal. This interaction enables CRM1 to transport the mRNA–Rev complex into the cytoplasm for ensuing translation.

The description of the full nucleotide sequence enabled remarkable discoveries that revealed how gene expression in HIV-1 is controlled by the HIV-1 RNA-binding proteins Tat and Rev and how the virus hijacks the core molecular machinery of the host during viral replication, using mechanisms that were unprecedented at that time. The sequencing work also set the stage for further discoveries regarding the origins and diversity of the virus (MILESTONE 7).


  1. Ratner, L. et al. Complete nucleotide sequence of the AIDS virus, HTLV-III. Nature 313, 277–284 (1985).

    PubMed  Article  Google Scholar 

  2. Sanchez-Pescador, R. et al. Nucleotide sequence and expression of an AIDS-associated retrovirus (ARV-2). Science 227, 484–492 (1985).

    PubMed  Article  Google Scholar 

  3. Wain-Hobson, S., Sonigo, P., Danos, O., Cole, S. & Alizon, M. Nucleotide sequence of the AIDS virus, LAV. Cell 40, 9–17 (1985).

    PubMed  Article  Google Scholar 

Download references

Nature Careers


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing


Quick links