Vertebrate genomes exhibit marked CG suppression—that is, lower than expected numbers of 5′-CG-3′ dinucleotides1. This feature is likely to be due to C-to-T mutations that have accumulated over hundreds of millions of years, driven by CG-specific DNA methyl transferases and spontaneous methyl-cytosine deamination. Many RNA viruses of vertebrates that are not substrates for DNA methyl transferases mimic the CG suppression of their hosts2,3,4. This property of viral genomes is unexplained4,5,6. Here we show, using synonymous mutagenesis, that CG suppression is essential for HIV-1 replication. The deleterious effect of CG dinucleotides on HIV-1 replication was cumulative, associated with cytoplasmic RNA depletion, and was exerted by CG dinucleotides in both translated and non-translated exonic RNA sequences. A focused screen using small inhibitory RNAs revealed that zinc-finger antiviral protein (ZAP)7 inhibited virion production by cells infected with CG-enriched HIV-1. Crucially, HIV-1 mutants containing segments whose CG content mimicked random nucleotide sequence were defective in unmanipulated cells, but replicated normally in ZAP-deficient cells. Crosslinking–immunoprecipitation–sequencing assays demonstrated that ZAP binds directly and selectively to RNA sequences containing CG dinucleotides. These findings suggest that ZAP exploits host CG suppression to identify non-self RNA. The dinucleotide composition of HIV-1, and perhaps other RNA viruses, appears to have adapted to evade this host defence.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Karlin, S. & Mrázek, J. Compositional differences within and between eukaryotic genomes. Proc. Natl Acad. Sci. USA 94, 10227–10232 (1997)
Karlin, S., Doerfler, W. & Cardon, L. R. Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? J. Virol. 68, 2889–2897 (1994)
Rima, B. K. & McFerran, N. V. Dinucleotide and stop codon frequencies in single-stranded RNA viruses. J. Gen. Virol. 78, 2859–2870 (1997)
Greenbaum, B. D., Levine, A. J., Bhanot, G. & Rabadan, R. Patterns of evolution and host gene mimicry in influenza and other RNA viruses. PLoS Pathog. 4, e1000079 (2008)
Cheng, X. et al. CpG usage in RNA viruses: data and hypotheses. PLoS One 8, e74109 (2013)
Futcher, B. et al. Reply to Simmonds et al.: Codon pair and dinucleotide bias have not been functionally distinguished. Proc. Natl Acad. Sci. USA 112, E3635–E3636 (2015)
Gao, G., Guo, X. & Goff, S. P. Inhibition of retroviral RNA production by ZAP, a CCCH-type zinc finger protein. Science 297, 1703–1706 (2002)
van Hemert, F., van der Kuyl, A. C. & Berkhout, B. On the nucleotide composition and structure of retroviral RNA genomes. Virus Res. 193, 16–23 (2014)
Karn, J. & Stoltzfus, C. M. Transcriptional and posttranscriptional regulation of HIV-1 gene expression. Cold Spring Harb. Perspect. Med. 2, a006916 (2012)
Li, M. M. et al. TRIM25 enhances the antiviral action of zinc-finger antiviral protein (ZAP). PLoS Pathog. 13, e1006145 (2017)
Zheng, X. et al. TRIM25 is required for the antiviral activity of zinc finger antiviral protein. J. Virol. 91, e00088–17 (2017)
Zhu, Y. et al. Zinc-finger antiviral protein inhibits HIV-1 infection by selectively targeting multiply spliced viral mRNAs for degradation. Proc. Natl Acad. Sci. USA 108, 15834–15839 (2011)
Guo, X., Carroll, J. W., Macdonald, M. R., Goff, S. P. & Gao, G. The zinc finger antiviral protein directly binds to specific viral mRNAs through the CCCH zinc finger motifs. J. Virol. 78, 12781–12787 (2004)
Zhu, Y. & Gao, G. ZAP-mediated mRNA degradation. RNA Biol. 5, 65–67 (2008)
Chen, S. et al. Structure of N-terminal domain of ZAP indicates how a zinc-finger protein recognizes complex RNA. Nat. Struct. Mol. Biol. 19, 430–435 (2012)
Huang, Z., Wang, X. & Gao, G. Analyses of SELEX-derived ZAP-binding RNA aptamers suggest that the binding specificity is determined by both structure and sequence of the RNA. Protein Cell 1, 752–759 (2010)
Bick, M. J. et al. Expression of the zinc-finger antiviral protein inhibits alphavirus replication. J. Virol. 77, 11555–11562 (2003)
Müller, S. et al. Inhibition of filovirus replication by the zinc finger antiviral protein. J. Virol. 81, 2391–2400 (2007)
Mao, R. et al. Inhibition of hepatitis B virus replication by the host zinc finger antiviral protein. PLoS Pathog. 9, e1003494 (2013)
Lin, Y. et al. Identification and characterization of alphavirus M1 as a selective oncolytic virus targeting ZAP-defective human cancers. Proc. Natl Acad. Sci. USA 111, E4504–E4512 (2014)
Goodier, J. L., Pereira, G. C., Cheung, L. E., Rose, R. J. & Kazazian, H. H., Jr. The broad-spectrum antiviral protein ZAP restricts human retrotransposition. PLoS Genet. 11, e1005252 (2015)
Moldovan, J. B. & Moran, J. V. The zinc-finger antiviral protein ZAP inhibits LINE and Alu retrotransposition. PLoS Genet. 11, e1005121 (2015)
Liu, C. H., Zhou, L., Chen, G. & Krug, R. M. Battle between influenza A virus and a newly identified antiviral activity of the PARP-containing ZAPL protein. Proc. Natl Acad. Sci. USA 112, 14048–14053 (2015)
Tang, Q., Wang, X. & Gao, G. The short form of the zinc finger antiviral protein inhibits influenza A virus protein expression and is antagonized by the virus-encoded NS1. J. Virol. 91, e01909–16 (2017)
Coleman, J. R. et al. Virus attenuation by genome-scale changes in codon pair bias. Science 320, 1784–1787 (2008)
Tulloch, F., Atkinson, N. J., Evans, D. J., Ryan, M. D. & Simmonds, P. RNA virus attenuation by codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies. eLife 3, e04531 (2014)
Kunec, D. & Osterrieder, N. Codon pair bias is a direct consequence of dinucleotide bias. Cell Reports 14, 55–67 (2016)
Todorova, T., Bock, F. J. & Chang, P. PARP13 regulates cellular mRNA post-transcriptionally and functions as a pro-apoptotic factor by destabilizing TRAILR4 transcript. Nat. Commun. 5, 5362 (2014)
Kutluay, S. B. et al. Global changes in the RNA binding specificity of HIV-1 gag regulate virion genesis. Cell 159, 1096–1109 (2014)
Corcoran, D. L. et al. PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data. Genome Biol. 12, R79 (2011)
We thank T. Kueck for primary lymphocytes and S. Giese for assistance with smFISH. This work was supported NIH grants R01AI50111 and P50GM103297 (to P.D.B.)
The authors declare no competing financial interests.
Reviewer Information Nature thanks G. Towers and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Figure 1 CG-enriched HIV-1 clones yield near wild-type levels of virus from transfected 293T cells but are attenuated in replication in primary lymphocytes.
a, Yield of infectious virus from proviral plasmid transfected 293T cells, as measured by infection of MT4 cells (mean ± s.e.m., n = 3, 4 or 5 independent experiments). b, c, Spreading replication of HIV-1 mutants in primary lymphocytes from two additional donors as measured by reverse transcriptase activity in the supernatant of infected cells over time.
Extended Data Figure 2 Effects of CG dinucleotides on HIV-1 infectious virion yield, RNA and protein levels in single-cycle replication assays.
a, Yield of infectious virus in a single cycle of replication following infection of MT4 cells with equal titres of HIV-1WT and pol mutants (mean ± s.e.m., n = 3 independent experiments). b, Expression of gfp in MT4 cells, as measured by flow cytometry, 48 h after infection with equal titres of the indicated viruses. Numerical values are mean fluorescent intensity (MFI) of infected cells (indicated by the dotted box). c, Western blot analysis (anti-Gag, anti-Env, anti-GFP and anti-HSP90) of viral, reporter and cellular protein expression, 48 h after a single cycle of infection of MT4 cells with wild-type and synonymous pol mutant HIV-1. Representative of three experiments. d, RT–qPCR quantification of unspliced RNA in MT4 cells in a single-cycle infection assay with wild-type and synonymous pol mutant HIV-1 (mean ± s.e.m., n = 2 or 3 independent experiments). e, Quantification of RNA molecules (fluorescent spots) by smFISH in cytoplasm using a probe targeting all spliced and unspliced HIV-1 RNA species after infection of HOS/CXCR4-CD4 cells. Each symbol represents an individual cell. Horizontal lines represent mean values; P values were determined using Mann–Whitney test (n = 10).
Examples of smFISH analysis of wild-type and synonymous mutant HIV-1-infected cells (red, smFISH gag probe (see Fig. 2c); green, GFP; blue, Hoescht dye). The boxed areas indicate regions selected for expanded views in Fig. 2f. Clusters of RNA molecules in the nuclei of some infected cells may represent sites of proviral integration. Representative of three independent experiments. Scale bar, 5 μm.
Examples of smFISH analysis of wild-type and synonymous mutant HIV-1-infected cells (red, smFISH probe targeting all viral mRNA species (see Fig. 2c); green, GFP; blue, Hoescht dye). Clusters of RNA molecules in the nuclei of some infected cells may represent sites of proviral integration. Representative of three independent experiments. Scale bar, 5 μm.
a, Western blot analyses, using the indicated antibodies, following transfection of HeLa cells with the corresponding siRNAs, or control siRNAs, in the single-cycle replication assays described in Fig. 3a. Representative of 2 experiments. b, Western blot analysis of ZAP expression in control, CRISPR-knockout MT4 cells and doxycycline-inducible ZAP-S-reconstituted MT4 cells. Asterisks indicate protein species that appeared in some CRISPR knockout clones, reacted with an anti-ZAP antibody and arose after extended passage. These are likely to represent truncated forms of ZAP-L whose translation initiated at methionine codons 3′ to the CRISPR target site (near the ZAP N terminus). Representative of three experiments. c, Western blot analysis (anti-Gag, anti-Env, anti-GFP and anti-tubulin) of viral and cellular protein levels in cells and virions, 48 h after single-cycle wild-type or mutant HIV-1infection of ZAP−/− MT4 cells that had been reconstituted with a doxycycline-inducible ZAP-S expression construct (ZAPDI) and left untreated or treated with doxycycline. Representative of three experiments.
a, Schematic representation of a reporter construct encoding a CG dinucleotide-depleted fluc cDNA into which were inserted the indicated sequences as 3′ UTRs. b, Western blot analysis of ZAP expression following CRISPR mutation of ZAP exon 1 in HeLa cells. Representative of three experiments. c, Number of CG dinucleotides present in a 200-nucleotide sliding window in the indicated viral cDNA sequences that were left unmanipulated (WT) or recoded with synonymous mutations to contain the maximum number of CG dinucleotides (CG+). d, Luciferase expression following transfection of 293T ZAP−/− cells with CG dinucleotide-depleted fluc reporter plasmids incorporating the indicated VSV or influenza A virus (IAV) RNA sequences as 3′ UTRs, in the presence or absence of a cotransfected ZAP-L expression plasmid (mean ± s.e.m., n = 4 independent experiments).
Extended Data Figure 7 Dinucleotide composition of ORFs, 3′ UTRs, and preferred ZAP binding sites in cellular mRNAs.
a, Expanded views of the portion of the CLIP graphs in Fig. 4a corresponding to unmutated portions of the viral genome. b, Sources of RNA reads bound to ZAP in a typical CLIP–seq experiment, done using HIV-1-infected cells. c–e, Ratio of the observed frequency to the expected frequency (obs/exp, based on mononucleotide composition) for each of the 16 possible dinucleotides, in ORFs (c), 3′ UTR sequences (d) and the 100 sites in cellular mRNAs that were most frequently bound by ZAP, based on CLIP read numbers (e). Plotted values are mean ± s.d. of all ORFs (n = 35,170) and 3′ UTRs (n = 135,557) in the respective libraries or the most preferred ZAP binding sites (n = 100). f, Frequency distributions of CG dinucleotide observed/expected frequencies in human ORFs, 3′ UTRs and top 100, top 1,000 and top 10,000 ZAP-binding sites in CLIP experiments. The top 100, top 1,000 and top 10,000 ZAP-binding sites account for 6.7%, 18.9% and 46.7% of total reads. g, Frequency distributions of CG, GC, UA and UG dinucleotide observed/expected frequencies in human ORFs, 3′ UTRs and the top 100 APOBEC3G-binding sites in CLIP assays.
Extended Data Figure 8 Analysis of CG suppression in previously reported ZAP-sensitive and ZAP-resistant viruses and ZAP-sensitizing elements.
a, CG suppression in RNA and reverse transcribing viruses previously reported to be ZAP sensitive (n = 9, open symbols) and ZAP resistant (n = 4, filled symbols)7,17,18,19,20. The viruses included in the analysis and their degrees of CG suppression (CG observed/expected) are: ZAP-sensitive: Sinbis virus (0.90), Semliki forest virus (0.89), Venezuelan equine encephalitis virus (0.76), ebolavirus (0.60), hepatitis B virus (0.52), Moloney murine leukaemia virus (0.51), Marburg virus (0.53), alphavirus M1 (0.89), Ross river virus (0.82); ZAP-insensitive: HIV-1 (0.21), yellow fever virus (0.38), vesicular stomatitis virus (0.48), poliovirus (0.54). The P value was calculated using Student’s t-test (two-sided, n = 9 ZAP-sensitive viruses and n = 4 ZAP-resistant viruses). Influenza virus (CG obs/exp = 0.44), which has been reported to be ZAP-resistant owing to the presence of an antagonist24 and ZAP-L-sensitive via an entirely distinct protein interaction-based mechanism23, was excluded from this analysis. b, Analysis of previous published data on ZAP inhibition of reporter gene expression. Each RNA element derived from the indicated RNA viruses was placed in a 3′ UTR of a luciferase reporter plasmid and fold inhibition by coexpressed ZAP is plotted against the product of CG suppression (CG observed/expected) and length for each RNA element. A data point that is a quantitative outlier from the general trend (indicated in red) is from the Sinbis (SINV) genome, but is nevertheless included in the linear regression analysis. P value was calculated using the F-test (two-sided, n = 32 data points) Data are from refs 13 and 18.
This file contains uncropped Western blots used in Figures 2 and 3, and in Extended Data Figures 2, 5 and 6. (PDF 4071 kb)
A codon by codon list of the mutations made in segment L (XLSX 34 kb)
Alignment of WT and mutant EH segments (Fasta format). (ZIP 2 kb)
Alignment of WT and mutant L segments (Fasta format). (ZIP 1 kb)
About this article
Cite this article
Takata, M., Gonçalves-Carneiro, D., Zang, T. et al. CG dinucleotide suppression enables antiviral defence targeting non-self RNA. Nature 550, 124–127 (2017). https://doi.org/10.1038/nature24039
Effect of genome composition and codon bias on infectious bronchitis virus evolution and adaptation to target tissues
BMC Genomics (2021)
SARS-CoV-2 and other human coronavirus show genome patterns previously associated to reduced viral recognition and altered immune response
Scientific Reports (2021)
Mechanisms Underlying Host Range Variation in Flavivirus: From Empirical Knowledge to Predictive Models
Journal of Molecular Evolution (2021)
Genome recoding: a review of basic concepts, current research and future prospects of virus attenuation for controlling plant viral diseases
Journal of Plant Biochemistry and Biotechnology (2021)
BMC Biology (2020)