Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections, and comprise nearly 8% of the human genome1. The most recently acquired human ERV is HERVK(HML-2), which repeatedly infected the primate lineage both before and after the divergence of the human and chimpanzee common ancestor2, 3. Unlike most other human ERVs, HERVK retained multiple copies of intact open reading frames encoding retroviral proteins4. However, HERVK is transcriptionally silenced by the host, with the exception of in certain pathological contexts such as germ-cell tumours, melanoma or human immunodeficiency virus (HIV) infection5, 6, 7. Here we demonstrate that DNA hypomethylation at long terminal repeat elements representing the most recent genomic integrations, together with transactivation by OCT4 (also known as POU5F1), synergistically facilitate HERVK expression. Consequently, HERVK is transcribed during normal human embryogenesis, beginning with embryonic genome activation at the eight-cell stage, continuing through the emergence of epiblast cells in preimplantation blastocysts, and ceasing during human embryonic stem cell derivation from blastocyst outgrowths. Remarkably, we detected HERVK viral-like particles and Gag proteins in human blastocysts, indicating that early human development proceeds in the presence of retroviral products. We further show that overexpression of one such product, the HERVK accessory protein Rec, in a pluripotent cell line is sufficient to increase IFITM1 levels on the cell surface and inhibit viral infection, suggesting at least one mechanism through which HERVK can induce viral restriction pathways in early embryonic cells. Moreover, Rec directly binds a subset of cellular RNAs and modulates their ribosome occupancy, indicating that complex interactions between retroviral proteins and host factors can fine-tune pathways of early human development.
At a glance
Gene Expression Omnibus
- Studies of endogenous retroviruses reveal a continuing evolutionary saga. Nature Rev. Microbiol. 10, 395–406 (2012)
- Long-term reinfection of the human genome by endogenous retroviruses. Proc. Natl Acad. Sci. USA 101, 4894–4899 (2004) et al.
- Many human endogenous retrovirus K (HERVK) proviruses are unique to humans. Curr. Biol. 9, 861–868 (1999) et al.
- Identification, characterization, and comparative genomic distribution of the HERVK (HML-2) group of human endogenous retroviruses. Retrovirology 8, 90 (2011) , , &
- Expression of human endogenous retrovirus K elements in germ cell and trophoblastic tumors. Am. J. Pathol. 149, 1727–1735 (1996) , &
- An endogenous retrovirus derived from human melanoma cells. Cancer Res. 63, 8735–8741 (2003) et al.
- Human endogenous retrovirus K (HML-2) elements in the plasma of people with lymphoma and breast cancer. J. Virol. 82, 9329–9336 (2008) et al.
- The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res. 17, 422–432 (2007) &
- Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nature Genet. 42, 631–634 (2010) et al.
- Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells. Nature Struct. Mol. Biol. 20, 1131–1139 (2013) et al.
- DNA methylation dynamics of the human preimplantation embryo. Nature 511, 611–615 (2014) et al.
- Induction of a human pluripotent state with distinct regulatory circuitry that resembles preimplantation epiblast. Cell Stem Cell 13, 663–675 (2013) et al.
- Derivation of novel human ground state naive pluripotent stem cells. Nature 504, 282–286 (2013) et al.
- Derivation of naive human embryonic stem cells. Proc. Natl Acad. Sci. USA 111, 4484–4489 (2014) et al.
- Resetting transcription factor control circuitry toward ground-state pluripotency in human. Cell 158, 1254–1269 (2014) et al.
- Systematic identification of culture conditions for induction and maintenance of naive human pluripotency. Cell Stem Cell 15, 471–487 (2014) et al.
- HERVK(HML-2), the best preserved family of HERVs: endogenization, expression, and implications in health and disease. Front. Oncol. 3, 246 (2013) , &
- Human-specific HERVK insertion causes genomic variations in the human genome. PLoS ONE 8, e60605 (2013) et al.
- Evidence that HERVK is the endogenous retrovirus sequence that codes for the human teratocarcinoma-derived retrovirus HTDV. Virology 196, 349–353 (1993) et al.
- Phenotypic heterogeneity of human endogenous retrovirus particles produced by teratocarcinoma cell lines. J. Gen. Virol. 82, 591–596 (2001) , &
- Identification of an infectious progenitor for the multiple-copy HERVK human endogenous retroelements. Genome Res. 16, 1548–1556 (2006) et al.
- Reconstitution of an infectious human endogenous retrovirus. PLoS Pathog. 3, e10 (2007) &
- Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012) et al.
- Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nature Genet. 45, 325–329 (2013) , , &
- Identification of a Rev-related protein by analysis of spliced transcripts of the human endogenous retroviruses HTDV/HERVK. J. Virol. 69, 141–149 (1995) , , , &
- The IFITM proteins mediate cellular resistance to influenza A H1N1 virus, West Nile virus, and dengue virus. Cell 139, 1243–1254 (2009) et al.
- Staufen-1 interacts with the human endogenous retrovirus family HERVK(HML-2) Rec and Gag proteins and increases virion production. J. Virol. 87, 11019–11030 (2013) et al.
- Rec (formerly Corf) function requires interaction with a complex, folded RNA structure within its responsive element rather than binding to a discrete specific binding site. J. Virol. 75, 10359–10371 (2001) et al.
- The ontogeny of cKIT+ human primordial germ cells proves to be a resource for human germ line reprogramming, imprint erasure and in vitro differentiation. Nature Cell Biol. 15, 113–122 (2013) et al.
- Normal germ line establishment in mice carrying a deletion of the Ifitm/Fragilis gene family cluster. Mol. Cell. Biol. 28, 4688–4696 (2008) et al.
- Characterization of six new human embryonic stem cell lines (HSF7, -8, -9, -10, -12, and -13) derived under minimal-animal component conditions. Stem Cells Dev. 17, 535–546 (2008) , , , &
- Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005) et al.
- Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell 139, 1290–1302 (2009) et al.
- 93–196 (Humana Press, 2005) & in RNA Silencing (ed. Carmichael, G. G.)
- Dynamic blastomere behaviour reflects human embryo ploidy by the four-cell stage. Nature Commun. 3, 1251 (2012) et al.
- Time-lapse microscopy and image analysis in basic and clinical embryo development research. Reprod. Biomed. Online 26, 120–129 (2013) , , &
- INTERFEROME v2. 0: an updated database of annotated interferon-regulated genes. Nucleic Acids Res. 41, D1040–D1046 (2013) et al.
- iCLIP: protein-RNA interactions at nucleotide resolution. Methods 65, 274–287 (2014) et al.
- Dissecting noncoding and pathogen RNA–protein interactomes. RNA 21, 135–143 (2015) et al.
- Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011) , &
- Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013) et al.
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Additional single-cell RNA-seq data analyses from preimplantation human embryos (supporting Fig. 1). (1,343 KB)
a, Heat map and hierarchical K-means clustering of highly expressed (average RPKM > 6 across 89 embryo libraries) repetitive elements in single cells of human preimplantation embryos at indicated developmental stages (top) and HERVK expression (bottom) using indicated data sets. b, HERVH expression (RPKM) in single cells of human embryos at indicated preimplantation stages. Solid line indicates mean. RNA-seq data are taken from ref. 10. c, HERVH expression (RPKM) in single cells of human blastocysts, grouped by lineage. Solid line indicates mean. Oocyte (n = 3), zygote (n = 3), 2-cell (n = 6), 4-cell (n = 11), 8-cell (n = 19), morula (n = 16), TE (n = 18), PE (n = 7), EPI (n = 5), p0 (n = 8), p10 (n = 26). RNA-seq data set was from ref. 10. d, Genome browser snapshot showing 100 bp PE-RNA-seq reads from ELF1 naive human EScells aligning at the HERVK 108 provirus on chromsome 7.
- Extended Data Figure 2: LTR5 alignments, HERVK expression data in cell lines, and control ChIP-qPCR analyses in primed human ES cells (supporting Fig. 2). (867 KB)
a, Top, presence of HERVK(HML-2) sequences in Old World primates, but absence in New World primates. Middle, schematic of HERVK proviral genome; all human-specific insertions contain LTR5HS. Bottom, phylogenetic relationship of HERVK LTR subclasses showing high degree of sequence similarity. Pro, protease; Pol, polymerase; Gag, group-specific antigen; Env, envelope. Bottom, ClustLW multiple sequence alignment of indicated HERVK LTR sequences (top), region around OCT4 motif is boxed, phylogenetic tree (bottom) indicating presence/absence of OCT4 motif. b, HERVK protein expression in human EC cells and human ES cells. Protein extracts from human EC cells (NCCIT) and human ES cells (H9) were analysed by immunoblotting with an antibody detecting HERVK Gag precursor and the processed Capsid (top), or the glycosylated, unprocessed form of the HERVK envelope protein Env (bottom). Tata-binding protein (TBP) was used as a loading control. Shown is a representative result of three independent experiments. c, RT–qPCR analysis of HERVK RNA expression in human EC cell line NCCIT, human ES cell line H9, and HEK293 cells. Three distinct qPCR amplicons, corresponding to env, gag and pro are shown. Samples were normalized to 18S ribosomal RNA levels. *P value < 0.05, one-sided t-test. Error bars are ±1 s.d., n = 3 biological replicates. d, HERVK gag or env expression in male human ES cell lines HSF-1, HSF-8, female human ES cell line H9 and human EC cell line NCCIT. *P value < 0.05, one sided t-test compared to control siRNA, n = 3 biological replicates. Error bars are ±1 s.d. e, RT–qPCR analysis of HERVK transcripts after siRNA knockdown of NANOG, OCT4 or SOX2 in human EC cells (NCCIT). Signals were normalized to 18S rRNA. *P value < 0.05, one sided t-test compared to control siRNA, n = 3 biological replicates. Error bars are ±1 s.d. f, ChIP-qPCR analyses of human ES cells (H9) with indicated antibodies. Signals were interrogated with primer sets for positive control regions (active human ES cell OCT4 and SOX2 enhancers), LTR5HS, or non-repetitive, intergenic negative regions, as indicated at the bottom. Shown is a representative result of two biological replicates.
- Extended Data Figure 3: HERVK regulation by OCT4 and DNA methylation (supporting Fig. 2). (1,414 KB)
a, Transcription factor knockdown in human EC cells (NCCIT). Cells were transfected with siRNA pools targeting indicated transcription factors and protein depletion was measured by immunofluoresence with respective antibodies in comparison with control, mock-transfected cells. DAPI (blue), OCT4 (green, left), NANOG (green, middle), SOX2 (green, right). Shown is one of three representative fields of view at ×20 magnification. b, Dual luciferase assays with indicated reporter constructs in human EC cells (NCCIT) showing that mutation of OCT4 site decreases reporter activity. N = 3 biological replicates, error-bars ±1 s.d. *P value < 0.05, one-sided t-test. SV40 enhancer/promoter construct was used as a positive control. c, Bisulfite sequencing for indicated cell types (WT33 human IPSC) analysing consensus LTR5HS-specific amplicon as in Fig. 2e. d, Bisulfite sequencing analysis of HERVK proviral consensus amplicon containing 3′ end of LTR, primer binding site, and 5′ region of Gag ORF (see Extended Data Fig. 2a) in indicated cell types: ELF1 naive, human ES cell, WT33 human IPSC, NCCIT human EC cell, or H9 human ES cell. e, RT–qPCR analysis of HERVK RNA levels in HEK293 cells treated with indicated concentrations of 5-aza-2′-deoxycytidine for 3 days, followed by transfection with OCT4/SOX2 expression constructs and RNA collection 48 h after transfection. qPCR primer sets were designed to three independent amplicons of HERVK. *P value < 0.05, one-sided t-test. n = 4 biological replicates, error bars ±1 s.d.
- Extended Data Figure 4: HERVK Gag/Capsid antibody validation and staining (supporting Fig. 3). (1,489 KB)
a, Immunofluorescence analysis of human EC cells (NCCIT) and human ES cells (H9) stained with DAPI (blue), OCT4 (green), Gag/Capsid (red), or IgG control (bottom). White boxes indicate regions shown in higher magnification/merge (right). Shown are representative fields of three independent experiments. b, Sensitivity of HERVK Gag/Capsid antibody immunoblot signal to HERVK knockdown. Human EC cells were transfected with one of three independent siRNA pools targeting HERVK Gag or with a control, non-targeting pool (synthesized against RFP) and total protein was analysed by immunoblotting with anti-Env and anti-Gag/Capsid antibodies. 1:2 serial dilution of total protein was loaded, as indicated. Blots were stripped and re-probed with TBP as a loading control. Shown is a representative result of two independent experiments. c, Sensitivity of HERVK Gag/Capsid antibody immunofluorescence signal to siRNA knockdown of Gag/Capsid (top) or control siRNA targeting RFP (bottom). Shown is a representative result of three fields of view. Magnification: 20X d, Immunoflourescence of naive ELF1 human ES cells with antibodies against OCT4 (green), HERVK Gag/Capsid (pink), DAPI in blue. Region marked with white box on left is shown with larger magnification (bottom). Magnification = 20x, 40x respectively. e, Another representative example of immunoflourescence of human blastocysts with DAPI (blue), OCT4 (green) and Gag/Capsid (red) shown (n = 19 blastocysts; DPF 5–6). Original magnification, ×40.
- Extended Data Figure 5: TEM analyses of human EC cells and control embryo staining (supporting Fig. 3). (1,459 KB)
a, TEM analysis of human EC cells (NCCIT) with heavy metal staining; arrow indicates VLPs. Boxed region is shown with higher magnification in an inset. Scale bar = 500 nm. Shown is a representative example of two independent experiments. b, TEM immuno-gold labelling of human EC cells (NCCIT) with Gag/Capsid antibodies. Shown is a representative example from two independent experiments. c, Secondary antibody only control for immuno-gold labelling of human blastocysts. Shown is a representative example from eight fields of view. d, Model figure summarizing HERVK transcriptional regulation in human embryos and in vitro cultured pluripotent cells. Dashed lines indicate inference of OCT4, DNA methylation and HERVK level changes at implantation from those observed between naive and primed human ES cells, in the absence of data from actual postimplantation human embryos.
- Extended Data Figure 6: Correlation of HERVK LTR5HS elements with gene expression (supporting Fig. 4). (940 KB)
a, Number of splice junctions identified linking indicated HERV class to annotated ReqSeq genes. Analysis was done using RNA-seq data set from ELF1 naive human ES cells, n = 3 biological replicates. b, Number of reads supporting chimaeric transcripts from indicated HERV class in ELF1 naive human ES cells, n = 3 biological replicates. c, Expression of LTR5HS linked genes plotted as a function of distance to the gene’s transcription start site (TSS). x-axis: distance of TSS to the nearest LTR5HS in kb; y-axis: fold change in expression of the linked gene in ELF1 naive versus primed human ES cells (this study, left) or expression of the linked gene in 3iL versus primed H1 human ES cells (right, ref. 12). d, Top, histograms showing expression of all genes that significantly change in expression between naive and primed ELF1 human ES cells (top histogram, white) or significantly changed genes that are LTR5HS associated (bottom histogram, blue); expression values from naive versus primed ELF1 human ES cell RNA-seq data sets (FDR < 0.05 DESeq). Fischer’s exact test gives stated P value, indicating enrichment of LTR5HS-linked genes in naive upregulated category. Bottom, quantification of average expression of LTR5HS-linked (blue) or unlinked (white) genes. Non-paired Wilcoxon test with stated P value indicating that genes linked to 1 or more LTR5HS have significantly higher mean expression. e, Top, histograms showing expression of all genes that significantly change in expression between 3iL and primed H1 human ES cells (top histogram, white) or significantly changed genes that are LTR5_HS associated (bottom histogram, blue); expression values from RNA-seq data sets reported previously12, FDR < 0.05 DESeq. Fischer’s exact test gives stated P value indicating enrichment of LTR5HS-linked genes in naive upregulated category. Bottom, quantification of average expression of LTR5HS-linked (blue) or unlinked (white) genes. Non-paired Wilcoxon test with stated P value indicating that genes linked to 1 or more LTR5HS have significantly higher mean expression.
- Extended Data Figure 7: rec and IFITM1 expression in naive human ES cells, and effect of Rec expression on H1N1(PR8) infection (supporting Fig. 4). (929 KB)
a, Left, RT–qPCR analysis of HERVK rec expression levels in ELF1 naive human ES cells (n = 3 biological replicates) or H9 primed human ES cells (one biological replicate). Normalized to 18S rRNA. Right, Rec RNA levels in indicated blastocyst lineages. Solid line indicates mean; data are from ref. 10. b, RNA-seq quantification of IFITM1 RNA levels in naive or primed ELF1 human ES cells (left, this study) or 3iL human ES cells versus primed H1 human ES cells from ref. 12 (right). n = 3 biological replicates for each condition, error bars are ±1 s.d. Asterisk indicates significance at FDR < 0.05, DESeq. c, Flow cytometry for surface-localized IFITM1 staining in the indicated H9 human ES cells or naive ELF1 human ES cells (top) or, as a control for IFITM1 antibody specificity, knockdown of IFITM1 with two independent IFITM1 siRNA pools compared to control siRNA-treated cells in Flag–eGFP–Rec-hECCs (bottom). d, Left, IFITM1 expression in control human EC cell versus Rec-hECC (NCCIT) RNA-seq data sets. n = 2 biological replicates. Significance = FDR < 0.05, DESeq. Right, IFITM1 expression in control siRNA versus Rec siRNA-treated human EC cells (NCCIT) RNA-seq. n = 3 biological replicates, error bars are ±1 s.d. Significance = FDR < 0.05, DESeq. e, Flow-cytometry profiles for indicated cell types in H1N1(PR8) infected (top) or non-infected (bottom) wild-type (WT) control human EC cells or Flag–GFP–Rec-hECCs, clone #1. Shown is one representative example of four independent experiments showing a co-plating experiment in which GFP-Rec cells and wild-type control (GFP negative) cells are infected in the same well, stained in the same tube and identified by GFP fluorescence after gating for FSC and SSC. f, Scatterplot of ELF1 naive versus primed human ES cell RNA-seq showing all interferon-induced genes, with differentially regulated genes (FDR < 0.05 DESeq, n = 3 biological replicates each) highlighted in red. There is a significant overlap between differentially regulated genes and interferon-induced genes as measured by a hypergeometric test (P value < 0.05).
- Extended Data Figure 8: iCLIP analysis of Rec-associated RNAs (supporting Fig. 4). (963 KB)
a, Diagram of iCLIP-seq procedure (see Methods for details). Briefly, cells are crosslinked using ultraviolet, lysed and digested with RNase to trim RNAs. Sequential immunopurification is performed using Flag M2, peptide elution, and GFP immunoprecipitation (IP). After stringent washing, RNAs are recovered and either radiolabelled (shown in b) or reverse transcribed and prepared for Illumina HTPS libraries. b, Autoradiogram of labelled RNAs (top) recovered from ultraviolet-crosslinked cells using sequential Flag–eGFP immunoprecipitation from: wild-type human EC cells (lanes 1, 2), Flag–eGFP control human EC cells (lanes 3, 4), or two independent Rec-hECC transgenic lines (lanes 5–8), separated on an SDS–polyacrylamide gel electrophoresis (SDS–PAGE) gel. Free Rec protein runs as a ~35 kDa band, while Rec protein crosslinked to RNA molecules show lower electrophoretic mobility. Please note that: (1) Rec-bound RNAs are resistant to even high concentrations of RNaseI, probably indicating extensive secondary RNA structures, and (2) low/no background of contaminating RNAs in control immunoprecipitation from wild-type human EC cells or Flag–eGFP control human EC cells. Western blots with anti-GFP antibody were also performed to confirm the presence of tagged protein in Flag–eGFP control and Flag–eGFP–Rec cells, both in input and immunoprecipitation fractions (middle). HSP90 was used as a loading control (bottom). c, Computationally predicted (using mFold) secondary structure of LTR5HS sequence around the Rec response element (identified experimentally in vitro previously25). Single nucleotide resolution Rec ultraviolet-crosslinking sites determined by iCLIP are shaded in red; n = 2 biological replicates.
- Extended Data Figure 9: Rec target mRNA analysis (supporting Fig. 4). (580 KB)
a, Genome browser representations of the Rec iCLIP read (n = 2 biological replicates) distribution at indicated mRNA targets. b, Computationally predicted (using mFold) secondary structures of indicated Rec iCLIP-seq targets. Single-nucleotide resolution Rec ultraviolet-crosslinking sites determined by iCLIP are shaded in red; to orient the reader, browser representation of the folded fragment is shown above each respective cartoon.
- Supplementary Data (271 KB)
This file contains Supplementary Table 1.
- Supplementary Data (13.7 MB)
This file contains Supplementary Table 2.
- Supplementary Data (227 KB)
This file contains Supplementary Table 3.
- Supplementary Data (15 KB)
This file contains Supplementary Table 4.
- Supplementary Data (5.1 MB)
This file contains Supplementary Table 5.
- Supplementary Data (8 MB)
This file contains Supplementary Table 6.
- Supplementary Data (11 KB)
This file contains Supplementary Table 7.
- Supplementary Data (9 KB)
This file contains Supplementary Table 8.
- Supplementary Data (9 KB)
This file contains Supplementary Table 9.
- Supplementary Data (16 KB)
This file contains Supplementary Table 10.