HIPSTR and thousands of lncRNAs are heterogeneously expressed in human embryos, primordial germ cells and stable cell lines

Eukaryotic genomes are transcribed into numerous regulatory long non-coding RNAs (lncRNAs). Compared to mRNAs, lncRNAs display higher developmental stage-, tissue-, and cell-subtype-specificity of expression, and are generally less abundant in a population of cells. Despite the progress in single-cell-focused research, the origins of low population-level expression of lncRNAs in homogeneous populations of cells are poorly understood. Here, we identify HIPSTR (Heterogeneously expressed from the Intronic Plus Strand of the TFAP2A-locus RNA), a novel lncRNA gene in the developmentally regulated TFAP2A locus. HIPSTR has evolutionarily conserved expression patterns, its promoter is most active in undifferentiated cells, and depletion of HIPSTR in HEK293 and in pluripotent H1BP cells predominantly affects the genes involved in early organismal development and cell differentiation. Most importantly, we find that HIPSTR is specifically induced and heterogeneously expressed in the 8-cell-stage human embryos during the major wave of embryonic genome activation. We systematically explore the phenomenon of cell-to-cell variation of gene expression and link it to low population-level expression of lncRNAs, showing that, similar to HIPSTR, the expression of thousands of lncRNAs is more highly heterogeneous than the expression of mRNAs in the individual, otherwise indistinguishable cells of totipotent human embryos, primordial germ cells, and stable cell lines.


LNCaP RNA-seq
LNCaP RNA-seq libraries were prepared as described in ref. 3. In brief, LNCaP poly(A) + RNA was extracted with FastTrack MAG Maxi mRNA Isolation Kit (Invitrogen), as per manufacturer's protocol, treated with 25 U of DNase I, Amplification Grade (Invitrogen) for 1 h at room temperature, quantified with Quant-iT RiboGreen RNA Reagent (Invitrogen) and assessed for integrity on 2100 Bioanalyzer (Agilent). Obtained RNA samples were used for strand-specific paired-end RNA-seq library preparation, in accordance with the standard illumina protocol and two biological replicates were sequenced on a HiSeq 2000. Data were processed as described in the main text, and 298.37 million read pairs were successfully mapped to human genome assembly hg19.

5ʹ and 3ʹ rapid amplification of cDNA ends (RACE)
Human Prostate Marathon-Ready cDNA (Clontech) was used to validate strand-specific RNA-seq identification of HIPSTR in LNCaP prostate carcinoma cell line. The first round of the 5ʹ and 3ʹ RACE PCRs was done in complete agreement with Marathon-Ready cDNA library user manual (Clontech). The second round of RACE PCR was performed with nested strand-specific primers to increase the specificity of target product detection (Table S11). Obtained PCR products were gelpurified (Wizard SV Gel and PCR Clean-Up System; Promega), cloned into pGEM T-Easy vector (Promega), and sequenced.

HIPSTR coding potential analysis and polyadenylation signal prediction
To assess HIPSTR coding potential, we first searched for potential open reading frames (ORFs) within HIPSTR gene sequence by using the ORF Finder on-line tool (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). To screen for similarities with any known proteins, all found ORFs were then subjected to blastp search against Non-redundant (nr) protein sequences database (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins).
ORF shuffling was done essentially as described in ref. 4. Briefly, HIPSTR sequence was split into groups of 3 nucleotides, which were subsequently shuffled 1000 times. Considering only ORFs that begin with a canonical ATG start codon, maximum ORF sizes were retrieved after each shuffling, and their distribution was plotted. ORF sizes are expressed as fractions of HIPSTR length.

RNA extraction, cDNA synthesis, and quantitative PCR (qPCR)
Total RNA was extracted with TRIzol (Invitrogen) and purified with RNeasy Micro Kit (QIAGEN) according to manufacturer's protocol, with on-column DNAse I treatment time extended to 1 h. Total RNA was quantified on ND-1000 (NanoDrop), and its integrity was checked with 2100 Bioanalyzer (Agilent). Total RNA was reverse transcribed with SuperScript III First-Strand Synthesis System (Invitrogen) and oligo(dT) 20 primer for detection of any transcript mentioned in this study, except for HIPSTR. To detect human HIPSTR, 100 to 500 ng total RNA and 20 pmol of strand-specific Primer #1 (Table S11) were annealed at 60 °C for 5 min, and cDNA was then synthesized at 55 °C for 1 h with ImProm-II Reverse Transcription System (Promega) and Mg 2+ concentration of 6 mM. To detect mouse Hipstr, 1 μg total RNA and 20 pmol of strand-specific Primer #2 (Table S11) were annealed at 62.5 °C for 5 min, and cDNA was then synthesized at 50 °C for 1 h with ImProm-II Reverse Transcription System (Promega) and Mg 2+ concentration of 6 mM. Strand-specific primers #1 and #2 contained a tag sequence (ATGGCGAGAATCAATGCG) at the 5ʹ-end that has no complementarity to the human or mouse genome. This tag sequence served as a target for annealing of the reverse qPCR primer, ensuring the strand specificity and eliminating non-specific background amplification 7 in the human or mouse HIPSTR detection assays. Medium was changed every other day. After six-seven days of differentiation, resultant neuroepithelial spheres attached and gave rise to migratory hNCCs, as previously described 10 . Fourfive days after the appearance of the first hNCCs, cells were collected for subsequent analyses.

H1 BP cells culture and derivation of human trophoblast-like cells (hTBCs) in vitro
H1 BP cells were derived from H1 hESCs (WiCell), cultured and differentiated into hTBCs as described previously 2 . Briefly, H1 BP cells were maintained in the hESC basal medium 11,12 , which had been conditioned by a monolayer of γ-irradiated mouse embryonic fibroblast (MEF) feeder cells for 24 h, and then supplemented with 10 ng/ml FGF2. Medium was changed every day. For passaging, H1 BP cells were detached with Gentle Cell Dissociation Reagent (STEMCELL Technologies) for 6-7 min at 37 °C, dispersed into clusters of 5-10 cells, and plated on 0.1 % gelatin-coated culture dishes of desired size.
For hTBCs derivation, 4x10 4 H1 BP cells were passaged onto 5 cm 2 culture dishes and cultured for the next 48 h as described above, after which the medium was changed to one lacking FGF2 but containing 0.1 μM PD173074 (Sigma-Aldrich) in hESC basal medium not conditioned with MEF feeder cells. Media of bothuntreated and PD173074-treated cellswas changed every day. Cells were collected for subsequent analyses after 1, 2, 4, 6, and 8 d of PD173074 treatment.

All-trans retinoic acid (ATRA) treatment of NT2/D1 cells
For ATRA treatment, 1x10 6 NT2/D1 cells were plated per 75 cm 2 tissue culture flask. Four hours after plating, all-trans RA in DMSO was added to complete growth medium to the final concentration of 10 µM, essentially as described in ref. 13. Medium containing ATRA was replaced every 7 days of treatment. Increase in HOXB5 mRNA expression levels was used to control for successful ATRA treatment, as in ref. 14.

and cloned into pCEP4 vector (Invitrogen) between
KpnI and HindIII sites.
Transfections were carried out by using FuGENE HD Reagent (Promega) at 3:1 transfection reagent:DNA ratio in the corresponding complete growth media.