To elucidate the transcriptional 'landscape' that regulates human lymphoid commitment during postnatal life, we used RNA sequencing to assemble the long non-coding transcriptome across human bone marrow and thymic progenitor cells spanning the earliest stages of B lymphoid and T lymphoid specification. Over 3,000 genes encoding previously unknown long non-coding RNAs (lncRNAs) were revealed through the analysis of these rare populations. Lymphoid commitment was characterized by lncRNA expression patterns that were highly stage specific and were more lineage specific than those of protein-coding genes. Protein-coding genes co-expressed with neighboring lncRNA genes showed enrichment for ontologies related to lymphoid differentiation. The exquisite cell-type specificity of global lncRNA expression patterns independently revealed new developmental relationships among the earliest progenitor cells in the human bone marrow and thymus.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Fatica, A. & Bozzoni, I. Long non-coding RNAs: new players in cell differentiation and development. Nat. Rev. Genet. 15, 7–21 (2014).
Galy, A., Travis, M., Cen, D., Chen, B. & Human, T. B, natural killer, and dendritic cells arise from a common bone marrow progenitor cell subset. Immunity 3, 459–473 (1995).
Plum, J. et al. Human intrathymic development: a selective approach. Semin. Immunopathol. 30, 411–423 (2008).
Doulatov, S. et al. Revised map of the human progenitor hierarchy shows the origin of macrophages and dendritic cells in early lymphoid development. Nat. Immunol. 11, 585–593 (2010).
Six, E.M. et al. A human postnatal lymphoid progenitor capable of circulating and seeding the thymus. J. Exp. Med. 204, 3085–3093 (2007).
Kohn, L.A. et al. Lymphoid priming in human bone marrow begins before expression of CD10 with upregulation of L-selectin. Nat. Immunol. 13, 963–971 (2012).
Blom, B. & Spits, H. Development of human lymphoid cells. Annu. Rev. Immunol. 24, 287–320 (2006).
Hao, Q.L. et al. Human intrathymic lineage commitment is marked by differential CD7 expression: identification of CD7-lympho-myeloid thymic progenitors. Blood 111, 1318–1326 (2008).
Weerkamp, F. et al. Human thymus contains multipotent progenitors with T/B lymphoid, myeloid, and erythroid lineage potential. Blood 107, 3131–3137 (2006).
Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
Dik, W.A. et al. New insights on human T cell development by quantitative T cell receptor gene rearrangement studies and gene expression profiling. J. Exp. Med. 201, 1715–1723 (2005).
Tydell, C.C. et al. Molecular dissection of prethymic progenitor entry into the T lymphocyte developmental pathway. J. Immunol. 179, 421–438 (2007).
Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Lin, M.F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Volders, P.J. et al. LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 41, D246–D251 (2013).
Trimarchi, T. et al. Genome-wide mapping and characterization of Notch-regulated long noncoding RNAs in acute leukemia. Cell 158, 593–606 (2014).
Marques, A.C. et al. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol. 14, R131 (2013).
Ilott, N.E. et al. Long non-coding RNAs and enhancer RNAs regulate the lipopolysaccharide-induced inflammatory response in human monocytes. Nat. Commun. 5, 3979 (2014).
Saeed, S. et al. Epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity. Science 345, 1251086 (2014).
Chadwick, L.H. The NIH Roadmap Epigenomics Program data resource. Epigenomics 4, 317–324 (2012).
Sati, S., Ghosh, S., Jain, V., Scaria, V. & Sengupta, S. Genome-wide analysis reveals distinct patterns of epigenetic features in long non-coding RNA loci. Nucleic Acids Res. 40, 10018–10031 (2012).
Si, Y., Liu, P., Li, P. & Brutnell, T.P. Model-based clustering for RNA-Seq data. Bioinformatics 30, 197–205 (2014).
McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science 345, 1251033 (2014).
Rinn, J.L. & Chang, H.Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145–166 (2012).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Horvath, S. et al. Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc. Natl. Acad. Sci. USA 103, 17402–17407 (2006).
Taghon, T. et al. Notch signaling is required for proliferation but not for differentiation at a well-defined beta-selection checkpoint during human T-cell development. Blood 113, 3254–3263 (2009).
Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).
Ranzani, V. et al. The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Nat. Immunol. 16, 318–325 (2015).
Durinck, K. et al. The Notch driven long non-coding RNA repertoire in T-cell acute lymphoblastic leukemia. Hematologica. 99, 1808–1826 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Roberts, A., Pimentel, H., Trapnell, C. & Pachter, L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27, 2325–2329 (2011).
Anders, S., Pyl, P.T. & Huber, W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
We thank F. Codrea, A. George and S. Feng for technical assistance; M. Pellegrini and R. D'Auria for computational support; the Broad Stem Cell Research Center high-throughput sequencing facility and the Broad Stem Cell Research Center and Children's Hospital Los Angeles flow cytometry facilities; and the Center for AIDS Research Virology Core Lab (supported by the US National Institutes of Health (AI-28697), the UCLA AIDS Institute and the UCLA Council of Bioscience Resources) for reagents. Supported by St. Baldrick's Foundation (C.P.), the US National Institutes of Health (K12 HD052954 and K12 HD034610 to C.P.; P01 HL073104 to G.M.C.; T32HL066992 to C.S.S.; and T32 HL086345 to S.S.), the Nautica Foundation (C.P.), the Tower Cancer Research Foundation (C.P.), the Couples Against Leukemia Foundation (C.P.), the Joseph Drown Foundation (C.P.), UCLA Broad Stem Cell Research Center (G.M.C. and D.C.) and the California Institute for Regenerative Medicine (TG2-01169 for S.S.).
The authors declare no competing financial interests.
Integrated supplementary information
Fluorescence activated cell sorting was used to isolate populations from (a) CD34+ enriched bone marrow cells (events pre-gated on DAPI-Lin- cells are depicted, * lineage cocktail included an antibody to CD19 except in the case of the BCP population), and (b) thymic cells; either CD34+ enriched (events pre-gated on CD34+CD4-CD8- cells are depicted) or CD34neg cells (events pre-gated on DAPI- cells are depicted). (c) RNA-Seq expression data of protein coding genes previously described to be associated with HSC or lymphoid lineages (n=2 biological replicates shown for each population). (d) Coding potential of transcripts from the entire dataset, estimated using CPAT and PhyloCSF algorithms. Novel lncRNAs include the set of transcripts just prior to selection of predictions with low coding potential (Fig. 1b). Intergenic lncRNAs: transcripts with no genomic overlap with protein coding genes. Divergent lncRNAs: transcripts from lncRNA genes that have antisense overlap with a protein coding gene. Data for transcripts from protein coding and annotated lncRNA databases are depicted for comparison.
Supplementary Figure 2 Inter-replicate concordance, quantitative PCR–validation data, and transcript-size distribution and expression analyses.
a) Concordance between RNA-Seq expression data from biological replicates (#1 and #2). Pearson correlation coefficients (r) are depicted above each graph. b) qPCR validation for a subset of B lineage specific lncRNA genes identified by RNA-Seq: Gene expression measured by RNA-Seq (mean expression from two biological replicates per cell type) and qPCR (from an additional biological replicate separate to those used for RNA-Seq). Pearson correlation coefficients (r) between measurements from the two methods are depicted at left. Additional qPCR reactions for the same genes using one of the replicates used for RNA-seq showed similar results (not shown). (c) Size distribution of novel lncRNA transcripts. Data for transcripts from protein coding and annotated lncRNA databases are depicted for comparison. (d) Violin plot of transcript expression levels (mean, standard deviation, range). For each transcript, the maximum expression value (among 20 samples) was used for the plot. Class 1- transcripts from annotated lncRNA genes that lack novel isoforms; class 2- annotated transcripts from lncRNA genes that have both novel and annotated isoforms; class 3- novel transcripts from annotated lncRNA genes; class 4- transcripts from novel lncRNA genes. Expression levels were higher for novel lncRNA transcripts than for annotated lncRNA transcripts.
Supplementary Figure 3 Mobilized peripheral blood CD34+ cells, CD19+ primary B lymphocytes, and whole thymic tissue show close transcriptional proximity to HSCs, BCPs, and Thy4 cells, respectively.
Sample clustering analyses based on expression of (a) HSC specific, (b) BCP specific, and (c) Thy4 specific lncRNA genes are depicted. Cell type specific transcripts were defined as transcripts that showed peak expression in that cell type, and the peak value exceeds twice the mean of the expression levels in all the other 9 cell types in our dataset. Publicly available RNA-Seq data for the following hematopoietic populations from the NIH Roadmap Epigenomics Mapping Consortium (Supplementary Table 3) were included: primary CD19+ B cells, CD3+ T cells, CD4+ T cells, and CD8+ T cells, mobilized peripheral blood (MPB) CD34+ cells, and thymus.
Supplementary Figure 4 lncRNA genes expressed differentially during lymphoid differentiation show genomic locational associations with protein-coding genes whose products have hematopoietic and immunological functions.
Functional categories of protein coding gene sets whose lncRNA gene neighbors are significantly enriched among (a) HSPC lncRNA genes, (b) B lineage lncRNA genes, and (c) T lineage lncRNA genes (Groups I,II and III in Figure 3c). Genomic Regions Enrichment of Annotation Tool (GREAT) analysis significance scores (–log10 hypergeometric p value) for enrichment (compared to a background set that consisted of all lncRNAs [listed in Supplementary Table 1]) are depicted. Full names and database details for functional categories are listed in Supplementary Table 8.
Supplementary Figure 5 Sample clustering tree is not affected by the inclusion of other tissue samples or changes in gene-selection criteria.
(a) Sample clustering tree based on expression of differentially expressed genes (protein coding and lncRNA genes differentially expressed in at least one pairwise comparison between the 10 cell types in the dataset [each with 2 replicates, fold change >2, false discovery rate<5%]). The tree was obtained from randomly sub-sampled data to control for differences in sequencing depth. Publicly available RNA-Seq data (Supplementary Table 3) included: the human body map project (adrenal, lymph node, lung, colon, prostate, ovary, adipose, breast, kidney, thyroid, heart, testis, brain, liver, muscle, peripheral blood leucocytes, and a mix of these tissues), primary CD19+ B cells, CD3+ T cells, CD4+ T cells, and CD8+ T cells, four replicates of mobilized peripheral blood CD34+ cells (MPB CD34+ (1), (2), (3) and (4)), and bone marrow CD34+ cells (BM CD34+) (all labeled in grey). (b) Sample clustering trees based on expression of genes that are differentially expressed in at least one pairwise comparison between the 10 cell types, using a range of fold change and false discovery rate (FDR) criteria for selection of differentially expressed genes. Number of differentially expressed genes (N) identified for each criteria combination is depicted above the corresponding tree. Two biological replicates for each of the 10 cell types are depicted. H: HSC; L: LMPP; C: CLP; B: BCP; T1: Thy1; T2: Thy2; T3: Thy3; T4: Thy4; T5: Thy5; T6: Thy6 (all labeled in color).
Supplementary Figure 6 GO pathway analysis of protein-coding genes assigned to Bayesian polytomous models (Fig. 5a).
Models 1, 2, 3 and 4 are shown at top for the depicted combinations of cell types. Heatmaps depict enrichment significance scores (-log10 (adjusted p value)) for GO pathways (compared to a background set that consisted of the whole genome). Pathway analysis was done using WebGestalt.
Supplementary Figures 1–6 and Supplementary Methods (PDF 1897 kb)
GTF file for the merged human genome lncRNA annotation used in "LncRNA profiling of human lymphoid progenitors reveals transcriptional divergence of B and T lineages" (TXT 16775 kb)
Gene expression statistics for individual cell types (XLSX 40 kb)
Public sequencing datasets used in this study (XLSX 50 kb)
Histone modification signal in 1kb windows around Transcription Start Sites (XLSX 2157 kb)
Pairwise gene expression correlation analysis. (XLSX 55 kb)
Expression estimates for differentially expressed genes. Model-based clustering analysis (XLSX 785 kb)
Results for GREAT analysis (XLSX 864 kb)
Protein coding gene sets whose lncRNA neigbors were enriched among one of the following groups of lncRNA genes: differentially expressed lncRNA genes, HSPC lncRNA genes, B lncRNA genes, or thymic lncRNA genes. (XLSX 49 kb)
Results for Bayesian polytomous model selection analysis (XLSX 96 kb)
Coding-non coding associations based on Weighted Gene Coexpression Analysis (WGCNA). (XLSX 1515 kb)
Custom primers used for qPCR (XLSX 36 kb)
About this article
Cite this article
Casero, D., Sandoval, S., Seet, C. et al. Long non-coding RNA profiling of human lymphoid progenitor cells reveals transcriptional divergence of B cell and T cell lineages. Nat Immunol 16, 1282–1291 (2015) doi:10.1038/ni.3299
Lymphoid-Biased Hematopoietic Stem Cells Are Maintained with Age and Efficiently Generate Lymphoid Progeny
Stem Cell Reports (2019)
Long non-coding RNAs discriminate the stages and gene regulatory states of human humoral immune response
Nature Communications (2019)
Transcriptome Analysis Reveals the Effect of Long Intergenic Noncoding RNAs on Pig Muscle Growth and Fat Deposition
BioMed Research International (2019)
Synthetic modeling reveals HOXB genes are critical for the initiation and maintenance of human leukemia
Nature Communications (2019)
Cell Stem Cell (2019)