Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Long non-coding RNA profiling of human lymphoid progenitor cells reveals transcriptional divergence of B cell and T cell lineages


To elucidate the transcriptional 'landscape' that regulates human lymphoid commitment during postnatal life, we used RNA sequencing to assemble the long non-coding transcriptome across human bone marrow and thymic progenitor cells spanning the earliest stages of B lymphoid and T lymphoid specification. Over 3,000 genes encoding previously unknown long non-coding RNAs (lncRNAs) were revealed through the analysis of these rare populations. Lymphoid commitment was characterized by lncRNA expression patterns that were highly stage specific and were more lineage specific than those of protein-coding genes. Protein-coding genes co-expressed with neighboring lncRNA genes showed enrichment for ontologies related to lymphoid differentiation. The exquisite cell-type specificity of global lncRNA expression patterns independently revealed new developmental relationships among the earliest progenitor cells in the human bone marrow and thymus.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Human HSC and lymphoid progenitor cell transcriptomes are characterized by previously undescribed lncRNAs.
Figure 2: TSSs of lncRNAs show cell type–specific active chromatin profiles.
Figure 3: lncRNA genes are co-expressed with protein-coding genes whose products are involved in hematopoiesis and immunological function, during lymphoid differentiation.
Figure 4: Lymphoid commitment and differentiation are characterized by stage- and lineage-specific global expression patterns of lncRNAs.
Figure 5: Expression of lncRNA genes defines developmental relationships among progenitor cells in BM and thymus before complete lineage commitment, independently of the expression of protein-coding genes.
Figure 6: Identification of lineage- or differentiation stage–specific lncRNA gene–protein-coding gene co-expression modules.

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus


  1. Fatica, A. & Bozzoni, I. Long non-coding RNAs: new players in cell differentiation and development. Nat. Rev. Genet. 15, 7–21 (2014).

    CAS  Article  Google Scholar 

  2. Galy, A., Travis, M., Cen, D., Chen, B. & Human, T. B, natural killer, and dendritic cells arise from a common bone marrow progenitor cell subset. Immunity 3, 459–473 (1995).

    CAS  Article  Google Scholar 

  3. Plum, J. et al. Human intrathymic development: a selective approach. Semin. Immunopathol. 30, 411–423 (2008).

    CAS  Article  Google Scholar 

  4. Doulatov, S. et al. Revised map of the human progenitor hierarchy shows the origin of macrophages and dendritic cells in early lymphoid development. Nat. Immunol. 11, 585–593 (2010).

    CAS  Article  Google Scholar 

  5. Six, E.M. et al. A human postnatal lymphoid progenitor capable of circulating and seeding the thymus. J. Exp. Med. 204, 3085–3093 (2007).

    CAS  Article  Google Scholar 

  6. Kohn, L.A. et al. Lymphoid priming in human bone marrow begins before expression of CD10 with upregulation of L-selectin. Nat. Immunol. 13, 963–971 (2012).

    CAS  Article  Google Scholar 

  7. Blom, B. & Spits, H. Development of human lymphoid cells. Annu. Rev. Immunol. 24, 287–320 (2006).

    CAS  Article  Google Scholar 

  8. Hao, Q.L. et al. Human intrathymic lineage commitment is marked by differential CD7 expression: identification of CD7-lympho-myeloid thymic progenitors. Blood 111, 1318–1326 (2008).

    CAS  Article  Google Scholar 

  9. Weerkamp, F. et al. Human thymus contains multipotent progenitors with T/B lymphoid, myeloid, and erythroid lineage potential. Blood 107, 3131–3137 (2006).

    CAS  Article  Google Scholar 

  10. Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).

    CAS  Article  Google Scholar 

  11. Dik, W.A. et al. New insights on human T cell development by quantitative T cell receptor gene rearrangement studies and gene expression profiling. J. Exp. Med. 201, 1715–1723 (2005).

    CAS  Article  Google Scholar 

  12. Tydell, C.C. et al. Molecular dissection of prethymic progenitor entry into the T lymphocyte developmental pathway. J. Immunol. 179, 421–438 (2007).

    CAS  Article  Google Scholar 

  13. Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

    CAS  Article  Google Scholar 

  14. Lin, M.F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

    CAS  Article  Google Scholar 

  15. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    CAS  Article  Google Scholar 

  16. Volders, P.J. et al. LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 41, D246–D251 (2013).

    CAS  Article  Google Scholar 

  17. Trimarchi, T. et al. Genome-wide mapping and characterization of Notch-regulated long noncoding RNAs in acute leukemia. Cell 158, 593–606 (2014).

    CAS  Article  Google Scholar 

  18. Marques, A.C. et al. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol. 14, R131 (2013).

    Article  Google Scholar 

  19. Ilott, N.E. et al. Long non-coding RNAs and enhancer RNAs regulate the lipopolysaccharide-induced inflammatory response in human monocytes. Nat. Commun. 5, 3979 (2014).

    CAS  Article  Google Scholar 

  20. Saeed, S. et al. Epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity. Science 345, 1251086 (2014).

    Article  Google Scholar 

  21. Chadwick, L.H. The NIH Roadmap Epigenomics Program data resource. Epigenomics 4, 317–324 (2012).

    CAS  Article  Google Scholar 

  22. Sati, S., Ghosh, S., Jain, V., Scaria, V. & Sengupta, S. Genome-wide analysis reveals distinct patterns of epigenetic features in long non-coding RNA loci. Nucleic Acids Res. 40, 10018–10031 (2012).

    CAS  Article  Google Scholar 

  23. Si, Y., Liu, P., Li, P. & Brutnell, T.P. Model-based clustering for RNA-Seq data. Bioinformatics 30, 197–205 (2014).

    CAS  Article  Google Scholar 

  24. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    CAS  Article  Google Scholar 

  25. Chen, L. et al. Transcriptional diversity during lineage commitment of human blood progenitors. Science 345, 1251033 (2014).

    Article  Google Scholar 

  26. Rinn, J.L. & Chang, H.Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145–166 (2012).

    CAS  Article  Google Scholar 

  27. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  Google Scholar 

  28. Horvath, S. et al. Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc. Natl. Acad. Sci. USA 103, 17402–17407 (2006).

    CAS  Article  Google Scholar 

  29. Taghon, T. et al. Notch signaling is required for proliferation but not for differentiation at a well-defined beta-selection checkpoint during human T-cell development. Blood 113, 3254–3263 (2009).

    CAS  Article  Google Scholar 

  30. Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).

    CAS  Article  Google Scholar 

  31. Ranzani, V. et al. The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Nat. Immunol. 16, 318–325 (2015).

    CAS  Article  Google Scholar 

  32. Durinck, K. et al. The Notch driven long non-coding RNA repertoire in T-cell acute lymphoblastic leukemia. Hematologica. 99, 1808–1826 (2014).

    Article  Google Scholar 

  33. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  34. Roberts, A., Pimentel, H., Trapnell, C. & Pachter, L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27, 2325–2329 (2011).

    CAS  Article  Google Scholar 

  35. Anders, S., Pyl, P.T. & Huber, W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    CAS  Article  Google Scholar 

  36. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    CAS  Article  Google Scholar 

  37. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  Google Scholar 

  38. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS  Article  Google Scholar 

Download references


We thank F. Codrea, A. George and S. Feng for technical assistance; M. Pellegrini and R. D'Auria for computational support; the Broad Stem Cell Research Center high-throughput sequencing facility and the Broad Stem Cell Research Center and Children's Hospital Los Angeles flow cytometry facilities; and the Center for AIDS Research Virology Core Lab (supported by the US National Institutes of Health (AI-28697), the UCLA AIDS Institute and the UCLA Council of Bioscience Resources) for reagents. Supported by St. Baldrick's Foundation (C.P.), the US National Institutes of Health (K12 HD052954 and K12 HD034610 to C.P.; P01 HL073104 to G.M.C.; T32HL066992 to C.S.S.; and T32 HL086345 to S.S.), the Nautica Foundation (C.P.), the Tower Cancer Research Foundation (C.P.), the Couples Against Leukemia Foundation (C.P.), the Joseph Drown Foundation (C.P.), UCLA Broad Stem Cell Research Center (G.M.C. and D.C.) and the California Institute for Regenerative Medicine (TG2-01169 for S.S.).

Author information

Authors and Affiliations



D.C., conception and design, development of bioinformatics analysis pipeline, bioinformatics analysis and interpretation, and manuscript writing; S.S., conception and design, collection and assembly of data (performance of experiments), data analysis and interpretation; C.S.S., data analysis and interpretation; J.S., conception and design; Y.Z., collection and assembly of data (assistance in in performance of experiments); V.L.H. and A.L., collection and assembly of data (performance of experiments); C.P., conception and design, collection and assembly of data (performance of experiments), data analysis and interpretation, manuscript writing, and final approval of manuscript; G.M.C., conception and design, data analysis and interpretation, manuscript writing, and final approval of manuscript.

Corresponding authors

Correspondence to Chintan Parekh or Gay M Crooks.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 RNA-Seq analyses of HSC and lymphoid populations from human BM and thymus.

Fluorescence activated cell sorting was used to isolate populations from (a) CD34+ enriched bone marrow cells (events pre-gated on DAPI-Lin- cells are depicted, * lineage cocktail included an antibody to CD19 except in the case of the BCP population), and (b) thymic cells; either CD34+ enriched (events pre-gated on CD34+CD4-CD8- cells are depicted) or CD34neg cells (events pre-gated on DAPI- cells are depicted). (c) RNA-Seq expression data of protein coding genes previously described to be associated with HSC or lymphoid lineages (n=2 biological replicates shown for each population). (d) Coding potential of transcripts from the entire dataset, estimated using CPAT and PhyloCSF algorithms. Novel lncRNAs include the set of transcripts just prior to selection of predictions with low coding potential (Fig. 1b). Intergenic lncRNAs: transcripts with no genomic overlap with protein coding genes. Divergent lncRNAs: transcripts from lncRNA genes that have antisense overlap with a protein coding gene. Data for transcripts from protein coding and annotated lncRNA databases are depicted for comparison.

Supplementary Figure 2 Inter-replicate concordance, quantitative PCR–validation data, and transcript-size distribution and expression analyses.

a) Concordance between RNA-Seq expression data from biological replicates (#1 and #2). Pearson correlation coefficients (r) are depicted above each graph. b) qPCR validation for a subset of B lineage specific lncRNA genes identified by RNA-Seq: Gene expression measured by RNA-Seq (mean expression from two biological replicates per cell type) and qPCR (from an additional biological replicate separate to those used for RNA-Seq). Pearson correlation coefficients (r) between measurements from the two methods are depicted at left. Additional qPCR reactions for the same genes using one of the replicates used for RNA-seq showed similar results (not shown). (c) Size distribution of novel lncRNA transcripts. Data for transcripts from protein coding and annotated lncRNA databases are depicted for comparison. (d) Violin plot of transcript expression levels (mean, standard deviation, range). For each transcript, the maximum expression value (among 20 samples) was used for the plot. Class 1- transcripts from annotated lncRNA genes that lack novel isoforms; class 2- annotated transcripts from lncRNA genes that have both novel and annotated isoforms; class 3- novel transcripts from annotated lncRNA genes; class 4- transcripts from novel lncRNA genes. Expression levels were higher for novel lncRNA transcripts than for annotated lncRNA transcripts.

Supplementary Figure 3 Mobilized peripheral blood CD34+ cells, CD19+ primary B lymphocytes, and whole thymic tissue show close transcriptional proximity to HSCs, BCPs, and Thy4 cells, respectively.

Sample clustering analyses based on expression of (a) HSC specific, (b) BCP specific, and (c) Thy4 specific lncRNA genes are depicted. Cell type specific transcripts were defined as transcripts that showed peak expression in that cell type, and the peak value exceeds twice the mean of the expression levels in all the other 9 cell types in our dataset. Publicly available RNA-Seq data for the following hematopoietic populations from the NIH Roadmap Epigenomics Mapping Consortium (Supplementary Table 3) were included: primary CD19+ B cells, CD3+ T cells, CD4+ T cells, and CD8+ T cells, mobilized peripheral blood (MPB) CD34+ cells, and thymus.

Supplementary Figure 4 lncRNA genes expressed differentially during lymphoid differentiation show genomic locational associations with protein-coding genes whose products have hematopoietic and immunological functions.

Functional categories of protein coding gene sets whose lncRNA gene neighbors are significantly enriched among (a) HSPC lncRNA genes, (b) B lineage lncRNA genes, and (c) T lineage lncRNA genes (Groups I,II and III in Figure 3c). Genomic Regions Enrichment of Annotation Tool (GREAT) analysis significance scores (–log10 hypergeometric p value) for enrichment (compared to a background set that consisted of all lncRNAs [listed in Supplementary Table 1]) are depicted. Full names and database details for functional categories are listed in Supplementary Table 8.

Supplementary Figure 5 Sample clustering tree is not affected by the inclusion of other tissue samples or changes in gene-selection criteria.

(a) Sample clustering tree based on expression of differentially expressed genes (protein coding and lncRNA genes differentially expressed in at least one pairwise comparison between the 10 cell types in the dataset [each with 2 replicates, fold change >2, false discovery rate<5%]). The tree was obtained from randomly sub-sampled data to control for differences in sequencing depth. Publicly available RNA-Seq data (Supplementary Table 3) included: the human body map project (adrenal, lymph node, lung, colon, prostate, ovary, adipose, breast, kidney, thyroid, heart, testis, brain, liver, muscle, peripheral blood leucocytes, and a mix of these tissues), primary CD19+ B cells, CD3+ T cells, CD4+ T cells, and CD8+ T cells, four replicates of mobilized peripheral blood CD34+ cells (MPB CD34+ (1), (2), (3) and (4)), and bone marrow CD34+ cells (BM CD34+) (all labeled in grey). (b) Sample clustering trees based on expression of genes that are differentially expressed in at least one pairwise comparison between the 10 cell types, using a range of fold change and false discovery rate (FDR) criteria for selection of differentially expressed genes. Number of differentially expressed genes (N) identified for each criteria combination is depicted above the corresponding tree. Two biological replicates for each of the 10 cell types are depicted. H: HSC; L: LMPP; C: CLP; B: BCP; T1: Thy1; T2: Thy2; T3: Thy3; T4: Thy4; T5: Thy5; T6: Thy6 (all labeled in color).

Supplementary Figure 6 GO pathway analysis of protein-coding genes assigned to Bayesian polytomous models (Fig. 5a).

Models 1, 2, 3 and 4 are shown at top for the depicted combinations of cell types. Heatmaps depict enrichment significance scores (-log10 (adjusted p value)) for GO pathways (compared to a background set that consisted of the whole genome). Pathway analysis was done using WebGestalt.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 and Supplementary Methods (PDF 1897 kb)

Supplementary Table 1

GTF file for the merged human genome lncRNA annotation used in "LncRNA profiling of human lymphoid progenitors reveals transcriptional divergence of B and T lineages" (TXT 16775 kb)

Supplementary Table 2

Gene expression statistics for individual cell types (XLSX 40 kb)

Supplementary Table 3

Public sequencing datasets used in this study (XLSX 50 kb)

Supplementary Table 4

Histone modification signal in 1kb windows around Transcription Start Sites (XLSX 2157 kb)

Supplementary Table 5

Pairwise gene expression correlation analysis. (XLSX 55 kb)

Supplementary Table 6

Expression estimates for differentially expressed genes. Model-based clustering analysis (XLSX 785 kb)

Supplementary Table 7

Results for GREAT analysis (XLSX 864 kb)

Supplementary Table 8

Protein coding gene sets whose lncRNA neigbors were enriched among one of the following groups of lncRNA genes: differentially expressed lncRNA genes, HSPC lncRNA genes, B lncRNA genes, or thymic lncRNA genes. (XLSX 49 kb)

Supplementary Table 9

Results for Bayesian polytomous model selection analysis (XLSX 96 kb)

Supplementary Table 10

Coding-non coding associations based on Weighted Gene Coexpression Analysis (WGCNA). (XLSX 1515 kb)

Supplementary Table 11

Custom primers used for qPCR (XLSX 36 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Casero, D., Sandoval, S., Seet, C. et al. Long non-coding RNA profiling of human lymphoid progenitor cells reveals transcriptional divergence of B cell and T cell lineages. Nat Immunol 16, 1282–1291 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing