Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Cell-type-aware analysis of RNA-seq data

A preprint version of the article is available at bioRxiv.

Abstract

Most tissue samples are composed of different cell types. Differential expression analysis without accounting for cell-type composition cannot separate the changes due to cell-type composition or cell type-specific expression. We propose a computational framework to address these limitations: CARseq (cell-type-aware analysis of RNA-seq). CARseq employs a negative binomial distribution that appropriately models the count data from RNA-seq experiments. Simulation studies show that CARseq has substantially higher power than a linear model-based approach and it also provides more accurate estimate of the rankings of differentially expressed genes. We have applied CARseq to compare gene expression of schizophrenia/autism subjects versus controls, and identified the cell types underlying the difference and similarities of these two neuron-developmental diseases. Our results are consistent with the results from differential expression analysis using single-cell RNA-seq data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Illustration of CT-specific expression.
Fig. 2: Simulation results.
Fig. 3: Reproducibility of effect size estimation.
Fig. 4: CT-specific DE results for SCZ versus controls.
Fig. 5: Microglia-specific DE signals.

Similar content being viewed by others

Data availability

The data used in this study are available in the following public repositories.

snRNA-seq data for CT-specific expression reference were generated by the Allen Institute for Brain Science. File ‘http://human_mtg_gene_expression_matrices_2018-06-14.zip/’ was downloaded from http://celltypes.brain-map.org/api/v2/well_known_file_download/694416044.

The gene expression and clinical data of schizophrenia patients and healthy controls were generated by Common Mind Consortium (CMC) and the relevant data were obtained from the following links. Data access is governed by the NIMH Repository and Genomics Resources. CMC gene expression data, https://www.synapse.org/#!Synapse:syn3346749; CMC gene expression meta data, https://www.synapse.org/#!Synapse:syn18103174; CMC clinical data, https://www.synapse.org/#!Synapse:syn3275213.

The gene expression and clinical data of autism patients and healthy controls were part of The PsychENCODE (PEC) Capstone Collection https://www.synapse.org/#!Synapse:syn12080241 and the relevant data were obtained from the following links. Data access is governed by the NIMH Repository and Genomics Resources. UCLA-ASD gene expression data, https://www.synapse.org/#!Synapse:syn8365527; UCLA-ASD gene expression meta data, https://www.synapse.org/#!Synapse:syn5602933; UCLA-ASD clinical data, https://www.synapse.org/#!Synapse:syn5602932. The list of SFARI ASD risk genes was downloaded from https://gene.sfari.org/database/human-gene/.

Source Data for Figures 25 are available with this manuscript.

Code availability

The codes for generating the CT-specific gene expression reference panel are included in GitHub repository scRNAseq_pipelines (https://github.com/Sun-lab/scRNAseq_pipelines). We analyzed three scRNA-seq datasets: MTG, dronc and psychENCODE, and the codes were saved in corresponding folders. The codes to compare different references and generate final references were saved in ‘_brain_cell_type’ folder. The codes for CARseq analyses (including simulation and analyses of SCZ and ASD datasets) were included in GitHub repository CARseq_pipelines (https://github.com/Sun-lab/CARseq_pipelines). The file ‘reproducible_figures.html’ has the code to generate most figures in this paper. The R package CARseq was deposited at GitHub repository CARseq (https://github.com/Sun-lab/CARseq). All codes were also deposited in a Zendo repository36.

References

  1. Nowakowski, T. J. et al. Expression analysis highlights AXL as a candidate Zika virus entry receptor in neural stem cells. Cell Stem Cell 18, 591–596 (2016).

    Article  Google Scholar 

  2. Zhang, T. et al. Cell-type specific eQTL of primary melanocytes facilitates identification of melanoma susceptibility genes. Genome Res. 28, 1621–1635 (2018).

    Article  Google Scholar 

  3. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  Google Scholar 

  4. Leng, N. et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035–1043 (2013).

    Article  Google Scholar 

  5. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  Google Scholar 

  6. Shen-Orr, S. S. et al. Cell type-specific gene expression differences in complex tissues. Nat. Methods 7, 287–289 (2010).

    Article  Google Scholar 

  7. Li, Z., Wu, Z., Jin, P. & Wu, H. Dissecting differential signals in high-throughput data from complex tissues. Bioinformatics 35, 3898–3905 (2019).

    Article  Google Scholar 

  8. Zheng, S. C., Breeze, C. E., Beck, S. & Teschendorff, A. E. Identification of differentially methylated cell types in epigenome-wide association studies. Nat. Methods 15, 1059–1066 (2018).

    Article  Google Scholar 

  9. Luo, X., Yang, C. & Wei, Y. Detection of cell-type-specific risk-CpG sites in epigenome-wide association studies. Nat. Commun. 10, 3113 (2019).

    Article  Google Scholar 

  10. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).

  11. Wilson, D. R., Jin, C., Ibrahim, J. G. & Sun, W. ICeD-T provides accurate estimates of immune cell abundance in tumor samples by allowing for aberrant gene expression patterns. J. Am. Stat. Assoc. 115, 1055–1065 (2019).

    Article  MathSciNet  Google Scholar 

  12. Zhong, Y. & Liu, Z. Gene expression deconvolution in linear space. Nat. Methods 9, 8–9 (2012).

    Article  Google Scholar 

  13. Cattane, N., Richetto, J. & Cattaneo, A. Prenatal exposure to environmental insults and enhanced risk of developing schizophrenia and autism spectrum disorder: focus on biological pathways and epigenetic mechanisms. Neurosci. Biobehav. Rev. 117, 253–278 (2018).

  14. Anttila, V. et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).

    Article  Google Scholar 

  15. Prata, J., Santos, S. G., Almeida, M. I., Coelho, R. & Barbosa, M. A. Bridging autism spectrum disorders and schizophrenia through inflammation and biomarkers-pre-clinical and clinical investigations. J. Neuroinflammation 14, 179 (2017).

    Article  Google Scholar 

  16. Jardri, R. et al. Are hallucinations due to an imbalance between excitatory and inhibitory influences on the brain? Schizophrenia Bull. 42, 1124–1134 (2016).

    Article  Google Scholar 

  17. Aitchison, J. & Egozcue, J. J. Compositional data analysis: where are we and where should we be heading? Math. Geol. 37, 829–850 (2005).

    Article  MathSciNet  Google Scholar 

  18. Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).

    Article  Google Scholar 

  19. Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442–1453 (2016).

    Article  Google Scholar 

  20. Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    Article  MathSciNet  Google Scholar 

  21. Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    Article  Google Scholar 

  22. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    Article  Google Scholar 

  23. Lin, M. et al. Heat shock alters the expression of schizophrenia and autism candidate genes in an induced pluripotent stem cell model of the human telencephalon. PLoS ONE 9, e94968 (2014).

    Article  Google Scholar 

  24. Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540, 423–427 (2016).

    Article  Google Scholar 

  25. Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).

    Article  Google Scholar 

  26. Petrelli, F., Pucci, L. & Bezzi, P. Astrocytes and microglia and their potential link with autism spectrum disorders. Front. Cell. Neurosci. 10, 21 (2016).

    Article  Google Scholar 

  27. Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584 (2020).

    Article  Google Scholar 

  28. Velmeshev, D. et al. Single-cell genomics identifies cell type-specific molecular changes in autism. Science 364, 685–689 (2019).

    Article  Google Scholar 

  29. Raymond, L. J., Deth, R. C. & Ralston, N. V. Potential role of selenoenzymes and antioxidant metabolism in relation to autism etiology and pathology. Autism Res. Treat. 2014, 164938 (2014).

    Google Scholar 

  30. Greenhalgh, A. D., David, S. & Bennett, F. C. Immune cell regulation of glia during cns injury and disease. Nat. Rev. Neurosci. 21, 139–152 (2020).

  31. Regev, A. et al. Science forum: The Human Cell Atlas. eLife 6, e27041 (2017).

    Article  Google Scholar 

  32. Kehrer, C., Maziashvili, N., Dugladze, T. & Gloveli, T. Altered excitatory-inhibitory balance in the NMDA-hypofunction model of schizophrenia. Front. Mol. Neurosci. 1, 6 (2008).

    Article  Google Scholar 

  33. Ajram, L. et al. Shifting brain inhibitory balance and connectivity of the prefrontal cortex of adults with autism spectrum disorder. Transl. Psychiatry 7, e1137 (2017).

    Article  Google Scholar 

  34. Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at BioRxiv https://doi.org/10.1101/060012 (2021).

  35. Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 11, 1–12 (2010).

    Article  Google Scholar 

  36. Jin, C., Chen, M., Lin, D. Y. & Sun, W. CARseq (2021); https://doi.org/10.5281/zenodo.4592636

Download references

Acknowledgements

We acknowledge the following grants: NIH R01GM105785 to W.S. and C.J., NIH R21CA224026 to W.S., NIH R01GM126550 to W.S., NIH R01HG009974 to D.-Y.L., NIH P01CA142538 to D.-Y.L., NIH R01GM126553 to M.C., NSF 2016307 to M.C. and a Sloan Foundation Fellowship to M.C. We also appreciate helpful discussions with P. Little.

Author information

Authors and Affiliations

Authors

Contributions

W.S. and C.J. conceived the approach. C.J. implemented the methods and performed analysis, with input from W.S., M.C. and D.-Y.L.; W.S. and C.J. wrote the paper, with input from M.C. and D.-Y.L.

Corresponding author

Correspondence to Wei Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature Computational Science thanks Ruibin Xi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Ananya Rastogi was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary notes, results, Tables 1–6 and Figs. 1–60.

Source data

Source Data Fig. 2

Source Data for Fig. 2.

Source Data Fig. 3

Source Data for Fig. 3.

Source Data Fig. 4

Source Data for Fig. 4.

Source Data Fig. 5

Source Data for Fig. 5.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, C., Chen, M., Lin, DY. et al. Cell-type-aware analysis of RNA-seq data. Nat Comput Sci 1, 253–261 (2021). https://doi.org/10.1038/s43588-021-00055-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00055-6

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing