Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Prediction of the cell-type-specific transcription of non-coding RNAs from genome sequences via machine learning

Abstract

Gene transcription is regulated through complex mechanisms involving non-coding RNAs (ncRNAs). As the transcription of ncRNAs, especially of enhancer RNAs, is often low and cell type specific, how the levels of RNA transcription depend on genotype remains largely unexplored. Here we report the development and utility of a machine-learning model (MENTR) that reliably links genome sequence and ncRNA expression at the cell type level. Effects on ncRNA transcription predicted by the model were concordant with estimates from published studies in a cell-type-dependent manner, regardless of allele frequency and genetic linkage. Among 41,223 variants from genome-wide association studies, the model identified 7,775 enhancer RNAs and 3,548 long ncRNAs causally associated with complex traits across 348 major human primary cells and tissues, such as rare variants plausibly altering the transcription of enhancer RNAs to influence the risks of Crohn’s disease and asthma. The model may aid the discovery of causal variants and the generation of testable hypotheses for biological mechanisms driving complex traits.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Strategy to find the effects of variants on ncRNA transcription.
Fig. 2: Cell-type-specific prediction of promoter-level and enhancer-level expression.
Fig. 3: Comparison between MENTR and Basenji.
Fig. 4: Accurate predictions of mutation effects on ncRNA expression in a cell-type-dependent manner.
Fig. 5: Predictions to explain and prioritize GWAS findings.
Fig. 6: Finding a candidate of a causal variant in Crohn’s disease by linking variants with transcribed enhancers in relevant cell types.
Fig. 7: Diagram of the interpretation of rare variants associated with complex traits.

Similar content being viewed by others

Data availability

The GWAS trait-associated ncRNA database is available via a user-friendly graphical user interface application (https://doi.org/10.5281/zenodo.5638259); basic-usage information is provided in the figure legend of Supplementary Fig. 11. Supplementary files for training MENTR ML models and the pre-trained MENTR ML models are available at https://github.com/koido/MENTR (https://github.com/koido/MENTR/wiki provides information on how to use them; large files are available at https://doi.org/10.5281/zenodo.5348471). Publicly available datasets that we used in the study are as follows: LCL CAGE transcriptomes22, https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5835; GWAS catalogue54 (r2019-07-12), https://www.ebi.ac.uk/gwas/downloads; 95% credible sets in 139 independent associated regions in IBD, supplementary data from ref. 31; representative TSS of CAGE peaks (promoters) in FANTOM5, https://fantom.gsc.riken.jp/5/datafiles/phase2.5/extra/CAGE_peaks/hg19.cage_peak_phase1and2combined_coord.bed.gz; inferred mid position of enhancers in FANTOM5, https://fantom.gsc.riken.jp/5/datafiles/phase2.5/extra/Enhancers/human_permissive_enhancers_phase_1_and_2.bed.gz; CAGE peak annotations in FANTOM5, https://fantom.gsc.riken.jp/5/datafiles/phase2.5/extra/CAGE_peaks_annotation; caQTL from LCL ATAC-seq, supplementary data from ref. 23; 5,376 puQTL and 110 eaQTL from LCL CAGE, datasets from ref. 22 via personal communication; eQTL GTEx v7 (ref. 55), https://www.gtexportal.org/home/; CAGE and NET-CAGE transcriptome of the five ENCODE cell lines (GM12878, HeLa-S3, HepG2, K562, and MCF-7) (ref. 14), https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118075; DeepSEA (Beluga)15,17, https://github.com/FunctionLab/ExPecto; Basenji18, https://github.com/calico/basenji/tree/0.4; IBD multi-ancestry meta GWAS results56,57, https://www.ibdgenetics.org; asthma multi-ancestry GWAS results32, http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST005001-GCST006000/GCST005212; results of MPRA: supplementary data in refs. 24,25,26; non-coding credible sets in UK Biobank, supplementary data from ref. 39 and https://www.finucanelab.org/data (release 1.1); dbSNP 151, https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/All_20180423.vcf.gz.

Code availability

The pre-trained MENTR ML models (347 sample ontologies and LCL), and the source code for training MENTR ML models and for running in silico mutagenesis are available at https://github.com/koido/MENTR. We also released the packaged docker image in Docker Hub (https://hub.docker.com/repository/docker/mkoido/mentr). Custom codes for comparing MENTR to ExPecto and Basenji methods are available at https://doi.org/10.5281/zenodo.7008214.

References

  1. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  CAS  PubMed  Google Scholar 

  3. Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kristjánsdóttir, K. et al. Population-scale study of eRNA transcription reveals bipartite functional enhancer architecture. Nat. Commun. 11, 5963 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Boyd, M. et al. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nat. Commun. 9, 1661 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. & Bergmann, S. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput. Biol. 12, 1–20 (2016).

    Article  Google Scholar 

  10. Iotchkova, V. et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat. Genet. 51, 343–353 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Terao, C. et al. Ustekinumab as a therapeutic option for Takayasu arteritis: from genetic findings to clinical application. Scand. J. Rheumatol. 45, 80–82 (2016).

    Article  CAS  PubMed  Google Scholar 

  12. Ardlie, K. G. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  Google Scholar 

  13. Ishigaki, K. et al. Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat. Genet. 49, 1120–1125 (2017).

    Article  CAS  PubMed  Google Scholar 

  14. Hirabayashi, S. et al. NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat. Genet. 51, 1369–1379 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hoffman, G. E., Bendl, J., Girdhar, K., Schadt, E. E. & Roussos, P. Functional interpretation of genetic variants using deep learning predicts impact on chromatin accessibility and histone modification. Nucleic Acids Res. 47, 10597–10611 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. https://doi.org/10.1038/s41588-018-0160-6 (2018).

  18. Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (General Chairs: Krishnapuram, B. & Shah, M.; Program Chairs: Smola, A., Aggarwal, C., Shen, D., & Rastogi, R.) vols 13-17-August-2016 785–794 (Association for Computing Machinery, 2016).

  20. Bühlmann, P. Boosting for high-dimensional linear models. Ann. Stat. 34, 559–583 (2006).

    Article  Google Scholar 

  21. SM, L. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  Google Scholar 

  22. Garieri, M. et al. The effect of genetic variation on promoter usage and enhancer activity. Nat. Commun. 8, 1358 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Kumasaka, N., Knights, A. J. & Gaffney, D. J. High-resolution genetic mapping of putative causal interactions between regions of open chromatin. Nat. Genet. 51, 128–137 (2019).

    Article  CAS  PubMed  Google Scholar 

  24. Mattioli, K. et al. High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue specificity. Genome Res. 29, 344–355 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. van Arensbergen, J. et al. High-throughput identification of human SNPs affecting regulatory element activity. Nat. Genet. 51, 1160–1169 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Akiyama, M. et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 49, 1458–1467 (2017).

    Article  CAS  PubMed  Google Scholar 

  28. Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Ishigaki, K. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. https://doi.org/10.1038/s41588-020-0640-3 (2020).

  30. Taft, R. J., Pang, K. C., Mercer, T. R., Dinger, M. & Mattick, J. S. Non-coding RNAs: regulators of disease. J. Pathol. 220, 126–139 (2010).

    Article  CAS  PubMed  Google Scholar 

  31. Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Demenais, F. et al. Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks. Nat. Genet. 50, 42–50 (2018).

    Article  CAS  PubMed  Google Scholar 

  33. Turner, A. W. et al. Functional analysis of a novel genome-wide association study signal in SMAD3 that confers protection from coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 36, 972–983 (2016).

    Article  CAS  PubMed  Google Scholar 

  34. Wéra, O., Lancellotti, P. & Oury, C. The dual role of neutrophils in inflammatory bowel diseases. J. Clin. Med. 5, 118 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Fahy, J. V. Eosinophilic and neutrophilic inflammation in asthma insights from clinical studies. Proc. Am. Thorac. Soc. 6, 256–259 (2009).

    Article  CAS  PubMed  Google Scholar 

  36. Yadav, P. K., Chen, C. & Liu, Z. Potential role of NK cells in the pathogenesis of inflammatory bowel disease. J. Biomed. Biotechnol. 2011, 348530 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Farh, K. K. H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

    Article  CAS  PubMed  Google Scholar 

  38. Miller, C. L. et al. Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci. Nat. Commun. 7, 12092 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 17, 1–6 (2021).

    Google Scholar 

  40. MM, P. et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41, 882–884 (2009).

    Article  Google Scholar 

  41. S, T. et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat. Genet. 41, 885–890 (2009).

    Article  Google Scholar 

  42. Morris, J. A. et al. An atlas of genetic influences on osteoporosis in humans and mice. Nat. Genet. 51, 258–266 (2019).

    Article  CAS  PubMed  Google Scholar 

  43. Hait, T. A., Amar, D., Shamir, R. & Elkon, R. FOCS: a novel method for analyzing enhancer and gene activity patterns infers an extensive enhancer–promoter map. Genome Biol. 19, 56 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Chen, J. et al. WNT7B promotes bone formation in part through mTORC1. PLoS Genet. 10, e1004145 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Song, D. et al. Inducible expression of Wnt7b promotes bone formation in aged mice and enhances fracture healing. Bone Res. 8, 4 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Busse, W. W. et al. Daclizumab improves asthma control in patients with moderate to severe persistent asthma: a randomized, controlled trial. Am. J. Respir. Crit. Care Med. 178, 1002–1008 (2008).

    Article  CAS  PubMed  Google Scholar 

  47. Tanaka, N. et al. Eight novel susceptibility loci and putative causal variants in atopic dermatitis. J. Allergy Clin. Immunol. 148, 1293–1306 (2021).

    Article  CAS  PubMed  Google Scholar 

  48. Hikino, K. et al. Susceptibility loci and polygenic architecture highlight population specific and common genetic features in inguinal hernias: genetics in inguinal hernias. eBioMedicine 70, 103532 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Murakawa, Y. et al. Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends Genet. 32, 76–88 (2016).

    Article  CAS  PubMed  Google Scholar 

  51. Saunders, G. et al. Leveraging European infrastructures to access 1 million human genomes by 2022. Nat. Rev. Genet. 20, 693–701 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Niculescu-Mizil, A. & Caruana, R. A. Obtaining calibrated probabilities from boosting. Preprint at arXiv:1207.1403 (2012).

  53. Niculescu-Mizil, A. & Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning–ICML (General Chair: Dzeroski, S.; Program Chairs: Raedt, L. D. & Wrobeleds, S.) 625–632 (ACM Press, 2005).

  54. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  56. Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank FANTOM consortium members for providing datasets and valuable discussions. Computational resources of AI Bridging Cloud Infrastructure (ABCI) provided by the National Institute of Advanced Industrial Science and Technology (AIST) were used for in silico mutagenesis. This work was supported in part by JSPS KAKENHI (grant number 20K15773, to M.K.), JP20H00462, the JCR Grant for Promoting Basic Rheumatology, and AMED (under grant numbers JP21kk0305013, JP21tm0424220 and JP21ck0106642, to C.T.).

Author information

Authors and Affiliations

Authors

Contributions

M.K., C.-C.H., Y.K. and C.T. conceived the study. M.K. conducted analysis with the help of C.-C.H., S.K., K. Ishigaki, K. Ito and P.C. C.-C.H. analysed CAGE transcriptome data. H.K. and Y.M. analysed NET-CAGE transcriptome data. M.K. and C.T. wrote the manuscript, and N.F.P. provided critical comments and valuable edits. J.S. contributed to providing graphics processing unit computational resources necessary for the study. P.C. and C.T. supervised the study.

Corresponding author

Correspondence to Chikashi Terao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Colin Campbell, William Ritchie and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Details about MENTR ML.

(a) Comparisons between required datasets for MENTR ML models and conventional eQTL study. In MENTR ML models, CAGE transcriptome data is only required. Existing large-scale CAGE transcriptome, such as FANTOM5 datasets, can be used. In eQTL study, transcriptome data for the tissue and genotypes from the same individuals are required for estimating mutation effects for each transcript (βeQTL) in a tissue. (b) Workflow of MENTR ML training and evaluation. See the details in the Methods section.

Extended Data Fig. 2 Accurate prediction of ncRNA expression by combining MENTR ML models with CAGE transcriptome.

(a) Schematic of comparison of accuracies among MENTR, MENTRlinear, and ExPecto methods. (b) Prediction accuracies of ExPecto methods17 on GTEx RNA sequence datasets (re-analysis of predictive accuracies among 218 types of tissues) and FANTOM5 CAGE transcriptome datasets (347 sample ontologies). The box plots (N = 218 for GTEx and N = 347 for FANTOM5) show the first and third quartiles, the centerline represented the median, the upper whisker extended from the hinge to the highest value that is within 1.5 × IQR (inter-quartile range) of the hinge, the lower whisker extended from the hinge to the lowest value within 1.5 × IQR of the hinge and the data beyond the end of the whiskers were plotted as points. (c) Prediction accuracies of the indicated methods (x-axis) on lncRNAs, and mRNAs in the FANTOM5 CAGE transcriptome datasets (n = 347). Spearman’s ρ and AUROC values were compared by violin plot and the mean values were shown by dot. P-values were calculated by two-sided Wilcoxon signed rank test. (d) Comparison of Spearman’s ρ for each model (n = 347) trained by MENTR or ExPecto method.

Extended Data Fig. 3 Prediction of lowly expressed, cell-type specific enhancers by MENTR.

(a) Mean expression levels for each FANTOM5 sample ontologies (n = 347) in testing dataset. (b) Distribution of Shannon entropy-based cell-type specificity score14 in testing dataset of the sample ontologies. The 0 score means ubiquitous expression and 1 score means cell-type specific expression. (c) Prediction accuracies of MENTR models for enhancer RNAs, lncRNAs, and mRNAs, stratified by cell-type specificity scores. (d) Prediction accuracies of MENTR models for enhancer RNAs, lncRNAs, and mRNAs, stratified by quartile of expression levels. The quartile bins were determined from all types of transcript for each sample ontology. The mean values were shown by dot in violin plot in (a, c, and d).

Extended Data Fig. 4 Differences of prediction accuracies between MENTR and MENTRreg.

Comparison of Spearman’s ρ values between MENTR and MENTRreg, using scatter plot (upper) and histogram for the differences (lower) for each transcript type. P-values were calculated by two-sided Wilcoxon signed rank test.

Extended Data Fig. 5 Evaluation of prediction accuracies for CAGE and NET-CAGE transcript.

X-axis, transcript type; y-axis, AUROC. Train represented which data sets were used for training, which was shown by the plot color. Evaluation and Train pairs represented which type of CAGE datasets were evaluated by the indicated model. (a) Evaluation of accuracy for CAGE transcripts predicted by MENTR ML models trained on CAGE transcripts, and that for NET-CAGE transcripts predicted by MENTR ML models trained on NET-CAGE transcripts. (b) Evaluation of accuracy for CAGE transcripts predicted by MENTR ML models trained on CAGE transcripts and NET-CAGE transcripts. (c) Evaluation of accuracy for NET-CAGE transcripts predicted by MENTR ML models trained on CAGE transcripts and NET-CAGE transcripts.

Extended Data Fig. 6 Feature importance of MENTR.

(a) Comparison of quantile-normalized SHAP values of input features from lncRNA or enhancer RNA (y-axis) and mRNA (x-axis). We compared them for assay type (upper boxes; DNase, Histone, and TF) and aggregation weight in MENTR (right boxes; 0.01, 0.02, 0.05, 0.1 and 0.2; see Methods). (b) Top 5 distinct features which were more important in enhancer predictions (vs. mRNA; weight = 0.01) from TF features.

Extended Data Fig. 7 Schematic comparison of accuracies between MENTR and Basenji.

Basenji pre-trained models were obtained from ref. 18.

Extended Data Fig. 8 Cell-type dependencies for Spearman’s ρ between MENTR mutation effects and βQTL.

Y-axis, Spearman’s ρ between MENTR mutation effects and effect size from the QTL studies (βQTL; see Fig. 4) for each model (each dot represented one of the FANTOM5 347 sample ontologies and LCL and the dashed line indicated the ρ value for LCL); x-axis and color key, transcriptome correlation between with LCL CAGE transcriptome and each of FANTOM5 347 CAGE transcriptome. We excluded variants with 0 mutation effect from this analysis. P-values were calculated by two-sided Spearman’s rank correlation test. p < 2.2×10−16 indicated that the p-value was lower than the default machine epsilon value (2.2×10−16).

Extended Data Fig. 9 Gene-level verification of MENTR in silico mutation effects for various types of tissues.

(a) Workflow of calculating gene-level mutation effects (Δy). Δy values were calculated from promoter-level mutation effects (Δyp) after filtered by the baseline, permissive, and robust threshold. (b, c) Concordance rate (y-axis) of directions of the Δy and effect size of eQTL from GTEx v7 at the indicated threshold of absolute Δy. Variants within + /− 1 kb in autosome and chromosome X were tested in (b), and variants within + /− 100 kb in chromosome 8 were tested in (c). The concordance rates of 26 tissues (Supplementary Table 8) were shown by violin plot and the mean values were shown as dot. P-values were calculated by two-sided Wilcoxon signed rank test.

Extended Data Fig. 10 PIP distribution of ten UK biobank complex diseases, stratified by MENTR predictions.

Distribution of PIP (posterior inclusion probability) for 10 complex diseases, stratified by MENTR robust prediction, permissive prediction (but not robust prediction; shown as “only permissive”) or others. The definition of robust and permissive was written in Result section. Fibroblastic_Disorders, Fibroblastic disorders; CRC, Colorectal cancer; Glaucoma_Combined, Glaucoma (Phecode + Self-reported); PrC, Prostate cancer; BrC, Breast cancer; AID_Combined, Autoimmune disease (Phecode + Self-reported). These abbreviations and non-coding credible sets (see Methods) were obtained from Supplementary Table 7 in Nasser et al.39, and their PIP values were obtained from https://www.finucanelab.org/data (release 1.1). The box plots show the first and third quartiles, the center line represented the median, the upper whisker extended from the hinge to the highest value that is within 1.5 × IQR (inter-quartile range) of the hinge, the lower whisker extended from the hinge to the lowest value within 1.5 × IQR of the hinge, and the data beyond the end of the whiskers were plotted as points. See N of each group in Supplementary Table 24.

Supplementary information

Main Supplementary Information

Supplementary figures, discussion and references.

Reporting Summary

Peer Review File

Supplementary Datasets

Supplementary Tables 1–24.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koido, M., Hon, CC., Koyama, S. et al. Prediction of the cell-type-specific transcription of non-coding RNAs from genome sequences via machine learning. Nat. Biomed. Eng 7, 830–844 (2023). https://doi.org/10.1038/s41551-022-00961-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41551-022-00961-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing