Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Extrapolating heterogeneous time-series gene expression data using Sagittarius

A preprint version of the article is available at bioRxiv.

Abstract

Understanding the temporal dynamics of gene expression is crucial for developmental biology, tumour biology and biogerontology. However, some timepoints remain challenging to measure in the laboratory, particularly during very early or very late stages of a biological process. Here we propose Sagittarius, a transformer-based model that can accurately simulate gene expression profiles at timepoints outside the range of times measured in the laboratory. The key idea behind Sagittarius is to learn a shared reference space for time-series measurements, thereby explicitly modelling unaligned timepoints and conditional batch effects between time series, and making the model widely applicable to diverse biological settings. We show Sagittarius’s promising performance when extrapolating mammalian developmental gene expression, simulating drug-induced expression at unmeasured dose and treatment times, and augmenting datasets to accurately predict drug sensitivity. We also used Sagittarius to extrapolate mutation profiles for early-stage cancer patients, which enabled us to discover a gene set connected to the Hedgehog signalling pathway that may be related to tumorigenesis in sarcoma patients, including PTCH1, ARID2 and MYCBP2. By augmenting experimental temporal datasets with crucial but difficult-to-measure extrapolated datapoints, Sagittarius enables deeper insights into the temporal dynamics of heterogeneous transcriptomic processes and can be broadly applied to biological time-series extrapolation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Sagittarius model overview.
Fig. 2: Gene expression prediction for extrapolated timepoints in later-stage development.
Fig. 3: Mouse transcriptomic velocity across organs.
Fig. 4: Drug-induced gene expression extrapolation at unmeasured experimental combinations, doses and times.
Fig. 5: Drug and cell-line treatment efficacy extrapolation analysis.
Fig. 6: Early cancer patient mutation profile extrapolation.

Similar content being viewed by others

Data availability

The data necessary for reproducing the paper are available in the figshare repository at https://figshare.com/projects/Sagittarius/144771. We provide a more detailed note describing the datasets provided in the figshare repository. Namely, the main experiments can be run with the preprocessed data files provided, and we also include data for the cross-dataset analyses such as the application in drug repurposing. Finally, we provide large-scale simulated data from Sagittarius for the Evo-devo and TCGA datasets.

Code availability

A Python repository including the Sagittarius implementation and code to reproduce the results in this paper is available at https://github.com/addiewc/Sagittarius113, with additional details for reproducing results provided in the repository (https://doi.org/10.5281/zenodo.7879454)114. We ran experiments on Linux 8.7 with RTX A4000 GPU. We used Python 3.9.7, PyTorch115 1.9.1, anndata116 0.8.0, cudatoolkit 11.1.74, matplotlib117 3.4.3, NumPy118 1.21.2, pandas119 1.3.3, pip 21.3.1, pybiomart90 0.2.0, python-louvain95 0.15, Scanpy120 1.8.2, SciPy121 1.7.1, seaborn122 0.11.2, sklearn93 0.0, statsmodels123 0.13.0, tqdm124 4.62.3, umap-learn37 0.5.1, yaml 0.2.5, lifelines125 0.26.5, BioRender Student Plan and Adobe Illustrator 25.2.3.

References

  1. Gulati, G. S. et al. Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367, 405–411 (2020).

    Google Scholar 

  2. Arbeitman, M. N. et al. Gene expression during the life cycle of Drosophila melanogaster. Science 297, 2270–2275 (2002).

    Google Scholar 

  3. Zheng, L. et al. Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science 374, abe6474 (2021).

    Google Scholar 

  4. Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    Google Scholar 

  5. Lee, J. S. et al. Single-cell transcriptome of bronchoalveolar lavage fluid reveals sequential change of macrophages during SARS-CoV-2 infection in ferrets. Nat. Commun. 12, 4567 (2021).

    Google Scholar 

  6. Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563, 347–353 (2018).

    Google Scholar 

  7. Douglass, E. F. Jr et al. A community challenge for a pancancer drug mechanism of action inference from perturbational profile data. Cell Rep. Med. 3, 100492 (2022).

    Google Scholar 

  8. Kohonen, P. et al. A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury. Nat. Commun. 8, 15932 (2017).

    Google Scholar 

  9. Almogy, G. et al. Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform. Preprint at bioRxiv https://doi.org/10.1101/2022.05.29.493900 (2022).

  10. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    Google Scholar 

  11. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

    Google Scholar 

  12. Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505–509 (2019).

    Google Scholar 

  13. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    Google Scholar 

  14. Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).

    Google Scholar 

  15. Subramanian, A. et al. A next generation Connectivity Map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).

    Google Scholar 

  16. Tabula Muris Consortium. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).

    Google Scholar 

  17. Schaum, N. et al. Ageing hallmarks exhibit organ-specific temporal signatures. Nature 583, 596–602 (2020).

    Google Scholar 

  18. Wang, W. et al. Single-cell transcriptomic atlas of the human endometrium during the menstrual cycle. Nat. Med. 26, 1644–1653 (2020).

    Google Scholar 

  19. Sunkin, S. M. et al. Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res. 41, D996–D1008 (2013).

    Google Scholar 

  20. Radovic, A., He, J., Ramanan, J., Brubaker, M. A. & Lehrmann, A. M. Agent forecasting at flexible horizons using ODE flows. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (2021).

  21. Peng, G., Cui, G., Ke, J. & Jing, N. Using single-cell and spatial transcriptomes to understand stem cell lineage specification during early embryo development. Annu. Rev. Genomics Hum. Genet. 21, 163–181 (2020).

    Google Scholar 

  22. Haniffa, M. et al. A roadmap for the Human Developmental Cell Atlas. Nature 597, 196–205 (2021).

    Google Scholar 

  23. Sohn, K., Lee, H. & Yan, X. Learning Structured Output Representation using Deep Conditional Generative Models. in Advances in Neural Information Processing Systems (eds. Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R.) vol. 28 3483–3491 (Curran Associates, Inc., 2015).

  24. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. e11517 (2023).

  25. Cho, K. et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds. Moschitti, A., Pang, B. & Daelemans, W.) 1724–1734 (Association for Computational Linguistics, 2014).

  26. Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural Ordinary Differential Equations. in Advances in Neural Information Processing Systems (eds. Bengio, S. et al.) vol. 31 6571–6583 (Curran Associates, Inc., 2018).

  27. Shukla, S. N. & Marlin, B. Multi-time attention networks for irregularly sampled time series. In International Conference on Learning Representations (ICLR, 2021).

  28. Chen, R. T. Q., Amos, B. & Nickel, M. Learning neural event functions for ordinary differential equations. International Conference on Learning Representations (ICLR, 2021).

  29. Vaswani, A. et al. Attention is All you Need. in Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) vol. 30 5998–6008 (Curran Associates, Inc., 2017).

  30. Rahaman, N. et al. On the spectral bias of neural networks. Proc. Mach. Learning Res. 97, 5301–5310 (2019).

  31. Cancer Genome Atlas Research Networket al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Google Scholar 

  32. Yeo, G. H. T., Saksena, S. D. & Gifford, D. K. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat. Commun. 12, 3222 (2021).

    Google Scholar 

  33. Tam, P. P. & Behringer, R. R. Mouse gastrulation: the formation of a mammalian body plan. Mech. Dev. 68, 3–25 (1997).

    Google Scholar 

  34. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    Google Scholar 

  35. Qiu, C. et al. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nat. Genet. 54, 328–341 (2022).

    Google Scholar 

  36. Briggs, J. A. et al. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science 360, eaar5780 (2018).

    Google Scholar 

  37. McInnes, L., Healy, J., Saul, N. & Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. The Journal of Open Source Software 3, 861 (2018).

  38. Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933).

    MATH  Google Scholar 

  39. Viegas, J. O. et al. RNA degradation eliminates developmental transcripts during murine embryonic stem cell differentiation via CAPRIN1-XRN2. Dev. Cell 57, 2731–2744.e5 (2022).

    Google Scholar 

  40. Tomecki, R., Sikorski, P. J. & Zakrzewska-Placzek, M. Comparison of preribosomal RNA processing pathways in yeast, plant and human cells—focus on coordinated action of endo- and exoribonucleases. FEBS Lett. 591, 1801–1850 (2017).

    Google Scholar 

  41. Watada, E. et al. Age-dependent ribosomal DNA variations in mice. Mol. Cell. Biol. 40, e00368-20 (2020).

    Google Scholar 

  42. Nimura, K. et al. Regulation of alternative polyadenylation by Nkx2-5 and Xrn2 during mouse heart development. eLife 5, e16030 (2016).

    Google Scholar 

  43. Chatterjee, S. & Grosshans, H. Active turnover modulates mature microRNA activity in Caenorhabditis elegans. Nature 461, 546–549 (2009).

    Google Scholar 

  44. Chatterjee, S., Fasler, M., Büssing, I. & Grosshans, H. Target-mediated protection of endogenous microRNAs in C. elegans. Dev. Cell 20, 388–396 (2011).

    Google Scholar 

  45. Chowdhury, T., Samajdar, A., Sardar, M. & Chatterjee, S. Dauer quiescence as well as continuity of the life cycle after dauer-exit in Caenorhabditis elegans are dependent on the endoribonuclease activity of XRN-2. Preprint at bioRxiv https://doi.org/10.1101/2022.05.02.489690 (2022).

  46. Kato, M., de Lencastre, A., Pincus, Z. & Slack, F. J. Dynamic expression of small non-coding RNAs, including novel microRNAs and piRNAs/21U-RNAs, during Caenorhabditis elegans development. Genome Biol. 10, R54 (2009).

    Google Scholar 

  47. Qiao, G.-J., Chen, L., Wu, J.-C. & Li, Z.-R. Identification of an eight-gene signature for survival prediction for patients with hepatocellular carcinoma based on integrated bioinformatics analysis. PeerJ 7, e6548 (2019).

    Google Scholar 

  48. Takada, H. & Kurisaki, A. Emerging roles of nucleolar and ribosomal proteins in cancer, development, and aging. Cell. Mol. Life Sci. 72, 4015–4025 (2015).

    Google Scholar 

  49. Loganathan, T., Ramachandran, S., Shankaran, P., Nagarajan, D. & Mohan S, S. Host transcriptome-guided drug repurposing for COVID-19 treatment: a meta-analysis based approach. PeerJ 8, e9357 (2020).

    Google Scholar 

  50. Belyaeva, A. et al. Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing. Nat. Commun. 12, 1024 (2021).

    Google Scholar 

  51. Minamiyama, M. et al. Naratriptan mitigates CGRP1-associated motor neuron degeneration caused by an expanded polyglutamine repeat tract. Nat. Med. 18, 1531–1538 (2012).

    Google Scholar 

  52. Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).

    Google Scholar 

  53. Yang, C. et al. A survey of optimal strategy for signature-based drug repositioning and an application to liver cancer. eLife 11, e71880 (2022).

    Google Scholar 

  54. Cheng, X. et al. Drug repurposing for cancer treatment through global propagation with a greedy algorithm in a multilayer network. Cancer Biol. Med. 19, 74–89 (2022).

  55. Folkes, A. J. et al. The identification of 2-(1H-indazol-4-yl)-6-(4-methanesulfonyl-piperazin-1-ylmethyl)-4-morpholin-4-yl-thieno[3,2-d]pyrimidine (GDC-0941) as a potent, selective, orally bioavailable inhibitor of class I PI3 kinase for the treatment of cancer. J. Med. Chem. 51, 5522–5532 (2008).

    Google Scholar 

  56. Roth, G. J. et al. Nintedanib: from discovery to the clinic. J. Med. Chem. 58, 1053–1063 (2015).

    Google Scholar 

  57. Suzuki, N., Nakagawa, F., Matsuoka, K. & Takechi, T. Effect of a novel oral chemotherapeutic agent containing a combination of trifluridine, tipiracil and the novel triple angiokinase inhibitor nintedanib, on human colorectal cancer xenografts. Oncol. Rep. 36, 3123–3130 (2016).

    Google Scholar 

  58. Seashore-Ludlow, B. et al. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov. 5, 1210–1223 (2015).

    Google Scholar 

  59. Menden, M. P. et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat. Commun. 10, 2674 (2019).

    Google Scholar 

  60. Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).

    Google Scholar 

  61. Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).

    Google Scholar 

  62. Chen, X. et al. Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nat. Commun. 11, 3475 (2020).

    Google Scholar 

  63. Bozic, I. et al. Accumulation of driver and passenger mutations during tumor progression. Proc. Natl Acad. Sci. USA 107, 18545–18550 (2010).

    Google Scholar 

  64. Arazo, E., Ortego, D., Paul, A., O’Connor, N. E. & McGuinness, K. Unsupervised label noise modeling and loss correction. Proc. Mach. Learning Res. 97, 312–321 (2019).

  65. Li, J., Socher, R. & Hoi, S. C. H. DivideMix: Learning with noisy labels as semi-supervised learning. In International Conference on Learning Representations (2020).

  66. Brown, L. C. et al. LRP1B mutations are associated with favorable outcomes to immune checkpoint inhibitors across multiple cancer types. J. Immunother. Cancer 9, e001792 (2021).

    Google Scholar 

  67. Arang, N. & Gutkind, J. S. G protein-coupled receptors and heterotrimeric G proteins as cancer drivers. FEBS Lett. 594, 4201–4232 (2020).

    Google Scholar 

  68. Ichikawa, D. et al. Integrated diagnosis based on transcriptome analysis in suspected pediatric sarcomas. NPJ Genom. Med. 6, 49 (2021).

    Google Scholar 

  69. Pietrobono, S., Gagliardi, S. & Stecca, B. Non-canonical Hedgehog signaling pathway in cancer: activation of GLI transcription factors beyond Smoothened. Front. Genet. 10, 556 (2019).

    Google Scholar 

  70. Lo, W. W., Pinnaduwage, D., Gokgoz, N., Wunder, J. S. & Andrulis, I. L. Aberrant hedgehog signaling and clinical outcome in osteosarcoma. Sarcoma 2014, 261804 (2014).

    Google Scholar 

  71. Banerjee, S. et al. Loss of the PTCH1 tumor suppressor defines a new subset of plexiform fibromyxoma. J. Transl. Med. 17, 246 (2019).

    Google Scholar 

  72. Martinez, M. F. et al. Nevoid basal cell carcinoma syndrome: PTCH1 mutation profile and expression of genes involved in the Hedgehog pathway in Argentinian patients. Cells 8, 144 (2019).

    Google Scholar 

  73. Ge, Z. et al. Clinical significance of high c-MYC and low MYCBP2 expression and their association with Ikaros dysfunction in adult acute lymphoblastic leukemia. Oncotarget 6, 42300–42311 (2015).

    Google Scholar 

  74. Vatapalli, R. et al. Histone methyltransferase DOT1L coordinates AR and MYC stability in prostate cancer. Nat. Commun. 11, 4153 (2020).

    Google Scholar 

  75. Yoon, J. W. et al. Noncanonical regulation of the Hedgehog mediator GLI1 by c-MYC in Burkitt lymphoma. Mol. Cancer Res. 11, 604–615 (2013).

    Google Scholar 

  76. Tazzari, M. et al. Molecular determinants of soft tissue sarcoma immunity: targets for immune intervention. Int. J. Mol. Sci. 22, 7518 (2021).

    Google Scholar 

  77. Wang, X., Haswell, J. R. & Roberts, C. W. M. Molecular pathways: SWI/SNF (BAF) complexes are frequently mutated in cancer—mechanisms and potential therapeutic insights. Clin. Cancer Res. 20, 21–27 (2014).

    Google Scholar 

  78. Fan, X. et al. The association between methylation patterns of DNAH17 and clinicopathological factors in hepatocellular carcinoma. Cancer Med. 8, 337–350 (2019).

    Google Scholar 

  79. Hassounah, N. B., Bunch, T. A. & McDermott, K. M. Molecular pathways: the role of primary cilia in cancer progression and therapeutics with a focus on Hedgehog signaling. Clin. Cancer Res. 18, 2429–2435 (2012).

    Google Scholar 

  80. Stecca, B. & Ruiz i Altaba, A. Context-dependent regulation of the GLI code in cancer by HEDGEHOG and non-HEDGEHOG signals. J. Mol. Cell. Biol. 2, 84–95 (2010).

    Google Scholar 

  81. Brechbiel, J., Miller-Moslin, K. & Adjei, A. A. Crosstalk between hedgehog and other signaling pathways as a basis for combination therapies in cancer. Cancer Treat. Rev. 40, 750–759 (2014).

    Google Scholar 

  82. Chen, J., Zhang, J., Hong, L. & Zhou, Y. EGFLAM correlates with cell proliferation, migration, invasion and poor prognosis in glioblastoma. Cancer Biomark. 24, 343–350 (2019).

    Google Scholar 

  83. Yu, Q. et al. Upregulated NLGN1 predicts poor survival in colorectal cancer. BMC Cancer 21, 884 (2021).

    Google Scholar 

  84. Ren, Y.-M. et al. Exploring the key genes and pathways of side population cells in human osteosarcoma using gene expression array analysis. J. Orthop. Surg. Res. 13, 153 (2018).

    Google Scholar 

  85. Cutcliffe, C. et al. Clear cell sarcoma of the kidney: up-regulation of neural markers with activation of the Sonic hedgehog and Akt pathways. Clin. Cancer Res. 11, 7986–7994 (2005).

    Google Scholar 

  86. Wald, Y., Feder, A., Greenfeld, D. & Shalit, U. On Calibration and Out-of-Domain Generalization. in Advances in Neural Information Processing Systems (eds. Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S. & Vaughan, J. W.) vol. 34 2215–2227 (Curran Associates, Inc., 2021).

  87. Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).

    Google Scholar 

  88. Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018).

    Google Scholar 

  89. Fischer, D. S. et al. Inferring population dynamics from single-cell RNA-sequencing time series data. Nat. Biotechnol. 37, 461–468 (2019).

    Google Scholar 

  90. de Ruiter, J. pybiomart: a simple Pythonic interface to BioMart. GitHub https://github.com/jrderuiter/pybiomart (2018).

  91. Joshi, C. J., Ke, W., Drangowska-Way, A., O’Rourke, E. J. & Lewis, N. E. What are housekeeping genes? PLoS Comput. Biol. 18, e1010295 (2022).

    Google Scholar 

  92. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Google Scholar 

  93. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  94. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).

    MATH  Google Scholar 

  95. Aynaud, T. python-louvain 0.15: Louvain algorithm for community detection. GitHub https://github.com/taynaud/python-louvain (2020).

  96. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Google Scholar 

  97. Berry, L. M. & Zhao, Z. An examination of IC50 and IC50-shift experiments in assessing time-dependent inhibition of CYP3A4, CYP2D6 and CYP2C9 in human liver microsomes. Drug Metab. Lett. 2, 51–59 (2008).

    Google Scholar 

  98. Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).

    Google Scholar 

  99. Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (eds. Bengio, Y. & LeCun, Y.) (2015).

  100. Mariani, O. et al. JUN oncogene amplification and overexpression block adipocytic differentiation in highly aggressive sarcomas. Cancer Cell 11, 361–374 (2007).

    Google Scholar 

  101. Bae, J. Y. et al. Evaluation of immune-biomarker expression in high-grade soft-tissue sarcoma: HLA-DQA1 expression as a prognostic marker. Exp. Ther. Med. 20, 107 (2020).

    Google Scholar 

  102. Wang, H. et al. HER4 promotes cell survival and chemoresistance in osteosarcoma via interaction with NDRG1. Biochim. Biophys. Acta Mol. Basis Dis. 1864, 1839–1849 (2018).

    Google Scholar 

  103. Yan, X., Chua, M.-S., Sun, H. & So, S. N-Myc down-regulated gene 1 mediates proliferation, invasion, and apoptosis of hepatocellular carcinoma cells. Cancer Lett. 262, 133–142 (2008).

    Google Scholar 

  104. Cheng, J. et al. NDRG1 as a biomarker for metastasis, recurrence and of poor prognosis in hepatocellular carcinoma. Cancer Lett. 310, 35–45 (2011).

    Google Scholar 

  105. Hua, Y. et al. Plasma membrane proteomic analysis of human osteosarcoma and osteoblastic cells: revealing NDRG1 as a marker for osteosarcoma. Tumour Biol. 32, 1013–1021 (2011).

    Google Scholar 

  106. Graf, S. A. et al. The myelin protein PMP2 is regulated by SOX10 and drives melanoma cell invasion. Pigment Cell Melanoma Res. 32, 424–434 (2019).

    Google Scholar 

  107. Cheng, L. et al. Integration of genomic copy number variations and chemotherapy-response biomarkers in pediatric sarcoma. BMC Med. Genom. 12, 23 (2019).

    Google Scholar 

  108. Guo, Q., Sun, H., Zheng, K., Yin, S. & Niu, J. Long non-coding RNA DLX6-AS1/miR-141-3p axis regulates osteosarcoma proliferation, migration and invasion through regulating Rab10. RSC Adv. 9, 33823–33833 (2019).

    Google Scholar 

  109. International Cancer Genome Consortiumet al. International network of cancer genome projects. Nature 464, 993–998 (2010).

    Google Scholar 

  110. Mito, J. K. et al. Cross species genomic analysis identifies a mouse model as undifferentiated pleomorphic sarcoma/malignant fibrous histiocytoma. PLoS ONE 4, e8075 (2009).

    Google Scholar 

  111. Capra, M. et al. Frequent alterations in the expression of serine/threonine kinases in human cancers. Cancer Res. 66, 8147–8154 (2006).

    Google Scholar 

  112. Pandey, P. et al. Amyloid precursor protein and amyloid precursor-like protein 2 in cancer. Oncotarget 7, 19430–19444 (2016).

    Google Scholar 

  113. Woicik, A. addiewc/Sagittarius: Sagittarius. Zenodo https://doi.org/10.5281/zenodo.7879454 (2023).

  114. Woicik, A. Simulated EvoDevo dataset. figshare https://doi.org/10.6084/m9.figshare.20425572 (2022).

  115. Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. in Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) vol. 32 8026–8037 (Curran Associates, Inc., 2019).

  116. Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Wolf, F. A. anndata: annotated data. Preprint at bioRxiv https://doi.org/10.1101/2021.12.16.473007 (2021).

  117. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Google Scholar 

  118. Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    Google Scholar 

  119. The Pandas Development Team. pandas-dev/pandas: pandas. Zenodo https://doi.org/10.5281/zenodo.7857418 (2023).

  120. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Google Scholar 

  121. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Google Scholar 

  122. Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).

    Google Scholar 

  123. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 92–96 https://doi.org/10.25080/majora-92bf1922-011 (SciPy, 2010).

  124. da Costa-Luis, C. et al. tqdm: a fast, extensible progress bar for Python and CLI. Zenodo https://doi.org/10.5281/zenodo.7697295 (2023).

  125. Davidson-Pilon, C. lifelines: survival analysis in Python. J. Open Source Softw. 4, 1317 (2019).

    Google Scholar 

Download references

Acknowledgements

S.W. is supported by the Sony Research Award. Figures 1, 4a,b and 6a,f were created with BioRender.com.

Author information

Authors and Affiliations

Authors

Contributions

A.W. and S.W. conceptualized the work and designed the method. A.W., S.W. and J.M. designed the experiments. A.W. and M.Z. ran the experiments, and A.W., M.Z. and J.C. developed computational tools for Sagittarius. A.W. and S.W. wrote the manuscript and designed the figures.

Corresponding author

Correspondence to Sheng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Chenling Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note 1, Tables 1 and 2 and Figs. 1–28.

Reporting Summary

Supplementary Data 1

Enriched processes for gene sets that Sagittarius extrapolates accurately. Gene Ontology terms enriched by the genes that Sagittarius predicts with at least 0.4 test Pearson correlation comparing timepoints, along with the number of species for which the term is enriched and the enrichment analysis’s two-sided Fisher exact test P value (with Bonferroni correction for multiple hypothesis testing) for each time series.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Woicik, A., Zhang, M., Chan, J. et al. Extrapolating heterogeneous time-series gene expression data using Sagittarius. Nat Mach Intell 5, 699–713 (2023). https://doi.org/10.1038/s42256-023-00679-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00679-5

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics