Article | Published:

Found In Translation: a machine learning model for mouse-to-human inference

Nature Methodsvolume 15pages10671073 (2018) | Download Citation


Cross-species differences form barriers to translational research that ultimately hinder the success of clinical trials, yet knowledge of species differences has yet to be systematically incorporated in the interpretation of animal models. Here we present Found In Translation (FIT;, a statistical methodology that leverages public gene expression data to extrapolate the results of a new mouse experiment to expression changes in the equivalent human condition. We applied FIT to data from mouse models of 28 different human diseases and identified experimental conditions in which FIT predictions outperformed direct cross-species extrapolation from mouse results, increasing the overlap of differentially expressed genes by 20–50%. FIT predicted novel disease-associated genes, an example of which we validated experimentally. FIT highlights signals that may otherwise be missed and reduces false leads, with no experimental cost.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

All gene expression data were acquired from the public domain and are listed in Supplementary Table 1. The training data used by FIT can be downloaded from the site and package mentioned under the “Code availability” section.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

  2. 2.

    Zheng-Bradley, X., Rung, J., Parkinson, H. & Brazma, A. Large scale comparison of global gene expression patterns in human and mouse. Genome Biol. 11, R124 (2010).

  3. 3.

    Liao, B.-Y. & Zhang, J. Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc. Natl. Acad. Sci. USA 105, 6987–6992 (2008).

  4. 4.

    Mestas, J. & Hughes, C. C. W. Of mice and not men: differences between mouse and human immunology. J. Immunol. 172, 2731–2738 (2004).

  5. 5.

    Geifman, N. & Rubin, E. The mouse age phenome knowledgebase and disease-specific inter-species age mapping. PLoS ONE 8, e81114 (2013).

  6. 6.

    Beura, L. K. et al. Normalizing the environment recapitulates adult human immune traits in laboratory mice. Nature 532, 512–516 (2016).

  7. 7.

    Shay, T. et al. Conservation and divergence in the transcriptional programs of the human and mouse immune systems. Proc. Natl. Acad. Sci. USA 110, 2946–2951 (2013).

  8. 8.

    Seok, J. et al. Genomic responses in mouse models poorly mimic human inflammatory diseases. Proc. Natl. Acad. Sci. USA 110, 3507–3512 (2013).

  9. 9.

    Kolata. G. Mice fall short as test subjects for some of humans’ deadly ills. New York Times 11 February (2013).

  10. 10.

    Bugelski, P. J. & Martin, P. L. Concordance of preclinical and clinical pharmacology and toxicology of therapeutic monoclonal antibodies and fusion proteins: cell surface targets. Br. J. Pharmacol. 166, 823–846 (2012).

  11. 11.

    Wilkins, H. M., Bouchard, R. J., Lorenzon, N. M. & Linseman, D. A. in Horizons in Neuroscience Research Vol. 5 (eds. Costa, A. & Villalba, E.) 67–72 (Nova Science, Hauppauge, NY, 2011).

  12. 12.

    Hünig, T. The storm has cleared: lessons from the CD28 superagonist TGN1412 trial. Nat. Rev. Immunol. 12, 317–318 (2012).

  13. 13.

    Brehm, M. A., Wiles, M. V., Greiner, D. L. & Shultz, L. D. Generation of improved humanized mouse models for human infectious diseases. J. Immunol. Methods 410, 3–17 (2014).

  14. 14.

    Hwang, S., Kim, E., Yang, S., Marcotte, E. M. & Lee, I. MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network. Nucleic Acids Res. 42, W147–W153 (2014).

  15. 15.

    Zinman, G. E. et al. ModuleBlast: identifying activated sub-networks within and across species. Nucleic Acids Res. 43, e20 (2015).

  16. 16.

    Djordjevic, D., Kusumi, K. & Ho, J. W. K. XGSA: a statistical method for cross-species gene set analysis. Bioinformatics 32, i620–i628 (2016).

  17. 17.

    Seok, J. Evidence-based translation for the genomic responses of murine models for the study of human immunity. PLoS ONE 10, e0118017 (2015).

  18. 18.

    Kolesnikov, N. et al. ArrayExpress update—simplifying data submissions. Nucleic Acids Res. 43, D1113–D1116 (2015).

  19. 19.

    Barrett, T. et al. NCBI GEO: archive for functional genomics datasets—update. Nucleic Acids Res. 41, D991–D995 (2013).

  20. 20.

    Sweeney, T. E., Braviak, L., Tato, C. M. & Khatri, P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir. Med. 4, 213–224 (2016).

  21. 21.

    Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).

  22. 22.

    Szász, A. M. et al. Cross-validation of survival associated biomarkers in gastric cancer using transcriptomic data of 1,065 patients. Oncotarget 7, 49322–49333 (2016).

  23. 23.

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. Stat. Soc. 58, 267–288 (1996).

  24. 24.

    Lorenzon-Ojea, A. R. et al. Stromal cell derived factor-2 (Sdf2): a novel protein expressed in mouse. Int. J. Biochem. Cell Biol. 53, 262–270 (2014).

  25. 25.

    Izumi, T. et al. Activation of synoviolin promoter in rheumatoid synovial cells by a novel transcription complex of interleukin enhancer binding factor 3 and GA binding protein alpha. Arthritis Rheum. 60, 63–72 (2009).

  26. 26.

    O’Rielly, D. D. & Rahman, P. Genetic, epigenetic and pharmacogenetic aspects of psoriasis and psoriatic arthritis. Rheum. Dis. Clin. North Am. 41, 623–642 (2015).

  27. 27.

    Hou, Q., Chen, K. & Shan, Z. The construction of cDNA library and the screening of related antigen of ascitic tumor cells of ovarian cancer. Eur. J. Gynaecol. Oncol. 36, 590–594 (2015).

  28. 28.

    Senchenkova, E., Seifert, H. & Granger, D. N. Hypercoagulability and platelet abnormalities in inflammatory bowel disease. Semin. Thromb. Hemost. 41, 582–589 (2015).

  29. 29.

    Stagg, A. J., Hart, A. L., Knight, S. C. & Kamm, M. A. The dendritic cell: its role in intestinal inflammation and relationship with gut bacteria. Gut 52, 1522–1529 (2003).

  30. 30.

    di Mola, F. F. et al. Nerve growth factor and Trk high affinity receptor (TrkA) gene expression in inflammatory bowel disease. Gut 46, 670–679 (2000).

  31. 31.

    Davis, S. & Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847 (2007).

  32. 32.

    Eppig, J. T., Blake, J. A., Bult, C. J., Kadin, J. A. & Richardson, J. E. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res. 43, D726–D736 (2015).

  33. 33.

    Cheadle, C., Vawter, M. P., Freed, W. J. & Becker, K. G. Analysis of microarray data using Z score transformation. J. Mol. Diagn. 5, 73–81 (2003).

  34. 34.

    Zhu, Y., Stephens, R. M., Meltzer, P. S. & Davis, S. R. SRAdb: query and use public next-generation sequencing data from within R. BMC Bioinformatics 14, 19 (2013).

  35. 35.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

  36. 36.

    Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods 14, 687–690 (2017).

  37. 37.

    Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: misc functions of the Department of Statistics, Probability Theory Group (formerly: E1071). The Comprehensive R Archive Network (2017).

  38. 38.

    Mi, H. et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 45, D183–D189 (2017).

  39. 39.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

Download references


We thank M. Davis, A. Butte and L. Steinman for discussions that inspired this work; Y. Shaked, T. Shlomi, Z. Yakhini, Y. Reiter and E. Borenstein for fruitful discussions; Y. Chowers for his assistance with IBD-related experiment and analysis; and D. Cohen for computing help. This research was supported by Israel Science Foundation (ISF) grant 1365/12 and the Rappaport Institute of Biomedical Research (both to S.S.S.-O.). R.J.T. was supported by NSF grant no. DMS1208164 and NIH grant no. 5R01 EB001988-16.

Author information


  1. Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel

    • Rachelly Normand
    • , Mayan Briller
    • , Renaud Gaujoux
    • , Elina Starosvetsky
    • , Amit Ziv-Kenet
    • , Gali Shalev-Malul
    •  & Shai S. Shen-Orr
  2. Department of Statistics, Stanford University, Stanford, CA, USA

    • Wenfei Du
    •  & Robert J. Tibshirani
  3. Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA

    • Robert J. Tibshirani


  1. Search for Rachelly Normand in:

  2. Search for Wenfei Du in:

  3. Search for Mayan Briller in:

  4. Search for Renaud Gaujoux in:

  5. Search for Elina Starosvetsky in:

  6. Search for Amit Ziv-Kenet in:

  7. Search for Gali Shalev-Malul in:

  8. Search for Robert J. Tibshirani in:

  9. Search for Shai S. Shen-Orr in:


R.N., W.D., R.J.T. and S.S.S.-O. designed the algorithm. R.N. and W.D. implemented the algorithm. R.N. analyzed all the results and implemented the pre-processing of the gene expression data. R.N. and S.S.S.-O. wrote the manuscript. R.N., M.B. and G.S.M. manually collected and annotated the training data. R.G. and A.Z.K. implemented a tool that allowed systematic annotations of GEO datasets. R.G. assisted with the pre-processing of the gene expression datasets. M.B. and E.S. performed the experimental validation. E.S. assisted with the interpretation of GSEA results.

Competing interests

S.S.S.-O. and E.S. are scientific advisers and hold equity in CytoReason. R.G. is an employee of CytoReason and holds equity. All other authors declare no competing interests.

Corresponding author

Correspondence to Shai S. Shen-Orr.

Integrated supplementary information

  1. Supplementary Figure 1 The structure of the human–mouse gene expression training compendium.

    (a) Expression datasets were assembled from the Gene Expression Omnibus (GEO) and were manually curated. Each dataset contains at least 3 control samples and at least 3 disease samples. GEO datasets within each species were divided into datasets of matching disease and control samples from parallel conditions. (b) The compendium contains 170 Cross-Species Pairings (CSPs) from 28 different diseases, spanning 3,033 human samples and 1,181 mouse samples. We divide the datasets into two types of CSPs: Standard (ST), which include human and mouse datasets that were conducted in separate experiments; and Reference (RF), in which the human and mouse datasets were directly contrasted in a publication authored by researchers who had generated at least one of the datasets. (c) Number of CSPs per disease. (d) The composition of the training compendium by technology and dataset type. (e) Summary of platform types included in the compendium in human (top) and mouse (bottom). (f) Summary of tissue types included in the training compendium in human (top) and mouse (bottom).

  2. Supplementary Figure 2 True positive fractions in different diseases and various fold-change thresholds.

    True positive fractions (mean ± s.e.) for every disease for cross-species pairings (blue), intra-human pairings (green) and intra-mouse pairings (pink) are plotted across fold-change thresholds. q-value threshold = 0.05. Number of CSPs per dataset is indicated in parentheses. Number of CSPs and intra-species pairs is the same as in Fig. 1c.

  3. Supplementary Figure 3 Cross-species comparisons exhibit significantly lower fraction of true positive genes compared to intra-species.

    TP fractions (mean ± s.e.) of all 170 cross-species is significantly lower compared to that of intra-species pairings (blue and black accordingly) across 117 fold-change and q-value threshold combinations (P-value < 10–16 by one-tailed Mann-Whitney on mean values, n = 117 as the number of threshold combinations). On average, cross-species TP fractions are a quarter of the size of intra-species TP fractions.

  4. Supplementary Figure 4 FIT’s performance varies across CSPs but can be predicted based on the input mouse gene expression.

    (a) Classification of CSPs in each disease into the five performance classes shown in Fig. 2d. FIT-improved disease set (marked in bold) are diseases in which most of the CSPs were improved by FIT (major or minor signal improvement). The diseases are sorted by the number of CSPs that were improved. (b) A distinction between good and poor FIT performance is apparent in the first two principal components based the mouse expression of 4,957 genes. A good FIT performance was defined as true positive ratio > 1.1, meaning that FIT was able to identify at least 10% more human-relevant genes compared to the mouse.

  5. Supplementary Figure 5 FIT improves TP ratios for a subset of diseases and tissues.

    Box plots of TP ratios for 170 CSPs at a q-value threshold = 0.1 and fold-change threshold = 0.25 ordered by (a) disease, (b) mouse source tissue and (c) human benchmark tissue FIT results are compared to. Significant differences are observed in all categories: p-value < 10–6, 10–4, 10–6 by ANOVA for disease, mouse tissue and human tissue respectively, for a q-value threshold = 0.1 and fold-change threshold = 0.25). Number of CSPs in each group is stated in the y axes labels in parentheses. With the present training set, FIT increased true positive fractions (log TP ratios > 0) predominantly in infectious diseases and inflammatory conditions yet performed poorly in cancer. For mouse tissues, FIT performed well on spleen, blood, lung and gut, whereas for human tissues FIT’s performance was best when the sampled benchmarked against were blood or gut. Boxes represent 25th and 75th percentiles around the median (line). Whiskers, 1.5× the IQR.

  6. Supplementary Figure 6 A control set of genes with 30% smallest CIs does not show higher TP fractions.

    Using genes with 30% smallest CIs significantly increases the TP fractions (red vs. pink, P < = 10–16, one-tailed Mann-Whitney on mean values, also shown in Fig. 3a). A control set of 30% random genes doesn’t show a significant difference from FIT’s predictions (pink vs. bold gray).

  7. Supplementary Figure 7 FIT rescues human DEGs undetected by the mouse with increased performance by relying on confidence intervals.

    (a) The number of putative genes (genes predicted by FIT to be human-relevant, but not by the mouse) is shown for each disease in the FIT-improved disease set. About one-third of the putative genes are validated by the human data (i.e. rescued). The percent of rescued genes out of the putative genes per disease is shown next to each bar. (b) Confidence intervals (CI) can be used as reliability scores for FIT predictions. Putative genes having the 30% smallest CIs are validated as human DEGS at a higher fraction compared to all putative genes, across all FIT-improved diseases.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–7

  2. Reporting Summary

  3. Supplementary Table 1

    Training data details

  4. Supplementary Table 2

    FIT putative genes per disease

  5. Supplementary Table 3

    Gene DEG summary

  6. Supplementary Software

    FIT package at the paper acceptance time

About this article

Publication history