Found In Translation: a machine learning model for mouse-to-human inference


Cross-species differences form barriers to translational research that ultimately hinder the success of clinical trials, yet knowledge of species differences has yet to be systematically incorporated in the interpretation of animal models. Here we present Found In Translation (FIT;, a statistical methodology that leverages public gene expression data to extrapolate the results of a new mouse experiment to expression changes in the equivalent human condition. We applied FIT to data from mouse models of 28 different human diseases and identified experimental conditions in which FIT predictions outperformed direct cross-species extrapolation from mouse results, increasing the overlap of differentially expressed genes by 20–50%. FIT predicted novel disease-associated genes, an example of which we validated experimentally. FIT highlights signals that may otherwise be missed and reduces false leads, with no experimental cost.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Disease versus control gene expression datasets are markedly dissimilar across species.
Fig. 2: FIT increases TP fractions compared to the mouse model.
Fig. 3: FIT rescues human-disease-relevant genes that could not have been found on the basis of mouse data alone.

Data availability

All gene expression data were acquired from the public domain and are listed in Supplementary Table 1. The training data used by FIT can be downloaded from the site and package mentioned under the “Code availability” section.


  1. 1.

    Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Zheng-Bradley, X., Rung, J., Parkinson, H. & Brazma, A. Large scale comparison of global gene expression patterns in human and mouse. Genome Biol. 11, R124 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Liao, B.-Y. & Zhang, J. Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc. Natl. Acad. Sci. USA 105, 6987–6992 (2008).

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Mestas, J. & Hughes, C. C. W. Of mice and not men: differences between mouse and human immunology. J. Immunol. 172, 2731–2738 (2004).

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Geifman, N. & Rubin, E. The mouse age phenome knowledgebase and disease-specific inter-species age mapping. PLoS ONE 8, e81114 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Beura, L. K. et al. Normalizing the environment recapitulates adult human immune traits in laboratory mice. Nature 532, 512–516 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Shay, T. et al. Conservation and divergence in the transcriptional programs of the human and mouse immune systems. Proc. Natl. Acad. Sci. USA 110, 2946–2951 (2013).

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Seok, J. et al. Genomic responses in mouse models poorly mimic human inflammatory diseases. Proc. Natl. Acad. Sci. USA 110, 3507–3512 (2013).

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Kolata. G. Mice fall short as test subjects for some of humans’ deadly ills. New York Times 11 February (2013).

  10. 10.

    Bugelski, P. J. & Martin, P. L. Concordance of preclinical and clinical pharmacology and toxicology of therapeutic monoclonal antibodies and fusion proteins: cell surface targets. Br. J. Pharmacol. 166, 823–846 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Wilkins, H. M., Bouchard, R. J., Lorenzon, N. M. & Linseman, D. A. in Horizons in Neuroscience Research Vol. 5 (eds. Costa, A. & Villalba, E.) 67–72 (Nova Science, Hauppauge, NY, 2011).

  12. 12.

    Hünig, T. The storm has cleared: lessons from the CD28 superagonist TGN1412 trial. Nat. Rev. Immunol. 12, 317–318 (2012).

    Article  CAS  PubMed  Google Scholar 

  13. 13.

    Brehm, M. A., Wiles, M. V., Greiner, D. L. & Shultz, L. D. Generation of improved humanized mouse models for human infectious diseases. J. Immunol. Methods 410, 3–17 (2014).

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Hwang, S., Kim, E., Yang, S., Marcotte, E. M. & Lee, I. MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network. Nucleic Acids Res. 42, W147–W153 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Zinman, G. E. et al. ModuleBlast: identifying activated sub-networks within and across species. Nucleic Acids Res. 43, e20 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. 16.

    Djordjevic, D., Kusumi, K. & Ho, J. W. K. XGSA: a statistical method for cross-species gene set analysis. Bioinformatics 32, i620–i628 (2016).

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Seok, J. Evidence-based translation for the genomic responses of murine models for the study of human immunity. PLoS ONE 10, e0118017 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Kolesnikov, N. et al. ArrayExpress update—simplifying data submissions. Nucleic Acids Res. 43, D1113–D1116 (2015).

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Barrett, T. et al. NCBI GEO: archive for functional genomics datasets—update. Nucleic Acids Res. 41, D991–D995 (2013).

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Sweeney, T. E., Braviak, L., Tato, C. M. & Khatri, P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir. Med. 4, 213–224 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Szász, A. M. et al. Cross-validation of survival associated biomarkers in gastric cancer using transcriptomic data of 1,065 patients. Oncotarget 7, 49322–49333 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. Stat. Soc. 58, 267–288 (1996).

    Google Scholar 

  24. 24.

    Lorenzon-Ojea, A. R. et al. Stromal cell derived factor-2 (Sdf2): a novel protein expressed in mouse. Int. J. Biochem. Cell Biol. 53, 262–270 (2014).

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Izumi, T. et al. Activation of synoviolin promoter in rheumatoid synovial cells by a novel transcription complex of interleukin enhancer binding factor 3 and GA binding protein alpha. Arthritis Rheum. 60, 63–72 (2009).

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    O’Rielly, D. D. & Rahman, P. Genetic, epigenetic and pharmacogenetic aspects of psoriasis and psoriatic arthritis. Rheum. Dis. Clin. North Am. 41, 623–642 (2015).

    Article  PubMed  Google Scholar 

  27. 27.

    Hou, Q., Chen, K. & Shan, Z. The construction of cDNA library and the screening of related antigen of ascitic tumor cells of ovarian cancer. Eur. J. Gynaecol. Oncol. 36, 590–594 (2015).

    CAS  PubMed  Google Scholar 

  28. 28.

    Senchenkova, E., Seifert, H. & Granger, D. N. Hypercoagulability and platelet abnormalities in inflammatory bowel disease. Semin. Thromb. Hemost. 41, 582–589 (2015).

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Stagg, A. J., Hart, A. L., Knight, S. C. & Kamm, M. A. The dendritic cell: its role in intestinal inflammation and relationship with gut bacteria. Gut 52, 1522–1529 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    di Mola, F. F. et al. Nerve growth factor and Trk high affinity receptor (TrkA) gene expression in inflammatory bowel disease. Gut 46, 670–679 (2000).

    Article  PubMed  Google Scholar 

  31. 31.

    Davis, S. & Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847 (2007).

    Article  CAS  PubMed  Google Scholar 

  32. 32.

    Eppig, J. T., Blake, J. A., Bult, C. J., Kadin, J. A. & Richardson, J. E. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res. 43, D726–D736 (2015).

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Cheadle, C., Vawter, M. P., Freed, W. J. & Becker, K. G. Analysis of microarray data using Z score transformation. J. Mol. Diagn. 5, 73–81 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Zhu, Y., Stephens, R. M., Meltzer, P. S. & Davis, S. R. SRAdb: query and use public next-generation sequencing data from within R. BMC Bioinformatics 14, 19 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  Article  Google Scholar 

  36. 36.

    Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods 14, 687–690 (2017).

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A. & Leisch, F. e1071: misc functions of the Department of Statistics, Probability Theory Group (formerly: E1071). The Comprehensive R Archive Network (2017).

  38. 38.

    Mi, H. et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 45, D183–D189 (2017).

    CAS  Article  Google Scholar 

  39. 39.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  Article  PubMed  Google Scholar 

Download references


We thank M. Davis, A. Butte and L. Steinman for discussions that inspired this work; Y. Shaked, T. Shlomi, Z. Yakhini, Y. Reiter and E. Borenstein for fruitful discussions; Y. Chowers for his assistance with IBD-related experiment and analysis; and D. Cohen for computing help. This research was supported by Israel Science Foundation (ISF) grant 1365/12 and the Rappaport Institute of Biomedical Research (both to S.S.S.-O.). R.J.T. was supported by NSF grant no. DMS1208164 and NIH grant no. 5R01 EB001988-16.

Author information




R.N., W.D., R.J.T. and S.S.S.-O. designed the algorithm. R.N. and W.D. implemented the algorithm. R.N. analyzed all the results and implemented the pre-processing of the gene expression data. R.N. and S.S.S.-O. wrote the manuscript. R.N., M.B. and G.S.M. manually collected and annotated the training data. R.G. and A.Z.K. implemented a tool that allowed systematic annotations of GEO datasets. R.G. assisted with the pre-processing of the gene expression datasets. M.B. and E.S. performed the experimental validation. E.S. assisted with the interpretation of GSEA results.

Corresponding author

Correspondence to Shai S. Shen-Orr.

Ethics declarations

Competing interests

S.S.S.-O. and E.S. are scientific advisers and hold equity in CytoReason. R.G. is an employee of CytoReason and holds equity. All other authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 The structure of the human–mouse gene expression training compendium.

(a) Expression datasets were assembled from the Gene Expression Omnibus (GEO) and were manually curated. Each dataset contains at least 3 control samples and at least 3 disease samples. GEO datasets within each species were divided into datasets of matching disease and control samples from parallel conditions. (b) The compendium contains 170 Cross-Species Pairings (CSPs) from 28 different diseases, spanning 3,033 human samples and 1,181 mouse samples. We divide the datasets into two types of CSPs: Standard (ST), which include human and mouse datasets that were conducted in separate experiments; and Reference (RF), in which the human and mouse datasets were directly contrasted in a publication authored by researchers who had generated at least one of the datasets. (c) Number of CSPs per disease. (d) The composition of the training compendium by technology and dataset type. (e) Summary of platform types included in the compendium in human (top) and mouse (bottom). (f) Summary of tissue types included in the training compendium in human (top) and mouse (bottom).

Supplementary Figure 2 True positive fractions in different diseases and various fold-change thresholds.

True positive fractions (mean ± s.e.) for every disease for cross-species pairings (blue), intra-human pairings (green) and intra-mouse pairings (pink) are plotted across fold-change thresholds. q-value threshold = 0.05. Number of CSPs per dataset is indicated in parentheses. Number of CSPs and intra-species pairs is the same as in Fig. 1c.

Supplementary Figure 3 Cross-species comparisons exhibit significantly lower fraction of true positive genes compared to intra-species.

TP fractions (mean ± s.e.) of all 170 cross-species is significantly lower compared to that of intra-species pairings (blue and black accordingly) across 117 fold-change and q-value threshold combinations (P-value < 10–16 by one-tailed Mann-Whitney on mean values, n = 117 as the number of threshold combinations). On average, cross-species TP fractions are a quarter of the size of intra-species TP fractions.

Supplementary Figure 4 FIT’s performance varies across CSPs but can be predicted based on the input mouse gene expression.

(a) Classification of CSPs in each disease into the five performance classes shown in Fig. 2d. FIT-improved disease set (marked in bold) are diseases in which most of the CSPs were improved by FIT (major or minor signal improvement). The diseases are sorted by the number of CSPs that were improved. (b) A distinction between good and poor FIT performance is apparent in the first two principal components based the mouse expression of 4,957 genes. A good FIT performance was defined as true positive ratio > 1.1, meaning that FIT was able to identify at least 10% more human-relevant genes compared to the mouse.

Supplementary Figure 5 FIT improves TP ratios for a subset of diseases and tissues.

Box plots of TP ratios for 170 CSPs at a q-value threshold = 0.1 and fold-change threshold = 0.25 ordered by (a) disease, (b) mouse source tissue and (c) human benchmark tissue FIT results are compared to. Significant differences are observed in all categories: p-value < 10–6, 10–4, 10–6 by ANOVA for disease, mouse tissue and human tissue respectively, for a q-value threshold = 0.1 and fold-change threshold = 0.25). Number of CSPs in each group is stated in the y axes labels in parentheses. With the present training set, FIT increased true positive fractions (log TP ratios > 0) predominantly in infectious diseases and inflammatory conditions yet performed poorly in cancer. For mouse tissues, FIT performed well on spleen, blood, lung and gut, whereas for human tissues FIT’s performance was best when the sampled benchmarked against were blood or gut. Boxes represent 25th and 75th percentiles around the median (line). Whiskers, 1.5× the IQR.

Supplementary Figure 6 A control set of genes with 30% smallest CIs does not show higher TP fractions.

Using genes with 30% smallest CIs significantly increases the TP fractions (red vs. pink, P < = 10–16, one-tailed Mann-Whitney on mean values, also shown in Fig. 3a). A control set of 30% random genes doesn’t show a significant difference from FIT’s predictions (pink vs. bold gray).

Supplementary Figure 7 FIT rescues human DEGs undetected by the mouse with increased performance by relying on confidence intervals.

(a) The number of putative genes (genes predicted by FIT to be human-relevant, but not by the mouse) is shown for each disease in the FIT-improved disease set. About one-third of the putative genes are validated by the human data (i.e. rescued). The percent of rescued genes out of the putative genes per disease is shown next to each bar. (b) Confidence intervals (CI) can be used as reliability scores for FIT predictions. Putative genes having the 30% smallest CIs are validated as human DEGS at a higher fraction compared to all putative genes, across all FIT-improved diseases.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7

Reporting Summary

Supplementary Table 1

Training data details

Supplementary Table 2

FIT putative genes per disease

Supplementary Table 3

Gene DEG summary

Supplementary Software

FIT package at the paper acceptance time

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Normand, R., Du, W., Briller, M. et al. Found In Translation: a machine learning model for mouse-to-human inference. Nat Methods 15, 1067–1073 (2018).

Download citation

Further reading