Cytokines are signaling molecules secreted and sensed by immune and other cell types, enabling dynamic intercellular communication. Although a vast amount of data on these interactions exists, this information is not compiled, integrated or easily searchable. Here we report immuneXpresso, a text-mining engine that structures and standardizes knowledge of immune intercellular communication. We applied immuneXpresso to PubMed to identify relationships between 340 cell types and 140 cytokines across thousands of diseases. The method is able to distinguish between incoming and outgoing interactions, and it includes the effect of the interaction and the cellular function involved. These factors are assigned a confidence score and linked to the disease. By leveraging the breadth of this network, we predicted and experimentally verified previously unappreciated cell–cytokine interactions. We also built a global immune-centric view of diseases and used it to predict cytokine–disease associations. This standardized knowledgebase (http://www.immunexpresso.org) opens up new directions for interpretation of immune data and model-driven systems immunology.
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Maecker, H.T. et al. New tools for classification and monitoring of autoimmune diseases. Nat. Rev. Rheumatol. 8, 317–328 (2012).
Hoffmann, R. & Valencia, A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21 (Suppl. 2), ii252–ii258 (2005).
Jimeno, A. et al. Assessment of disease-named-entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9 (Suppl. 3), S3 (2008).
Leaman, R., Islamaj Dogan, R. & Lu, Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29, 2909–2917 (2013).
McDonald, R.T. et al. An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics 20, 3249–3251 (2004).
Tanenblatt, M., Coden, A. & Sominsky, I. The ConceptMapper approach to named-entity recognition. LREC Int. Conf. Lang. Resour. Eval. 546–551 (2010).
Funk, C. et al. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 15, 59 (2014).
Shah, N.H. et al. Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics 10 (Suppl. 9), S14 (2009).
Bada, M. et al. Concept annotation in the CRAFT corpus. BMC Bioinformatics 13, 161 (2012).
Kim, J.D., Ohta, T., Tateisi, Y. & Tsujii, J. GENIA corpus—semantically annotated corpus for bio-textmining. Bioinformatics 19 (Suppl. 1), i180–i182 (2003).
Arighi, C.N. et al. Overview of the BioCreative III Workshop. BMC Bioinformatics 12 (Suppl. 8), S1 (2011).
Kim, J., Ohta, T., Pyysalo, S. & Kano, Y. Overview of BioNLP 2009 shared task on event extraction. (Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task) 1–9 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2009).
Kim, J.-D. et al. Overview of BioNLP Shared Task 2011. (Proceedings of the BioNLP Shared Task 2011 Workshop) 1–6 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2011).
Kim, J.D., Ohta, T. & Tsujii, J. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008).
Ananiadou, S., Pyysalo, S., Tsujii, J. & Kell, D.B. Event extraction for systems biology by text-mining the literature. Trends Biotechnol. 28, 381–390 (2010).
Pyysalo, S. et al. Event extraction across multiple levels of biological organization. Bioinformatics 28, i575–i581 (2012).
Mahmood, A.S.M.A., Wu, T.J., Mazumder, R. & Vijay-Shanker, K. DiMeX: a text-mining system for mutation–disease association extraction. PLoS One 11, e0152725 (2016).
Lee, K. et al. BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene–variant–disease–drug relations. Database (Oxford) 2016, 1–13 (2016).
Verspoor, K.M., Heo, G.E., Kang, K.Y. & Song, M. Establishing a baseline for literature-mining human genetic variants and their relationships to disease cohorts. BMC Med. Inform. Decis. Mak. 16 (Suppl. 1), 68 (2016).
Liu, H., Hunter, L., Kešelj, V. & Verspoor, K. Approximate subgraph matching–based literature mining for biomedical events and relations. PLoS One 8, e60954 (2013).
Björne, J. et al. Extracting complex biological events with rich graph-based feature sets. (Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task 10–18 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2009).
Rzhetsky, A., Seringhaus, M. & Gerstein, M.B. Getting started in text mining: part two. PLoS Comput. Biol. 5, e1000411 (2009).
Zhu, F. et al. Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46, 200–211 (2013).
Jensen, L.J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7, 119–129 (2006).
Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).
Linghu, B., Snitkin, E.S., Hu, Z., Xia, Y. & Delisi, C. Genome-wide prioritization of disease genes and identification of disease–disease associations from an integrated human functional linkage network. Genome Biol. 10, R91 (2009).
Hu, G. & Agarwal, P. Human disease–drug network based on genomic expression profiles. PLoS One 4, e6536 (2009).
Kilpinen, S. et al. Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues. Genome Biol. 9, R139 (2008).
Dembic, Z. The Cytokines of the Immune System: The Role of Cytokines in Disease Related to Immune Response (Elsevier Science, 2015).
Bhattacharya, S. et al. ImmPort: disseminating data to the public for the future of immunology. Immunol. Res. 58, 234–239 (2014).
Edwards, A.M. et al. Too many roads not taken. Nature 470, 163–165 (2011).
Barabasi, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Barabási, A.-L. Scale-free networks: a decade and beyond. Science 325, 412–413 (2009).
Rieckmann, J.C. et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat. Immunol. 18, 583–593 (2017).
Heng, T.S.P. & Painter, M.W. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).
Möller, P., Böhm, M., Czarnetszki, B.M. & Schadendorf, D. Interleukin-7. Biology and implications for dermatology. Exp. Dermatol. 5, 129–137 (1996).
Lin, H. et al. Discovery of a cytokine and its receptor by functional screening of the extracellular proteome. Science 320, 807–811 (2008).
Asghar, A. & Sheikh, N. Role of immune cells in obesity-induced low-grade inflammation and insulin resistance. Cell. Immunol. 315, 18–26 (2017).
Gaede, P. et al. Multifactorial intervention and cardiovascular disease in patients with type 2 diabetes. N. Engl. J. Med. 348, 383–393 (2003).
Park, H.K., Kwak, M.K., Kim, H.J. & Ahima, R.S. Linking resistin, inflammation and cardiometabolic diseases. Korean J. Intern. Med. 32, 239–247 (2017).
Hillenbrand, A., Weiss, M., Knippschild, U., Wolf, A.M. & Huber-Lang, M. Sepsis-induced adipokine change with regard to insulin resistance. Int. J. Inflam. 2012, 972368 (2012).
Shen-Orr, S.S. et al. Defective signaling in the JAK–STAT pathway tracks with chronic inflammation and cardiovascular risk in aging humans. Cell Syst. 3, 374–384.e4 (2016).
Furman, D. et al. Expression of specific inflammasome gene modules stratifies older individuals into two extreme clinical and immunological states. Nat. Med. 23, 174–184 (2017).
Russell, C.B. et al. Gene expression profiles normalized in psoriatic skin by treatment with brodalumab, a human anti–IL-17 receptor monoclonal antibody. J. Immunol. 192, 3828–3836 (2014).
Yao, Y. et al. Type I interferon: potential therapeutic target for psoriasis? PLoS One 3, e2737 (2008).
Hughes, A.L. Vertebrate Immune System: Evolution (John Wiley and Sons, Ltd., 2001) http://dx.doi.org/10.1002/9780470015902.a0006125.pub2.
Du Pasquier, L. The immune system of invertebrates and vertebrates. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 129, 1–15 (2001).
De Marneffe, M.-C., MacCartney, B. & Manning, C.D. Generating typed dependency parses from phrase structure parses. LREC Int. Conf. Lang. Resour. Eval. 6, 449–454 (2006).
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).
Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types. Genome Biol. 6, R21 (2005).
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1994).
Gillespie, C.S. Fitting heavy-tailed distributions: the 'poweRlaw' package. J. Stat. Softw. 64, 1–16 (2015).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Langfelder, P. & Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 46, 1–17 (2012).
We thank A. Butte and M. Davis for fruitful discussions and advice, N. Geifman for assistance with cytokine ontology development, D. Dougall for contribution to the cell lexicon, members of the Shen-Orr lab for reference book curation, D. Cohen for the high-performance computing cluster support, R. Reichart for Text Mining insights, and P. Dunn and S. Bhattacharya for the user interface development support. This work was supported by the US National Institutes of Health (NIH)-National Institute of Allergy and Infectious Diseases (U19 AI057229, and BISC contract HHSN272201200028C) and an award from the Rappaport Family Institute for Research in the Medical Sciences (S.S.S.-O.).
K.K. and Y.K. are employees and co-founders of CytoReason. S.S.S.-O. and E.S. are co-founders of, and serve as scientific advisors and/or consultants to, CytoReason.
Integrated supplementary information
Supplementary Figure 1 System level characteristics of inter-cellular information flow in the literature-derived network.
(a) Sankey plots showing bi-partite information flow between CD4+ T-cell subsets and a variety of cytokines. (b) A sorted histogram illustrating the number of unique cell-types secreting each of the 114 cytokines (outgoing interactions). A second y-axis displays the information as a cumulative sum (blue line). 50% of outgoing edges are attributed to only 17 (15%) cytokines (grey area). Cytokine family classification appears as coloring of individual members along x-axis. (c) Scatter plot highlighting the strong correlation in cytokine degrees between incoming and outgoing directions of the manually curated reference book interaction network (n=104 cytokines, r=0.69 pval<0.001, Pearson's).
Cytokine degree distribution analysis of (a) incoming and (b) outgoing interactions (n=144 and n=114 cytokines; bootstrap goodness-of-fit test, non-power law pval=0.73 and pval=0.47 respectively).
Heatmaps showing connectivity knowledge accumulation of cytokines and chemokines as a function of time and family membership. Families tend to have one dominant family member, suggesting a "rich get richer" phenomena.
Supplementary Figure 4 Weak relation between cytokine connectivity degrees and historical knowledge accumulation.
The correlation of the cytokine degree and the date of first publication was repeatedly computed across cytokines, threshold by minimal degree. A scatter plot showing this correlation when scaled back from the entire network to only few highly dominant hubs (cytokine degree cutoff is shown along x-axis). The entire network (n=145 cytokines) demonstrates low correlation, r=-0.27 and -0.26, Pearson's for (a) incoming and (b) outgoing interactions respectively, driven by a few highly dominant hubs.
Copy number profile based on the proteomics data from ImmProt (n=4 or n=3 independent donors). A protein was considered expressed in a cell type if more than 50% of replicates showed detectable non-zero values. Box-plot elements: center line, median; box limits, first to third quartile (Q1 to Q3); whiskers, from Q1–1.5 × IQR to Q3+1.5 × IQR; data points.
Sorted boxplots showing results of repeated paper sampling (200 iterations, each randomly selecting 200 abstracts), and calculating the proportion of hits for each cell (left) and cytokine (right). The entire corpus of 521,625 disease-HPC and 438,012 disease-cytokine co-occurrence papers is used, without limiting to any condition, to define pan-disease control immune profile. Cell subset and cytokine family classification appears as coloring of individual members across y-axis. Top 50% of the results are shown, with highest cited entities emphasized (grey area, median>=0.05). Box-plot elements: center line, median; box limits, first to third quartile (Q1 to Q3); whiskers, from Q1–1.5 × IQR to Q3+1.5 × IQR; points, outliers.
Supplementary Figure 7 Literature-based evaluation of novel cytokine-disease association predictions.
Evaluation of the predicted cytokine-disease associations within the entire, non-sampled iX knowledgebase. The amounts of total predicted associations (red), those verified as already reported in the literature (green) and the other, potentially novel candidates (blue) are shown separately for associations with different prediction strength (x-axis, scaled to 1).
Illustration of cell phrase expansion from the "cell" seed match by (a) inclusion of the seed and the dependents it governs or (b) by inclusion of the seed word's governor and all the other dependents of that governor.
Supplementary Figures 1–8 (PDF 1399 kb)
Cell seed recognition statistics. (XLSX 19 kb)
Blacklist of Cell Ontology nodes. (XLSX 10 kb)
Cytokine recognition statistics. (XLSX 13 kb)
Cytokine lexicon fragment for CXC chemokine family. (XLSX 65 kb)
Verb classification lexicon (XLSX 25 kb)
Cell entity fields and manual precision evaluation results. (XLSX 40 kb)
Cytokine entity fields and manual precision evaluation results. 2 (XLSX 39 kb)
Disease entity fields and manual precision evaluation results. (XLSX 39 kb)
Manual precision evaluation for noun phrase-internal relation 43 evidence records. (XLSX 38 kb)
Manual precision evaluation for non-noun phrase-internal 52 relation evidence records. (XLSX 146 kb)
ImmuneXpresso counts (XLSX 12 kb)
PubMed ids and statistics for relation evidence records (XLSX 2127 kb)
Disease term recognition statistics. (XLSX 364 kb)
Novel incoming cell-cytokine interaction candidates. (XLSX 64 kb)
Novel outgoing cell-cytokine interaction candidates. (XLSX 57 kb)
Detailed profiles of 188 top-cited diseases. (XLSX 1090 kb)
Cytokine sampling for 188 top-cited diseases (XLSX 78 kb)
Novel cytokine-disease association candidates (XLSX 26 kb)
Supplementary Notes 1–9 (PDF 967 kb)
About this article
Cite this article
Kveler, K., Starosvetsky, E., Ziv-Kenet, A. et al. Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed. Nat Biotechnol 36, 651–659 (2018) doi:10.1038/nbt.4152
Systems immunology: Integrating multi-omics data to infer regulatory networks and hidden drivers of immunity
Current Opinion in Systems Biology (2019)
Nature Reviews Genetics (2019)
Scientific Data (2018)
Nucleic Acids Research (2018)
The Journal of Korean Institute of Information Technology (2018)