Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed

This article has been updated


Cytokines are signaling molecules secreted and sensed by immune and other cell types, enabling dynamic intercellular communication. Although a vast amount of data on these interactions exists, this information is not compiled, integrated or easily searchable. Here we report immuneXpresso, a text-mining engine that structures and standardizes knowledge of immune intercellular communication. We applied immuneXpresso to PubMed to identify relationships between 340 cell types and 140 cytokines across thousands of diseases. The method is able to distinguish between incoming and outgoing interactions, and it includes the effect of the interaction and the cellular function involved. These factors are assigned a confidence score and linked to the disease. By leveraging the breadth of this network, we predicted and experimentally verified previously unappreciated cell–cytokine interactions. We also built a global immune-centric view of diseases and used it to predict cytokine–disease associations. This standardized knowledgebase ( opens up new directions for interpretation of immune data and model-driven systems immunology.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: immuneXpresso assembles a system-level directional intercellular interaction network.
Figure 2: System-level characteristic of intercellular information flow in the literature-derived network.
Figure 3: Immune intercellular network knowledge is biased and far from saturated.
Figure 4: Prediction and validation of cell–cytokine interactions.
Figure 5: Global interaction profile analysis across diseases.
Figure 6: iX informs less well-studied diseases.

Change history

  • 16 August 2018

    In the HTML version of this paper initially published, it was labeled as an Article; this has been changed to Resource.


  1. Maecker, H.T. et al. New tools for classification and monitoring of autoimmune diseases. Nat. Rev. Rheumatol. 8, 317–328 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Hoffmann, R. & Valencia, A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21 (Suppl. 2), ii252–ii258 (2005).

    CAS  PubMed  Google Scholar 

  3. Jimeno, A. et al. Assessment of disease-named-entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9 (Suppl. 3), S3 (2008).

    PubMed  PubMed Central  Google Scholar 

  4. Leaman, R., Islamaj Dogan, R. & Lu, Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29, 2909–2917 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. McDonald, R.T. et al. An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics 20, 3249–3251 (2004).

    CAS  PubMed  Google Scholar 

  6. Tanenblatt, M., Coden, A. & Sominsky, I. The ConceptMapper approach to named-entity recognition. LREC Int. Conf. Lang. Resour. Eval. 546–551 (2010).

  7. Funk, C. et al. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 15, 59 (2014).

    PubMed  PubMed Central  Google Scholar 

  8. Shah, N.H. et al. Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics 10 (Suppl. 9), S14 (2009).

    PubMed  PubMed Central  Google Scholar 

  9. Bada, M. et al. Concept annotation in the CRAFT corpus. BMC Bioinformatics 13, 161 (2012).

    PubMed  PubMed Central  Google Scholar 

  10. Kim, J.D., Ohta, T., Tateisi, Y. & Tsujii, J. GENIA corpus—semantically annotated corpus for bio-textmining. Bioinformatics 19 (Suppl. 1), i180–i182 (2003).

    PubMed  Google Scholar 

  11. Arighi, C.N. et al. Overview of the BioCreative III Workshop. BMC Bioinformatics 12 (Suppl. 8), S1 (2011).

    PubMed  PubMed Central  Google Scholar 

  12. Kim, J., Ohta, T., Pyysalo, S. & Kano, Y. Overview of BioNLP 2009 shared task on event extraction. (Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task) 1–9 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2009).

  13. Kim, J.-D. et al. Overview of BioNLP Shared Task 2011. (Proceedings of the BioNLP Shared Task 2011 Workshop) 1–6 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2011).

  14. Kim, J.D., Ohta, T. & Tsujii, J. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008).

    PubMed  PubMed Central  Google Scholar 

  15. Ananiadou, S., Pyysalo, S., Tsujii, J. & Kell, D.B. Event extraction for systems biology by text-mining the literature. Trends Biotechnol. 28, 381–390 (2010).

    CAS  PubMed  Google Scholar 

  16. Pyysalo, S. et al. Event extraction across multiple levels of biological organization. Bioinformatics 28, i575–i581 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Mahmood, A.S.M.A., Wu, T.J., Mazumder, R. & Vijay-Shanker, K. DiMeX: a text-mining system for mutation–disease association extraction. PLoS One 11, e0152725 (2016).

    PubMed  PubMed Central  Google Scholar 

  18. Lee, K. et al. BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene–variant–disease–drug relations. Database (Oxford) 2016, 1–13 (2016).

    Google Scholar 

  19. Verspoor, K.M., Heo, G.E., Kang, K.Y. & Song, M. Establishing a baseline for literature-mining human genetic variants and their relationships to disease cohorts. BMC Med. Inform. Decis. Mak. 16 (Suppl. 1), 68 (2016).

    PubMed  PubMed Central  Google Scholar 

  20. Liu, H., Hunter, L., Kešelj, V. & Verspoor, K. Approximate subgraph matching–based literature mining for biomedical events and relations. PLoS One 8, e60954 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Björne, J. et al. Extracting complex biological events with rich graph-based feature sets. (Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task 10–18 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2009).

  22. Rzhetsky, A., Seringhaus, M. & Gerstein, M.B. Getting started in text mining: part two. PLoS Comput. Biol. 5, e1000411 (2009).

    PubMed  PubMed Central  Google Scholar 

  23. Zhu, F. et al. Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46, 200–211 (2013).

    PubMed  Google Scholar 

  24. Jensen, L.J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7, 119–129 (2006).

    CAS  PubMed  Google Scholar 

  25. Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Linghu, B., Snitkin, E.S., Hu, Z., Xia, Y. & Delisi, C. Genome-wide prioritization of disease genes and identification of disease–disease associations from an integrated human functional linkage network. Genome Biol. 10, R91 (2009).

    PubMed  PubMed Central  Google Scholar 

  27. Hu, G. & Agarwal, P. Human disease–drug network based on genomic expression profiles. PLoS One 4, e6536 (2009).

    PubMed  PubMed Central  Google Scholar 

  28. Kilpinen, S. et al. Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues. Genome Biol. 9, R139 (2008).

    PubMed  PubMed Central  Google Scholar 

  29. Dembic, Z. The Cytokines of the Immune System: The Role of Cytokines in Disease Related to Immune Response (Elsevier Science, 2015).

  30. Bhattacharya, S. et al. ImmPort: disseminating data to the public for the future of immunology. Immunol. Res. 58, 234–239 (2014).

    CAS  PubMed  Google Scholar 

  31. Edwards, A.M. et al. Too many roads not taken. Nature 470, 163–165 (2011).

    CAS  PubMed  Google Scholar 

  32. Barabasi, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).

    CAS  PubMed  Google Scholar 

  33. Barabási, A.-L. Scale-free networks: a decade and beyond. Science 325, 412–413 (2009).

    PubMed  Google Scholar 

  34. Rieckmann, J.C. et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat. Immunol. 18, 583–593 (2017).

    CAS  PubMed  Google Scholar 

  35. Heng, T.S.P. & Painter, M.W. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).

    CAS  PubMed  Google Scholar 

  36. Möller, P., Böhm, M., Czarnetszki, B.M. & Schadendorf, D. Interleukin-7. Biology and implications for dermatology. Exp. Dermatol. 5, 129–137 (1996).

    PubMed  Google Scholar 

  37. Lin, H. et al. Discovery of a cytokine and its receptor by functional screening of the extracellular proteome. Science 320, 807–811 (2008).

    CAS  PubMed  Google Scholar 

  38. Asghar, A. & Sheikh, N. Role of immune cells in obesity-induced low-grade inflammation and insulin resistance. Cell. Immunol. 315, 18–26 (2017).

    CAS  PubMed  Google Scholar 

  39. Gaede, P. et al. Multifactorial intervention and cardiovascular disease in patients with type 2 diabetes. N. Engl. J. Med. 348, 383–393 (2003).

    PubMed  Google Scholar 

  40. Park, H.K., Kwak, M.K., Kim, H.J. & Ahima, R.S. Linking resistin, inflammation and cardiometabolic diseases. Korean J. Intern. Med. 32, 239–247 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Hillenbrand, A., Weiss, M., Knippschild, U., Wolf, A.M. & Huber-Lang, M. Sepsis-induced adipokine change with regard to insulin resistance. Int. J. Inflam. 2012, 972368 (2012).

    PubMed  PubMed Central  Google Scholar 

  42. Shen-Orr, S.S. et al. Defective signaling in the JAK–STAT pathway tracks with chronic inflammation and cardiovascular risk in aging humans. Cell Syst. 3, 374–384.e4 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Furman, D. et al. Expression of specific inflammasome gene modules stratifies older individuals into two extreme clinical and immunological states. Nat. Med. 23, 174–184 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Russell, C.B. et al. Gene expression profiles normalized in psoriatic skin by treatment with brodalumab, a human anti–IL-17 receptor monoclonal antibody. J. Immunol. 192, 3828–3836 (2014).

    CAS  PubMed  Google Scholar 

  45. Yao, Y. et al. Type I interferon: potential therapeutic target for psoriasis? PLoS One 3, e2737 (2008).

    PubMed  PubMed Central  Google Scholar 

  46. Hughes, A.L. Vertebrate Immune System: Evolution (John Wiley and Sons, Ltd., 2001)

  47. Du Pasquier, L. The immune system of invertebrates and vertebrates. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 129, 1–15 (2001).

    CAS  PubMed  Google Scholar 

  48. De Marneffe, M.-C., MacCartney, B. & Manning, C.D. Generating typed dependency parses from phrase structure parses. LREC Int. Conf. Lang. Resour. Eval. 6, 449–454 (2006).

    Google Scholar 

  49. Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types. Genome Biol. 6, R21 (2005).

    PubMed  PubMed Central  Google Scholar 

  51. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1994).

    Google Scholar 

  52. Gillespie, C.S. Fitting heavy-tailed distributions: the 'poweRlaw' package. J. Stat. Softw. 64, 1–16 (2015).

    Google Scholar 

  53. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    CAS  PubMed  Google Scholar 

  55. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    PubMed  PubMed Central  Google Scholar 

  56. Langfelder, P. & Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 46, 1–17 (2012).

    Google Scholar 

Download references


We thank A. Butte and M. Davis for fruitful discussions and advice, N. Geifman for assistance with cytokine ontology development, D. Dougall for contribution to the cell lexicon, members of the Shen-Orr lab for reference book curation, D. Cohen for the high-performance computing cluster support, R. Reichart for Text Mining insights, and P. Dunn and S. Bhattacharya for the user interface development support. This work was supported by the US National Institutes of Health (NIH)-National Institute of Allergy and Infectious Diseases (U19 AI057229, and BISC contract HHSN272201200028C) and an award from the Rappaport Family Institute for Research in the Medical Sciences (S.S.S.-O.).

Author information

Authors and Affiliations



K.K. designed, performed and interpreted the analyses, led the design and development of the software, implemented relation extraction, filtering and network assembly, and wrote the manuscript; E.S. interpreted the analyses, performed quality control, designed and performed the experimental validations, and wrote the manuscript; A.Z.-K. implemented indexing, ontology support and reference book annotation software, and broadly contributed to the entire computational pipeline development; Y.K. conceived the pipeline architecture and broadly contributed to the software formation; Y.G. assisted with cell entity recognition; G.S.-M. performed quality control on Text Mining output and assisted with cytokine ontology development; T.D. assisted with quality control on Text Mining output; M.B. contributed to quality control and prediction evaluation; N.A.-R. wrote the software and implemented the website back end; J.C. and J.W. designed and developed the user interface; J.C.R. and F.M. provided and interpreted the proteomic data; N.A. assisted with machine-learning for quality control; D.R. contributed to quality control and interpretation of disease profiles; and S.S.S.-O. conceived the idea, oversaw, designed and interpreted the analyses, and wrote the manuscript.

Corresponding author

Correspondence to Shai S Shen-Orr.

Ethics declarations

Competing interests

K.K. and Y.K. are employees and co-founders of CytoReason. S.S.S.-O. and E.S. are co-founders of, and serve as scientific advisors and/or consultants to, CytoReason.

Integrated supplementary information

Supplementary Figure 1 System level characteristics of inter-cellular information flow in the literature-derived network.

(a) Sankey plots showing bi-partite information flow between CD4+ T-cell subsets and a variety of cytokines. (b) A sorted histogram illustrating the number of unique cell-types secreting each of the 114 cytokines (outgoing interactions). A second y-axis displays the information as a cumulative sum (blue line). 50% of outgoing edges are attributed to only 17 (15%) cytokines (grey area). Cytokine family classification appears as coloring of individual members along x-axis. (c) Scatter plot highlighting the strong correlation in cytokine degrees between incoming and outgoing directions of the manually curated reference book interaction network (n=104 cytokines, r=0.69 pval<0.001, Pearson's).

Supplementary Figure 2 Power-law distribution.

Cytokine degree distribution analysis of (a) incoming and (b) outgoing interactions (n=144 and n=114 cytokines; bootstrap goodness-of-fit test, non-power law pval=0.73 and pval=0.47 respectively).

Supplementary Figure 3 Cytokine family leaders.

Heatmaps showing connectivity knowledge accumulation of cytokines and chemokines as a function of time and family membership. Families tend to have one dominant family member, suggesting a "rich get richer" phenomena.

Supplementary Figure 4 Weak relation between cytokine connectivity degrees and historical knowledge accumulation.

The correlation of the cytokine degree and the date of first publication was repeatedly computed across cytokines, threshold by minimal degree. A scatter plot showing this correlation when scaled back from the entire network to only few highly dominant hubs (cytokine degree cutoff is shown along x-axis). The entire network (n=145 cytokines) demonstrates low correlation, r=-0.27 and -0.26, Pearson's for (a) incoming and (b) outgoing interactions respectively, driven by a few highly dominant hubs.

Supplementary Figure 5 L34 receptor expression on terminally differentiated CD8+ T cells.

Copy number profile based on the proteomics data from ImmProt (n=4 or n=3 independent donors). A protein was considered expressed in a cell type if more than 50% of replicates showed detectable non-zero values. Box-plot elements: center line, median; box limits, first to third quartile (Q1 to Q3); whiskers, from Q1–1.5 × IQR to Q3+1.5 × IQR; data points.

Supplementary Figure 6 Pan-disease control profile sampling.

Sorted boxplots showing results of repeated paper sampling (200 iterations, each randomly selecting 200 abstracts), and calculating the proportion of hits for each cell (left) and cytokine (right). The entire corpus of 521,625 disease-HPC and 438,012 disease-cytokine co-occurrence papers is used, without limiting to any condition, to define pan-disease control immune profile. Cell subset and cytokine family classification appears as coloring of individual members across y-axis. Top 50% of the results are shown, with highest cited entities emphasized (grey area, median>=0.05). Box-plot elements: center line, median; box limits, first to third quartile (Q1 to Q3); whiskers, from Q1–1.5 × IQR to Q3+1.5 × IQR; points, outliers.

Supplementary Figure 7 Literature-based evaluation of novel cytokine-disease association predictions.

Evaluation of the predicted cytokine-disease associations within the entire, non-sampled iX knowledgebase. The amounts of total predicted associations (red), those verified as already reported in the literature (green) and the other, potentially novel candidates (blue) are shown separately for associations with different prediction strength (x-axis, scaled to 1).

Supplementary Figure 8 Typed dependency analysis for cell recognition.

Illustration of cell phrase expansion from the "cell" seed match by (a) inclusion of the seed and the dependents it governs or (b) by inclusion of the seed word's governor and all the other dependents of that governor.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8 (PDF 1399 kb)

Life Sciences Reporting Summary (PDF 161 kb)

Supplementary Table 1

Cell seed recognition statistics. (XLSX 19 kb)

Supplementary Table 2

Blacklist of Cell Ontology nodes. (XLSX 10 kb)

Supplementary Table 3

Cytokine recognition statistics. (XLSX 13 kb)

Supplementary Table 4

Cytokine lexicon fragment for CXC chemokine family. (XLSX 65 kb)

Supplementary Table 5

Verb classification lexicon (XLSX 25 kb)

Supplementary Table 6

Cell entity fields and manual precision evaluation results. (XLSX 40 kb)

Supplementary Table 7

Cytokine entity fields and manual precision evaluation results. 2 (XLSX 39 kb)

Supplementary Table 8

Disease entity fields and manual precision evaluation results. (XLSX 39 kb)

Supplementary Table 9

Manual precision evaluation for noun phrase-internal relation 43 evidence records. (XLSX 38 kb)

Supplementary Table 10

Manual precision evaluation for non-noun phrase-internal 52 relation evidence records. (XLSX 146 kb)

Supplementary Table 11

ImmuneXpresso counts (XLSX 12 kb)

Supplementary Table 12

PubMed ids and statistics for relation evidence records (XLSX 2127 kb)

Supplementary Table 13

Disease term recognition statistics. (XLSX 364 kb)

Supplementary Table 14

Novel incoming cell-cytokine interaction candidates. (XLSX 64 kb)

Supplementary Table 15

Novel outgoing cell-cytokine interaction candidates. (XLSX 57 kb)

Supplementary Table 16

Detailed profiles of 188 top-cited diseases. (XLSX 1090 kb)

Supplementary Table 17

Cytokine sampling for 188 top-cited diseases (XLSX 78 kb)

Supplementary Table 18

Novel cytokine-disease association candidates (XLSX 26 kb)

Supplementary Notes

Supplementary Notes 1–9 (PDF 967 kb)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kveler, K., Starosvetsky, E., Ziv-Kenet, A. et al. Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed. Nat Biotechnol 36, 651–659 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing