Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed

Article metrics

Abstract

Cytokines are signaling molecules secreted and sensed by immune and other cell types, enabling dynamic intercellular communication. Although a vast amount of data on these interactions exists, this information is not compiled, integrated or easily searchable. Here we report immuneXpresso, a text-mining engine that structures and standardizes knowledge of immune intercellular communication. We applied immuneXpresso to PubMed to identify relationships between 340 cell types and 140 cytokines across thousands of diseases. The method is able to distinguish between incoming and outgoing interactions, and it includes the effect of the interaction and the cellular function involved. These factors are assigned a confidence score and linked to the disease. By leveraging the breadth of this network, we predicted and experimentally verified previously unappreciated cell–cytokine interactions. We also built a global immune-centric view of diseases and used it to predict cytokine–disease associations. This standardized knowledgebase (http://www.immunexpresso.org) opens up new directions for interpretation of immune data and model-driven systems immunology.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: immuneXpresso assembles a system-level directional intercellular interaction network.
Figure 2: System-level characteristic of intercellular information flow in the literature-derived network.
Figure 3: Immune intercellular network knowledge is biased and far from saturated.
Figure 4: Prediction and validation of cell–cytokine interactions.
Figure 5: Global interaction profile analysis across diseases.
Figure 6: iX informs less well-studied diseases.

Change history

  • 16 August 2018

    In the HTML version of this paper initially published, it was labeled as an Article; this has been changed to Resource.

References

  1. 1

    Maecker, H.T. et al. New tools for classification and monitoring of autoimmune diseases. Nat. Rev. Rheumatol. 8, 317–328 (2012).

  2. 2

    Hoffmann, R. & Valencia, A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 21 (Suppl. 2), ii252–ii258 (2005).

  3. 3

    Jimeno, A. et al. Assessment of disease-named-entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9 (Suppl. 3), S3 (2008).

  4. 4

    Leaman, R., Islamaj Dogan, R. & Lu, Z. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29, 2909–2917 (2013).

  5. 5

    McDonald, R.T. et al. An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics 20, 3249–3251 (2004).

  6. 6

    Tanenblatt, M., Coden, A. & Sominsky, I. The ConceptMapper approach to named-entity recognition. LREC Int. Conf. Lang. Resour. Eval. 546–551 (2010).

  7. 7

    Funk, C. et al. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 15, 59 (2014).

  8. 8

    Shah, N.H. et al. Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics 10 (Suppl. 9), S14 (2009).

  9. 9

    Bada, M. et al. Concept annotation in the CRAFT corpus. BMC Bioinformatics 13, 161 (2012).

  10. 10

    Kim, J.D., Ohta, T., Tateisi, Y. & Tsujii, J. GENIA corpus—semantically annotated corpus for bio-textmining. Bioinformatics 19 (Suppl. 1), i180–i182 (2003).

  11. 11

    Arighi, C.N. et al. Overview of the BioCreative III Workshop. BMC Bioinformatics 12 (Suppl. 8), S1 (2011).

  12. 12

    Kim, J., Ohta, T., Pyysalo, S. & Kano, Y. Overview of BioNLP 2009 shared task on event extraction. (Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task) 1–9 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2009).

  13. 13

    Kim, J.-D. et al. Overview of BioNLP Shared Task 2011. (Proceedings of the BioNLP Shared Task 2011 Workshop) 1–6 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2011).

  14. 14

    Kim, J.D., Ohta, T. & Tsujii, J. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008).

  15. 15

    Ananiadou, S., Pyysalo, S., Tsujii, J. & Kell, D.B. Event extraction for systems biology by text-mining the literature. Trends Biotechnol. 28, 381–390 (2010).

  16. 16

    Pyysalo, S. et al. Event extraction across multiple levels of biological organization. Bioinformatics 28, i575–i581 (2012).

  17. 17

    Mahmood, A.S.M.A., Wu, T.J., Mazumder, R. & Vijay-Shanker, K. DiMeX: a text-mining system for mutation–disease association extraction. PLoS One 11, e0152725 (2016).

  18. 18

    Lee, K. et al. BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene–variant–disease–drug relations. Database (Oxford) 2016, 1–13 (2016).

  19. 19

    Verspoor, K.M., Heo, G.E., Kang, K.Y. & Song, M. Establishing a baseline for literature-mining human genetic variants and their relationships to disease cohorts. BMC Med. Inform. Decis. Mak. 16 (Suppl. 1), 68 (2016).

  20. 20

    Liu, H., Hunter, L., Kešelj, V. & Verspoor, K. Approximate subgraph matching–based literature mining for biomedical events and relations. PLoS One 8, e60954 (2013).

  21. 21

    Björne, J. et al. Extracting complex biological events with rich graph-based feature sets. (Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task 10–18 (Association for Computational Linguistics, Stroudsburg, Pennsylvania, USA, 2009).

  22. 22

    Rzhetsky, A., Seringhaus, M. & Gerstein, M.B. Getting started in text mining: part two. PLoS Comput. Biol. 5, e1000411 (2009).

  23. 23

    Zhu, F. et al. Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46, 200–211 (2013).

  24. 24

    Jensen, L.J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7, 119–129 (2006).

  25. 25

    Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).

  26. 26

    Linghu, B., Snitkin, E.S., Hu, Z., Xia, Y. & Delisi, C. Genome-wide prioritization of disease genes and identification of disease–disease associations from an integrated human functional linkage network. Genome Biol. 10, R91 (2009).

  27. 27

    Hu, G. & Agarwal, P. Human disease–drug network based on genomic expression profiles. PLoS One 4, e6536 (2009).

  28. 28

    Kilpinen, S. et al. Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues. Genome Biol. 9, R139 (2008).

  29. 29

    Dembic, Z. The Cytokines of the Immune System: The Role of Cytokines in Disease Related to Immune Response (Elsevier Science, 2015).

  30. 30

    Bhattacharya, S. et al. ImmPort: disseminating data to the public for the future of immunology. Immunol. Res. 58, 234–239 (2014).

  31. 31

    Edwards, A.M. et al. Too many roads not taken. Nature 470, 163–165 (2011).

  32. 32

    Barabasi, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).

  33. 33

    Barabási, A.-L. Scale-free networks: a decade and beyond. Science 325, 412–413 (2009).

  34. 34

    Rieckmann, J.C. et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat. Immunol. 18, 583–593 (2017).

  35. 35

    Heng, T.S.P. & Painter, M.W. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).

  36. 36

    Möller, P., Böhm, M., Czarnetszki, B.M. & Schadendorf, D. Interleukin-7. Biology and implications for dermatology. Exp. Dermatol. 5, 129–137 (1996).

  37. 37

    Lin, H. et al. Discovery of a cytokine and its receptor by functional screening of the extracellular proteome. Science 320, 807–811 (2008).

  38. 38

    Asghar, A. & Sheikh, N. Role of immune cells in obesity-induced low-grade inflammation and insulin resistance. Cell. Immunol. 315, 18–26 (2017).

  39. 39

    Gaede, P. et al. Multifactorial intervention and cardiovascular disease in patients with type 2 diabetes. N. Engl. J. Med. 348, 383–393 (2003).

  40. 40

    Park, H.K., Kwak, M.K., Kim, H.J. & Ahima, R.S. Linking resistin, inflammation and cardiometabolic diseases. Korean J. Intern. Med. 32, 239–247 (2017).

  41. 41

    Hillenbrand, A., Weiss, M., Knippschild, U., Wolf, A.M. & Huber-Lang, M. Sepsis-induced adipokine change with regard to insulin resistance. Int. J. Inflam. 2012, 972368 (2012).

  42. 42

    Shen-Orr, S.S. et al. Defective signaling in the JAK–STAT pathway tracks with chronic inflammation and cardiovascular risk in aging humans. Cell Syst. 3, 374–384.e4 (2016).

  43. 43

    Furman, D. et al. Expression of specific inflammasome gene modules stratifies older individuals into two extreme clinical and immunological states. Nat. Med. 23, 174–184 (2017).

  44. 44

    Russell, C.B. et al. Gene expression profiles normalized in psoriatic skin by treatment with brodalumab, a human anti–IL-17 receptor monoclonal antibody. J. Immunol. 192, 3828–3836 (2014).

  45. 45

    Yao, Y. et al. Type I interferon: potential therapeutic target for psoriasis? PLoS One 3, e2737 (2008).

  46. 46

    Hughes, A.L. Vertebrate Immune System: Evolution (John Wiley and Sons, Ltd., 2001) http://dx.doi.org/10.1002/9780470015902.a0006125.pub2.

  47. 47

    Du Pasquier, L. The immune system of invertebrates and vertebrates. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 129, 1–15 (2001).

  48. 48

    De Marneffe, M.-C., MacCartney, B. & Manning, C.D. Generating typed dependency parses from phrase structure parses. LREC Int. Conf. Lang. Resour. Eval. 6, 449–454 (2006).

  49. 49

    Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).

  50. 50

    Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types. Genome Biol. 6, R21 (2005).

  51. 51

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1994).

  52. 52

    Gillespie, C.S. Fitting heavy-tailed distributions: the 'poweRlaw' package. J. Stat. Softw. 64, 1–16 (2015).

  53. 53

    Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

  54. 54

    Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

  55. 55

    Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

  56. 56

    Langfelder, P. & Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 46, 1–17 (2012).

Download references

Acknowledgements

We thank A. Butte and M. Davis for fruitful discussions and advice, N. Geifman for assistance with cytokine ontology development, D. Dougall for contribution to the cell lexicon, members of the Shen-Orr lab for reference book curation, D. Cohen for the high-performance computing cluster support, R. Reichart for Text Mining insights, and P. Dunn and S. Bhattacharya for the user interface development support. This work was supported by the US National Institutes of Health (NIH)-National Institute of Allergy and Infectious Diseases (U19 AI057229, and BISC contract HHSN272201200028C) and an award from the Rappaport Family Institute for Research in the Medical Sciences (S.S.S.-O.).

Author information

K.K. designed, performed and interpreted the analyses, led the design and development of the software, implemented relation extraction, filtering and network assembly, and wrote the manuscript; E.S. interpreted the analyses, performed quality control, designed and performed the experimental validations, and wrote the manuscript; A.Z.-K. implemented indexing, ontology support and reference book annotation software, and broadly contributed to the entire computational pipeline development; Y.K. conceived the pipeline architecture and broadly contributed to the software formation; Y.G. assisted with cell entity recognition; G.S.-M. performed quality control on Text Mining output and assisted with cytokine ontology development; T.D. assisted with quality control on Text Mining output; M.B. contributed to quality control and prediction evaluation; N.A.-R. wrote the software and implemented the website back end; J.C. and J.W. designed and developed the user interface; J.C.R. and F.M. provided and interpreted the proteomic data; N.A. assisted with machine-learning for quality control; D.R. contributed to quality control and interpretation of disease profiles; and S.S.S.-O. conceived the idea, oversaw, designed and interpreted the analyses, and wrote the manuscript.

Correspondence to Shai S Shen-Orr.

Ethics declarations

Competing interests

K.K. and Y.K. are employees and co-founders of CytoReason. S.S.S.-O. and E.S. are co-founders of, and serve as scientific advisors and/or consultants to, CytoReason.

Integrated supplementary information

Supplementary Figure 1 System level characteristics of inter-cellular information flow in the literature-derived network.

(a) Sankey plots showing bi-partite information flow between CD4+ T-cell subsets and a variety of cytokines. (b) A sorted histogram illustrating the number of unique cell-types secreting each of the 114 cytokines (outgoing interactions). A second y-axis displays the information as a cumulative sum (blue line). 50% of outgoing edges are attributed to only 17 (15%) cytokines (grey area). Cytokine family classification appears as coloring of individual members along x-axis. (c) Scatter plot highlighting the strong correlation in cytokine degrees between incoming and outgoing directions of the manually curated reference book interaction network (n=104 cytokines, r=0.69 pval<0.001, Pearson's).

Supplementary Figure 2 Power-law distribution.

Cytokine degree distribution analysis of (a) incoming and (b) outgoing interactions (n=144 and n=114 cytokines; bootstrap goodness-of-fit test, non-power law pval=0.73 and pval=0.47 respectively).

Supplementary Figure 3 Cytokine family leaders.

Heatmaps showing connectivity knowledge accumulation of cytokines and chemokines as a function of time and family membership. Families tend to have one dominant family member, suggesting a "rich get richer" phenomena.

Supplementary Figure 4 Weak relation between cytokine connectivity degrees and historical knowledge accumulation.

The correlation of the cytokine degree and the date of first publication was repeatedly computed across cytokines, threshold by minimal degree. A scatter plot showing this correlation when scaled back from the entire network to only few highly dominant hubs (cytokine degree cutoff is shown along x-axis). The entire network (n=145 cytokines) demonstrates low correlation, r=-0.27 and -0.26, Pearson's for (a) incoming and (b) outgoing interactions respectively, driven by a few highly dominant hubs.

Supplementary Figure 5 L34 receptor expression on terminally differentiated CD8+ T cells.

Copy number profile based on the proteomics data from ImmProt (n=4 or n=3 independent donors). A protein was considered expressed in a cell type if more than 50% of replicates showed detectable non-zero values. Box-plot elements: center line, median; box limits, first to third quartile (Q1 to Q3); whiskers, from Q1–1.5 × IQR to Q3+1.5 × IQR; data points.

Supplementary Figure 6 Pan-disease control profile sampling.

Sorted boxplots showing results of repeated paper sampling (200 iterations, each randomly selecting 200 abstracts), and calculating the proportion of hits for each cell (left) and cytokine (right). The entire corpus of 521,625 disease-HPC and 438,012 disease-cytokine co-occurrence papers is used, without limiting to any condition, to define pan-disease control immune profile. Cell subset and cytokine family classification appears as coloring of individual members across y-axis. Top 50% of the results are shown, with highest cited entities emphasized (grey area, median>=0.05). Box-plot elements: center line, median; box limits, first to third quartile (Q1 to Q3); whiskers, from Q1–1.5 × IQR to Q3+1.5 × IQR; points, outliers.

Supplementary Figure 7 Literature-based evaluation of novel cytokine-disease association predictions.

Evaluation of the predicted cytokine-disease associations within the entire, non-sampled iX knowledgebase. The amounts of total predicted associations (red), those verified as already reported in the literature (green) and the other, potentially novel candidates (blue) are shown separately for associations with different prediction strength (x-axis, scaled to 1).

Supplementary Figure 8 Typed dependency analysis for cell recognition.

Illustration of cell phrase expansion from the "cell" seed match by (a) inclusion of the seed and the dependents it governs or (b) by inclusion of the seed word's governor and all the other dependents of that governor.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8 (PDF 1399 kb)

Life Sciences Reporting Summary (PDF 161 kb)

Supplementary Table 1

Cell seed recognition statistics. (XLSX 19 kb)

Supplementary Table 2

Blacklist of Cell Ontology nodes. (XLSX 10 kb)

Supplementary Table 3

Cytokine recognition statistics. (XLSX 13 kb)

Supplementary Table 4

Cytokine lexicon fragment for CXC chemokine family. (XLSX 65 kb)

Supplementary Table 5

Verb classification lexicon (XLSX 25 kb)

Supplementary Table 6

Cell entity fields and manual precision evaluation results. (XLSX 40 kb)

Supplementary Table 7

Cytokine entity fields and manual precision evaluation results. 2 (XLSX 39 kb)

Supplementary Table 8

Disease entity fields and manual precision evaluation results. (XLSX 39 kb)

Supplementary Table 9

Manual precision evaluation for noun phrase-internal relation 43 evidence records. (XLSX 38 kb)

Supplementary Table 10

Manual precision evaluation for non-noun phrase-internal 52 relation evidence records. (XLSX 146 kb)

Supplementary Table 11

ImmuneXpresso counts (XLSX 12 kb)

Supplementary Table 12

PubMed ids and statistics for relation evidence records (XLSX 2127 kb)

Supplementary Table 13

Disease term recognition statistics. (XLSX 364 kb)

Supplementary Table 14

Novel incoming cell-cytokine interaction candidates. (XLSX 64 kb)

Supplementary Table 15

Novel outgoing cell-cytokine interaction candidates. (XLSX 57 kb)

Supplementary Table 16

Detailed profiles of 188 top-cited diseases. (XLSX 1090 kb)

Supplementary Table 17

Cytokine sampling for 188 top-cited diseases (XLSX 78 kb)

Supplementary Table 18

Novel cytokine-disease association candidates (XLSX 26 kb)

Supplementary Notes

Supplementary Notes 1–9 (PDF 967 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kveler, K., Starosvetsky, E., Ziv-Kenet, A. et al. Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed. Nat Biotechnol 36, 651–659 (2018) doi:10.1038/nbt.4152

Download citation

Further reading