Data integration

Data integration is the process of combining data generated using a variety of different research methods in order to enable detection of underlying themes and, in computational biology and bioinformatics, biological principles. Data integration is important in biology owing to the large and different 'omics' datasets now available.

Latest Research and Reviews

News and Comment

  • News & Views |

    Alignment of single-cell proteomics data across platforms is difficult when different data sets contain limited shared features, as is typical in single-cell assays with antibody readouts. Therefore, we developed matching with partial overlap (MARIO) to enable confident and accurate matching for multimodal data integration and cross-species analysis.

  • News & Views |

    BIONIC (Biological Network Integration using Convolutions) is a scalable deep learning network integration approach that learns and combines diverse data representations across a range of biological network types to consolidate knowledge of gene function. BIONIC outperforms existing integration approaches by capturing biological information more comprehensively and with greater accuracy than previously possible.

    Nature Methods 19, 1185-1186
  • Comments & Opinion
    | Open Access

    The text-guided diffusion model GLIDE (Guided Language to Image Diffusion for Generation and Editing) is the state of the art in text-to-image generative artificial intelligence (AI). GLIDE has rich representations, but medical applications of this model have not been systematically explored. If GLIDE had useful medical knowledge, it could be used for medical image analysis tasks, a domain in which AI systems are still highly engineered towards a single use-case. Here we show that the publicly available GLIDE model has reasonably strong representations of key topics in cancer research and oncology, in particular the general style of histopathology images and multiple facets of diseases, pathological processes and laboratory assays. However, GLIDE seems to lack useful representations of the style and content of radiology data. Our findings demonstrate that domain-agnostic generative AI models can learn relevant medical concepts without explicit training. Thus, GLIDE and similar models might be useful for medical image processing tasks in the future - particularly with additional domain-specific fine-tuning.

    • Jakob Nikolas Kather
    • , Narmin Ghaffari Laleh
    •  & Daniel Truhn