Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
By exploiting recent advances in modern artificial intelligence and large-scale functional genomic datasets, sequence-to-function models learn the relationship between genomic DNA and its multilayer gene regulatory functions. These models are poised to uncover mechanistic relationships across layers of cellular biology, which will transform our understanding of cis gene regulation and open new avenues for discovering disease mechanisms.
Spatial omics technologies have transformed biomedical research by offering detailed, spatially resolved molecular profiles that elucidate tissue structure and function at unprecedented levels. AI can potentially unlock the full power of spatial omics, facilitating the integration of complex datasets and discovery of novel biomedical insights.
Methods for predicting bimolecular interactions are seeing tremendous growth, but challenges remain in capturing the full physical complexity of these interactions.
Artificial intelligence-enabled computational tools not only help us to elucidate biological processes but also facilitate the programming of biology through molecular and cellular engineering.
Mass spectrometry-based proteomics provides broad and quantitative detection of the proteome, but its results are mostly presented as protein lists. Artificial intelligence approaches will exploit prior knowledge from literature and harmonize fragmented datasets to enable mechanistic and functional interpretation of proteomics experiments.
Multimodal large language models have been recognized as a historical milestone in the field of artificial intelligence and have demonstrated revolutionary potentials not only in commercial applications, but also for many scientific fields. Here we give a brief overview of multimodal large language models through the lens of bioimage analysis and discuss how we could build these models as a community to facilitate biology research.
The success of deep learning in analyzing bioimages comes at the expense of biologically meaningful interpretations. We review the state of the art of explainable artificial intelligence (XAI) in bioimaging and discuss its potential in hypothesis generation and data-driven discovery.
New approaches in artificial intelligence (AI), such as foundation models and synthetic data, are having a substantial impact on many areas of applied computer science. Here we discuss the potential to apply these developments to the computational challenges associated with producing synapse-resolution maps of nervous systems, an area in which major ambitions are currently bottlenecked by AI performance.
Advancements in artificial intelligence (AI) have led to unprecedented success in modeling technically challenging domains including language, audio, image and video understanding. Here we discuss the opportunities represented by recent AI methods to advance immunology research.
Breakthroughs in AI and multimodal genomics are unlocking the ability to study the tumor microenvironment. We explore promising machine learning techniques to integrate and interpret high-dimensional data, examine cellular dynamics and unravel gene regulatory mechanisms, ultimately enhancing our understanding of tumor progression and resistance.
Risks from AI in basic biology research can be addressed with a dual mitigation strategy that comprises basic education in AI ethics and community governance measures that are tailored to the needs of individual research communities.
Inspired by the success of large-scale machine learning models in natural language, several groups are adapting these models for cellular data using massive single-cell datasets.
Rapid advancements in transcriptomics have enabled the quantification of individual transcripts for thousands of genes in millions of single cells. By coupling a machine learning inference framework with biophysical models describing the RNA life cycle, we can explore the dynamics driving RNA production, processing and degradation across cell types.
We developed PINNACLE, a graph-based AI model for learning protein representations across cell-type contexts. These contextualized protein representations enable the integration of 3D protein structure with single-cell genomic-based representations to enhance protein–protein interaction prediction, analysis of drug effects across cell-type contexts, and prediction of therapeutic targets in a cell type-specific manner.
This Perspective presents a comprehensive and in-depth overview of computational models based on the deep learning architecture of transformers for single-cell omics analysis.
This Perspective discusses the issue of data leakage in machine learning based models and presents seven questions designed to identify and avoid the problems resulting from data leakage.
This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on common pitfalls when using IML and how to avoid them.
Pretrained using over 33 million single-cell RNA-sequencing profiles, scGPT is a foundation model facilitating a broad spectrum of downstream single-cell analysis tasks by transfer learning.
scFoundation, with 100 million parameters covering about 20,000 genes, pretrained on over 50 million single-cell transcriptomics profiles, is a foundation model for diverse tasks of single-cell analysis.
SATURN performs cross-species integration and analysis using both single-cell gene expression and protein representations generated by protein language models.
Using a dependency-aware deep generative framework, spaVAE efficiently models spatially resolved transcriptomics data and advances diverse analysis tasks. Following similar strategies, spaPeakVAE and spaMultiVAE enable spatial ATAC-seq data and spatial multi-omics data modeling and analysis, respectively.
OpenFold is a trainable open-source implementation of AlphaFold2. It is fast and memory efficient, and the code and training data are available under a permissive license.
cryoDRGN-ET is a generative neural network method for heterogeneous reconstruction of cryo-ET subtomograms. Using subtomogram tilt-series images, it can capture states diverse in both composition and conformation.
PINNACLE is a context-specific geometric deep learning model for generating protein representations. Leveraging single-cell transcriptomics combined with networks of protein–protein interactions, cell type-to-cell type interactions and a tissue hierarchy, PINNACLE generates high-resolution protein representations tailored to each cell type.
A pretrained foundation model (UniFMIR) enables versatile and generalizable performance across diverse fluorescence microscopy image reconstruction tasks.