Computational biology and bioinformatics

  • Article
    | Open Access

    The 1+ million publicly-available human –omics samples currently remain acutely underused. Here the authors present an approach combining natural language processing and machine learning to infer the source tissue of public genomics samples based on their plain text descriptions, making these samples easy to discover and reuse.

    • Nathaniel T. Hawkins
    • , Marc Maldaver
    •  & Arjun Krishnan
  • Article
    | Open Access

    Traditional bulk sequencing data lack information about cell-type-specific gene expression. Here, the authors develop a Tissue-AdaPtive autoEncoder (TAPE), a deep learning method connecting bulk RNA-seq and single-cell RNA-seq, and apply it to analyze the cell type fractions and cell-type-specific gene expression in clinical data.

    • Yanshuo Chen
    • , Yixuan Wang
    •  & Yu Li
  • Article
    | Open Access

    Asthma is a heterogeneous, complex syndrome that arises in individuals with various genetic and exposure variations. Here, the authors show that disease comorbidity patterns can serve as a surrogate for these variations, and identify asthma endotypes distinguished by comorbidity patterns, asthma risk loci, gene expression, and health-related phenotypes.

    • Gengjie Jia
    • , Xue Zhong
    •  & Julian Solway
  • Article
    | Open Access

    The current work reports the structure of the human organic cation transporter 3 (OCT3 / SLC22A3) and provides the structural basis of its inhibition by two specific inhibitors, decynium-22 and corticosterone.

    • Basavraj Khanppnavar
    • , Julian Maier
    •  & Harald H. Sitte
  • Article
    | Open Access

    Studies on parent-of-origin effects have been limited in terms of sample size due to lack of parental genomes or known genealogies. Here, the authors develop a method to infer the parent-of-origin of an individual alleles in biobank-scale datasets, without requiring parental genomes or prior knowledge of genealogy, allowing discovery of parent-of-origin effects with an unprecedented sample size.

    • Robin J. Hofmeister
    • , Simone Rubinacci
    •  & Olivier Delaneau
  • Article
    | Open Access

    Amino acids are important components in a variety of human foods and diets. Here, the authors show trade-offs linking dietary intake of amino acids to human health and develop amino acid intake guidelines based on them.

    • Ziwei Dai
    • , Weiyan Zheng
    •  & Jason W. Locasale
  • Article
    | Open Access

    Consensus sequence-based methods for self-correction of long-read sequencing data are affected by biases that can mask true variants characterizing little-covered or low-frequency haplotypes. Here, to address this issue, the authors develop a variation graph-based method for performing haplotype-aware self-correction of long reads.

    • Xiao Luo
    • , Xiongbin Kang
    •  & Alexander Schönhuth
  • Article
    | Open Access

    RNA velocity can detect the differentiation directionality by modelling sparse unspliced RNAs, but suffers from high estimation errors. Here, the authors develop a computational method called UniTVelo to reinforce the velocity estimation by introducing a unified time and a top-down model design.

    • Mingze Gao
    • , Chen Qiao
    •  & Yuanhua Huang
  • Article
    | Open Access

    Previous efforts to study the circadian clock using scRNA-seq have relied on time course designs that treat cell collection time as a proxy for circadian time. Here, the authors introduce a statistical method to infer circadian timing directly from expression, enabling researchers to study circadian phase heterogeneity.

    • Benjamin J. Auerbach
    • , Garret A. FitzGerald
    •  & Mingyao Li
  • Article
    | Open Access

    Safe clinical deployment of deep learning models for digital pathology requires reliable estimates of predictive uncertainty. Here the authors describe an algorithm for quantifying whole-slide image uncertainty, demonstrating their approach with models trained to distinguish lung cancer subtypes.

    • James M. Dolezal
    • , Andrew Srisuwananukorn
    •  & Alexander T. Pearson
  • Article
    | Open Access

    The success of CRISPR experiments relies on the choice of gRNA. Here the authors report crisprVerse, which enables efficient gRNA design and annotation for methods including CRISPRko, CRISPRa, CRISPRi, CRISPRbe and CRISPRkd, enabled for RNA- and DNA-targeting nucleases, including Cas9, Cas12 and Cas13.

    • Luke Hoberecht
    • , Pirunthan Perampalam
    •  & Jean-Philippe Fortin
  • Article
    | Open Access

    Biomarkers of age and frailty may aid in understanding the aging process, predicting lifespan or health span and in assessing the effects of anti-aging interventions. Here, the authors show that combining physics-based models and deep learning may enhance understanding of aging from big biomedical data, observe effects of anti-aging interventions in laboratory animals, and discover signatures of longevity.

    • Konstantin Avchaciov
    • , Marina P. Antoch
    •  & Peter O. Fedichev
  • Article
    | Open Access

    Current methods to reanalyze bulk RNA-seq at spatially resolved single-cell resolution have limitations. Here, the authors develop Bulk2Space, a spatial deconvolution algorithm using single-cell and spatial transcriptomics as references, providing new insights into spatial heterogeneity within bulk tissue.

    • Jie Liao
    • , Jingyang Qian
    •  & Xiaohui Fan
  • Article
    | Open Access

    Renal fibrosis is a progressive process with complex etiopathology, causing organ failure. Here authors present a mathematical model, based on an in vitro system faithfully contemplating macrophage-fibroblast interaction and the metabolic-immunologic signals that are affecting kidney fibrosis, that is applicable to kidney transplant failure.

    • Elisa Setten
    • , Alessandra Castagna
    •  & Massimo Locati
  • Article
    | Open Access

    Mendelian randomization uses genetic variation to study the causal effect of exposure on outcome, but results can be biased by confounders, such as horizontal pleiotropy. Here, the authors present MR-CUE, a method to determine causal effects by accounting for correlated and uncorrelated horizontal pleiotropic effects.

    • Qing Cheng
    • , Xiao Zhang
    •  & Jin Liu
  • Article
    | Open Access

    Here the authors characterize structural variations (SVs) in a cohort of individuals with complex genomic rearrangements, identifying breakpoints by employing short- and long-read genome sequencing and investigate their impact on gene expression and the three-dimensional chromatin architecture. They find breakpoints are enriched in inactive regions and can result in chromatin domain fusions.

    • Robert Schöpflin
    • , Uirá Souto Melo
    •  & Stefan Mundlos
  • Article
    | Open Access

    Cryogenic electron tomography suffers from anisotropic resolution due to the missing-wedge problem. Here, the authors present IsoNet, a neural network that learn the feature representation from similar structures in the tomogram and recover the missing information for isotropic tomogram reconstruction.

    • Yun-Tao Liu
    • , Heng Zhang
    •  & Z. Hong Zhou
  • Article
    | Open Access

    Statins are promising for breast cancer therapy; dipyridamole can potentiate their effects, but is contraindicated in some cases. Here, the authors develop a pharmacogenomics pipeline to predict other compounds that potentiate statins, and validate the top candidates in cell line screens and 3D cultures.

    • Jenna E. van Leeuwen
    • , Wail Ba-Alawi
    •  & Deena M. A. Gendoo
  • Article
    | Open Access

    Monitoring of co-infections of SARS-CoV-2 variants is important to evaluate their clinical impact and the risk of emergence of recombinants. Here, the authors develop and validate a methodological pipeline to detect co-infections and apply it to samples from France in early 2022, when Delta and Omicron were co-circulating.

    • Antonin Bal
    • , Bruno Simon
    •  & Laurence Josset
  • Article
    | Open Access

    In this study, the authors analyse contact tracing records for ~650,000 suspected or confirmed COVID-19 cases in New York City during the second epidemic wave. They reconstruct transmission networks and find that vaccination and zone-based control policies likely contributed to control of the epidemic.

    • Sen Pei
    • , Sasikiran Kandula
    •  & Jeffrey Shaman
  • Article
    | Open Access

    Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

    • Umberto Lupo
    • , Damiano Sgarbossa
    •  & Anne-Florence Bitbol
  • Article
    | Open Access

    Rapid antibiotic susceptibility testing (AST) is needed. Here the authors report a method for phenotypic AST at the single cell level, using a microfluidic chip that allows for subsequent genotyping with in situ FISH; they apply this to a mixed sample of 7 species and 4 antibiotics.

    • Vinodh Kandavalli
    • , Praneeth Karempudi
    •  & Johan Elf
  • Article
    | Open Access

    Fibroblast-like synoviocytes (FLS) are used as a model of rheumatoid arthritis synoviocytes, although cell lines derived from individual patients can have heterogeneous biology. Here the authors use a Taiji computational approach to analyze gene expression, chromatin accessibility and functional differences between individual patient-derived RA FLS lines.

    • Richard I. Ainsworth
    • , Deepa Hammaker
    •  & Wei Wang
  • Article
    | Open Access

    Proteomics can be used to refine cancer classification. Here, the authors characterise chronic lymphocytic leukaemia patients by proteogenomics, and identified a subtype of patients with poor prognosis associated with aberrant B cell receptor signalling.

    • Sophie A. Herbst
    • , Mattias Vesterlund
    •  & Sascha Dietrich
  • Article
    | Open Access

    Bioactive peptides regulate many physiological functions but progress in discovering them has been slow. Here, the authors use a machine learning framework to predict mammalian peptide candidates from the global and local structure of large-scale tissue-specific mass spectrometry data.

    • Christian T. Madsen
    • , Jan C. Refsgaard
    •  & Ulrik de Lichtenberg
  • Article
    | Open Access

    The analysis of protein NMR spectra is time-consuming and can occupy a human expert for weeks or months. The researchers in this work present a deep learning-based method that delivers signal positions, chemical shift assignments, and structures of proteins within hours after completion of the NMR measurements.

    • Piotr Klukowski
    • , Roland Riek
    •  & Peter Güntert
  • Article
    | Open Access

    Global alignment of complex cell state trajectories between single-cell datasets remains challenging. Here, the authors present a computational method called CAPITAL to compare branching trajectories, and demonstrate that this method achieves accurate and robust alignments.

    • Reiichi Sugihara
    • , Yuki Kato
    •  & Yukio Kawahara
  • Article
    | Open Access

    Transplanting encapsulated insulin-producing cells may achieve a functional cure for type 1 diabetes, but efficacy is constrained by mass transfer limits. Here, the authors report a dynamic computational platform to investigate the therapeutic potency of such programmable bioartificial pancreas devices.

    • Alexander U. Ernst
    • , Long-Hai Wang
    •  & Minglin Ma
  • Article
    | Open Access

    Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.

    • Jean-Sebastien Gounot
    • , Minghao Chia
    •  & Niranjan Nagarajan
  • Article
    | Open Access

    The accuracy of AlphaFold decreases with the number of protein chains and the available GPU memory limits the size of protein complexes that can be predicted. Here, the authors show that complexes with 10–30 chains can be assembled from predicted subcomponents using Monte Carlo tree search.

    • Patrick Bryant
    • , Gabriele Pozzati
    •  & Arne Elofsson
  • Article
    | Open Access

    Previous studies have characterized the diversity and dynamics of the T cell receptor (TCR) repertoire in patients with solid cancer. Here, by analyzing TCR repertoire data from multiple datasets, the authors report that melanoma-associated antigen-specific TCRs can be used to separate metastatic melanoma patients from healthy controls and to follow anti-tumor responses in patients treated with immunotherapy.

    • Jani Huuhtanen
    • , Liang Chen
    •  & Satu Mustjoki
  • Article
    | Open Access

    Multi-view graph approaches could enhance the analysis of tissue heterogeneity in spatial transcriptomics. Here, the authors develop the Spatial Transcriptomics data analysis by Multiple View Collaborative-learning - stMVC - framework, and apply it to detect spatial domains and cell states in brain and tumor tissues.

    • Chunman Zuo
    • , Yijian Zhang
    •  & Luonan Chen