Computational biology and bioinformatics

  • Article
    | Open Access

    Deep learning algorithms trained on data streamed temporally from different clinical sites and from a multitude of physiological sensors are generally affected by a degradation in performance. To mitigate this, the authors propose a continual learning strategy that employs a replay buffer.

    • Dani Kiyasseh
    • , Tingting Zhu
    •  & David Clifton
  • Article
    | Open Access

    Mutations in 5’ untranslated regions (UTRs) have a functional role in gene expression in cancer. Here, the authors develop a sequencing-based high throughput functional assay named PLUMAGE and show the effects of these mutations on gene expression and their association with clinical outcomes in prostate cancer.

    • Yiting Lim
    • , Sonali Arora
    •  & Andrew C. Hsieh
  • Article
    | Open Access

    Our ability to interpret single-cell multivariate signaling responses is still limited. Here the authors introduce fractional response analysis (FRA), involving fractional cell counting, capable of deconvoluting heterogeneous multivariate responses of cellular populations.

    • Karol Nienałtowski
    • , Rachel E. Rigby
    •  & Michał Komorowski
  • Article
    | Open Access

    Existing genetic prediction tools typically assume that genetic variants contribute equally towards the phenotype. The authors develop eight prediction tools that allow the user to specify the heritability model, and show that these tools enable substantially improved prediction of complex traits.

    • Qianqian Zhang
    • , Florian Privé
    •  & Doug Speed
  • Article
    | Open Access

    It remains unclear how spatial information controls endothelial cell identity and behavior in the developing heart. Here the authors perform single cell RNA sequencing at key developmental timepoints in mice to interrogate cellular contributions to coronary vessel patterning and maturation in the epicardium.

    • Pearl Quijada
    • , Michael A. Trembley
    •  & Eric M. Small
  • Article
    | Open Access

    Few studies have provided functional analysis of the epigenetic landscape in the regenerating liver. Here the authors define chromatin states in the quiescent vs. regenerating mouse liver through integration of genome wide profiles of DNA methylation, histone modifications, and chromatin accessibility, identifying H3K27me3 as an epigenetic mark conferring regenerative potential.

    • Chi Zhang
    • , Filippo Macchi
    •  & Kirsten C. Sadler
  • Article
    | Open Access

    Precision medicine needs prognostic markers to select the patients that will benefit more from targeted therapy. Authors show here that high level of baseline T cell receptor diversity is an indicator of favourable prognosis in multiple cancer types, and monoclonal expansion of T-cells correlates with good response to immune checkpoint blockade therapy in metastatic melanoma patients.

    • Sara Valpione
    • , Piyushkumar A. Mundra
    •  & Richard Marais
  • Article
    | Open Access

    RNA modifications appear to play a role in determining RNA structure and function. Here, the authors develop a deep learning model that predicts the location of 12 RNA modifications using primary sequence, and show that several modifications are associated, which suggests dependencies between them.

    • Zitao Song
    • , Daiyun Huang
    •  & Jia Meng
  • Article
    | Open Access

    The superior colliculus (SC) receives diverse cortical inputs to drive many behaviors. Here, based on comprehensive mapping of cortico-tectal projections, the authors refined the superior colliculus into medial, centromedial, centrolateral, and lateral zones, and characterized the input-output connectivity and morphology of neurons in each zone that serve the role of SC in goal-directed behaviors.

    • Nora L. Benavidez
    • , Michael S. Bienkowski
    •  & Hong-Wei Dong
  • Article
    | Open Access

    A more comprehensive map of viral host ranges can help identify and mitigate zoonotic and animal-disease risks. A divide-and-conquer approach which separates viral, mammalian and network features predicts over 20,000 unknown associations between known viruses and susceptible mammalian species.

    • Maya Wardeh
    • , Marcus S. C. Blagrove
    •  & Matthew Baylis
  • Article
    | Open Access

    To benchmark single cell bioinformatics tools, data simulators can provide a robust ground truth. Here the authors present dyngen, a multi-modal simulator, and apply it to aligning cell developmental trajectories, cell-specific regulatory network inference and estimation of RNA velocity.

    • Robrecht Cannoodt
    • , Wouter Saelens
    •  & Yvan Saeys
  • Article
    | Open Access

    Small molecules bioactivity descriptors are enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Here the authors present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them.

    • Martino Bertoni
    • , Miquel Duran-Frigola
    •  & Patrick Aloy
  • Article
    | Open Access

    The class Frizzled of G protein-coupled receptors (GPCRs) consist of ten Frizzled (FZD1-10) subtypes and Smoothened (SMO). Here the Schulte laboratory demonstrates that FZDs differ substantially from SMO in receptor activation-associated conformational changes, while SMO manifests a preference for a straight TM6, the TM6 of FZDs is kinked upon activation.

    • Ainoleena Turku
    • , Hannes Schihada
    •  & Gunnar Schulte
  • Review Article
    | Open Access

    Natural products are an important source of bioactive compounds and have versatile applications in different fields, but their discovery is challenging. Here, the authors review the recent developments in genome mining for discovery of natural products, focusing on compounds from unconventional microorganisms and microbiomes.

    • Kirstin Scherlach
    •  & Christian Hertweck
  • Article
    | Open Access

    A deeper knowledge of the immune cell profile within the brain cancer tumor microenvironment (TM) could identify targets to improve immunotherapy efficacy. Here, in glioblastoma, the authors find haematopoietic stem and progenitor cells in the TM, which are associated with poor prognosis and increased immunosuppression.

    • I-Na Lu
    • , Celia Dobersalske
    •  & Igor Cima
  • Article
    | Open Access

    Bacterial microcompartments (BMCs) are organelles consisting of a protein shell in which certain metabolic reactions take place separated from the cytoplasm. Here, Sutter et al. present a comprehensive catalog of BMC loci, substantially expanding the number of known BMCs and describing distinct types and compartmentalized reactions.

    • Markus Sutter
    • , Matthew R. Melnicki
    •  & Cheryl A. Kerfeld
  • Article
    | Open Access

    Many proteins exist in various proteoforms but detecting these variants by bottom-up proteomics remains difficult. Here, the authors present a computational approach based on peptide correlation analysis to identify and characterize proteoforms from bottom-up proteomics data.

    • Isabell Bludau
    • , Max Frank
    •  & Ruedi Aebersold
  • Article
    | Open Access

    Keratitis is the main cause of corneal blindness worldwide, but most vision loss caused by keratitis can be avoidable via early detection and treatment, which are challenging in resource-limited settings. Here, the authors develop a deep learning system for the automated classification of keratitis and other cornea abnormalities.

    • Zhongwen Li
    • , Jiewei Jiang
    •  & Wei Chen
  • Article
    | Open Access

    α-Synuclein (αS) aggregation is a driver of several neurodegenerative disorders. Here, the authors identify a class of peptides that bind toxic αS oligomers and amyloid fibrils but not monomeric functional protein, and prevent further αS aggregation and associated cell damage.

    • Jaime Santos
    • , Pablo Gracia
    •  & Salvador Ventura
  • Article
    | Open Access

    The authors generate the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. They use this dataset to train a decision-tree ensemble machine learning algorithm that allows them to distinguish between catalytic and non-catalytic metal sites. The computational model described here could also be useful for the identification of new enzymatic mechanisms and de novo enzyme design.

    • Ryan Feehan
    • , Meghan W. Franklin
    •  & Joanna S. G. Slusky
  • Article
    | Open Access

    Single cell RNA-seq loses spatial information of gene expression in multicellular systems because tissue must be dissociated. Here, the authors show the spatial gene expression profiles can be both accurately and robustly reconstructed by a new computational method using a generative linear mapping, Perler.

    • Yasushi Okochi
    • , Shunta Sakaguchi
    •  & Honda Naoki
  • Article
    | Open Access

    A large number of mass spectra from different samples have been collected, and to identify small molecules from these spectra, database searches are needed, which is challenging. Here, the authors report molDiscovery, a mass spectral database search method that uses an algorithm to generate mass spectrometry fragmentations and learns a probabilistic model to match small molecules with their mass spectra.

    • Liu Cao
    • , Mustafa Guler
    •  & Hosein Mohimani
  • Article
    | Open Access

    Despite the consensus that mass vaccination against SARS-CoV-2 will ultimately end the pandemic, it is not clear when and which control measures can be relaxed during the rollout of vaccination programmes. Here, the authors investigate relaxation scenarios using an age-structured transmission model that has been fitted to data for Portugal.

    • João Viana
    • , Christiaan H. van Dorp
    •  & Ganna Rozhnova
  • Article
    | Open Access

    People can infer unobserved causes of perceptual data (e.g. the contents of a box from the sound made by shaking it). Here the authors show that children compare what they hear with what they would have heard given other causes, and explore longer when the heard and imagined sounds are hard to discriminate.

    • Max H. Siegel
    • , Rachel W. Magid
    •  & Laura E. Schulz
  • Article
    | Open Access

    Directed evolution commonly relies on point mutations but InDels frequently occur in evolution. Here the authors report a protein-engineering framework based on InDel mutagenesis and fragment transplantation resulting in greater catalysis and longer glow-type bioluminescence of the ancestral luciferase.

    • Andrea Schenkmayerova
    • , Gaspar P. Pinto
    •  & Jiri Damborsky
  • Article
    | Open Access

    Here, the authors use simulated quantitative gut microbial communities to benchmark the performance of 13 common data transformations in determining diversity as well as microbe-microbe and microbe-metadata associations, finding that quantitative approaches incorporating microbial load variation outperform computational strategies in downstream analyses, urging for a widespread adoption of quantitative approaches, or recommending specific computational transformations whenever determination of microbial load of samples is not feasible.

    • Verónica Lloréns-Rico
    • , Sara Vieira-Silva
    •  & Jeroen Raes
  • Article
    | Open Access

    Cross-linking mass spectrometry (MS) can identify protein-protein interaction (PPI) networks but assessing the reliability of these data remains challenging. To address this issue, the authors develop and validate a method to determine the false-discovery rate of PPIs identified by cross-linking MS.

    • Swantje Lenz
    • , Ludwig R. Sinn
    •  & Juri Rappsilber
  • Article
    | Open Access

    Disentangling the impacts of non-pharmaceutical interventions on COVID-19 transmission is challenging as they have been used in different combinations across time and space. This study shows that, early in the epidemic, school/daycare closures and stopping nursing home visits were associated with the biggest reduction in transmission in the United States.

    • Bingyi Yang
    • , Angkana T. Huang
    •  & Derek A. T. Cummings
  • Article
    | Open Access

    Population-based surveys are the gold standard for estimating seroprevalence but are expensive and often only capture a small geographic area or window of time. This study describes a new platform, SCALE-IT, for serosurveillance based on algorithmic sampling of electronic health records, and uses it to estimate the seroprevalence of SARS-CoV-2 in San Francisco.

    • Isobel Routledge
    • , Adrienne Epstein
    •  & Isabel Rodriguez-Barraquer
  • Article
    | Open Access

    Systemic light chain amyloidosis (AL) is caused by the production of toxic light chains and can be fatal, yet effective treatments are often not possible due to delayed diagnosis. Here the authors show that a machine learning platform analyzing light chain somatic mutations allows the prediction of light chain toxicity to serve as a possible tool for early diagnosis of AL.

    • Maura Garofalo
    • , Luca Piccoli
    •  & Andrea Cavalli
  • Article
    | Open Access

    Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Here, the authors propose an algebraic graph-assisted bidirectional transformer, which can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy and assisted with 3D stereochemical information from graphs.

    • Dong Chen
    • , Kaifu Gao
    •  & Feng Pan