Computational biology and bioinformatics articles within Nature Communications

Featured

  • Article
    | Open Access

    Genome assembly approaches are limited by factors including cost, power and incomplete resolution. Here, the authors present Aquila, a method that uses a reference sequence and linked read data to generate high quality diploid genome assemblies from which genetic variation can be detected and phased.

    • Xin Zhou
    • , Lu Zhang
    •  & Arend Sidow
  • Article
    | Open Access

    Social distancing policies aiming to reduce COVID-19 transmission have been reflected in reductions in human mobility. Here, the authors show that reduced mobility is correlated with decreased transmission, but that this relationship weakened over time as social distancing measures were relaxed.

    • Pierre Nouvellet
    • , Sangeeta Bhatia
    •  & Christl A. Donnelly
  • Article
    | Open Access

    Molecular heterogeneity of acute myeloid leukaemia (AML) across patients is a major challenge for prognosis and therapy. Here, the authors show that NPM1 mutated AML is a heterogeneous class, consisting of two subtypes which exhibit distinct molecular characteristics, differentiation state, patient survival and drug response.

    • Arvind Singh Mer
    • , Emily M. Heath
    •  & Benjamin Haibe-Kains
  • Article
    | Open Access

    Boundaries of topologically associated domains in genomes are marked by CTCF and cohesin binding. Here the authors predict CTCF interaction specificity by building a simple mathematical model with features including loop competition and extrusion.

    • Wang Xi
    •  & Michael A. Beer
  • Article
    | Open Access

    Pelvic radiographs (PXRs) are essential for detecting proximal femur and pelvis injuries in trauma patients, but none of the currently available algorithms can detect all kinds of trauma-related radiographic findings. Here, the authors develop a multiscale deep learning algorithm trained with weakly supervised point annotation.

    • Chi-Tung Cheng
    • , Yirui Wang
    •  & Le Lu
  • Article
    | Open Access

    Homologous recombination between co-infecting coronaviruses can produce novel pathogens. Here, Wardeh et al. develop a machine learning approach to predict associations between mammals and multiple coronaviruses and hence estimate the potential for generation of novel coronaviruses by recombination.

    • Maya Wardeh
    • , Matthew Baylis
    •  & Marcus S. C. Blagrove
  • Article
    | Open Access

    The role of children in the spread of COVID-19 is not fully understood, and the circumstances under which schools should be opened are therefore debated. Here, the authors demonstrate protocols by which schools in France can be safely opened without overwhelming the healthcare system.

    • Laura Di Domenico
    • , Giulia Pullano
    •  & Vittoria Colizza
  • Article
    | Open Access

    Here, the authors analyze 4907 Circular Metagenome Assembled Genomes from human microbiomes and identify and characterize nearly 600 diverse genomes of crAss-like phages, finding two putative families with unusual genomic features, including high density of self-splicing introns and inteins.

    • Natalya Yutin
    • , Sean Benler
    •  & Eugene V. Koonin
  • Article
    | Open Access

    Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. Here, the authors identify robust druggable protein targets within a principled causal framework that makes use of multiple data modalities and integrates aging signatures.

    • Anastasiya Belyaeva
    • , Louis Cammarata
    •  & Caroline Uhler
  • Article
    | Open Access

    Accurate analysis of single-cell RNA sequencing (scRNA-seq) data is affected by issues including technical noise and high dropout rate. Here, the authors develop a hierarchical autoencoder, scDHA, which outperforms existing methods in scRNA-seq analyses such as cell segregation and classification.

    • Duc Tran
    • , Hung Nguyen
    •  & Tin Nguyen
  • Article
    | Open Access

    Clinical trials of novel therapeutics for Alzheimer’s Disease (AD) have provided largely negative results, so far. Here, the authors present a machine learning framework that quantifies potential associations between the pathology of AD severity and gene-based molecular mechanisms to enable drug repurposing.

    • Steve Rodriguez
    • , Clemens Hug
    •  & Artem Sokolov
  • Article
    | Open Access

    Metabolites are indicators of health and disease; genetic studies can reveal variants influencing their levels. Here, the authors investigate the contribution of rare, exonic variants on the levels of urine metabolites and generate predictions on metabolic consequences underlying metabolic disease.

    • Yurong Cheng
    • , Pascal Schlosser
    •  & Anna Köttgen
  • Article
    | Open Access

    Accurately predicting the secondary structure of non-coding RNAs can help unravel their function. Here the authors propose a method integrating thermodynamic information and deep learning to improve the robustness of RNA secondary structure prediction compared to several existing algorithms.

    • Kengo Sato
    • , Manato Akiyama
    •  & Yasubumi Sakakibara
  • Article
    | Open Access

    Patch clamp recording of neurons is slow and labor-intensive. Here the authors present a method for automated deep learning driven label-free image guided patch clamp physiology to perform measurements on hundreds of human and rodent neurons.

    • Krisztian Koos
    • , Gáspár Oláh
    •  & Peter Horvath
  • Article
    | Open Access

    Replication forks that are stalled at obstacles on the DNA template can be restarted by homologous recombination. Here, the authors show replication dynamics during homologous recombination-dependent replication fork restart by combining polymerase usage sequencing and a Monte Carlo mathematical model.

    • Karel Naiman
    • , Eduard Campillo-Funollet
    •  & Antony M. Carr
  • Article
    | Open Access

    Human mobility plays a central role in the spread of infectious diseases and can help in forecasting incidence. Here the authors show a comparison of multiple mobility benchmarks in forecasting influenza, and demonstrate the value of a machine-learned mobility map with global coverage at multiple spatial scales.

    • Srinivasan Venkatramanan
    • , Adam Sadilek
    •  & Madhav Marathe
  • Article
    | Open Access

    Large BioBank studies are commonly used in GWAS, but may be biased by factors affecting participation and dropout. Here the authors show that some of the factors affecting participation may have underlying genetic components.

    • Jessica Tyrrell
    • , Jie Zheng
    •  & Kate Tilling
  • Article
    | Open Access

    Genomic prediction of phenotype may be improved by using DNA mutations with functional, evolutionary, and pleiotropic consequences. Here the authors describe a method for genome-wide fine-mapping of QTLs and develop a genotyping array for improved prediction of genetic values for cattle traits.

    • Ruidong Xiang
    • , Iona M. MacLeod
    •  & Michael E. Goddard
  • Article
    | Open Access

    Spread of SARS-CoV-2 in the early phase of the pandemic has been driven by high population susceptibility, but virus sensitivity to climate may play a role in future outbreaks. Here, the authors simulate SARS-CoV-2 dynamics in winter assuming climate dependence is similar to an endemic coronavirus strain.

    • Rachel E. Baker
    • , Wenchang Yang
    •  & Bryan T. Grenfell
  • Article
    | Open Access

    Understanding patient-specific pathobiological pathways is a critical step for advancing precision medicine. Here the authors show that individualized protein-protein interaction networks provide key insight on patient-level pathobiology and clinically relevant pathophenotypic characteristics in a complex disease.

    • Bradley A. Maron
    • , Rui-Sheng Wang
    •  & Joseph Loscalzo
  • Article
    | Open Access

    Secondary ion beam mass spectrometry (SIMS) is a method to obtain a chemical snapshot of biological tissue, but the spatial resolution is low. Here, the authors develop a computational and technology pipeline to localise a chemical signal in SIMS in 3D and sub-25 nm accuracy, called Ion Beam Tomography

    • Ahmet F. Coskun
    • , Guojun Han
    •  & Garry P. Nolan
  • Article
    | Open Access

    Connecting genotypes to complex social behaviour is challenging. Taylor et al. use machine learning to show a strong response of caste-associated gene expression to queen loss, wherein individual wasp’s expression profiles become intermediate between queen and worker states, even in the absence of behavioural changes.

    • Benjamin A. Taylor
    • , Alessandro Cini
    •  & Seirian Sumner
  • Article
    | Open Access

    Establishing the natural history of COVID-19 requires longitudinal data from population-based cohorts. Here, the authors use linked primary care, testing, and hospital data to describe the disease in ~100,000 individuals with a COVID-19 diagnosis among a population of ~5.5 million in Catalonia, Spain.

    • Edward Burn
    • , Cristian Tebé
    •  & Talita Duarte-Salles
  • Article
    | Open Access

    Statistical colocalisation is a method to identify causal genes and shared genetic aetiology across traits. Here, the authors describe HyPrColoc, an efficient Bayesian divisive clustering algorithm which integrates summary statistics from genome-wide association studies to detect clusters of colocalised traits from large numbers of traits.

    • Christopher N. Foley
    • , James R. Staley
    •  & Joanna M. M. Howson
  • Article
    | Open Access

    Single cell genomics uses cells from the same individual, or pseudoreplicates, that can introduce biases and inflate type I error rates. Here the authors apply generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among cells within an individual.

    • Kip D. Zimmerman
    • , Mark A. Espeland
    •  & Carl D. Langefeld
  • Article
    | Open Access

    Identifying structural variants (SVs) from whole genome sequence data has been a significant bioinformatic challenge. Here, the authors describe PopDel, which uses a joint SV detection approach to reliably and efficiently identify 500-10,000 bp deletions across large population cohorts.

    • Sebastian Niehus
    • , Hákon Jónsson
    •  & Birte Kehr
  • Article
    | Open Access

    Early prediction and diagnosis of sepsis, which is critical in reducing mortality, is challenging as many of its signs and symptoms are similar to other less critical conditions. Here, the authors develop an artificial intelligence algorithm which uses both structured data and unstructured clinical notes to predict sepsis.

    • Kim Huat Goh
    • , Le Wang
    •  & Gamaliel Yu Heng Tan
  • Article
    | Open Access

    Here, the authors present Methyl Assignments Using Satisfiability (MAUS), a method for the assignment of methyl groups using raw NOE data. They use eight proteins in the 10–45 kDa size range as test cases and show that MAUS yields 100% accurate assignments at high completeness levels.

    • Santrupti Nerli
    • , Viviane S. De Paula
    •  & Nikolaos G. Sgourakis
  • Article
    | Open Access

    Coronary artery calcium is an accurate predictor of cardiovascular events but this information is not routinely quantified. Here the authors show a robust and time-efficient deep learning system to automatically quantify coronary calcium on CT scans and predict cardiovascular events in a large, multicentre study.

    • Roman Zeleznik
    • , Borek Foldyna
    •  & Hugo J. W. L. Aerts
  • Article
    | Open Access

    The SARS-COV-2 pandemic has put pressure on intensive care units, so that predicting severe deterioration early is a priority. Here, the authors develop a multimodal severity score including clinical and imaging features that has significantly improved prognostic performance in two validation datasets compared to previous scores.

    • Nathalie Lassau
    • , Samy Ammari
    •  & Michael G. B. Blum
  • Article
    | Open Access

    Aberrant splicing is a major contributor to rare disease, but detection accuracy using current methods is limited. Here, the authors develop an algorithm that detects aberrant splicing and intron retention events from RNA-seq data and apply it to diagnosis in mitochondrial disease.

    • Christian Mertes
    • , Ines F. Scheller
    •  & Julien Gagneur
  • Article
    | Open Access

    Some cholesterol-lowering drugs can increase the risk of type 2 diabetes, but the mechanism behind this is not fully understood. Here the authors show that there is a single genetic regulatory module that influences both cholesterol levels and glucose levels, providing a link between cholesterol levels and diabetes.

    • Ariella T. Cohain
    • , William T. Barrington
    •  & Eric E. Schadt
  • Article
    | Open Access

    Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.

    • Christian Koelsche
    • , Daniel Schrimpf
    •  & Andreas von Deimling
  • Article
    | Open Access

    Accurate prediction of variant pathogenicity is essential to understanding genetic risks in disease. Here, the authors present a deep neural network method for prediction of missense variant pathogenicity, MVP, and demonstrate its utility in prioritizing de novo variants contributing to developmental disorders.

    • Hongjian Qi
    • , Haicang Zhang
    •  & Yufeng Shen
  • Article
    | Open Access

    While cell shape is crucial for function and development of organisms, versatile frameworks for cell shape quantification, comparison, and classification remain underdeveloped. Here, the authors use a network-based framework for Arabidopsis leaf epidermal cell shape characterization and classification.

    • Jacqueline Nowak
    • , Ryan Christopher Eng
    •  & Zoran Nikoloski
  • Article
    | Open Access

    Incidence of COVID-19 has been high in parts of South America including Brazil, and information on effective intervention strategies is needed. Here, the authors use mathematical modelling to show that reductions in social distancing should be made gradually to avoid a severe second peak of cases.

    • Osmar Pinto Neto
    • , Deanna M. Kennedy
    •  & Renato Amaro Zângaro
  • Article
    | Open Access

    Here, the authors introduce Cell Heterogeneity–Adjusted cLonal Methylation (CHALM) as a methylation quantification method that considers the heterogeneity of sequenced bulk cells. They apply CHALM to methylation datasets to detect differentially methylated genes that exhibit distinct biological functions supporting underlying mechanisms.

    • Jianfeng Xu
    • , Jiejun Shi
    •  & Wei Li
  • Article
    | Open Access

    Family 1 glycosidases (GH1) are present in the three domains of life and share classical TIM-barrel fold. Structural and biochemical analyses of a resurrected ancestral GH1 enzyme reveal heme binding, not known in its modern descendants. Heme rigidifies the TIM-barrel and allosterically enhances catalysis.

    • Gloria Gamiz-Arco
    • , Luis I. Gutierrez-Rus
    •  & Jose M. Sanchez-Ruiz
  • Article
    | Open Access

    Recent critical commentaries unfavorably compare deep learning (DL) with standard machine learning (SML) for brain imaging data analysis. Here, the authors show that if trained following prevalent DL practices, DL methods substantially improve compared to SML methods by encoding robust discriminative brain representations.

    • Anees Abrol
    • , Zening Fu
    •  & Vince Calhoun