Data processing

  • Article
    | Open Access

    Our ability to interpret single-cell multivariate signaling responses is still limited. Here the authors introduce fractional response analysis (FRA), involving fractional cell counting, capable of deconvoluting heterogeneous multivariate responses of cellular populations.

    • Karol Nienałtowski
    • , Rachel E. Rigby
    •  & Michał Komorowski
  • Article
    | Open Access

    A deeper knowledge of the immune cell profile within the brain cancer tumor microenvironment (TM) could identify targets to improve immunotherapy efficacy. Here, in glioblastoma, the authors find haematopoietic stem and progenitor cells in the TM, which are associated with poor prognosis and increased immunosuppression.

    • I-Na Lu
    • , Celia Dobersalske
    •  & Igor Cima
  • Article
    | Open Access

    Here, the authors use simulated quantitative gut microbial communities to benchmark the performance of 13 common data transformations in determining diversity as well as microbe-microbe and microbe-metadata associations, finding that quantitative approaches incorporating microbial load variation outperform computational strategies in downstream analyses, urging for a widespread adoption of quantitative approaches, or recommending specific computational transformations whenever determination of microbial load of samples is not feasible.

    • Verónica Lloréns-Rico
    • , Sara Vieira-Silva
    •  & Jeroen Raes
  • Article
    | Open Access

    Biomedical measurements usually generate high-dimensional data where individual samples are classified in several categories. Vogelstein et al. propose a supervised dimensionality reduction method which estimates the low-dimensional data projection for classification and prediction in big datasets.

    • Joshua T. Vogelstein
    • , Eric W. Bridgeford
    •  & Mauro Maggioni
  • Article
    | Open Access

    Classification methods for scRNA-seq data are limited in their ability to learn from multiple datasets simultaneously. Here the authors present scHPL, a hierarchical progressive learning method that automatically finds relationships between cell populations across multiple datasets and constructs a classification tree.

    • Lieke Michielsen
    • , Marcel J. T. Reinders
    •  & Ahmed Mahfouz
  • Article
    | Open Access

    Identifying enriched gene sets in transcriptomic data is routine analysis. Here, the authors show that conventional gene category enrichment analysis (GCEA) applied to brain-wide atlas data yields biased results and develop a flexible ensemble-based null model framework to enable appropriate inference in GCEA.

    • Ben D. Fulcher
    • , Aurina Arnatkeviciute
    •  & Alex Fornito
  • Article
    | Open Access

    High-content screening prompted the development of software enabling discrete phenotypic analysis of single cells. Here, the authors show that supervised continuous machine learning can drive novel discoveries in diverse imaging experiments and present the Regression Plane module of Advanced Cell Classifier.

    • Abel Szkalisity
    • , Filippo Piccinini
    •  & Peter Horvath
  • Article
    | Open Access

    Patch clamp recording of neurons is slow and labor-intensive. Here the authors present a method for automated deep learning driven label-free image guided patch clamp physiology to perform measurements on hundreds of human and rodent neurons.

    • Krisztian Koos
    • , Gáspár Oláh
    •  & Peter Horvath
  • Article
    | Open Access

    Large BioBank studies are commonly used in GWAS, but may be biased by factors affecting participation and dropout. Here the authors show that some of the factors affecting participation may have underlying genetic components.

    • Jessica Tyrrell
    • , Jie Zheng
    •  & Kate Tilling
  • Article
    | Open Access

    Chromatin loops bridging distant loci within chromosomes can be detected by a variety of techniques such as Hi-C. Here the authors present Chromosight, an algorithm applied on mammalian, bacterial, viral and yeast genomes, able to detect various types of pattern in chromosome contact maps, including chromosomal loops.

    • Cyril Matthey-Doret
    • , Lyam Baudry
    •  & Axel Cournac
  • Article
    | Open Access

    Convolutional Neural Networks are powerful tools for clinical diagnosis but their effectiveness decreases when the number of available samples is small. Here, the authors develop a cumulative learning method by training the same model through several classification tasks over various small Mass Spectrometry datasets.

    • Khawla Seddiki
    • , Philippe Saudemont
    •  & Arnaud Droit
  • Article
    | Open Access

    In the context of diseases impairing movement, quantitative assessment of motion is critical to medical decision-making but is currently possible only with expensive motion capture systems and trained personnel. Here, the authors present a method for predicting clinically relevant motion parameters from an ordinary video of a patient.

    • Łukasz Kidziński
    • , Bryan Yang
    •  & Michael H. Schwartz
  • Article
    | Open Access

    Clinical proteomics critically depends on the ability to acquire highly reproducible data over an extended period of time. Here, the authors assess reproducibility over four months across different mass spectrometers and develop a computational approach to mitigate variation among instruments over time.

    • Rebecca C. Poulos
    • , Peter G. Hains
    •  & Qing Zhong
  • Perspective
    | Open Access

    Scarcity of high-quality annotated data and mismatch between the development dataset and the target environment are two of the main challenges in developing predictive tools from medical imaging. In this Perspective, the authors show how causal reasoning can shed new light on these challenges.

    • Daniel C. Castro
    • , Ian Walker
    •  & Ben Glocker
  • Article
    | Open Access

    Deep learning is becoming a popular approach for understanding biological processes but can be hard to adapt to new questions. Here, the authors develop Janggu, a python library that aims to ease data acquisition and model evaluation and facilitate deep learning applications in genomics.

    • Wolfgang Kopp
    • , Remo Monti
    •  & Altuna Akalin
  • Article
    | Open Access

    Matching mass spectra to peptide sequences is the usual first step in proteomics data analysis, often followed by peptide quantification. Here, the authors show that clustering and quantifying mass spectral features prior to peptide identification can increase the sensitivity of label-free quantitative proteomics.

    • Matthew The
    •  & Lukas Käll
  • Article
    | Open Access

    The complexity of structural variation (SV) and short tandem repeats (STRs) makes it necessary to apply different calling and filtering strategies to sequencing datasets. Here, Jakubosky et al. report a comprehensive SV and STR callset from whole-genome sequencing of 477 individuals from iPSCORE and HipSci using five algorithms.

    • David Jakubosky
    • , Erin N. Smith
    •  & Kelly A. Frazer
  • Article
    | Open Access

    Inflammatory bowel disease (IBD) has been linked to host-microbiota interactions. Here, the authors investigate mucosa-associated microbiota using endoscopically-targeted biopsies from inflamed and non-inflamed colon in patients with Crohn’s disease and ulcerative colitis, finding associations with inflammation and host epigenomic alterations.

    • F. J. Ryan
    • , A. M. Ahern
    •  & M. J. Claesson
  • Article
    | Open Access

    Antimicrobial resistance (AMR) represents a global health threat. Here, the authors analyse the oral and gut resistomes from metagenomes of diverse populations and find that the oral resistome harbours higher abundance but lower diversity of antimicrobial resistance genes than the gut resistome.

    • Victoria R. Carr
    • , Elizabeth A. Witherden
    •  & David L. Moyes
  • Article
    | Open Access

    Integrating independent large-scale pharmacogenomic screens can enable unprecedented characterization of genetic vulnerabilities in cancers. Here, the authors show that the two largest independent CRISPR-Cas9 gene-dependency screens are concordant, paving the way for joint analysis of the data sets.

    • Joshua M. Dempster
    • , Clare Pacini
    •  & Francesco Iorio
  • Article
    | Open Access

    Mechanistic insight into the regulation of transcriptional modules remains scarce. Here, the authors identify statistically independent gene sets by applying independent component analysis to a high-quality E. coli RNA-seq data compendium and find that most gene sets represent the effects of specific transcriptional regulators.

    • Anand V. Sastry
    • , Ye Gao
    •  & Bernhard O. Palsson
  • Article
    | Open Access

    N1-methyladenosine (m1A) was recently reported as a new mRNA modification but its prevalence has been controversial. Here the authors showed that m1A, if present in mRNA, is at very low stoichiometry, with the notable exception of MT-ND5. Further, they show that the previously reported enrichment of m1A near the start of transcripts are false-positive identifications due to cross-reactivity of the commonly used m1A antibody with mRNA caps.

    • Anya V. Grozhik
    • , Anthony O. Olarerin-George
    •  & Samie R. Jaffrey
  • Article
    | Open Access

    There has been a rapid rise in single cell RNA-seq methods and associated pipelines. Here the authors use simulated data to systematically evaluate the performance of 3000 possible pipelines to derive recommendations for data processing and analysis of different types of scRNA-seq experiments.

    • Beate Vieth
    • , Swati Parekh
    •  & Ines Hellmann
  • Article
    | Open Access

    Sequencing platforms, such as Oxford Nanopore or Pacific Biosciences generate long-read data that preserve long-range genomic information but have high error rates. Here, the authors develop MetaMaps, a computational tool for strain-level metagenomic assignment and compositional estimation using long reads.

    • Alexander T. Dilthey
    • , Chirag Jain
    •  & Adam M. Phillippy
  • Article
    | Open Access

    Complete gene expression deconvolution remains a challenging problem. Here, the authors provide a solution based on the recognition that expression levels of cell type specific genes are mutually linear across mixtures and mutually linear gene clusters correspond to cell type-specific signatures.

    • Konstantin Zaitsev
    • , Monika Bambouskova
    •  & Maxim N. Artyomov
  • Article
    | Open Access

    The increasing accessibility of single cell omics technologies beyond transcriptomics demands parallel advances in analysis. Here, the authors introduce STREAM, a pipeline for reconstruction and visualization of differentiation trajectories from both single-cell RNA-seq and ATAC-seq data.

    • Huidong Chen
    • , Luca Albergante
    •  & Luca Pinello
  • Article
    | Open Access

    With the increasing obtainability of multi-OMICs data comes the need for easy to use data analysis tools. Here, the authors introduce Metascape, a biologist-oriented portal that provides a gene list annotation, enrichment and interactome resource and enables integrated analysis of multi-OMICs datasets.

    • Yingyao Zhou
    • , Bin Zhou
    •  & Sumit K. Chanda
  • Article
    | Open Access

    Inferring direct protein−protein interactions (PPIs) and modules in PPI networks remains a challenge. Here, the authors introduce an algorithm to infer potential direct PPIs from quantitative proteomic AP-MS data by identifying enriched interactions of each bait relative to the other baits.

    • Mihaela E. Sardiu
    • , Joshua M. Gilmore
    •  & Michael P. Washburn
  • Article
    | Open Access

    Bacterial outer membrane vesicles (OMVs) are increasingly used as carriers for drug delivery. Here the authors encapsulate biopolymer melanin into OMVs, extending their use to optoacoustic imaging both in vitro and in vivo, and demonstrate the potential of this tool for photothermal therapy applications.

    • Vipul Gujrati
    • , Jaya Prakash
    •  & Vasilis Ntziachristos
  • Article
    | Open Access

    Analyzing the organization of molecular complexes in multi-color single-molecule localization microscopy data requires heavy computation resources that are impractical for laboratory computers. Here the authors develop a coordinate-based Triple-Correlation algorithm with improved speed and reduced computational cost.

    • Yandong Yin
    • , Wei Ting Chelsea Lee
    •  & Eli Rothenberg
  • Article
    | Open Access

    Biomedical image analysis challenges have increased in the last ten years, but common practices have not been established yet. Here the authors analyze 150 recent challenges and demonstrate that outcome varies based on the metrics used and that limited information reporting hampers reproducibility.

    • Lena Maier-Hein
    • , Matthias Eisenmann
    •  & Annette Kopp-Schneider
  • Article
    | Open Access

    Integrated analyses of multiple large-scale screenings can be complicated by batch effects and technical artefacts. McFarland et al. introduce DEMETER2, a hierarchical model coupled with model-based normalization, which allows the assessment of differential dependencies across genes and cell lines.

    • James M. McFarland
    • , Zandra V. Ho
    •  & Aviad Tsherniak
  • Article
    | Open Access

    Sharing of whole genome sequencing (WGS) data improves study scale and power, but data from different groups are often incompatible. Here, US genome centers and NIH programs define WGS data processing standards and a flexible validation method, facilitating collaboration in human genetics research.

    • Allison A. Regier
    • , Yossi Farjoun
    •  & Ira M. Hall
  • Article
    | Open Access

    Functional magnetic resonance imaging (fMRI) is a powerful technique for measuring human brain activity, but the statistical analysis of fMRI data can be difficult. Here, the authors introduce a new fMRI analysis tool, LISA, which provides increased statistical power compared to existing techniques.

    • Gabriele Lohmann
    • , Johannes Stelzer
    •  & Klaus Scheffler
  • Article
    | Open Access

    Inference and representation of differentiation trajectories from single cell RNA-seq data remains a challenge. Here, the authors offer a visualization approach that captures both continuous differentiation trajectories and discrete clusters representing metastable states along the trajectories.

    • Fabrizio Costa
    • , Dominic Grün
    •  & Rolf Backofen
  • Article
    | Open Access

    DNA barcode swapping results in mislabelling of sequencing reads between multiplexed samples. Here, the authors investigate the severity and consequences of barcode swapping for single-cell RNA-seq data, and develop a computational method to exclude swapped reads.

    • Jonathan A. Griffiths
    • , Arianne C. Richard
    •  & John C. Marioni
  • Article
    | Open Access

    Publicly available RNA-seq data is provided mostly in raw form, resulting in a barrier for integrative analyses. Here, Lachmann et al. develop a high-throughput processing infrastructure and search database (ARCHS4) that provides processed RNA-seq data for 187,946 publicly available mouse and human samples to support exploration and reuse.

    • Alexander Lachmann
    • , Denis Torre
    •  & Avi Ma’ayan