Statistical methods

  • Article
    | Open Access

    Bulk tissue RNA-seq data reveals transcriptomic profiles but masks the contributions of different cell types. Here, the authors develop a new method for estimating cell type proportions from bulk tissue RNA-seq data guided by multi-subject single-cell expression reference.

    • Xuran Wang
    • , Jihwan Park
    •  & Mingyao Li
  • Article
    | Open Access

    Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.

    • Li Liu
    • , Maxwell D. Sanderford
    •  & Sudhir Kumar
  • Article
    | Open Access

    Tumour heterogeneity hinders translation of large-scale genomic data into the clinic. Here the authors develop a method for the stratification of cancer patients based on the molecular gene status, including genetic interactions, rather than clinico-histological data, and apply it to TCGA data for over 8000 cases across 22 cancer types.

    • Jack Kuipers
    • , Thomas Thurnherr
    •  & Niko Beerenwinkel
  • Article
    | Open Access

    Little is known about the contribution of germline genetic variants to cancer drug sensitivity. Here, the authors devise an approach for joint analysis of germline variants and somatic mutations, identifying substantial germline contributions to variation in drug sensitivity.

    • Michael P. Menden
    • , Francesco Paolo Casale
    •  & Oliver Stegle
  • Article
    | Open Access

    Genome-wide association studies (GWAS) of neuroimaging data pose a significant computational burden because of the need to correct for multiple testing in both the genetic and the imaging data. Here, Ganjgahi et al. develop WLS-REML which significantly reduces computation running times in brain imaging GWAS.

    • Habib Ganjgahi
    • , Anderson M. Winkler
    •  & Thomas E. Nichols
  • Article
    | Open Access

    This study analyzes allelic expression bias in post-mortem brains of healthy individuals and those diagnosed with schizophrenia or bipolar disorder. The study shows that the number of imprinted genes is consistent with low estimates, and that allelic bias is independent of psychiatric disease status.

    • Attila Gulyás-Kovács
    • , Ifat Keydar
    •  & Andrew Chess
  • Article
    | Open Access

    There is increasing urgency to understand the spatiotemporal dynamics of dengue in non-endemic regions. Here, the authors reconstruct likely dengue transmission chains in the city of Porto Alegre based on geo-located cases only, and find that most transmission events occur over short-distances.

    • Giorgio Guzzetta
    • , Cecilia A. Marques-Toledo
    •  & Stefano Merler
  • Article
    | Open Access

    Time series single cell expression data has large variance between time points and is challenging for analysis. Here, the authors develop a new dimension reduction and data visualization tool for large scale temporal scRNA-seq data which identifies trajectories and subpopulations.

    • Wuming Gong
    • , Il-Youp Kwak
    •  & Daniel J. Garry
  • Article
    | Open Access

    Synthetic lethality (SL) offers a new precision oncology approach, which is based on targeting cancer-specific vulnerabilities across the whole genome, going beyond cancer drivers. The authors develop an approach termed ISLE to identify clinically relevant SL interactions and use them for patient stratification and novel target identification.

    • Joo Sang Lee
    • , Avinash Das
    •  & Eytan Ruppin
  • Article
    | Open Access

    RNA–protein interactions often depend on the recognition of extended RNA elements but the identification of these motifs is challenging. Here, the authors present a global integrated approach to analyze RNA–protein binding landscapes, mapping extended RNA interaction motifs for four RNA-binding proteins.

    • Qin Zhou
    • , Nikesh Kunder
    •  & Zachary T. Campbell
  • Article
    | Open Access

    Single cell ATAC-seq (scATAC-seq) data reveals cellular level epigenetic heterogeneity but its application in delineating distinct subpopulations is still challenging. Here, the authors develop scABC, a statistical method for unsupervised clustering of scATAC-seq data and identification of open chromatin regions specific to cell identity.

    • Mahdi Zamanighomi
    • , Zhixiang Lin
    •  & Wing Hung Wong
  • Article
    | Open Access

    From infectious diseases to brain activity, complex systems can be approximated using autoregressive models. Here, the authors show that incomplete sampling can bias estimates of the stability of such systems, and introduce a novel, unbiased metric for use in such situations.

    • Jens Wilting
    •  & Viola Priesemann
  • Article
    | Open Access

    Despite being widely performed in exploring cell heterogeneity and gene expression stochasticity, single cell RNA-seq analysis is complicated by excess zero counts (dropouts). Here, Li and Li develop scImpute for statistical imputation of dropouts in scRNA-seq data.

    • Wei Vivian Li
    •  & Jingyi Jessica Li
  • Article
    | Open Access

    Genetic prediction of complex traits so far has limited accuracy because of insufficient understanding of the genetic risk. Here, Maier et al. develop an improved method for trait prediction that makes use of genetic correlations between traits and apply it to summary statistics of psychiatric diseases.

    • Robert M. Maier
    • , Zhihong Zhu
    •  & Matthew R. Robinson
  • Article
    | Open Access

    Selective sweeps are events in which beneficial mutations spread rapidly through a population. Here, Sugden et al. develop SWIF(r), a probabilistic classification framework for detecting and localizing selective sweeps, and apply it to genomic data from the ‡Khomani San.

    • Lauren Alpert Sugden
    • , Elizabeth G. Atkinson
    •  & Sohini Ramachandran
  • Article
    | Open Access

    Elucidating molecular organisation requires precise localisation and analysis. Here the authors develop SODA software for automatic and quantitative mapping of statistically coupled molecules, and use it to unravel spatial organisation of thousands of synaptic proteins in SIM and 3DSTORM microscopy.

    • Thibault Lagache
    • , Alexandre Grassart
    •  & Jean-Christophe Olivo-Marin
  • Article
    | Open Access

    Different experimental and computational approaches can be used to study RNA structures. Here, the authors present a computational method for data-directed reconstruction of complex RNA structure landscapes, which predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data.

    • Hua Li
    •  & Sharon Aviran
  • Article
    | Open Access

    B and T cell receptor diversity can be studied by high-throughput immune receptor sequencing. Here, the authors develop a software tool, IGoR, that calculates the likelihoods of potential V(D)J recombination and somatic hypermutation scenarios from raw immune sequence reads.

    • Quentin Marcou
    • , Thierry Mora
    •  & Aleksandra M. Walczak
  • Article
    | Open Access

    Single-cell RNA sequencing (scRNA-seq) data provides information on transcriptomic heterogeneity within cell populations. Here, Risso et al develop ZINB-WaVE for low-dimensional representations of scRNA-seq data that account for zero inflation, over-dispersion, and the count nature of the data.

    • Davide Risso
    • , Fanny Perraudeau
    •  & Jean-Philippe Vert
  • Article
    | Open Access

    Genetic methods are useful to test whether risk factors are causal for or consequence of disease. Here, Zhu et al. develop a generalized summary-based Mendelian Randomization (GSMR) method which uses summary-level data from GWAS to test for causal associations of health risk factors with common diseases.

    • Zhihong Zhu
    • , Zhili Zheng
    •  & Jian Yang
  • Article
    | Open Access

    Mathematical approaches can be used to assess immune cell composition from the tumour's bulk expression data. Here the authors optimise the CYBERSORT-based deconvolution algorithm by including cell type-specific reference gene expression profiles generated from tumour-derived single-cell RNA sequencing data.

    • Max Schelker
    • , Sonia Feau
    •  & Andreas Raue
  • Article
    | Open Access

    Matching fragment spectra to reference library spectra is an important procedure for annotating small molecules in untargeted mass spectrometry based metabolomics studies. Here, the authors develop strategies to estimate false discovery rates (FDR) by empirical Bayes and target-decoy based methods which enable a user to define the scoring criteria for spectral matching.

    • Kerstin Scheubert
    • , Franziska Hufsky
    •  & Sebastian Böcker
  • Article
    | Open Access

    IgG glycosylation is an important factor in immune function, yet the molecular details of protein glycosylation remain poorly understood. The data-driven approach presented here uses large-scale plasma IgG mass spectrometry measurements to infer new biochemical reactions in the glycosylation pathway.

    • Elisa Benedetti
    • , Maja Pučić-Baković
    •  & Jan Krumsiek
  • Article
    | Open Access

    Large-scale metabolic models of organisms from microbes to mammals can provide great insight into cellular function, but their analysis remains challenging. Here, the authors provide an approximate analytic method to estimate the feasible solution space for the flux vectors of metabolic networks, enabling more accurate analysis under a wide range of conditions of interest.

    • Alfredo Braunstein
    • , Anna Paola Muntoni
    •  & Andrea Pagnani
  • Article
    | Open Access

    Single-cell RNA sequencing has enabled great advances in understanding developmental biology but reconstructing cellular lineages from this data remains challenging. Here the authors develop an algorithm,dpath, which models the lineage relationships of underlying single cells based on single cell RNA seq data and apply it to study lineage progression of Etv2 expressing progenitors.

    • Wuming Gong
    • , Tara L. Rasmussen
    •  & Daniel J. Garry
  • Article
    | Open Access

    Plasticity and clonal population structure in bacterial genomes can hinder traditional SNP-based genetic association studies. Here, Corander and colleagues present a method to identify variable-length sequence elements enriched in a phenotype of interest, and demonstrate its use in human pathogens.

    • John A. Lees
    • , Minna Vehkala
    •  & Jukka Corander
  • Article
    | Open Access

    Clinical RNA-seq datasets can predict clinical outcomes. Here, Shen et al. report a statistical method for survival analysis of mRNA isoform variation using clinical RNA-seq datasets, and the identified isoform based survival predictors outperform gene expression based survival predictors using TCGA data on six cancer types.

    • Shihao Shen
    • , Yuanyuan Wang
    •  & Yi Xing
  • Article
    | Open Access

    Stochastic reaction-diffusion systems are used for modelling spatial dynamics in many disciplines, but parameter inference and model selection remain challenging. Here the authors offer a solution enabled by a connection between reaction-diffusion and the well-studied spatio-temporal Cox processes.

    • David Schnoerr
    • , Ramon Grima
    •  & Guido Sanguinetti
  • Article
    | Open Access

    Use of general linear mixed models (GLMMs) in genetic variance analysis can quantify the relative contribution of additive effects from genetic variation on a given trait. Here, Jonathan Mosley and colleagues apply GLMM in a phenome-wide analysis and show that genetic variations in the HLA region are associated with 44 phenotypes, 5 phenotypes which were not previously reported in GWASes.

    • Jonathan D. Mosley
    • , John S. Witte
    •  & Joshua C. Denny
  • Article
    | Open Access

    The influence of species conservation on food webs is less well understood than the effects of species loss. Here, the authors test several indices against optimal food web management and find no current metrics are reliably effective at identifying species conservation priorities.

    • E. McDonald-Madden
    • , R. Sabbadin
    •  & H. P. Possingham
  • Article
    | Open Access

    Das et al. present a novel Bayesian approach called expression Quantitative Trait enhancer Loci (eQTeL), which effectively integrates genetic and epigenetic information to identify combination of regulatory genomic variants underlying expression variance. Using various functional data, the authors show the variants identified by eQTeL are likely to be causal.

    • Avinash Das
    • , Michael Morley
    •  & Sridhar Hannenhalli
  • Article |

    Body plan complexity is associated with the number of different cell types, yet the processes that create this diversity are unclear. Here the authors use transcriptomics to test the hypothesis that unlike cancer cells, novel normal cell types arise through sub-specialization of an ancestral cell type.

    • Cong Liang
    • , Alistair R.R. Forrest
    •  & Günter P. Wagner