Statistical methods articles within Nature Communications

Featured

  • Article
    | Open Access

    Finding coral reefs resilient to climate warming is challenging. This study combines Great Barrier Reef remote sensing with breeding experiments that estimate coral survival under exposure to high temperatures to develop forecasting models that locate reefs with increased heat tolerance. These reefs represent targets for protection and potential sources of corals for reef restoration.

    • K. M. Quigley
    •  & M. J. H. van Oppen
  • Article
    | Open Access

    Randomized clinical trials are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding factors. Here, the authors develop a framework based on natural language processing to uncover interpretable potential confounders from text.

    • Jiaming Zeng
    • , Michael F. Gensheimer
    •  & Ross D. Shachter
  • Article
    | Open Access

    Schwaz, Austria, experienced SARS-CoV-2 outbreaks caused by variants of concern in early 2021 and conducted a mass vaccination campaign in response, with 70% of the adult population vaccinated after 5 days. Here, the authors show that this campaign resulted in reduced infections and hospitalisations.

    • Jörg Paetzold
    • , Janine Kimpel
    •  & Hannes Winner
  • Article
    | Open Access

    Many microbiome differential abundance methods are available, but it lacks systematic comparison among them. Here, the authors compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups, and show ALDEx2 and ANCOM-II produce the most consistent results.

    • Jacob T. Nearing
    • , Gavin M. Douglas
    •  & Morgan G. I. Langille
  • Article
    | Open Access

    Missing values in scRNA-seq datasets can bias their analysis. Here, the authors threshold the low rank approximation of the expression matrix, so false zeros can be imputed while true zeros are preserved.

    • George C. Linderman
    • , Jun Zhao
    •  & Yuval Kluger
  • Article
    | Open Access

    Improving inference in large-scale genetic data linked to electronic medical record data requires the development of novel computationally efficient regression methods. Here, the authors develop a Bayesian approach for association analyses to improve SNP-heritability estimation, discovery, fine-mapping and genomic prediction.

    • Marion Patxot
    • , Daniel Trejo Banos
    •  & Matthew R. Robinson
  • Article
    | Open Access

    Factors underlying the effects of gastric bypass surgery on glucose homeostasis are incompletely understood. Here the authors developed and applied high-throughput mediation analysis to identify proteome/metabolome mediators of improved glucose homeostasis after to gastric bypass surgery, and report that improved glycemia was mediated by the growth hormone receptor.

    • Jonathan M. Dreyfuss
    • , Yixing Yuchi
    •  & Mary Elizabeth Patti
  • Article
    | Open Access

    In this study, Tran Kiem et al. examine the contribution of different age groups to COVID-19 transmission. Using data from the French epidemic in summer 2020, they report that while individuals aged 80 years and older are more at risk, pandemic control in the absence of vaccines required measures targeted at all age groups.

    • Cécile Tran Kiem
    • , Paolo Bosetti
    •  & Simon Cauchemez
  • Article
    | Open Access

    Simulation is useful for developing and evaluating computational methods. Here, the authors develop a comprehensive evaluation framework, SimBench, to benchmark Single-cell RNA-seq simulation methods through a diverse collection of experimental datasets.

    • Yue Cao
    • , Pengyi Yang
    •  & Jean Yee Hwa Yang
  • Article
    | Open Access

    Imbalance and loss of cell types is a hallmark in many diseases. Still, quantifying compositional changes in scRNAseq data remains challenging. Here the authors present scCODA, a Bayesian model to assess cell type compositions in scRNA-seq data.

    • M. Büttner
    • , J. Ostner
    •  & B. Schubert
  • Article
    | Open Access

    Obtaining accurate variant calls from multiple displacement amplified single cell DNA sequencing data needs dedicated models that account for amplification bias and copy errors. Here, the authors describe ProSolo, a model for calling single nucleotide variants with control over the false discovery rate.

    • David Lähnemann
    • , Johannes Köster
    •  & Alexander Schönhuth
  • Article
    | Open Access

    O’nyong nyong and Chikungunya virus are arboviruses present in Africa but their prevalence is unknown, partly due to high antibody cross-reactivity with one another. Here, the authors develop a statistical model that accounts for cross-reactivity to characterise circulation of both viruses from seroprevalence surveys.

    • Nathanaël Hozé
    • , Issa Diarra
    •  & Simon Cauchemez
  • Article
    | Open Access

    scRNASeq data is revolutionizing our understanding of biological systems, but is still expensive to generate. Here, the authors present a statistical framework that facilitates informed multi-sample experimental design to reduce unnecessary costs and maximize the utility of the generated data.

    • Katharina T. Schmid
    • , Barbara Höllbacher
    •  & Matthias Heinig
  • Article
    | Open Access

    Estimates of the severity of emerging infections did not consider the case ascertainment method, but secondary cases identified by contact tracing of index cases may be more reliable as they are less susceptible to ascertainment bias. Here, the authors perform a systematic review to quantify these differences and model their impacts for COVID-19.

    • Tim K. Tsang
    • , Can Wang
    •  & Benjamin J. Cowling
  • Article
    | Open Access

    The proportion of a population that has previously been infected by a pathogen is typically estimated using antibody thresholds adjusted for sensitivity and specificity. Here, the authors present a model-based alternative to threshold methods which accounts for antibody waning and other sources of spectrum bias.

    • C. Bottomley
    • , M. Otiende
    •  & J. A. G. Scott
  • Article
    | Open Access

    Scalable trajectory inference for multi-omic single cell datasets is challenging in terms of capturing non-tree complex topologies. Here the authors present a method, VIA, that scales to millions of cells across multiple omic modalities using lazy-teleporting random walks.

    • Shobana V. Stassen
    • , Gwinky G. K. Yip
    •  & Kevin K. Tsia
  • Article
    | Open Access

    Genetic plasticity drives phenotypic differences. Here, the authors develop a framework to quantify the individual and combinatorial contributions of SNPs on a phenotype of interest and use it to identify SNP-SNP interactions associated with variations in bacteria’s response to external changes.

    • Dengcheng Yang
    • , Yi Jin
    •  & Rongling Wu
  • Article
    | Open Access

    Glycomics can uncover important molecular changes but measured glycans are highly interconnected and incompatible with common statistical methods, introducing pitfalls during analysis. Here, the authors develop an approach to identify glycan dependencies across samples to facilitate comparative glycomics.

    • Bokan Bao
    • , Benjamin P. Kellman
    •  & Nathan E. Lewis
  • Article
    | Open Access

    Mass spectrometry-based metabolomics is a powerful method for profiling large clinical cohorts but batch variations can obscure biologically meaningful differences. Here, the authors develop a computational workflow that removes unwanted data variation while preserving biologically relevant information.

    • Taiyun Kim
    • , Owen Tang
    •  & Jean Yee Hwa Yang
  • Article
    | Open Access

    Existing genetic prediction tools typically assume that genetic variants contribute equally towards the phenotype. The authors develop eight prediction tools that allow the user to specify the heritability model, and show that these tools enable substantially improved prediction of complex traits.

    • Qianqian Zhang
    • , Florian Privé
    •  & Doug Speed
  • Article
    | Open Access

    Precision medicine needs prognostic markers to select the patients that will benefit more from targeted therapy. Authors show here that high level of baseline T cell receptor diversity is an indicator of favourable prognosis in multiple cancer types, and monoclonal expansion of T-cells correlates with good response to immune checkpoint blockade therapy in metastatic melanoma patients.

    • Sara Valpione
    • , Piyushkumar A. Mundra
    •  & Richard Marais
  • Article
    | Open Access

    Cross-linking mass spectrometry (MS) can identify protein-protein interaction (PPI) networks but assessing the reliability of these data remains challenging. To address this issue, the authors develop and validate a method to determine the false-discovery rate of PPIs identified by cross-linking MS.

    • Swantje Lenz
    • , Ludwig R. Sinn
    •  & Juri Rappsilber
  • Article
    | Open Access

    The genome-wide investigation of chromatin organization enables insights into global gene expression control. Here, the authors present a computationally efficient method for the analysis of chromatin organization data and use it to recover principles of 3D organization across conditions.

    • Merve Sahin
    • , Wilfred Wong
    •  & Christina S. Leslie
  • Article
    | Open Access

    Allele-specific expression in diploid organisms can be quantified by RNA-seq and it is common practice to rely on a single library. Here, the authors show that the standard approach has variable error rate and present Qllelic as a tool to improve reproducibility of allele-specific RNA-seq analysis.

    • Asia Mendelevich
    • , Svetlana Vinogradova
    •  & Alexander A. Gimelbrant
  • Article
    | Open Access

    Association analyses that capture rare and noncoding variants in whole genome sequencing data are limited by factors like statistical power. Here, the authors present KnockoffScreen, a statistical method using the knockoff framework to detect, localise and prioritise rare and common risk variants at genome-wide scale.

    • Zihuai He
    • , Linxi Liu
    •  & Iuliana Ionita-Laza
  • Article
    | Open Access

    The vast majority of somatic mutations observed in tumors are rare. Here, the authors show that these large numbers of rare mutations are more predictive of the tissue of origin of a tumor than the information from a few common driver mutations.

    • Saptarshi Chakraborty
    • , Axel Martin
    •  & Ronglai Shen
  • Article
    | Open Access

    Cellular genetic heterogeneity is common across biological conditions, yet application of long-read sequencing to this subject is limited by error rates. Here, the authors present iGDA, a tool for detection and phasing of minor variants from long-read sequencing data, allowing accurate reconstruction of haplotypes.

    • Zhixing Feng
    • , Jose C. Clemente
    •  & Eric E. Schadt
  • Article
    | Open Access

    Study of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Here, the authors propose a computationally efficient Permutation-based Feature Importance Test to assist interpretation and selection of individual features in complex machine learning models for complex disease analysis.

    • Xinlei Mi
    • , Baiming Zou
    •  & Jianhua Hu
  • Article
    | Open Access

    Biomedical measurements usually generate high-dimensional data where individual samples are classified in several categories. Vogelstein et al. propose a supervised dimensionality reduction method which estimates the low-dimensional data projection for classification and prediction in big datasets.

    • Joshua T. Vogelstein
    • , Eric W. Bridgeford
    •  & Mauro Maggioni
  • Article
    | Open Access

    Quantifying the effects of individual loci on the human phenome is a challenging task. Here, the authors introduce a modelling technique, TGCA, that assesses total genetic contribution per locus and apply this to UK Biobank phenotype domains, revealing top loci and links to tissue-specific gene expression.

    • Ting Li
    • , Zheng Ning
    •  & Xia Shen
  • Article
    | Open Access

    Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Here the authors present a meta-analysis empowered by a new statistical method covering thousands of ChIP-Seq experiments resulting in the identification of more than 500 thousand allele-specific binding (ASB) events in the human genome.

    • Sergey Abramov
    • , Alexandr Boytsov
    •  & Ivan V. Kulakovskiy
  • Article
    | Open Access

    Estimates of COVID-19-related mortality are limited by incomplete testing. Here, the authors perform counterfactual analyses and estimate that there were 59,000–62,000 deaths from COVID-19 in Italy until 9th September 2020, approximately 1.5 times higher than official statistics.

    • Chirag Modi
    • , Vanessa Böhm
    •  & Uroš Seljak
  • Article
    | Open Access

    Tissue damage and turnover lead to the release of DNA in the blood and can be used to monitor changes in tissue state. Here, the authors developed a tool to accurately estimate the proportion of cell types contributing to cell-free DNA in the blood, with an application to pregnant women and ALS patients.

    • Christa Caggiano
    • , Barbara Celona
    •  & Noah Zaitlen
  • Article
    | Open Access

    Functional RNA secondary structure is important for the pre-mRNA processing including splicing, cleavage and polyadenylation, and RNA editing. Here the authors present a catalog of conserved long-range RNA structures in the human transcriptome by defining pairs of conserved complementary regions (PCCR) in pre-aligned evolutionarily conserved regions.

    • Svetlana Kalmykova
    • , Marina Kalinina
    •  & Dmitri Pervouchine
  • Article
    | Open Access

    Methods for profiling differences between individual cells are constantly expanding. Here, the authors present a computational framework for the analysis of chromatin accessibility data at the single-cell level that takes into account previous knowledge and data-specific characteristics.

    • Shengquan Chen
    • , Guanao Yan
    •  & Zhixiang Lin
  • Article
    | Open Access

    Genetic correlation analyses give insight on complex disease, yet are limited by oversimplification. Here, the authors present LOGODetect, a method using summary statistics from genome-wide association studies to identify genomic regions with correlation signals across multiple phenotypes.

    • Hanmin Guo
    • , James J. Li
    •  & Lin Hou