Statistics

  • Article
    | Open Access

    Characterizing an unknown, complex system, like an accelerator, in multi-dimensional space is a challenging task. Here the authors report a Bayesian active learning method - Constrained Proximal Bayesian Exploration - for the characterization of a complex, constrained measurement as a function of multiple free parameters.

    • Ryan Roussel
    • , Juan Pablo Gonzalez-Aguilera
    •  & Auralee Edelen
  • Article
    | Open Access

    Forecasting models have been used extensively to inform decision making during the COVID-19 pandemic. In this preregistered and prospective study, the authors evaluated 14 short-term models for Germany and Poland, finding considerable heterogeneity in predictions and highlighting the benefits of combined forecasts.

    • J. Bracher
    • , D. Wolffram
    •  & Frost Tianjian Xu
  • Article
    | Open Access

    Accurate seasonal forecasts of sea ice are highly valuable, particularly in the context of sea ice loss due to global warming. A new machine learning tool for sea ice forecasting offers a substantial increase in accuracy over current physics-based dynamical model predictions.

    • Tom R. Andersson
    • , J. Scott Hosking
    •  & Emily Shuckburgh
  • Article
    | Open Access

    In many machine learning applications, one uses pre-trained neural networks, having limited access to training and test data. Martin et al. show how to predict trends in the quality of such neural networks without access to this information, relevant for reproducibility, diagnostics, and validation.

    • Charles H. Martin
    • , Tongsu (Serena) Peng
    •  & Michael W. Mahoney
  • Article
    | Open Access

    Networks describe the intricate patterns of interaction occurring within ecological systems, but they are unfortunately difficult to construct from data. Here, the authors show how Bayesian statistical techniques can separate structure from noise in networks gathered in observational studies of plant-pollinator systems.

    • Jean-Gabriel Young
    • , Fernanda S. Valdovinos
    •  & M. E. J. Newman
  • Article
    | Open Access

    Generating new sensible molecular structures is a key problem in computer aided drug discovery. Here the authors propose a graph-based molecular generative model that outperforms previously proposed graph-based generative models of molecules and performs comparably to several SMILES-based models.

    • Omar Mahmood
    • , Elman Mansimov
    •  & Kyunghyun Cho
  • Article
    | Open Access

    Influenza forecasting in the United States is challenging and consequential, with the ability to improve the public health response. Here the authors show the performance of the multiscale flu forecasting model, Dante, that won the CDC’s 2018/19 national, regional and state flu forecasting challenges.

    • Dave Osthus
    •  & Kelly R. Moran
  • Article
    | Open Access

    Gene regulatory networks are a useful means of inferring functional interactions from large-scale genomic data. Here, the authors develop a Bayesian framework integrating GWAS summary statistics with gene regulatory networks to identify genetic enrichments and associations simultaneously.

    • Xiang Zhu
    • , Zhana Duren
    •  & Wing Hung Wong
  • Article
    | Open Access

    In genome-wide association meta-analysis, it is often difficult to find an independent dataset of sufficient size to replicate associations. Here, the authors have developed MAMBA to calculate the probability of replicability based on consistency between datasets within the meta-analysis.

    • Daniel McGuire
    • , Yu Jiang
    •  & Dajiang J. Liu
  • Article
    | Open Access

    The Tafel slope in electrochemical catalysis is usually determined from experimental data and remains error-prone. Here, the authors develop a Bayesian approach for Tafel slope quantification, and apply it to study the prevalence of certain "cardinal" Tafel slopes in the electrochemical CO2 reduction literature.

    • Aditya M. Limaye
    • , Joy S. Zeng
    •  & Karthish Manthiram
  • Article
    | Open Access

    Accurate prediction of solubility represents a challenge for traditional computational approaches due to the complex nature of phenomena involved. Here the authors report a successful approach to solubility prediction in organic solvents and water using combination of machine learning and computational chemistry.

    • Samuel Boobier
    • , David R. J. Hose
    •  & Bao N. Nguyen
  • Article
    | Open Access

    Distributed health data networks (DHDNs) leverage data from multiple healthcare systems, but often face major analytical challenges in the presence of missing data. This paper develops distributed multiple imputation methods that do not require sharing subject-level data across health systems.

    • Changgee Chang
    • , Yi Deng
    •  & Qi Long
  • Article
    | Open Access

    Theories of human categorization have traditionally been evaluated in the context of simple, low-dimensional stimuli. In this work, the authors use a large dataset of human behavior over 10,000 natural images to re-evaluate these theories, revealing interesting differences from previous results.

    • Ruairidh M. Battleday
    • , Joshua C. Peterson
    •  & Thomas L. Griffiths
  • Article
    | Open Access

    Time-dependent errors are one of the main obstacles to fully-fledged quantum information processing. Here, the authors develop a general methodology to monitor time-dependent errors, which could be used to make other characterisation protocols time-resolved, and demonstrate it on a trapped-ion qubit.

    • Timothy Proctor
    • , Melissa Revelle
    •  & Kevin Young
  • Article
    | Open Access

    The intermittency of solar resources is one of the primary challenges for the large-scale integration of the renewable energy. Here Yin et al. used satellite data and climate model outputs to evaluate the geographic patterns of future solar power reliability, highlighting the tradeoff between the maximum potential power and the power reliability.

    • Jun Yin
    • , Annalisa Molini
    •  & Amilcare Porporato
  • Article
    | Open Access

    Principal component analysis is often used in studies of ancient DNA, but does not account for the age of the samples. Here, the authors present a factor analysis (FA) which corrects for this by including the effect of allele frequency drift over time.

    • Olivier François
    •  & Flora Jay
  • Article
    | Open Access

    The pyruvate dehydrogenase complex (PDC) is a multienzyme complex connecting glycolysis to mitochondrial oxidation of pyruvate. Cryo-EM analysis of PDC from Neurospora crassa reveals localization of fungi-specific protein X (PX) and confirms that it functions like the mammalian E3BP, recruiting the E3 component of PDC.

    • B. O. Forsberg
    • , S. Aibara
    •  & E. Lindahl
  • Perspective
    | Open Access

    Photon-induced charge separation phenomena are at the heart of light-harvesting applications but challenging to be described by quantum mechanical models. Here the authors illustrate the potential of machine-learning approaches towards understanding the fundamental processes governing electronic excitations.

    • Florian Häse
    • , Loïc M. Roch
    •  & Alán Aspuru-Guzik
  • Article
    | Open Access

    Although power laws are observed during nanoindentation and the power-law exponents are estimated to be approximately 1.5-1.6 for face-centered cubic metals, the origin of the exponent remains unclear. In this paper, we show the power-law statistics in pop-in magnitudes and unveil the nature of the exponent.

    • Yuji Sato
    • , Shuhei Shinzato
    •  & Shigenobu Ogata
  • Article
    | Open Access

    In medical diagnosis a doctor aims to explain a patient’s symptoms by determining the diseases causing them, while existing diagnostic algorithms are purely associative. Here, the authors reformulate diagnosis as a counterfactual inference task and derive new counterfactual diagnostic algorithms.

    • Jonathan G. Richens
    • , Ciarán M. Lee
    •  & Saurabh Johri
  • Article
    | Open Access

    It is not clear which designs, other than completely randomized ones, are valid for scRNA-seq experiments so that batch effects can be adjusted. Here the authors show that under flexible reference panel and chain-type designs, biological variability can also be separated from batch effects, at least by BUSseq.

    • Fangda Song
    • , Ga Ming Angus Chan
    •  & Yingying Wei
  • Article
    | Open Access

    Natural hazards can have huge impacts on individuals and societies, however, monitoring the economic recovery in the aftermath of extreme events remains a challenge. Here, the authors find that Facebook posting activity of small businesses can be used to monitor post-disaster economic recovery, and can allow local governments to better target distribution of resources.

    • Robert Eyre
    • , Flavia De Luca
    •  & Filippo Simini
  • Article
    | Open Access

    Relapse, reinfection and recrudescence can all cause recurrent infection after treatment of Plasmodium vivax malaria in endemic areas, but are difficult to distinguish. Here the authors show that they can be differentiated probabilistically and thereby demonstrate the high efficacy of primaquine treatment in preventing relapse.

    • Aimee R. Taylor
    • , James A. Watson
    •  & Nicholas J. White
  • Article
    | Open Access

    t-SNE is widely used for dimensionality reduction and visualization of high-dimensional single-cell data. Here, the authors introduce a protocol to help avoid common shortcomings of t-SNE, for example, enabling preservation of the global structure of the data.

    • Dmitry Kobak
    •  & Philipp Berens
  • Article
    | Open Access

    Viral assembly is a complex process that in tailed bacteriophages involves scaffolding proteins which coordinate assembly of the phage procapsid and are subsequently released during maturation. Here the authors reveal the conformational changes that accompany virion maturation, documenting how the dissociation of scaffold proteins and DNA packaging processes intersect.

    • Athanasios Ignatiou
    • , Sandrine Brasilès
    •  & Elena V. Orlova
  • Article
    | Open Access

    Forecasting aftershock earthquakes is a critical step in improving seismic hazard mitigation. The authors here combine Bayesian methods with extreme value theory to tackle this problem - and manage to estimate the maximum magnitude of an expected earthquake as well as the arrival times in a pre-defined window.

    • Robert Shcherbakov
    • , Jiancang Zhuang
    •  & Yosihiko Ogata
  • Article
    | Open Access

    Approaches to examine the urbanization impact on climate change ignore that interactions between size and density may have an important influence on urban emissions. Here the authors show that variations in the emissions associated with changes in population or density may not only depend on the magnitude of these changes but also on the initial values of these quantities.

    • Haroldo V. Ribeiro
    • , Diego Rybski
    •  & Jürgen P. Kropp
  • Article
    | Open Access

    Whole-genome sequencing data reveals a large number of variants for testing their associations with phenotypic traits and diseases. Here, the authors develop WGScan, a statistical method for detecting the existence and estimating the locations of the association signal at genome-wide scale.

    • Zihuai He
    • , Bin Xu
    •  & Iuliana Ionita-Laza
  • Article
    | Open Access

    Cellular uptake of nanoparticles is highly variable between individual cells in a population. Here, the authors show that this heterogeneity is a result of varying numbers of nanoparticle-containing endosomes while the nanoparticle dose per endosome remains constant.

    • Paul Rees
    • , John W. Wills
    •  & Huw D. Summers
  • Article
    | Open Access

    Polygenic risk scores (PRS) have the potential to predict complex diseases and traits from genetic data. Here, Ge et al. develop PRS-CS which uses a Bayesian regression framework, continuous shrinkage (CS) priors and an external LD reference panel for polygenic prediction of binary and quantitative traits from GWAS summary statistics.

    • Tian Ge
    • , Chia-Yen Chen
    •  & Jordan W. Smoller
  • Article
    | Open Access

    The resolution limitations when using the ubiquitous algorithms that process images obtained using modern techniques are not straightforward to define. Here, the authors examine the performance of localization algorithms and use spatial statistics to provide a metric for assessing the resolution limit of such algorithms.

    • Edward A. K. Cohen
    • , Anish V. Abraham
    •  & Raimund J. Ober
  • Article
    | Open Access

    AI is used increasingly in medical diagnostics. Here, the authors present a deep learning model that masters medical knowledge, demonstrated by it having passed the written test of the 2017 National Medical Licensing Examination in China, and can provide help with clinical diagnosis based on electronic health care records.

    • Ji Wu
    • , Xien Liu
    •  & Ping Lv
  • Article
    | Open Access

    Materials databases currently neglect the temperature effect on compound thermodynamics. Here the authors introduce a Gibbs energy descriptor enabling the high-throughput prediction of temperature-dependent thermodynamics across a wide range of compositions and temperatures for inorganic solids.

    • Christopher J. Bartel
    • , Samantha L. Millican
    •  & Aaron M. Holder
  • Article
    | Open Access

    Functional magnetic resonance imaging (fMRI) is a powerful technique for measuring human brain activity, but the statistical analysis of fMRI data can be difficult. Here, the authors introduce a new fMRI analysis tool, LISA, which provides increased statistical power compared to existing techniques.

    • Gabriele Lohmann
    • , Johannes Stelzer
    •  & Klaus Scheffler
  • Article
    | Open Access

    Genome-wide association studies (GWAS) of neuroimaging data pose a significant computational burden because of the need to correct for multiple testing in both the genetic and the imaging data. Here, Ganjgahi et al. develop WLS-REML which significantly reduces computation running times in brain imaging GWAS.

    • Habib Ganjgahi
    • , Anderson M. Winkler
    •  & Thomas E. Nichols
  • Article
    | Open Access

    Bottom-up fabrication via on-surface molecular self-assembly is a useful way to make nanomaterials, but finding appropriate precursor molecules for a given structure remains a challenge. Here the authors present an informatics technique linking self-assembled structures with precursor properties, helping identify molecules for target nanomaterials.

    • Daniel M. Packwood
    •  & Taro Hitosugi
  • Article
    | Open Access

    From infectious diseases to brain activity, complex systems can be approximated using autoregressive models. Here, the authors show that incomplete sampling can bias estimates of the stability of such systems, and introduce a novel, unbiased metric for use in such situations.

    • Jens Wilting
    •  & Viola Priesemann
  • Article
    | Open Access

    Dimensionality reduction and visualization methods lack a principled way of comparing multiple datasets. Here, Abid et al. introduce contrastive PCA, which identifies low-dimensional structures enriched in one dataset compared to another and enables visualization of dataset-specific patterns.

    • Abubakar Abid
    • , Martin J. Zhang
    •  & James Zou
  • Article
    | Open Access

    Systematic changes in stock market prices or in the migration behaviour of cancer cells may be hidden behind random fluctuations. Here, Mark et al. describe an empirical approach to identify when and how such real-world systems undergo systematic changes.

    • Christoph Mark
    • , Claus Metzner
    •  & Ben Fabry
  • Article
    | Open Access

    Single-cell RNA sequencing (scRNA-seq) data provides information on transcriptomic heterogeneity within cell populations. Here, Risso et al develop ZINB-WaVE for low-dimensional representations of scRNA-seq data that account for zero inflation, over-dispersion, and the count nature of the data.

    • Davide Risso
    • , Fanny Perraudeau
    •  & Jean-Philippe Vert
  • Article
    | Open Access

    Most time series techniques tend to ignore data uncertainties, which results in inaccurate conclusions. Here, Goswami et al. represent time series as a sequence of probability density functions, and reliably detect abrupt transitions by identifying communities in probabilistic recurrence networks.

    • Bedartha Goswami
    • , Niklas Boers
    •  & Jürgen Kurths
  • Article
    | Open Access

    The description of temporal networks is usually simplified in terms of their dynamic community structures, whose identification however relies on a priori assumptions. Here the authors present a data-driven method that determines relevant timescales for the dynamics and uses it to identify communities.

    • Tiago P. Peixoto
    •  & Martin Rosvall
  • Article
    | Open Access

    Super-resolution localization microscopy produces biophysical information in the form of estimated positions of single molecules. Here, Lindénet al. estimate the uncertainty of single localizations, and show that this additional information can improve data analysis and localization precision.

    • Martin Lindén
    • , Vladimir Ćurić
    •  & Johan Elf