Statistics articles within Nature Communications

Featured

  • Article
    | Open Access

    It is not clear which designs, other than completely randomized ones, are valid for scRNA-seq experiments so that batch effects can be adjusted. Here the authors show that under flexible reference panel and chain-type designs, biological variability can also be separated from batch effects, at least by BUSseq.

    • Fangda Song
    • , Ga Ming Angus Chan
    •  & Yingying Wei
  • Article
    | Open Access

    Natural hazards can have huge impacts on individuals and societies, however, monitoring the economic recovery in the aftermath of extreme events remains a challenge. Here, the authors find that Facebook posting activity of small businesses can be used to monitor post-disaster economic recovery, and can allow local governments to better target distribution of resources.

    • Robert Eyre
    • , Flavia De Luca
    •  & Filippo Simini
  • Article
    | Open Access

    Relapse, reinfection and recrudescence can all cause recurrent infection after treatment of Plasmodium vivax malaria in endemic areas, but are difficult to distinguish. Here the authors show that they can be differentiated probabilistically and thereby demonstrate the high efficacy of primaquine treatment in preventing relapse.

    • Aimee R. Taylor
    • , James A. Watson
    •  & Nicholas J. White
  • Article
    | Open Access

    t-SNE is widely used for dimensionality reduction and visualization of high-dimensional single-cell data. Here, the authors introduce a protocol to help avoid common shortcomings of t-SNE, for example, enabling preservation of the global structure of the data.

    • Dmitry Kobak
    •  & Philipp Berens
  • Article
    | Open Access

    Viral assembly is a complex process that in tailed bacteriophages involves scaffolding proteins which coordinate assembly of the phage procapsid and are subsequently released during maturation. Here the authors reveal the conformational changes that accompany virion maturation, documenting how the dissociation of scaffold proteins and DNA packaging processes intersect.

    • Athanasios Ignatiou
    • , Sandrine Brasilès
    •  & Elena V. Orlova
  • Article
    | Open Access

    Forecasting aftershock earthquakes is a critical step in improving seismic hazard mitigation. The authors here combine Bayesian methods with extreme value theory to tackle this problem - and manage to estimate the maximum magnitude of an expected earthquake as well as the arrival times in a pre-defined window.

    • Robert Shcherbakov
    • , Jiancang Zhuang
    •  & Yosihiko Ogata
  • Article
    | Open Access

    Approaches to examine the urbanization impact on climate change ignore that interactions between size and density may have an important influence on urban emissions. Here the authors show that variations in the emissions associated with changes in population or density may not only depend on the magnitude of these changes but also on the initial values of these quantities.

    • Haroldo V. Ribeiro
    • , Diego Rybski
    •  & Jürgen P. Kropp
  • Article
    | Open Access

    Whole-genome sequencing data reveals a large number of variants for testing their associations with phenotypic traits and diseases. Here, the authors develop WGScan, a statistical method for detecting the existence and estimating the locations of the association signal at genome-wide scale.

    • Zihuai He
    • , Bin Xu
    •  & Iuliana Ionita-Laza
  • Article
    | Open Access

    Cellular uptake of nanoparticles is highly variable between individual cells in a population. Here, the authors show that this heterogeneity is a result of varying numbers of nanoparticle-containing endosomes while the nanoparticle dose per endosome remains constant.

    • Paul Rees
    • , John W. Wills
    •  & Huw D. Summers
  • Comment
    | Open Access

    In research studies, the need for additional samples to obtain sufficient statistical power has often to be balanced with the experimental costs. One approach to this end is to sequentially collect data until you have sufficient measurements, e.g., when the p-value drops below 0.05. I outline that this approach is common, yet that unadjusted sequential sampling leads to severe statistical issues, such as an inflated rate of false positive findings. As a consequence, the results of such studies are untrustworthy. I identify the statistical methods that can be implemented in order to account for sequential sampling.

    • Casper Albers
  • Article
    | Open Access

    Polygenic risk scores (PRS) have the potential to predict complex diseases and traits from genetic data. Here, Ge et al. develop PRS-CS which uses a Bayesian regression framework, continuous shrinkage (CS) priors and an external LD reference panel for polygenic prediction of binary and quantitative traits from GWAS summary statistics.

    • Tian Ge
    • , Chia-Yen Chen
    •  & Jordan W. Smoller
  • Article
    | Open Access

    The resolution limitations when using the ubiquitous algorithms that process images obtained using modern techniques are not straightforward to define. Here, the authors examine the performance of localization algorithms and use spatial statistics to provide a metric for assessing the resolution limit of such algorithms.

    • Edward A. K. Cohen
    • , Anish V. Abraham
    •  & Raimund J. Ober
  • Article
    | Open Access

    AI is used increasingly in medical diagnostics. Here, the authors present a deep learning model that masters medical knowledge, demonstrated by it having passed the written test of the 2017 National Medical Licensing Examination in China, and can provide help with clinical diagnosis based on electronic health care records.

    • Ji Wu
    • , Xien Liu
    •  & Ping Lv
  • Article
    | Open Access

    Materials databases currently neglect the temperature effect on compound thermodynamics. Here the authors introduce a Gibbs energy descriptor enabling the high-throughput prediction of temperature-dependent thermodynamics across a wide range of compositions and temperatures for inorganic solids.

    • Christopher J. Bartel
    • , Samantha L. Millican
    •  & Aaron M. Holder
  • Article
    | Open Access

    Functional magnetic resonance imaging (fMRI) is a powerful technique for measuring human brain activity, but the statistical analysis of fMRI data can be difficult. Here, the authors introduce a new fMRI analysis tool, LISA, which provides increased statistical power compared to existing techniques.

    • Gabriele Lohmann
    • , Johannes Stelzer
    •  & Klaus Scheffler
  • Article
    | Open Access

    Genome-wide association studies (GWAS) of neuroimaging data pose a significant computational burden because of the need to correct for multiple testing in both the genetic and the imaging data. Here, Ganjgahi et al. develop WLS-REML which significantly reduces computation running times in brain imaging GWAS.

    • Habib Ganjgahi
    • , Anderson M. Winkler
    •  & Thomas E. Nichols
  • Article
    | Open Access

    Bottom-up fabrication via on-surface molecular self-assembly is a useful way to make nanomaterials, but finding appropriate precursor molecules for a given structure remains a challenge. Here the authors present an informatics technique linking self-assembled structures with precursor properties, helping identify molecules for target nanomaterials.

    • Daniel M. Packwood
    •  & Taro Hitosugi
  • Article
    | Open Access

    From infectious diseases to brain activity, complex systems can be approximated using autoregressive models. Here, the authors show that incomplete sampling can bias estimates of the stability of such systems, and introduce a novel, unbiased metric for use in such situations.

    • Jens Wilting
    •  & Viola Priesemann
  • Article
    | Open Access

    Dimensionality reduction and visualization methods lack a principled way of comparing multiple datasets. Here, Abid et al. introduce contrastive PCA, which identifies low-dimensional structures enriched in one dataset compared to another and enables visualization of dataset-specific patterns.

    • Abubakar Abid
    • , Martin J. Zhang
    •  & James Zou
  • Article
    | Open Access

    Systematic changes in stock market prices or in the migration behaviour of cancer cells may be hidden behind random fluctuations. Here, Mark et al. describe an empirical approach to identify when and how such real-world systems undergo systematic changes.

    • Christoph Mark
    • , Claus Metzner
    •  & Ben Fabry
  • Article
    | Open Access

    Single-cell RNA sequencing (scRNA-seq) data provides information on transcriptomic heterogeneity within cell populations. Here, Risso et al develop ZINB-WaVE for low-dimensional representations of scRNA-seq data that account for zero inflation, over-dispersion, and the count nature of the data.

    • Davide Risso
    • , Fanny Perraudeau
    •  & Jean-Philippe Vert
  • Article
    | Open Access

    Most time series techniques tend to ignore data uncertainties, which results in inaccurate conclusions. Here, Goswami et al. represent time series as a sequence of probability density functions, and reliably detect abrupt transitions by identifying communities in probabilistic recurrence networks.

    • Bedartha Goswami
    • , Niklas Boers
    •  & Jürgen Kurths
  • Article
    | Open Access

    The description of temporal networks is usually simplified in terms of their dynamic community structures, whose identification however relies on a priori assumptions. Here the authors present a data-driven method that determines relevant timescales for the dynamics and uses it to identify communities.

    • Tiago P. Peixoto
    •  & Martin Rosvall
  • Article
    | Open Access

    Super-resolution localization microscopy produces biophysical information in the form of estimated positions of single molecules. Here, Lindénet al. estimate the uncertainty of single localizations, and show that this additional information can improve data analysis and localization precision.

    • Martin Lindén
    • , Vladimir Ćurić
    •  & Johan Elf
  • Article
    | Open Access

    The stochastic nature of single-molecule charge transport measurements requires collection of large data sets to capture their full complexity. Here, the authors adopt strategies from machine learning for the unsupervised classification of single-molecule charge transport data without a prioriassumptions.

    • Mario Lemmer
    • , Michael S. Inkpen
    •  & Tim Albrecht
  • Article
    | Open Access

    Graphene is known to be a remarkably strong material, but it can often contain defects. Here, the authors use large-scale simulations and continuum modelling to show that the statistical variation in toughness and strength of polycrystalline graphene can be understood with 'weakest-link' statistics.

    • Ashivni Shekhawat
    •  & Robert O. Ritchie
  • Article |

    An enduring paradox of urban economics is why cities support levels of enterprise, such as patents and inventions, higher than the countryside. Here Pentland et al. suggest that the density of social ties provides a greater flow of ideas, resulting in increased productivity and innovation.

    • Wei Pan
    • , Gourab Ghoshal
    •  & Alex Pentland