Software articles within Nature Communications

Featured

  • Article
    | Open Access

    Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. GoldRush departs from this paradigm, generating highly contiguous assemblies with linear time complexity and using an order of magnitude less RAM than state-of-the-art methods.

    • Johnathan Wong
    • , Lauren Coombe
    •  & Inanç Birol
  • Article
    | Open Access

    Spatial proteomic data serve to provide cell-level location information for the extraction of biological features from tissues, but analyzing such data can be difficult. Here the authors report the development of SPIAT for data analyses and spaSim for simulation and validation of methods to help bridge the gap between the technology and its translation.

    • Yuzhou Feng
    • , Tianpei Yang
    •  & Anna S. Trigos
  • Article
    | Open Access

    Integration and comparison of multiple single cell sequencing datasets can be used to compare different studies. Here the authors propose MetaTiME which compares the gene expression of single cells from the tumour microenvironment across different tumours and uses transportable labels and metacomponents to annotate cell types and states.

    • Yi Zhang
    • , Guanjue Xiang
    •  & Clifford A. Meyer
  • Article
    | Open Access

    It’s challenging to capture “hidden” drivers that may not be genetically-altered or differentially-expressed from omics data. Here the authors developed NetBID2, a comprehensive network-based toolbox with versatile features, enabling the integration of multi-omics data to expose such hidden drivers.

    • Xinran Dong
    • , Liang Ding
    •  & Jiyang Yu
  • Article
    | Open Access

    Here the authors have realized a programmable incoherent optical neural network that delivers light-speed, high-bandwidth, and power-efficient neural network inference via processing parallel visible light signals in the free space.

    • Yuchi Huo
    • , Hujun Bao
    •  & Sung-Eui Yoon
  • Article
    | Open Access

    Interpretation of rare genetic variants remains challenging. Here, the authors develop a supervised variant effect predictor for use in clinically actionable genes which incorporates evolutionary and structural relationships between residues and has balanced specificity and sensitivity.

    • Federica Luppino
    • , Ivan A. Adzhubei
    •  & Agnes Toth-Petroczy
  • Article
    | Open Access

    Billions of MS/MS spectra are available in public proteomics data repositories, but their usage has been limited to informatics experts. Here, the authors provide a solution to democratize these data for rapid peptide searching and demonstrate utilities in a wide range of biological applications

    • Bo Wen
    •  & Bing Zhang
  • Article
    | Open Access

    Here the authors report SBCG2 an update to the neural network based, Shape-Based Coarse Graining (SBCG) approach for creating coarse grained molecular topologies with atomistic detail. They show how SBCG2 can reduce the computational costs of simulating very large assemblies like the HIV-1 capsid allowing simulation on commodity hardware.

    • Alexander J. Bryer
    • , Juan S. Rey
    •  & Juan R. Perilla
  • Article
    | Open Access

    Accurately annotating cell types is a fundamental step in single-cell omics data analysis. Here, the authors develop a computational method called Cellcano based on a two-round supervised learning algorithm to identify cell types for scATAC-seq data and perform benchmarking to demonstrate its accuracy, robustness and computational efficiency.

    • Wenjing Ma
    • , Jiaying Lu
    •  & Hao Wu
  • Article
    | Open Access

    Dimension reduction (DR) is a key step of Cytometry by Time-of-Flight (CyTOF) data analysis. Here, the authors benchmark 21 DR methods on 110 real and 425 synthetic CyTOF samples, finding a high level of complementarity between the methods, and providing a comprehensive set of user guidelines.

    • Kaiwen Wang
    • , Yuqiu Yang
    •  & Tao Wang
  • Article
    | Open Access

    The authors develop a machine learning approach to find structurally related chemicals in mass spectral libraries. Their method boosts the annotation rate and aids in assessing novelty in metabolomics datasets.

    • Niek F. de Jonge
    • , Joris J. R. Louwen
    •  & Justin J. J. van der Hooft
  • Article
    | Open Access

    The signal-to-noise ratio in bioimages is often low, which is problematic for segmentation. Here the authors report a deep learning method, deepflash2, to facilitate the segmentation of ambiguous bioimages through multi-expert annotations and integrated quality assurance.

    • Matthias Griebel
    • , Dennis Segebarth
    •  & Christoph M. Flath
  • Article
    | Open Access

    The accuracy and granularity of classifying cell types in the tumour microenvironment (TME) from single-cell RNA-seq data is impacted by heterogeneity among cancer cells and similarities among functionally related immune cells. Here, the authors develop scATOMIC, a tumour and TME cell type classifier based on a hierarchical approach that can be applied to pan-cancer datasets.

    • Ido Nofech-Mozes
    • , David Soave
    •  & Sagi Abelson
  • Article
    | Open Access

    Analysing the regulatory consequences of mutations and splice variants at large scale in cancer requires efficient computational tools. Here, the authors develop RegTools, a software package that can identify splice-associated variants from large-scale genomics and transcriptomics data with efficiency and flexibility.

    • Kelsy C. Cotto
    • , Yang-Yang Feng
    •  & Malachi Griffith
  • Article
    | Open Access

    There is a lack of standardisation in slide microscopy imaging data. Here the authors report Slim, an open-source, web-based slide microscopy viewer implementing the Digital Imaging and Communications in Medicine (DICOM) standard to achieve interoperability with a range of existing medical imaging systems.

    • Chris Gorman
    • , Davide Punzo
    •  & Markus D. Herrmann
  • Article
    | Open Access

    The authors present BARASA, an approach to assign backbone triple resonance spectra of proteins that augments traditional approaches with a Bayesian statistical analysis of the observed chemical shifts. The algorithm employs a simulated annealing engine to establish a consensus set of resonance assignments and is tested against systems ranging in size to over 450 amino acids including examples of intrinsically disordered proteins.

    • Anthony C. Bishop
    • , Glorisé Torres-Montalvo
    •  & A. Joshua Wand
  • Article
    | Open Access

    The precise inference of structural variants (SVs) requires suitable sequencing technologies and computational tools. Here, in order to analyse SVs with haplotype resolution, the author applies high-resolution long-read sequencing and long-range Hi-C to a melanoma cell line and develops an efficient graph-based computational framework, pstools.

    • Shilpa Garg
  • Article
    | Open Access

    The increasing scale of single-cell RNA-seq studies presents new challenge for integrating datasets from different batches. Here, the authors develop scDML, a tool that simultaneously removes batch effects, improves clustering performance, recovers true cell types, and scales well to large datasets.

    • Xiaokang Yu
    • , Xinyi Xu
    •  & Xiangjie Li
  • Article
    | Open Access

    Computational methods to study protein structural dynamics are a powerful tool in life sciences but are computationally expensive. Here, the authors show that machine learning can be used to efficiently generate protein conformational ensembles and test their method on intrinsically disordered peptides.

    • Giacomo Janson
    • , Gilberto Valdes-Garcia
    •  & Michael Feig
  • Article
    | Open Access

    Existing genomic data analysis methods tend to not take full advantage of underlying biological characteristics. Here, the authors leverage the inherent interactions of scRNA-seq data and develop a cartography strategy to contrive the data into a spatially configured genomap for accurate deep pattern discovery.

    • Md Tauhidul Islam
    •  & Lei Xing
  • Article
    | Open Access

    The extensive information capacity of DNA makes it an attractive alternative to traditional data storage. DNA-Aeon is a DNA data storage solution that can correct all error types commonly observed in DNA storage, while encoding data into sequences that meet user-defined constraints such as GC content, homopolymer length, and no undesired motifs.

    • Marius Welzel
    • , Peter Michael Schwarz
    •  & Dominik Heider
  • Article
    | Open Access

    Cell type deconvolution in tumor spatial transcriptomics (ST) data remains challenging. Here, the authors develop Spatial Cellular Estimator for Tumors (SpaCET) to infer cell types and intercellular interactions from ST data in cancer across different platforms, with improved performance over similar methods.

    • Beibei Ru
    • , Jinlin Huang
    •  & Peng Jiang
  • Article
    | Open Access

    Many methods for single cell data integration have been developed, though mosaic integration remains challenging. Here the authors present scMoMaT, a mosaic integration method for single cell multi-modality data from multiple batches, that jointly learns cell representations and marker features across modalities for different cell clusters, to interpret the cell clusters from different modalities.

    • Ziqi Zhang
    • , Haoran Sun
    •  & Xiuwei Zhang
  • Article
    | Open Access

    Hybrid nucleic acid origami has potential for biomedical delivery of mRNA and fabrication of artificial ribozymes. Here, the authors use chemical footprinting and cryo-electron microscopy to reveal insights into nucleic acid origami used to fold messenger and ribosomal RNA into 3D polyhedral structures.

    • Molly F. Parsons
    • , Matthew F. Allan
    •  & Mark Bathe
  • Article
    | Open Access

    Comparing experimental mass spectra to reference spectra can enable natural product identification, but these spectral libraries are often incomplete and not universally applicable. Here, the authors present SNAP-MS, a tool that allows assigning compound families without experimental or calculated reference spectra.

    • Nicholas J. Morehouse
    • , Trevor N. Clark
    •  & Roger G. Linington
  • Article
    | Open Access

    Long-read sequencing is promising for the detection of structural variants (SVs), which requires algorithms with high sensitivity and precision. Here, the authors develop DeBreak, an algorithm for comprehensive and accurate SV detection in long-read sequencing data across different platforms, which outperforms other SV callers.

    • Yu Chen
    • , Amy Y. Wang
    •  & Zechen Chong
  • Article
    | Open Access

    Developing computational tools for interpretable cell type annotation in scRNA-seq data remains challenging. Here the authors propose a Transformer-based model for interpretable annotation transfer using biologically understandable entities, and demonstrate its performance on large or atlas datasets.

    • Jiawei Chen
    • , Hao Xu
    •  & Jing-Dong J. Han
  • Article
    | Open Access

    Longitudinal proteomics holds great promise for biomarker discovery, but the data interpretation has remained a challenge. Here, the authors evaluate several tools to detect longitudinal differential expression in proteomics data and introduce RolDE, a robust reproducibility optimization approach.

    • Tommi Välikangas
    • , Tomi Suomi
    •  & Laura L. Elo
  • Article
    | Open Access

    The presented Mean-Shift Super Resolution (MSSR) algorithm can extend spatial resolution within a single microscopy image. Its applicability extends across a wide range of experimental and instrumental configurations and it is compatible with other super-resolution microscopy approaches.

    • Esley Torres-García
    • , Raúl Pinto-Cámara
    •  & Adán Guerrero
  • Article
    | Open Access

    As the throughput of single-cell RNA-seq studies increases, there is a need for tools that can make the data analysis steps more streamlined and convenient. Here, the authors develop UniverSC, a tool that unifies single-cell RNA-seq analysis workflows and also facilitates their use for non-experts.

    • Kai Battenberg
    • , S. Thomas Kelly
    •  & Aki Minoda
  • Article
    | Open Access

    The organisation of mammalian genomes plays a role in many biological processes. Here the authors report dcHiC, a tool which uses a multivariate distance measure to identify changes in compartmentalisation among multiple genome-wide chromatin contact maps, and apply this to different human and mouse datasets.

    • Abhijit Chakraborty
    • , Jeffrey G. Wang
    •  & Ferhat Ay
  • Article
    | Open Access

    The success of CRISPR experiments relies on the choice of gRNA. Here the authors report crisprVerse, which enables efficient gRNA design and annotation for methods including CRISPRko, CRISPRa, CRISPRi, CRISPRbe and CRISPRkd, enabled for RNA- and DNA-targeting nucleases, including Cas9, Cas12 and Cas13.

    • Luke Hoberecht
    • , Pirunthan Perampalam
    •  & Jean-Philippe Fortin
  • Article
    | Open Access

    Cryogenic electron tomography suffers from anisotropic resolution due to the missing-wedge problem. Here, the authors present IsoNet, a neural network that learn the feature representation from similar structures in the tomogram and recover the missing information for isotropic tomogram reconstruction.

    • Yun-Tao Liu
    • , Heng Zhang
    •  & Z. Hong Zhou
  • Article
    | Open Access

    The development of machine learning systems has to ensure their robustness and reliability. The authors introduce a framework that defines a principled process of machine learning system formation, from research to production, for various domains and data scenarios.

    • Alexander Lavin
    • , Ciarán M. Gilligan-Lee
    •  & Yarin Gal
  • Article
    | Open Access

    Global alignment of complex cell state trajectories between single-cell datasets remains challenging. Here, the authors present a computational method called CAPITAL to compare branching trajectories, and demonstrate that this method achieves accurate and robust alignments.

    • Reiichi Sugihara
    • , Yuki Kato
    •  & Yukio Kawahara
  • Article
    | Open Access

    Early cancer detection by cell-free DNA (cfDNA) is challenged by the low amount of tumour DNA in cfDNA, tumour heterogeneity and the small patient cohorts. Here, the authors develop a method, cfMethyl-Seq, for cost-effective methylome profiling of cfDNA and for detecting and locating cancer.

    • Mary L. Stackpole
    • , Weihua Zeng
    •  & Xianghong Jasmine Zhou
  • Article
    | Open Access

    Human-operated optimization of non-aqueous Li-ion battery liquid electrolytes is a time-consuming process. Here, the authors propose an automated workflow that couples robotic experiments with machine learning to optimize liquid electrolyte formulations without human intervention.

    • Adarsh Dave
    • , Jared Mitchell
    •  & Venkatasubramanian Viswanathan
  • Article
    | Open Access

    DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search.

    • Lifu Song
    • , Feng Geng
    •  & Ying-Jin Yuan