Classification and taxonomy articles within Nature Communications

Featured

  • Article
    | Open Access

    Metagenomic taxonomic profiling usually relies either on reads or assembled contigs/MAGs. Here, authors present RAT, a tool that integrates taxonomic signals from reads, contigs, and MAGs into one profile with high precision and sensitivity. RAT provides a comprehensive view of the microbiome.

    • Ernestina Hauptfeld
    • , Nikolaos Pappas
    •  & F. A. Bastiaan von Meijenfeldt
  • Article
    | Open Access

    The classification of different types of neurons has been a long-standing challenge in neuroscience. Here, the authors present a strategy to quantify all statistically distinct axonal patterns from a brain region based on their anatomical targeting, with this projection-driven neuron classification informing the functional architecture of the circuit.

    • Diek W. Wheeler
    • , Shaina Banduri
    •  & Giorgio A. Ascoli
  • Article
    | Open Access

    Cell type annotation for single-cell data is challenging. Here, authors explore active and self-supervised learning and introduce adaptive reweighting as a tailored heuristic, demonstrating competitive performance and showing that incorporating prior knowledge enhances cell type annotation accuracy.

    • Michael J. Geuenich
    • , Dae-won Gong
    •  & Kieran R. Campbell
  • Article
    | Open Access

    Bacterial viruses (phages) are generally recognised as rapidly evolving biological entities. Here, Rozwalak et al. analyse DNA sequence datasets generated from ancient palaeofaeces and identify 298 phage genomes from the last 5300 years, including a 1300-year-old phage genome nearly identical to a present-day virus that infects human gut bacteria.

    • Piotr Rozwalak
    • , Jakub Barylski
    •  & Andrzej Zielezinski
  • Article
    | Open Access

    The evolution of cicadas is unclear due to a lack of understanding of transitional features. Here, the authors assess adult and nymph mid-Cretaceous cicadas, to elucidate their morphological evolution and identify evidence of the origins of cicada sound-generation and subterranean lifestyle.

    • Hui Jiang
    • , Jacek Szwedo
    •  & Bo Wang
  • Article
    | Open Access

    Identifying tissue structure in large-scale spatial omics datasets from multiple slices is challenging. Here, authors present MENDER, an optimisation-free spatial clustering method that can scale to million-level spatial data, enabling efficient analysis of spatial cell atlases.

    • Zhiyuan Yuan
  • Article
    | Open Access

    Acute GVHD severity grading is based on target organ assessments. Here, the authors show that data-driven grading can identify 12 distinct grades with specific aGVHD phenotypes, which are associated with clinical outcomes, and that their method outperformed conventional gradings.

    • Evren Bayraktar
    • , Theresa Graf
    •  & Amin T. Turki
  • Article
    | Open Access

    Integration and comparison of multiple single cell sequencing datasets can be used to compare different studies. Here the authors propose MetaTiME which compares the gene expression of single cells from the tumour microenvironment across different tumours and uses transportable labels and metacomponents to annotate cell types and states.

    • Yi Zhang
    • , Guanjue Xiang
    •  & Clifford A. Meyer
  • Article
    | Open Access

    The accuracy and granularity of classifying cell types in the tumour microenvironment (TME) from single-cell RNA-seq data is impacted by heterogeneity among cancer cells and similarities among functionally related immune cells. Here, the authors develop scATOMIC, a tumour and TME cell type classifier based on a hierarchical approach that can be applied to pan-cancer datasets.

    • Ido Nofech-Mozes
    • , David Soave
    •  & Sagi Abelson
  • Article
    | Open Access

    Contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low biomass environments. Here the authors describe Squeegee, a computational approach designed to detect microbial contamination within low microbial biomass microbiomes and identify microbial contaminants in publicly available datasets that lack negative controls.

    • Yunxi Liu
    • , R. A. Leo Elworth
    •  & Todd J. Treangen
  • Article
    | Open Access

    Asthma is a heterogeneous, complex syndrome that arises in individuals with various genetic and exposure variations. Here, the authors show that disease comorbidity patterns can serve as a surrogate for these variations, and identify asthma endotypes distinguished by comorbidity patterns, asthma risk loci, gene expression, and health-related phenotypes.

    • Gengjie Jia
    • , Xue Zhong
    •  & Julian Solway
  • Article
    | Open Access

    It is unclear how Lungfishes evolved durophagy, the consumption of hard prey, despite being the longest lineage of vertebrates with this feeding mechanism. Here, the authors describe exceptionally preserved fossils of Youngolepis from the Early Devonian, showing early adaptations to durophagy.

    • Xindong Cui
    • , Matt Friedman
    •  & Min Zhu
  • Article
    | Open Access

    While a large amount of genomic resources is available, the phylogeny of wild and cultivated beets remains unclear. Here, the authors use the k-mer-based Mash method to analyze resequenced genomes of 606 accessions of the genus Beta and reveal Greece as the domestication site of sugar beet.

    • Felix L. Sandell
    • , Nancy Stralis-Pavese
    •  & Juliane C. Dohm
  • Article
    | Open Access

    Idiopathic pulmonary arterial hypertension is a rare and fatal disease with a heterogeneous treatment response. Here the authors show that unsupervised machine learning of whole blood transcriptomes from 359 patients with idiopathic pulmonary arterial hypertension identifies 3 subgroups (endophenotypes) that improve risk stratification and provide new molecular insights.

    • Sokratis Kariotis
    • , Emmanuel Jammeh
    •  & Richard C. Trembath
  • Article
    | Open Access

    Here, the authors analyse the distance between the body of an antibody and a protein antigen denoted as the Antibody-Framework-to-Antigen Distance (AFAD) for about 2000 non-redundant antibody-protein antigen complexes in the Protein Data Bank. They observe that antibodies with exceptionally long AFADs were all broad HIV-1-neutralizing antibodies that targeted densely glycosylated regions on the HIV-1-envelope trimer. The connection between long AFAD and dense glycan was further validated by the cryo-EM structure of antibody 2909 recognizing a glycan hole and by glycan shielding analyses based on molecular dynamics simulations.

    • Myungjin Lee
    • , Anita Changela
    •  & Peter D. Kwong
  • Article
    | Open Access

    Whole genome sequencing is increasingly being adopted for Shigella sonnei outbreak investigation and surveillance, but there is no global classification standard. Here, the authors develop and validate a genomic framework implemented using open-source software, and demonstrate its application using surveillance data.

    • Jane Hawkey
    • , Kalani Paranagama
    •  & Kathryn E. Holt
  • Article
    | Open Access

    Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.

    • Christian Koelsche
    • , Daniel Schrimpf
    •  & Andreas von Deimling
  • Article
    | Open Access

    The human- and animal-adapted lineages of the Mycobacterium tuberculosis complex (MTBC) are thought to be evolved from a common progenitor in Africa. Here, the authors identify two MTBC strains isolated from patients with multidrug-resistant tuberculosis, representing an as-yet-unknown lineage further supporting an East African origin for the MTBC.

    • Jean Claude Semuto Ngabonziza
    • , Chloé Loiseau
    •  & Philip Supply
  • Article
    | Open Access

    Molecular analysis of archival formalin-fixed clinical tissues can be difficult. Here, researchers have developed MethCORR, an approach that infers gene expression from DNA methylation data and use the approach for molecular characterization and prognostication of colorectal cancer using archival samples.

    • Trine B. Mattesen
    • , Mads H. Rasmussen
    •  & Jesper B. Bramsen
  • Article
    | Open Access

    In breast cancer, the claudin-low breast cancer subtype is remarkably diverse. Here, the authors propose that claudin-low is not a classical intrinsic breast cancer subtype, but rather a complex additional phenotype that can occur across intrinsic subtypes.

    • Christian Fougner
    • , Helga Bergholtz
    •  & Therese Sørlie
  • Article
    | Open Access

    Atopic dermatitis (AD) and psoriasis (PSO) are associated with dysbiosis. Here, by analyses of skin microbiome and host transcriptome of AD and PSO patients, the authors find distinct microbial and disease-related gene transcriptomic signatures that differentiate both diseases.

    • Nanna Fyhrquist
    • , Gareth Muirhead
    •  & Harri Alenius
  • Article
    | Open Access

    Taxonomy classification of amplicon sequences is an important step in investigating microbial communities in microbiome analysis. Here, the authors show incorporating environment-specific taxonomic abundance information can lead to improved species-level classification accuracy across common sample types.

    • Benjamin D. Kaehler
    • , Nicholas A. Bokulich
    •  & Gavin A. Huttley
  • Article
    | Open Access

    Sequencing platforms, such as Oxford Nanopore or Pacific Biosciences generate long-read data that preserve long-range genomic information but have high error rates. Here, the authors develop MetaMaps, a computational tool for strain-level metagenomic assignment and compositional estimation using long reads.

    • Alexander T. Dilthey
    • , Chirag Jain
    •  & Adam M. Phillippy
  • Article
    | Open Access

    Cation/proton antiporters (CPAs) play a major role in maintaining living cells’ homeostasis and are divided in two main groups: CPA1 and CPA2. Here authors use a comprehensive evolutionary analysis of 6537 representative CPAs and reveal a sequence motif that determines central phenotypic characteristics.

    • Gal Masrati
    • , Manish Dwivedi
    •  & Nir Ben-Tal
  • Article
    | Open Access

    Informative pathways driving cancer pathogenesis and subtypes can be difficult to identify in the presence of many gene interactions irrelevant to cancer. Here, the authors describe an approach for cancer gene pathway analysis based on key molecular interactions that drive cancer in relevant tissue types, and they assemble a focused map of Evolutionarily Selected Pathways (ESP) with interactions supported by both protein–protein binding and genetic epistasis during somatic tumor evolution.

    • Sheng Wang
    • , Jianzhu Ma
    •  & Trey Ideker
  • Article
    | Open Access

    Viral taxonomic characterization from metagenomic data suffers from high background noise and signal crosstalk. Here, the authors develop VirMAP, a novel pipeline for analyses of metagenomic data that classifies viral reconstructions independent of genome coverage or read overlap.

    • Nadim J Ajami
    • , Matthew C. Wong
    •  & Joseph F. Petrosino
  • Article
    | Open Access

    Germline mutations in homologous recombination (HR) DNA repair genes are linked to breast and ovarian cancer. Here, the authors show that mutually exclusive bi-allelic inactivation of HR genes are present in other cancer types and associated with genomic features of HR deficiency, expanding the potential use of HR-directed therapies.

    • Nadeem Riaz
    • , Pedro Blecua
    •  & Jorge S. Reis-Filho
  • Article
    | Open Access

    Tumour expression profiling is currently used for prognostic and predictive purposes without taking into account the intra patient heterogeneity. Here the authors show that cancer cell specific signatures overcome the tumour heterogeneity effect and result in better classification of colorectal cancer patients.

    • Philip D. Dunne
    • , Matthew Alderdice
    •  & Mark Lawler
  • Article
    | Open Access

    Here, Anders Krogh and colleagues describe Kaiju, a metagenome taxonomic classification program that uses maximum (in-)exact matches on the protein-level to account for evolutionary divergence. The authors show that Kaiju performs faster and is more sensitive compared with existing algorithms and can be used on a standard computer.

    • Peter Menzel
    • , Kim Lee Ng
    •  & Anders Krogh