Genome informatics

  • Article
    | Open Access

    Long-read sequencing is promising for the detection of structural variants (SVs), which requires algorithms with high sensitivity and precision. Here, the authors develop DeBreak, an algorithm for comprehensive and accurate SV detection in long-read sequencing data across different platforms, which outperforms other SV callers.

    • Yu Chen
    • , Amy Y. Wang
    •  & Zechen Chong
  • Article
    | Open Access

    The power of pangenomic graphs to improve genetic mapping is still unclear. Here, the authors demonstrate its value in identification of genetic variants associated with disease resistance traits in melon using PanPipes, a pangenome construction and low-coverage genotype-by-sequencing pipeline.

    • Justin N. Vaughn
    • , Sandra E. Branham
    •  & William P. Wechter
  • Article
    | Open Access

    Predicting topological structures from Hi-C data provides insight into comprehending gene expression and regulation. Here, the authors present RefHiC, an attention-based deep learning framework that leverages a reference panel of Hi-C datasets to assist topological structure annotation from a given study sample.

    • Yanlin Zhang
    •  & Mathieu Blanchette
  • Article
    | Open Access

    Here the authors show that transposable element-mediated rearrangements impact more than 500 kbp of an average human genome, are a source of individual variation, a substrate for evolutionary change, and can occur through diverse mechanisms.

    • Parithi Balachandran
    • , Isha A. Walawalkar
    •  & Christine R. Beck
  • Article
    | Open Access

    Consensus sequence-based methods for self-correction of long-read sequencing data are affected by biases that can mask true variants characterizing little-covered or low-frequency haplotypes. Here, to address this issue, the authors develop a variation graph-based method for performing haplotype-aware self-correction of long reads.

    • Xiao Luo
    • , Xiongbin Kang
    •  & Alexander Schönhuth
  • Article
    | Open Access

    Monitoring of co-infections of SARS-CoV-2 variants is important to evaluate their clinical impact and the risk of emergence of recombinants. Here, the authors develop and validate a methodological pipeline to detect co-infections and apply it to samples from France in early 2022, when Delta and Omicron were co-circulating.

    • Antonin Bal
    • , Bruno Simon
    •  & Laurence Josset
  • Article
    | Open Access

    Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.

    • Jean-Sebastien Gounot
    • , Minghao Chia
    •  & Niranjan Nagarajan
  • Article
    | Open Access

    The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.

    • Danielle Miller
    • , Adi Stern
    •  & David Burstein
  • Article
    | Open Access

    Identifying structural variants (SVs) under positive selection in cancer is challenging. Here, the authors develop CSVDriver, a method that computes SV breakpoint proximity and the contribution of elements such as topologically associating domains, and identifies loci that show signs of positive selection and contain known and putative drivers.

    • Alexander Martinez-Fundichely
    • , Austin Dixon
    •  & Ekta Khurana
  • Article
    | Open Access

    Many archived tumour samples are stored as formalin-fixed and paraffin-embedded (FFPE) blocks, but this treatment can impact downstream genomics analyses. Here, the authors derive the mutational signatures of formalin on the cancer genome, and present FFPEsig, an algorithm that can distinguish and correct FFPE mutational signatures in archived cancer samples.

    • Qingli Guo
    • , Eszter Lakatos
    •  & Ville Mustonen
  • Article
    | Open Access

    As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Here the authors present Scarf, a modularly designed Python package that makes the analysis workflow highly memory efficient such that even the largest existing datasets can be analyzed on an average modern laptop.

    • Parashar Dhapola
    • , Johan Rodhe
    •  & Göran Karlsson
  • Article
    | Open Access

    Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. Here the authors present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry.

    • H. Serhat Tetikol
    • , Deniz Turgut
    •  & Brandi N. Davis-Dusenbery
  • Article
    | Open Access

    Previous work has investigated selection in the coding genome, but it is not as well characterized in the non-coding genome. By analyzing rare variants in 70k genome sequences from gnomAD, the authors detect very strong purifying selection ("ultraselection”) across the human genome, finding it in some microRNAs and coding sequences but generally rare in regulatory sequences.

    • Noah Dukler
    • , Mehreen R. Mughal
    •  & Adam Siepel
  • Article
    | Open Access

    A critical task in spatial transcriptomics analysis is to understand inherently spatial relationships among cells. Here, the authors present a deep learning framework to integrate spatial and transcriptional information, spatially extending pseudotime and revealing spatiotemporal organization of cells.

    • Honglei Ren
    • , Benjamin L. Walker
    •  & Qing Nie
  • Article
    | Open Access

    It is unclear if the molecular profiles of pancreatic ductal adenocarcinoma (PDAC) preclinical models remain stable during propagation. Here, the authors characterise clonal evolution throughout propagation in PDAC cell lines and a patient-derived organoid using single-cell genomics, transcriptomics and epigenomics.

    • Maria E. Monberg
    • , Heather Geiger
    •  & Anirban Maitra
  • Article
    | Open Access

    Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.

    • Alexander S. Leonard
    • , Danang Crysnanto
    •  & Hubert Pausch
  • Article
    | Open Access

    Biobanks of genetic data have been primarily in European populations, which gives us an incomplete understanding of complex traits across populations. Here, the authors initiate the Westlake BioBank for Chinese (WBBC) pilot project with 4,535 whole genome sequences and 5,841 high-density genotypes from China, characterizing large-scale genomic variation in Chinese populations.

    • Pei-Kuan Cong
    • , Wei-Yang Bai
    •  & Hou-Feng Zheng
  • Article
    | Open Access

    The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.

    • Lucía Álvarez-González
    • , Frances Burden
    •  & Aurora Ruiz-Herrera
  • Article
    | Open Access

    Here the authors digested chromatin with DNA fragmentation factor (DFF) prior to chromatin immunoprecipitation (DFF-ChIP) to depict transcription complex interactions with neighboring nucleosomes in cells. Applying this method to human cytomegalovirus (HMCV)-infected cells, they find that the viral genome is underchromatinized, leading to fewer transcription complex interactions with nucleosomes.

    • Benjamin M. Spector
    • , Mrutyunjaya Parida
    •  & David H. Price
  • Article
    | Open Access

    Anopheles mosquitoes are vectors of human malaria, and better understanding of them has implications for public health. Here, the authors apply Hi-C, FISH, RNA-seq, and ChIP-seq techniques to comprehensively characterize chromatin architecture and its evolutionary dynamics in five Anopheles species.

    • Varvara Lukyanchikova
    • , Miroslav Nuriddinov
    •  & Veniamin Fishman
  • Article
    | Open Access

    Plasmodium malariae is a cause of malaria in humans and related species have been identified in non-human primates. Here, the authors use genomic analyses to establish that human P. malariae arose from a host switch of an ape parasite whilst a species infecting New World monkeys can be traced to a reverse zoonosis.

    • Lindsey J. Plenderleith
    • , Weimin Liu
    •  & Paul M. Sharp
  • Article
    | Open Access

    Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.

    • Joachim Johansen
    • , Damian R. Plichta
    •  & Simon Rasmussen
  • Article
    | Open Access

    Lab-based surveillance of Shigella has traditionally been based on serotyping but increasing availability of whole genome sequencing could enable higher resolution typing. Here, the authors apply a core genome multilocus sequence typing scheme to Shigella sequence data and describe its population structure.

    • Iman Yassine
    • , Sophie Lefèvre
    •  & François-Xavier Weill
  • Article
    | Open Access

    The exceptionally long-lived naked mole-rat is characterized by the lack of increased mortality with aging. Here the authors perform epigenetic studies to show that naked mole-rats epigenetically age despite their non-increasing mortality rate.

    • Csaba Kerepesi
    • , Margarita V. Meer
    •  & Vadim N. Gladyshev
  • Article
    | Open Access

    Breast cancer heterogeneity and tumour evolutionary trajectories remain largely unknown among women of African ancestry. Here, the authors perform whole genome and transcriptome sequencing of Nigerian breast cancer patients and identify unique evolutionary phenomena.

    • Naser Ansari-Pour
    • , Yonglan Zheng
    •  & Olufunmilayo I. Olopade
  • Article
    | Open Access

    @melkebir @psashittal et al. develop a graph-based method for the assembly of discontinuous transcripts produced in Coronaviruses and other Nidovirales, enabling the discovery of transcriptional changes missed by existing methods.

    • Palash Sashittal
    • , Chuanyi Zhang
    •  & Mohammed El-Kebir
  • Article
    | Open Access

    Subclonal deconvolution in cancer sequencing data is a complex task, and the optimal tools to use are unclear. Here, the authors systematically benchmark subclonal deconvolution pipelines with a comprehensive set of simulated tumour genomes and identify the best-performing methods.

    • Georgette Tanner
    • , David R. Westhead
    •  & Lucy F. Stead
  • Article
    | Open Access

    Historical interbreeding between Neanderthals and humans should leave signatures of historical demographics in modern human genomes. Analysing the size distribution of Neanderthal fragments in non-African genomes suggests consistent differences in the generation interval across Eurasia, and that this could explain mutational spectrum variation.

    • Moisès Coll Macià
    • , Laurits Skov
    •  & Mikkel Heide Schierup
  • Article
    | Open Access

    Alternative polyadenylation regulates localization, half-life and translation of mRNA isoforms. Here the authors investigate alternative polyadenylation using single cell RNA sequencing data from mouse embryos and identify 3’-UTR isoforms that are regulated across cell types and developmental time.

    • Vikram Agarwal
    • , Sereno Lopez-Darwin
    •  & Jay Shendure
  • Article
    | Open Access

    Despite being a common congenital facial anomaly, the genetic etiology of craniofacial microsomia (CFM) is not well understood. Here, the authors use exome and genome sequencing of 146 individuals with CFM to identify haploinsufficient variants in SF3B2 as a prevalent underlying cause.

    • Andrew T. Timberlake
    • , Casey Griffin
    •  & Daniela V. Luquetti
  • Article
    | Open Access

    Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.

    • Riccardo Vicedomini
    • , Christopher Quince
    •  & Rayan Chikhi
  • Article
    | Open Access

    Traditional methods to identify genomic regions identical-by-descent (IBD) do not scale well to biobank-level datasets. Here, the authors describe a new IBD algorithm, iLASH, which uses LocAlity-Sensitive Hashing to provide rapid IBD estimation when applied to the PAGE and UK Biobank datasets.

    • Ruhollah Shemirani
    • , Gillian M. Belbin
    •  & José Luis Ambite
  • Article
    | Open Access

    Several existing algorithms predict the methylation of DNA using Nanopore sequencing signals, but it is unclear how they compare in performance. Here, the authors benchmark the performance of several such tools, and propose METEORE, a consensus tool that improves prediction accuracy.

    • Zaka Wing-Sze Yuen
    • , Akanksha Srivastava
    •  & Eduardo Eyras
  • Article
    | Open Access

    Whole genome sequencing data are increasingly becoming routinely available but generating actionable insights is challenging. Here, the authors describe Pathogenwatch, a web tool for genomic surveillance of S. Typhi, and demonstrate its use for antimicrobial resistance assignment and strain risk assessment.

    • Silvia Argimón
    • , Corin A. Yeats
    •  & David M. Aanensen
  • Article
    | Open Access

    Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.

    • Zev N. Kronenberg
    • , Arang Rhie
    •  & Sarah B. Kingan