Genome informatics

  • Article
    | Open Access

    Historical interbreeding between Neanderthals and humans should leave signatures of historical demographics in modern human genomes. Analysing the size distribution of Neanderthal fragments in non-African genomes suggests consistent differences in the generation interval across Eurasia, and that this could explain mutational spectrum variation.

    • Moisès Coll Macià
    • , Laurits Skov
    •  & Mikkel Heide Schierup
  • Article
    | Open Access

    Alternative polyadenylation regulates localization, half-life and translation of mRNA isoforms. Here the authors investigate alternative polyadenylation using single cell RNA sequencing data from mouse embryos and identify 3’-UTR isoforms that are regulated across cell types and developmental time.

    • Vikram Agarwal
    • , Sereno Lopez-Darwin
    •  & Jay Shendure
  • Article
    | Open Access

    Despite being a common congenital facial anomaly, the genetic etiology of craniofacial microsomia (CFM) is not well understood. Here, the authors use exome and genome sequencing of 146 individuals with CFM to identify haploinsufficient variants in SF3B2 as a prevalent underlying cause.

    • Andrew T. Timberlake
    • , Casey Griffin
    •  & Daniela V. Luquetti
  • Article
    | Open Access

    Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.

    • Riccardo Vicedomini
    • , Christopher Quince
    •  & Rayan Chikhi
  • Article
    | Open Access

    Traditional methods to identify genomic regions identical-by-descent (IBD) do not scale well to biobank-level datasets. Here, the authors describe a new IBD algorithm, iLASH, which uses LocAlity-Sensitive Hashing to provide rapid IBD estimation when applied to the PAGE and UK Biobank datasets.

    • Ruhollah Shemirani
    • , Gillian M. Belbin
    •  & José Luis Ambite
  • Article
    | Open Access

    Several existing algorithms predict the methylation of DNA using Nanopore sequencing signals, but it is unclear how they compare in performance. Here, the authors benchmark the performance of several such tools, and propose METEORE, a consensus tool that improves prediction accuracy.

    • Zaka Wing-Sze Yuen
    • , Akanksha Srivastava
    •  & Eduardo Eyras
  • Article
    | Open Access

    Whole genome sequencing data are increasingly becoming routinely available but generating actionable insights is challenging. Here, the authors describe Pathogenwatch, a web tool for genomic surveillance of S. Typhi, and demonstrate its use for antimicrobial resistance assignment and strain risk assessment.

    • Silvia Argimón
    • , Corin A. Yeats
    •  & David M. Aanensen
  • Article
    | Open Access

    Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.

    • Zev N. Kronenberg
    • , Arang Rhie
    •  & Sarah B. Kingan
  • Article
    | Open Access

    Conventional single-cell RNA sequencing analysis rely on genome annotations that may be incomplete or inaccurate especially for understudied organisms. Here the authors present a bioinformatic tool that leverages single-cell data to uncover biologically relevant transcripts beyond the best available genome annotation.

    • Michael F. Z. Wang
    • , Madhav Mantri
    •  & Iwijn De Vlaminck
  • Article
    | Open Access

    Clustering cells based on similarities in gene expression is the first step towards identifying cell types in scRNASeq data. Here the authors incorporate biological knowledge into the clustering step to facilitate the biological interpretability of clusters, and subsequent cell type identification.

    • Tian Tian
    • , Jie Zhang
    •  & Hakon Hakonarson
  • Article
    | Open Access

    Sonic Hedgehog medulloblastoma (Shh-MB) comprises four subtypes each with distinct clinical traits. Here the authors characterize the genome, transcriptome, and methylome of Shh-MB subtypes, revealing a complex fusion landscape and the molecular convergence of MYCN and cAMP signaling pathways.

    • Patryk Skowron
    • , Hamza Farooq
    •  & Michael D. Taylor
  • Article
    | Open Access

    Here, the authors analyze 4907 Circular Metagenome Assembled Genomes from human microbiomes and identify and characterize nearly 600 diverse genomes of crAss-like phages, finding two putative families with unusual genomic features, including high density of self-splicing introns and inteins.

    • Natalya Yutin
    • , Sean Benler
    •  & Eugene V. Koonin
  • Article
    | Open Access

    Genomic prediction of phenotype may be improved by using DNA mutations with functional, evolutionary, and pleiotropic consequences. Here the authors describe a method for genome-wide fine-mapping of QTLs and develop a genotyping array for improved prediction of genetic values for cattle traits.

    • Ruidong Xiang
    • , Iona M. MacLeod
    •  & Michael E. Goddard
  • Article
    | Open Access

    Identifying structural variants (SVs) from whole genome sequence data has been a significant bioinformatic challenge. Here, the authors describe PopDel, which uses a joint SV detection approach to reliably and efficiently identify 500-10,000 bp deletions across large population cohorts.

    • Sebastian Niehus
    • , Hákon Jónsson
    •  & Birte Kehr
  • Article
    | Open Access

    The TGFβ signaling pathway has been shown to regulate transcription by regulating enhancer activity. Here, the authors perform a comprehensive analysis of enhancers in normal mammary epithelial gland cells to elucidate how TGFβ-dependent enhancers control gene transcription in these cells.

    • Jose A. Guerrero-Martínez
    • , María Ceballos-Chávez
    •  & Jose C. Reyes
  • Article
    | Open Access

    Large-scale sequencing efforts have uncovered a large number of secondary metabolic pathways, but the chemicals they synthesise remain unknown. Here the authors present PRISM 4, which predicts the chemical structures encoded by microbial genome sequences, including all classes of bacterial antibiotics in clinical use.

    • Michael A. Skinnider
    • , Chad W. Johnston
    •  & Nathan A. Magarvey
  • Article
    | Open Access

    Chromatin loops bridging distant loci within chromosomes can be detected by a variety of techniques such as Hi-C. Here the authors present Chromosight, an algorithm applied on mammalian, bacterial, viral and yeast genomes, able to detect various types of pattern in chromosome contact maps, including chromosomal loops.

    • Cyril Matthey-Doret
    • , Lyam Baudry
    •  & Axel Cournac
  • Article
    | Open Access

    The molecular basis for the unique taste and aroma of tea cultivars is largely unknown, but is critical for breeding new cultivars. Here the authors use transcriptomics and metabolomics to study the relationship among phylogenetic groups and specialized metabolites from 136 tea accessions in China.

    • Xiaomin Yu
    • , Jiajing Xiao
    •  & Renyi Liu
  • Article
    | Open Access

    Acral melanoma occurs on the soles of the feet, palms of the hands and in nail beds. Here, the authors reports the genomic landscape of 87 acral melanomas and find that some tumors harbor a UV signature and that the tumors are diverse at the levels of mutational signatures, structural aberrations and copy number signatures.

    • Felicity Newell
    • , James S. Wilmott
    •  & Nicholas K. Hayward
  • Article
    | Open Access

    Genomic analysis of neuroblastoma has revealed important disease etiology. In this study, the authors assembled whole genome, exome and transcriptome data from over 700 neuroblastomas and identified molecular signatures correlated with age, and rare, potentially targetable variants overlooked in smaller cohorts.

    • Samuel W. Brady
    • , Yanling Liu
    •  & Jinghui Zhang
  • Article
    | Open Access

    The evolutionary progression from primary to metastatic prostate cancer is largely uncharted, and the implications for liquid biopsy are unexplored. Here, the authors use deep genomic sequencing and histopathological information to trace tumor evolution both within the prostate and during metastasis in ten men.

    • D. J. Woodcock
    • , E. Riabchenko
    •  & D. C. Wedge
  • Article
    | Open Access

    Despite the identification of genetic risk loci for late-onset Alzheimer’s disease (LOAD), the genetic architecture and prediction remains unclear. Here, the authors use genetic risk scores for prediction of LOAD across three datasets and show evidence suggesting oligogenic variant architecture for this disease.

    • Qian Zhang
    • , Julia Sidorenko
    •  & Peter M. Visscher
  • Article
    | Open Access

    A biologically-interpretable and robust metric that provides insight into one’s health status from a gut microbiome sample is an important clinical goal in current human microbiome research. Herein, the authors introduce a species-level index that predicts the likelihood of having a disease.

    • Vinod K. Gupta
    • , Minsuk Kim
    •  & Jaeyun Sung
  • Article
    | Open Access

    There’s an emerging body of evidence to show how biological sex impacts cancer incidence, treatment and underlying biology. Here, using a large pan-cancer dataset, the authors further highlight how sex differences shape the cancer genome.

    • Constance H. Li
    • , Stephenie D. Prokopec
    •  & Christian von Mering
  • Article
    | Open Access

    Pseudogenes are key markers of genome remodelling processes. Here the authors present genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains, update human pseudogene annotations, and characterise the transcription and evolution of mouse pseudogenes.

    • Cristina Sisu
    • , Paul Muir
    •  & Mark Gerstein
  • Article
    | Open Access

    CRISPR-Cas is a host adaptive immunity system and viruses harbor diverse anti-CRISPR proteins (Acrs). Here, the authors develop a random forest machine-learning approach to predict Acrs, identifying 2500 candidate Acr families, which expand the current repertoire of predicted Acrs by two orders of magnitude.

    • Ayal B. Gussow
    • , Allyson E. Park
    •  & Eugene V. Koonin
  • Article
    | Open Access

    Predicting chromatin loops from genome-wide interaction matrices such as Hi-C data provides insight into gene regulation events. Here, the authors present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps, and apply it to systematically predict chromatin loops in 56 Hi-C datasets, with results available at the 3D Genome Browser.

    • Tarik J. Salameh
    • , Xiaotao Wang
    •  & Feng Yue
  • Article
    | Open Access

    Joint analysis of multiple traits can increase power and provide insights into shared genetic architecture. Here, Nguyen et al. develop multi-trait TADA (mTADA), an extension of TADA (transmission and de novo association test) that jointly analyses de novo mutations of traits for improved risk-gene identification power.

    • Tan-Hoang Nguyen
    • , Amanda Dobbyn
    •  & Eli A. Stahl
  • Article
    | Open Access

    Upstream open reading frames (uORFs), located in 5’ untranslated regions, are regulators of downstream protein translation. Here, Whiffin et al. use the genomes of 15,708 individuals in the Genome Aggregation Database (gnomAD) to systematically assess the deleteriousness of variants creating or disrupting uORFs.

    • Nicola Whiffin
    • , Konrad J. Karczewski
    •  & James S. Ware
  • Article
    | Open Access

    Regulation of chromosome structure plays essential roles in many nuclear processes. Here, the authors present TADdyn, a tool that integrates time-course 3C data, restraint-based modelling, and molecular dynamics to simulate the structural rearrangements of genomic loci and find that during gene activation, transcription starting sites contact with open chromatin regions into active physical domains.

    • Marco Di Stefano
    • , Ralph Stadhouders
    •  & Marc A. Marti-Renom
  • Article
    | Open Access

    Plasmids can mediate the exchange of genetic material between bacterial cells. Here, Acman et al. use network analyses to study the population structure and dynamics of over 10,000 plasmids, assigning them into cliques that correlate with gene content, host range, and existing classifications based on replicon and mobility types.

    • Mislav Acman
    • , Lucy van Dorp
    •  & Francois Balloux
  • Article
    | Open Access

    Empirical examples documenting the pace of adaptation across the whole genome in wild populations are scarce. Here the authors study wild stickleback populations from lake and stream habitats and show that there is a genome-wide signature of adaptation to stream habitats within just one generation.

    • Telma G. Laurentino
    • , Dario Moser
    •  & Daniel Berner
  • Article
    | Open Access

    Evolutionary steering uses therapies to control tumour evolution by exploiting trade-offs. Here, using a barcoding approach applied to large cell populations, the authors explore evolutionary steering in lung cancer cells treated with EGFR inhibitors.

    • Ahmet Acar
    • , Daniel Nichol
    •  & Andrea Sottoriva
  • Article
    | Open Access

    A fraction of mammalian CTCF binding sites fall within transposable elements (TEs) but their contribution to the evolution of 3D chromatin structure is unknown. Here the authors investigate the effect of TE-driven CTCF binding site expansions on chromatin looping in humans and mice, and provide evidence that TEs contribute to cell-specific and species-specific chromatin looping diversity and variable gene regulation in mammalian genomes.

    • Adam G. Diehl
    • , Ningxin Ouyang
    •  & Alan P. Boyle
  • Article
    | Open Access

    Population structure, even subtle differences within seemingly homogenous populations, can have an impact on the accuracy of polygenic prediction. Here, Sakaue et al. use dimensionality reduction methods to reveal fine-scale structure in the Biobank Japan cohort and explore the performance of polygenic risk scores.

    • Saori Sakaue
    • , Jun Hirata
    •  & Yukinori Okada
  • Article
    | Open Access

    Prior to genome assembly, the raw sequencing reads must be analyzed for assessment of major genome characteristics such as genome size, heterozygosity, and repetitiveness. For this purpose, the authors introduce GenomeScope 2.0, an extension of GenomeScope for polyploid genomes, and Smudgeplot, which can estimate a genome’s ploidy.

    • T. Rhyker Ranallo-Benavidez
    • , Kamil S. Jaron
    •  & Michael C. Schatz