Article
|
Open Access
Featured
-
-
Article
| Open AccessMetabolite signatures of diverse Camellia sinensis tea populations
The molecular basis for the unique taste and aroma of tea cultivars is largely unknown, but is critical for breeding new cultivars. Here the authors use transcriptomics and metabolomics to study the relationship among phylogenetic groups and specialized metabolites from 136 tea accessions in China.
- Xiaomin Yu
- , Jiajing Xiao
- & Renyi Liu
-
Article
| Open AccessWhole-genome sequencing of acral melanoma reveals genomic complexity and diversity
Acral melanoma occurs on the soles of the feet, palms of the hands and in nail beds. Here, the authors reports the genomic landscape of 87 acral melanomas and find that some tumors harbor a UV signature and that the tumors are diverse at the levels of mutational signatures, structural aberrations and copy number signatures.
- Felicity Newell
- , James S. Wilmott
- & Nicholas K. Hayward
-
Article
| Open AccessPan-neuroblastoma analysis reveals age- and signature-associated driver alterations
Genomic analysis of neuroblastoma has revealed important disease etiology. In this study, the authors assembled whole genome, exome and transcriptome data from over 700 neuroblastomas and identified molecular signatures correlated with age, and rare, potentially targetable variants overlooked in smaller cohorts.
- Samuel W. Brady
- , Yanling Liu
- & Jinghui Zhang
-
Article
| Open AccessProstate cancer evolution from multilineage primary to single lineage metastases with implications for liquid biopsy
The evolutionary progression from primary to metastatic prostate cancer is largely uncharted, and the implications for liquid biopsy are unexplored. Here, the authors use deep genomic sequencing and histopathological information to trace tumor evolution both within the prostate and during metastasis in ten men.
- D. J. Woodcock
- , E. Riabchenko
- & D. C. Wedge
-
Article
| Open AccessRisk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture
Despite the identification of genetic risk loci for late-onset Alzheimer’s disease (LOAD), the genetic architecture and prediction remains unclear. Here, the authors use genetic risk scores for prediction of LOAD across three datasets and show evidence suggesting oligogenic variant architecture for this disease.
- Qian Zhang
- , Julia Sidorenko
- & Peter M. Visscher
-
Article
| Open AccessA diploid assembly-based benchmark for variants in the major histocompatibility complex
Accurate, phased assemblies are a key tool in understanding the human genome, particularly in highly polymorphic regions like the medically important MHC. Here the authors provide an assembly-based benchmark for this difficult-to-characterize region.
- Chen-Shan Chin
- , Justin Wagner
- & Justin M. Zook
-
Article
| Open AccessA predictable conserved DNA base composition signature defines human core DNA replication origins
In metazoan the DNA sequence elements characterizing origin specification are unknown. By generating and analysing 19 SNS-seq datasets from different human cell types, the authors reveal a class and features of Core origins of replication which can be predicted by an algorithm.
- Ildem Akerman
- , Bahar Kasaai
- & Marcel Méchali
-
Article
| Open AccessImproved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets
Haplotype reconstruction of distant genetic variants is problematic in short-read sequencing. Here, the authors describe HapTree-X, a probabilistic framework that uses differential allele-specific expression to better reconstruct paternal haplotypes from diploid and polyploid genomes.
- Emily Berger
- , Deniz Yorukoglu
- & Bonnie Berger
-
Article
| Open AccessA predictive index for health status using species-level gut microbiome profiling
A biologically-interpretable and robust metric that provides insight into one’s health status from a gut microbiome sample is an important clinical goal in current human microbiome research. Herein, the authors introduce a species-level index that predicts the likelihood of having a disease.
- Vinod K. Gupta
- , Minsuk Kim
- & Jaeyun Sung
-
Article
| Open AccessSex differences in oncogenic mutational processes
There’s an emerging body of evidence to show how biological sex impacts cancer incidence, treatment and underlying biology. Here, using a large pan-cancer dataset, the authors further highlight how sex differences shape the cancer genome.
- Constance H. Li
- , Stephenie D. Prokopec
- & Christian von Mering
-
Article
| Open AccessTranscriptional activity and strain-specific history of mouse pseudogenes
Pseudogenes are key markers of genome remodelling processes. Here the authors present genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains, update human pseudogene annotations, and characterise the transcription and evolution of mouse pseudogenes.
- Cristina Sisu
- , Paul Muir
- & Mark Gerstein
-
Article
| Open AccessMachine-learning approach expands the repertoire of anti-CRISPR protein families
CRISPR-Cas is a host adaptive immunity system and viruses harbor diverse anti-CRISPR proteins (Acrs). Here, the authors develop a random forest machine-learning approach to predict Acrs, identifying 2500 candidate Acr families, which expand the current repertoire of predicted Acrs by two orders of magnitude.
- Ayal B. Gussow
- , Allyson E. Park
- & Eugene V. Koonin
-
Article
| Open AccessCross-species oncogenic signatures of breast cancer in canine mammary tumors
Comparison of spontaneous canine cancers and human cancers may illuminate future therapeutic avenues. Here, genomic analyses of these tumors highlights a convergence on PI3K-Akt oncogenic pathways.
- Tae-Min Kim
- , In Seok Yang
- & Sangwoo Kim
-
Article
| Open AccessA supervised learning framework for chromatin loop detection in genome-wide contact maps
Predicting chromatin loops from genome-wide interaction matrices such as Hi-C data provides insight into gene regulation events. Here, the authors present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps, and apply it to systematically predict chromatin loops in 56 Hi-C datasets, with results available at the 3D Genome Browser.
- Tarik J. Salameh
- , Xiaotao Wang
- & Feng Yue
-
Article
| Open AccessmTADA is a framework for identifying risk genes from de novo mutations in multiple traits
Joint analysis of multiple traits can increase power and provide insights into shared genetic architecture. Here, Nguyen et al. develop multi-trait TADA (mTADA), an extension of TADA (transmission and de novo association test) that jointly analyses de novo mutations of traits for improved risk-gene identification power.
- Tan-Hoang Nguyen
- , Amanda Dobbyn
- & Eli A. Stahl
-
Article
| Open AccessCharacterising the loss-of-function impact of 5’ untranslated region variants in 15,708 individuals
Upstream open reading frames (uORFs), located in 5’ untranslated regions, are regulators of downstream protein translation. Here, Whiffin et al. use the genomes of 15,708 individuals in the Genome Aggregation Database (gnomAD) to systematically assess the deleteriousness of variants creating or disrupting uORFs.
- Nicola Whiffin
- , Konrad J. Karczewski
- & James S. Ware
-
Article
| Open AccessTranscriptional activation during cell reprogramming correlates with the formation of 3D open chromatin hubs
Regulation of chromosome structure plays essential roles in many nuclear processes. Here, the authors present TADdyn, a tool that integrates time-course 3C data, restraint-based modelling, and molecular dynamics to simulate the structural rearrangements of genomic loci and find that during gene activation, transcription starting sites contact with open chromatin regions into active physical domains.
- Marco Di Stefano
- , Ralph Stadhouders
- & Marc A. Marti-Renom
-
Article
| Open AccessLarge-scale network analysis captures biological features of bacterial plasmids
Plasmids can mediate the exchange of genetic material between bacterial cells. Here, Acman et al. use network analyses to study the population structure and dynamics of over 10,000 plasmids, assigning them into cliques that correlate with gene content, host range, and existing classifications based on replicon and mobility types.
- Mislav Acman
- , Lucy van Dorp
- & Francois Balloux
-
Article
| Open AccessWhole genome landscapes of uveal melanoma show an ultraviolet radiation signature in iris tumours
Uveal melanoma has a propensity to metastasise. Here, the authors report the whole genome sequence of 103 uveal melanomas and find that the tumour mutational burden is variable and that two subsets of tumours are characterised by MBD4 mutations and a UV exposure signature.
- Peter A. Johansson
- , Kelly Brooks
- & Nicholas K. Hayward
-
Article
| Open AccessGenomic release-recapture experiment in the wild reveals within-generation polygenic selection in stickleback fish
Empirical examples documenting the pace of adaptation across the whole genome in wild populations are scarce. Here the authors study wild stickleback populations from lake and stream habitats and show that there is a genome-wide signature of adaptation to stream habitats within just one generation.
- Telma G. Laurentino
- , Dario Moser
- & Daniel Berner
-
Article
| Open AccessExploiting evolutionary steering to induce collateral drug sensitivity in cancer
Evolutionary steering uses therapies to control tumour evolution by exploiting trade-offs. Here, using a barcoding approach applied to large cell populations, the authors explore evolutionary steering in lung cancer cells treated with EGFR inhibitors.
- Ahmet Acar
- , Daniel Nichol
- & Andrea Sottoriva
-
Article
| Open AccessTransposable elements contribute to cell and species-specific chromatin looping and gene regulation in mammalian genomes
A fraction of mammalian CTCF binding sites fall within transposable elements (TEs) but their contribution to the evolution of 3D chromatin structure is unknown. Here the authors investigate the effect of TE-driven CTCF binding site expansions on chromatin looping in humans and mice, and provide evidence that TEs contribute to cell-specific and species-specific chromatin looping diversity and variable gene regulation in mammalian genomes.
- Adam G. Diehl
- , Ningxin Ouyang
- & Alan P. Boyle
-
Article
| Open AccessDimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction
Population structure, even subtle differences within seemingly homogenous populations, can have an impact on the accuracy of polygenic prediction. Here, Sakaue et al. use dimensionality reduction methods to reveal fine-scale structure in the Biobank Japan cohort and explore the performance of polygenic risk scores.
- Saori Sakaue
- , Jun Hirata
- & Yukinori Okada
-
Article
| Open AccessFull-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns
Long-read sequencing is useful in determining exon-connectivity of full-length mRNA isoforms. Here, by long-read nanopore sequencing, the authors report that intron retention is downregulated in SF3B1 mutant chronic lymphocytic leukemia cells than normal B cells.
- Alison D. Tang
- , Cameron M. Soulette
- & Angela N. Brooks
-
Article
| Open AccessGenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes
Prior to genome assembly, the raw sequencing reads must be analyzed for assessment of major genome characteristics such as genome size, heterozygosity, and repetitiveness. For this purpose, the authors introduce GenomeScope 2.0, an extension of GenomeScope for polyploid genomes, and Smudgeplot, which can estimate a genome’s ploidy.
- T. Rhyker Ranallo-Benavidez
- , Kamil S. Jaron
- & Michael C. Schatz
-
Article
| Open AccessInterplay between DNA damage repair and apoptosis shapes cancer evolution through aneuploidy and microsatellite instability
The interplay between driver mutations and aneuploidy during tumorigenesis is largely unexplored. Here, the authors show two types of associations, leading to different therapeutic vulnerabilities and prognoses.
- Noam Auslander
- , Yuri I. Wolf
- & Eugene V. Koonin
-
Article
| Open AccessDetermining sequencing depth in a single-cell RNA-seq experiment
For single-cell RNA-seq experiments the sequencing budget is limited, and how it should be optimally allocated to maximize information is not clear. Here the authors develop a mathematical framework to show that, for estimating many gene properties, the optimal allocation is to sequence at the depth of one read per cell per gene.
- Martin Jinye Zhang
- , Vasilis Ntranos
- & David Tse
-
Article
| Open AccessCommunity diversity and habitat structure shape the repertoire of extracellular proteins in bacteria
Microbes secrete a repertoire of extracellular proteins to serve various functions depending on the ecological context. Here the authors examine how bacterial community composition and habitat structure affect the extracellular proteins, showing that generalist species and those living in more structured environments produce more extracellular proteins, and that costs of production are lower in more diverse communities.
- Marc Garcia-Garcera
- & Eduardo P. C. Rocha
-
Article
| Open AccessTranscriptional effects of copy number alterations in a large set of human cancers
Copy number alterations (CNAs) can drive tumor progression in cancer by altering gene expression levels, but transcriptional adaption can skew CNA impact. Here, the authors present transcriptional adaptation to CNA (TACNA) profiling; a tool to extract the transcriptional effect of CNAs from expression data without requiring paired CNA profiles.
- Arkajyoti Bhattacharya
- , Rico D. Bense
- & Rudolf S. N. Fehrmann
-
Article
| Open AccessLinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data
Compared to single nucleotide variants and short indels, structural variants (SVs) are often more challenging to detect using high-throughput sequencing based methods. Here, the authors develop LinkedSV, a computational tool for SV detection using linked-read exome and genome sequencing data.
- Li Fang
- , Charlly Kao
- & Kai Wang
-
Article
| Open AccessAssembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads
Repetitive sequences in complex eukaryote genomes can cause fragmented assemblies with incomplete gene sequences and unanchored or mispositioned contigs. Here, the authors report HERA, a method to improve genome assemblies by efficiently resolving repeats using single-molecule sequencing data.
- Huilong Du
- & Chengzhi Liang
-
Article
| Open AccessA multi-sample approach increases the accuracy of transcript assembly
Transcript assembly is an important step in analysis of RNA-seq data whose accuracy influences downstream quantification, detection and characterization of alternative splice variants. Here, the authors develop PsiCLASS, a transcript assembler leveraging simultaneous analysis of multiple RNA-seq samples.
- Li Song
- , Sarven Sabunciyan
- & Liliana Florea
-
Article
| Open AccessDe novo compartment deconvolution and weight estimation of tumor samples using DECODER
Separating different cell compartments from bulk gene expression data can be challenging. Here the authors present DECODER, which can perform de novo deconvolutions on non-negative matrices including microarray, RNA-seq and ATAC-seq data sets.
- Xianlu Laura Peng
- , Richard A. Moffitt
- & Jen Jen Yeh
-
Article
| Open AccessMulti-strategic RNA-seq analysis reveals a high-resolution transcriptional landscape in cotton
In-depth functional characterization of genomes relies on comprehensive transcriptome data. Here, the authors employ four complementary RNA sequencing technologies to explore the transcription landscape across 16 tissues or different organ types in diploid A genome cotton using a newly developed computational pipeline.
- Kun Wang
- , Dehe Wang
- & Yuxian Zhu
-
Article
| Open AccessLongshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing
Single-molecule sequencing (SMS) such as Pacific Biosciences and Oxford Nanopore generate long reads with high error rate. Here, the authors develop Longshot, a computational method that detects and phases single nucleotide variants (SNV) in diploid genomes using SMS data.
- Peter Edge
- & Vikas Bansal
-
Article
| Open AccessIdentification of significant chromatin contacts from HiChIP data by FitHiChIP
HiChIP/PLAC-seq assay is popular for profiling 3D genome interactions among regulatory elements at kilobase resolution. Here the authors describe FitHiChIP an empirical null-based, flexible computational method for statistical significance estimation and loop calling from HiChIP data.
- Sourya Bhattacharyya
- , Vivek Chandra
- & Ferhat Ay
-
Article
| Open AccessTranslational coupling via termination-reinitiation in archaea and bacteria
Archaea and bacteria often have gene pairs with overlapping stop and start codons, suggesting translational coupling. Here, Huber et al. analyse overlapping gene pairs from 720 genomes, and validate translational coupling via termination-reinitiation for 14 gene pairs in Haloferax volcanii and Escherichia coli.
- Madeleine Huber
- , Guilhem Faure
- & Jörg Soppa
-
Article
| Open AccessSmu1 and RED are required for activation of spliceosomal B complexes assembled on short introns
Human spliceosome components Smu1 and RED regulate alternative splicing. Here the authors show that Smu1 and RED are also required for constitutive splicing of short introns.
- Sandra Keiper
- , Panagiotis Papasaikas
- & Reinhard Lührmann
-
Article
| Open AccessStrain-level metagenomic assignment and compositional estimation for long reads with MetaMaps
Sequencing platforms, such as Oxford Nanopore or Pacific Biosciences generate long-read data that preserve long-range genomic information but have high error rates. Here, the authors develop MetaMaps, a computational tool for strain-level metagenomic assignment and compositional estimation using long reads.
- Alexander T. Dilthey
- , Chirag Jain
- & Adam M. Phillippy
-
Article
| Open AccessPRC1 collaborates with SMCHD1 to fold the X-chromosome and spread Xist RNA between chromosome compartments
The inactive X (Xi)-specific S1/S2 chromosome compartments are merged by SMCHD1, but how the S1/S2 structure is constructed is unclear. The authors find that PRC1 drives the formation of S1/S2s and that the stepwise folding process of the Xi facilitates Xist RNA spreading between Xi compartments.
- Chen-Yu Wang
- , David Colognori
- & Jeannie T. Lee
-
Article
| Open AccessSimulating multiple faceted variability in single cell RNA sequencing
Simulated single cell RNA sequencing data is useful for method development and comparison. Here, the authors developed SymSim, a simulator that explicitly models the main factors of variation in single cell data.
- Xiuwei Zhang
- , Chenling Xu
- & Nir Yosef
-
Article
| Open AccessDetection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data
DNA modification generates unique electric signals in Oxford Nanopore sequencing data but the signals can be complicated to decipher. Here, the authors develop a deep learning framework, DeepMod, to detect DNA base modifications including 5mC and 6mA using Nanopore sequencing data
- Qian Liu
- , Li Fang
- & Kai Wang
-
Article
| Open AccessA systems biology approach uncovers cell-specific gene regulatory effects of genetic associations in multiple sclerosis
Genome-wide association studies (GWAS) have so far uncovered more than 200 loci for multiple sclerosis (MS). Here, the authors integrate data from various sources for a cell type-specific pathway analysis of MS GWAS results that specifically highlights the involvement of the immune system in disease pathogenesis.
- Lohith Madireddy
- , Nikolaos A. Patsopoulos
- & Sergio E. Baranzini
-
Article
| Open AccessBacteroidetes use thousands of enzyme combinations to break down glycans
Bacteroidetes genomes contain polysaccharide utilization loci (PULs), each of which encodes enzymes for the breakdown of one particular glycan. By analyzing the enzyme composition of 13,537 PULs, the authors suggest that the natural glycan diversity is orders of magnitude lower than previously proposed.
- Pascal Lapébie
- , Vincent Lombard
- & Bernard Henrissat
-
Article
| Open AccessSequencing of human genomes with nanopore technology
Nanopore sequencing technology generates longer reads than current technologies, but with more errors. Here, the authors develop new analytical tools to improve accuracy and evaluate the potential of nanopore sequencing for clinical human genomics.
- Rory Bowden
- , Robert W. Davies
- & Peter Donnelly
-
Article
| Open AccessPlatanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions
Most phasing programmes for sequencing data work well for genomes with low heterozygosity but drop in performance in regions of high heterozygosity. Here, Kajitani et al. develop the assembler Platanus-allee and demonstrate its utility in de novo assemblies of various genomes and the human MHC region.
- Rei Kajitani
- , Dai Yoshimura
- & Takehiko Itoh
-
Article
| Open AccessChiral DNA sequences as commutable controls for clinical genomics
Any DNA sequence can be represented by a chiral partner sequence – an exact copy arranged in reverse nucleotide order. Here, the authors show that chiral DNA sequence pairs share important properties and show the utility of synthetic chiral sequences (sequins) as controls for clinical genomics.
- Ira W. Deveson
- , Bindu Swapna Madala
- & Tim R. Mercer
-
Article
| Open AccessA multi-task convolutional deep neural network for variant calling in single molecule sequencing
Single Molecule Sequencing (SMS) technologies generate long but noisy reads data. Here, the authors develop Clairvoyante, a deep neural network-based method for variant calling with SMS reads such as PacBio and ONT data.
- Ruibang Luo
- , Fritz J. Sedlazeck
- & Michael C. Schatz
-
Article
| Open AccessSelective single molecule sequencing and assembly of a human Y chromosome of African origin
Due to various structural and sequence complexities, the human Y chromosome is challenging to sequence and characterize. Here, the authors develop a strategy to sequence native, unamplified flow sorted Y chromosomes with a nanopore sequencing platform, and report the first assembly of a human Y chromosome of African origin.
- Lukas F. K. Kuderna
- , Esther Lizano
- & Tomas Marques-Bonet