-
-
Article
| Open AccessIdentification of significant chromatin contacts from HiChIP data by FitHiChIP
HiChIP/PLAC-seq assay is popular for profiling 3D genome interactions among regulatory elements at kilobase resolution. Here the authors describe FitHiChIP an empirical null-based, flexible computational method for statistical significance estimation and loop calling from HiChIP data.
- Sourya Bhattacharyya
- , Vivek Chandra
- & Ferhat Ay
-
Article
| Open AccessTranslational coupling via termination-reinitiation in archaea and bacteria
Archaea and bacteria often have gene pairs with overlapping stop and start codons, suggesting translational coupling. Here, Huber et al. analyse overlapping gene pairs from 720 genomes, and validate translational coupling via termination-reinitiation for 14 gene pairs in Haloferax volcanii and Escherichia coli.
- Madeleine Huber
- , Guilhem Faure
- & Jörg Soppa
-
Article
| Open AccessSmu1 and RED are required for activation of spliceosomal B complexes assembled on short introns
Human spliceosome components Smu1 and RED regulate alternative splicing. Here the authors show that Smu1 and RED are also required for constitutive splicing of short introns.
- Sandra Keiper
- , Panagiotis Papasaikas
- & Reinhard Lührmann
-
Article
| Open AccessStrain-level metagenomic assignment and compositional estimation for long reads with MetaMaps
Sequencing platforms, such as Oxford Nanopore or Pacific Biosciences generate long-read data that preserve long-range genomic information but have high error rates. Here, the authors develop MetaMaps, a computational tool for strain-level metagenomic assignment and compositional estimation using long reads.
- Alexander T. Dilthey
- , Chirag Jain
- & Adam M. Phillippy
-
Article
| Open AccessPRC1 collaborates with SMCHD1 to fold the X-chromosome and spread Xist RNA between chromosome compartments
The inactive X (Xi)-specific S1/S2 chromosome compartments are merged by SMCHD1, but how the S1/S2 structure is constructed is unclear. The authors find that PRC1 drives the formation of S1/S2s and that the stepwise folding process of the Xi facilitates Xist RNA spreading between Xi compartments.
- Chen-Yu Wang
- , David Colognori
- & Jeannie T. Lee
-
Article
| Open AccessSimulating multiple faceted variability in single cell RNA sequencing
Simulated single cell RNA sequencing data is useful for method development and comparison. Here, the authors developed SymSim, a simulator that explicitly models the main factors of variation in single cell data.
- Xiuwei Zhang
- , Chenling Xu
- & Nir Yosef
-
Article
| Open AccessDetection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data
DNA modification generates unique electric signals in Oxford Nanopore sequencing data but the signals can be complicated to decipher. Here, the authors develop a deep learning framework, DeepMod, to detect DNA base modifications including 5mC and 6mA using Nanopore sequencing data
- Qian Liu
- , Li Fang
- & Kai Wang
-
Article
| Open AccessA systems biology approach uncovers cell-specific gene regulatory effects of genetic associations in multiple sclerosis
Genome-wide association studies (GWAS) have so far uncovered more than 200 loci for multiple sclerosis (MS). Here, the authors integrate data from various sources for a cell type-specific pathway analysis of MS GWAS results that specifically highlights the involvement of the immune system in disease pathogenesis.
- Lohith Madireddy
- , Nikolaos A. Patsopoulos
- & Sergio E. Baranzini
-
Article
| Open AccessBacteroidetes use thousands of enzyme combinations to break down glycans
Bacteroidetes genomes contain polysaccharide utilization loci (PULs), each of which encodes enzymes for the breakdown of one particular glycan. By analyzing the enzyme composition of 13,537 PULs, the authors suggest that the natural glycan diversity is orders of magnitude lower than previously proposed.
- Pascal Lapébie
- , Vincent Lombard
- & Bernard Henrissat
-
Article
| Open AccessSequencing of human genomes with nanopore technology
Nanopore sequencing technology generates longer reads than current technologies, but with more errors. Here, the authors develop new analytical tools to improve accuracy and evaluate the potential of nanopore sequencing for clinical human genomics.
- Rory Bowden
- , Robert W. Davies
- & Peter Donnelly
-
Article
| Open AccessPlatanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions
Most phasing programmes for sequencing data work well for genomes with low heterozygosity but drop in performance in regions of high heterozygosity. Here, Kajitani et al. develop the assembler Platanus-allee and demonstrate its utility in de novo assemblies of various genomes and the human MHC region.
- Rei Kajitani
- , Dai Yoshimura
- & Takehiko Itoh
-
Article
| Open AccessChiral DNA sequences as commutable controls for clinical genomics
Any DNA sequence can be represented by a chiral partner sequence – an exact copy arranged in reverse nucleotide order. Here, the authors show that chiral DNA sequence pairs share important properties and show the utility of synthetic chiral sequences (sequins) as controls for clinical genomics.
- Ira W. Deveson
- , Bindu Swapna Madala
- & Tim R. Mercer
-
Article
| Open AccessA multi-task convolutional deep neural network for variant calling in single molecule sequencing
Single Molecule Sequencing (SMS) technologies generate long but noisy reads data. Here, the authors develop Clairvoyante, a deep neural network-based method for variant calling with SMS reads such as PacBio and ONT data.
- Ruibang Luo
- , Fritz J. Sedlazeck
- & Michael C. Schatz
-
Article
| Open AccessSelective single molecule sequencing and assembly of a human Y chromosome of African origin
Due to various structural and sequence complexities, the human Y chromosome is challenging to sequence and characterize. Here, the authors develop a strategy to sequence native, unamplified flow sorted Y chromosomes with a nanopore sequencing platform, and report the first assembly of a human Y chromosome of African origin.
- Lukas F. K. Kuderna
- , Esther Lizano
- & Tomas Marques-Bonet
-
Article
| Open AccessComparative expression profiling reveals widespread coordinated evolution of gene expression across eukaryotes
Gene pairs that are coexpressed across various environmental conditions in multiple species suggest functional similarity. Here the authors analyze patterns of gene expression co-evolution across diverse eukaryotes, and identify hundreds of protein complexes and pathways whose gene expression levels have co-evolved since their ancient divergence.
- Trevor Martin
- & Hunter B. Fraser
-
Article
| Open AccessWhole genome sequencing puts forward hypotheses on metastasis evolution and therapy in colorectal cancer
The evolution and genetic nature of metastatic lesions is not completely characterized. Here the authors perform a comprehensive whole-genome study of colorectal metastases in comparison to matched primary tumors and define a multistage progression model and metastasis-specific changes that, in part, are therapeutically actionable.
- Naveed Ishaque
- , Mohammed L. Abba
- & Heike Allgayer
-
Article
| Open AccessDeciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon
Transcripts from highly-similar multigene families are challenging to decipher. Here, the authors develop IsoCon, a tool for detecting and reconstructing isoforms from multigene families by analyzing long PacBio Iso-Seq reads.
- Kristoffer Sahlin
- , Marta Tomaszkiewicz
- & Paul Medvedev
-
Article
| Open AccessImproved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration
Integrated analyses of multiple large-scale screenings can be complicated by batch effects and technical artefacts. McFarland et al. introduce DEMETER2, a hierarchical model coupled with model-based normalization, which allows the assessment of differential dependencies across genes and cell lines.
- James M. McFarland
- , Zandra V. Ho
- & Aviad Tsherniak
-
Article
| Open AccessDeterminants of promoter and enhancer transcription directionality in metazoans
Divergent transcription from promoters and enhancers occurs in many species, but it is unclear if it is a general feature of all eukaryotic cis regulatory elements. Here the authors define cis regulatory elements in worms, flies, and human; and identify several differences in regulatory architecture among metazoans.
- Mahmoud M. Ibrahim
- , Aslihan Karabacak
- & Uwe Ohler
-
Article
| Open AccessA reference haplotype panel for genome-wide imputation of short tandem repeats
Short-tandem repeats (STR), similar to single nucleotide polymorphisms (SNP), contribute to complex traits, but their ascertainment by next-generation sequencing is costly. Here, Saini et al. provide a SNP+STR haplotype reference panel that allows imputation of STRs from SNP array data.
- Shubham Saini
- , Ileena Mitra
- & Melissa Gymrek
-
Article
| Open AccessDefective transcription elongation in a subset of cancers confers immunotherapy resistance
Transcription elongation (TE) is a key point of inducible gene expression regulation. Here, the authors report widespread TE defects (TEdeff) in a high proportion of cancers that correlate with poor immunotherapy response, highlighting TE defects as potential routes for immune resistance.
- Vishnu Modur
- , Navneet Singh
- & Kakajan Komurov
-
Article
| Open AccessPredicting CTCF-mediated chromatin interactions by integrating genomic and epigenomic features
CTCF mediates long-range chromatin interactions which are important for genome organization and function. Here, the authors demonstrate that CTCF-mediated interactome exhibits extensive plasticity and present Lollipop, a machine-learning framework which predicts CTCF-mediated long-range interactions using genomic and epigenomic features.
- Yan Kai
- , Jaclyn Andricovich
- & Weiqun Peng
-
Article
| Open AccessFunctional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
Sharing of whole genome sequencing (WGS) data improves study scale and power, but data from different groups are often incompatible. Here, US genome centers and NIH programs define WGS data processing standards and a flexible validation method, facilitating collaboration in human genetics research.
- Allison A. Regier
- , Yossi Farjoun
- & Ira M. Hall
-
Article
| Open AccessClinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome
Clinical oncology is rapidly adopting next-generation sequencing technology for nucleotide variant and indel detection. Here the authors present a three-platform approach (whole-genome, whole-exome, and whole-transcriptome) in pediatric patients for the detection of diverse types of germline and somatic variants.
- Michael Rusch
- , Joy Nakitandwe
- & Jinghui Zhang
-
Article
| Open AccessDecoding topologically associating domains with ultra-low resolution Hi-C data by graph structural entropy
Accurate detection of TADs requires ultra-deep sequencing and sophisticated normalisation procedures, which limits the analysis of Hi-C data. Here the authors develop a normalisation-free method to decode the domains of chromosomes (deDoc) that utilizes structural entropy to predict TADs with ultra-low sequencing data.
- Angsheng Li
- , Xianchen Yin
- & Zhihua Zhang
-
Article
| Open AccessSynthetic microbe communities provide internal reference standards for metagenome sequencing and analysis
Complex microbial communities pose a challenge to metagenomic analysis. Here the authors develop ‘sequins’, internal DNA standards that represent a synthetic community of artificial genomes.
- Simon A. Hardwick
- , Wendy Y. Chen
- & Tim R. Mercer
-
Article
| Open AccessDe novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations
The majority of the human reference genome assembly is represented as a single consensus haplotype. Here, Wong et al. analyze de novo assemblies of 17 diverse, haplotype-resolved genomes to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations.
- Karen H. Y. Wong
- , Michal Levy-Sakin
- & Pui-Yan Kwok
-
Article
| Open AccessThe evolution of the temporal program of genome replication
Temporal programs of genome replication show different levels of conservation between closely or distantly related species. Here, the authors generate genome-wide replication timing profiles for ten yeast species, and analyze their evolutionary dynamics.
- Nicolas Agier
- , Stéphane Delmas
- & Gilles Fischer
-
Article
| Open AccessThe effects of mutational processes and selection on driver mutations across cancer types
A central question in cancer research is how specific driver mutations are acquired and maintained during cancer development. Here Temko et al. use public sequencing data to infer the effect of mutation and selection on a set of driver mutations and suggest that selection frequently dominates.
- Daniel Temko
- , Ian P. M. Tomlinson
- & Trevor A. Graham
-
Article
| Open AccessBayesian nonparametric discovery of isoforms and individual specific quantification
Alternative splicing leads to transcript isoform diversity. Here, Aguiar et al. develop biisq, a Bayesian nonparametric approach to discover and quantify isoforms from RNA-seq data.
- Derek Aguiar
- , Li-Fang Cheng
- & Barbara E. Engelhardt
-
Article
| Open AccessIdentification of rare sequence variation underlying heritable pulmonary arterial hypertension
Pulmonary arterial hypertension (PAH) is a rare lung disorder characterised by narrowing and obliteration of small pulmonary arteries ultimately leading to right heart failure. Here, the authors sequence whole genomes of over 1000 PAH patients and identify likely causal variants in GDF2, ATP13A3, AQP1 and SOX17.
- Stefan Gräf
- , Matthias Haimel
- & Nicholas W. Morrell
-
Article
| Open AccessA genomics approach reveals insights into the importance of gene losses for mammalian adaptations
Gene losses are generally considered detrimental, or at best neutral. Here, Sharma and colleagues present a new comparative genomics method to detect gene losses and highlight cases of gene losses in mammals that have potentially contributed to adaptive phenotypic innovations.
- Virag Sharma
- , Nikolai Hecker
- & Michael Hiller
-
Article
| Open AccessHigh contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell
Long-read sequencing technologies facilitate efficient and high quality genome assembly. Here Michael et al. achieve a fast reference assembly for Arabidopsis thaliana KBS-Mac-74 accession using the handheld Oxford Nanopore MinION sequencer and consumer computing hardware, and demonstrate its usefulness in resolving complex structural variation.
- Todd P. Michael
- , Florian Jupe
- & Joseph R. Ecker
-
Article
| Open AccessTranscriptional decomposition reveals active chromatin architectures and cell specific regulatory interactions
Transcriptional regulation is coupled with chromosomal positioning and chromatin architecture. Here the authors develop a transcriptional decomposition approach to separate expression associated with genome structure from independent effects not directly associated with genomic positioning.
- Sarah Rennie
- , Maria Dalby
- & Robin Andersson
-
Article
| Open AccessPromoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains
Proximity-ligation methods like Hi-C map DNA-DNA interactions and reveal its organization into topologically associating domains (TADs). Here the authors describe PSYCHIC, a computational approach for analysing Hi-C data that allows the identification of promoter-enhancer interactions.
- Gil Ron
- , Yuval Globerson
- & Tommy Kaplan
-
Article
| Open AccessCentromere evolution and CpG methylation during vertebrate speciation
Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here Ichikawa et al perform de novo long-read genome assembly of three inbred medaka strains, and report long-range structure of centromeres and their methylation as well as correlation of structural variants with differential gene expression.
- Kazuki Ichikawa
- , Shingo Tomioka
- & Shinich Morishita
-
Article
| Open AccessAnnotating pathogenic non-coding variants in genic regions
While non-coding synonymous and intronic variants are often not under strong selective constraint, they can be pathogenic through affecting splicing or transcription. Here, the authors develop a score that uses sequence context alterations to predict pathogenicity of synonymous and non-coding genetic variants, and provide a web server of pre-computed scores.
- Sahar Gelfman
- , Quanli Wang
- & David B. Goldstein
-
Article
| Open AccessGaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis
RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.
- Sayed Mohammad Ebrahim Sahraeian
- , Marghoob Mohiyuddin
- & Hugo Y. K. Lam
-
Article
| Open AccessWhole genome analysis of a schistosomiasis-transmitting freshwater snail
Biomphalaria glabrata is a fresh water snail that acts as a host for trematode Schistosoma mansoni that causes intestinal infection in human. This work describes the genome and transcriptome analyses from 12 different tissues of B glabrata, and identify genes for snail behavior and evolution.
- Coen M. Adema
- , LaDeana W. Hillier
- & Richard K. Wilson
-
Article
| Open AccessGenome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce
Genome assembly for many plant species can be challenging due to large size and high repeat content. Here, the authors usein vitroproximity ligation to assemble the genome of lettuce, revealing a family-specific triplication event and providing a comprehensive reference genome for a member of the Compositae.
- Sebastian Reyes-Chin-Wo
- , Zhiwen Wang
- & Richard W. Michelmore
-
Article
| Open AccessFunctional cis-regulatory modules encoded by mouse-specific endogenous retrovirus
The gene-battery model posits transposable elements (TEs) may becis-regulatory elements to control gene expression. Here, mouse-specific TEs are shown as binding sites for multiple collaborating transcription factors in embryonic stem cells, and act as cis-regulatory modules in synergistic fashion.
- Vasavi Sundaram
- , Mayank N. K. Choudhary
- & Ting Wang
-
Article
| Open AccessScaffolding and completing genome assemblies in real-time with nanopore sequencing
Assembling genomes using currently available computational methods can be time consuming. Here, Coin and colleagues describe a bioinformatics tool named npScarf that can scaffold and complete an existing short read assembly in real-time using nanopore sequencing.
- Minh Duc Cao
- , Son Hoang Nguyen
- & Lachlan J. M. Coin
-
Article
| Open AccessSmall genomic insertions form enhancers that misregulate oncogenes
Sequencing initiatives have detected multiple types of mutations in cancer. Here the authors, analysing enhancer-targeting sequence data, show that small insertions in transcriptional enhancers are frequently found near oncogenes, and demonstrate how one mutation deregulates expression of LMO2 in leukemia cells.
- Brian J. Abraham
- , Denes Hnisz
- & Richard A. Young
-
Article
| Open AccessTransient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast
Fission yeastSchizosaccharomyces pombe has diverse traits. Jeffares et al. characterize large copy number variations (CNVs) and rearrangements in S. pombe, and show that CNVs are transient with effects on quantitative traits and gene expression, whereas rearrangements influence intrinsic reproductive isolation.
- Daniel C. Jeffares
- , Clemency Jolly
- & Fritz J. Sedlazeck
-
Article
| Open AccessMetaSort untangles metagenome assembly by reducing microbial community complexity
Currently available metagenomic data analysis relies on reference genomes. Here, the authors describe a newde novometagenomic assembly method, metaSort, that constructs bacterial genomes from metagenomic samples to reduce microbial community complexity while increasing genome recovery and assembly.
- Peifeng Ji
- , Yanming Zhang
- & Fangqing Zhao
-
Article
| Open AccessIntegrative modelling of tumour DNA methylation quantifies the contribution of metabolism
Altered DNA methylation is a feature of cancer and between-patient variability is prevalent. Here, the authors integrate data on thousands of human tumours, and find that expression levels of methionine metabolism genes are predictive of methylation features, and that the breakdown of this relationship is a negative prognostic marker.
- Mahya Mehrmohamadi
- , Lucas K. Mentch
- & Jason W. Locasale
-
Article
| Open AccessAn ethnically relevant consensus Korean reference genome is a step towards personal reference genomes
The utility of a universal reference sequence for human genome comparisons is dependent on the ethnic origins of the individuals being sequenced. Here the authors report a Korean reference genome and consensus variome, and show that an ethnically-relevant reference can improve variant detection.
- Yun Sung Cho
- , Hyunho Kim
- & Jong Bhak
-
Article
| Open AccessA high-quality human reference panel reveals the complexity and distribution of genomic structural variants
Structural variants (SVs) are prevalent in genomes of the general population. Here, Guryev and The Genome of the Netherlands Consortium describe the reference panel of haplotype-resolved SVs from 769 individuals from 250 Dutch families and show its utility for studying heritable traits.
- Jayne Y. Hehir-Kwa
- , Tobias Marschall
- & Victor Guryev
-
Article
| Open AccessComparative genomics reveals adaptive evolution of Asian tapeworm in switching to a new intermediate host
Only one of the three Taenia species causing taeniasis in humans was previously sequenced. Here the authors provide draft genomes of Taenia saginata and Taenia asiatica, analyse genome evolution of all three species, and identify potential targets for developing diagnostic markers or intervention tools.
- Shuai Wang
- , Sen Wang
- & Xuepeng Cai