Data integration

  • Article
    | Open Access

    Identifying the molecular mechanisms of response to systemic therapy in prostate cancer remains crucial. Here, the authors apply single cell-ATAC and RNAseq to models of early treatment response and resistance to enzalutamide and identify chromatin and gene expression patterns that can predict treatment response.

    • S. Taavitsainen
    • , N. Engedal
    •  & A. Urbanucci
  • Article
    | Open Access

    Local gene co-expression is found throughout the genome, but systematic analysis of these co-expressed genes is needed. Here, the authors identify local co-expressed genes in 49 tissues and characterize the genetic variants which may affect their expression and contribute to disease.

    • Diogo M. Ribeiro
    • , Simone Rubinacci
    •  & Olivier Delaneau
  • Article
    | Open Access

    Our ability to interpret single-cell multivariate signaling responses is still limited. Here the authors introduce fractional response analysis (FRA), involving fractional cell counting, capable of deconvoluting heterogeneous multivariate responses of cellular populations.

    • Karol Nienałtowski
    • , Rachel E. Rigby
    •  & Michał Komorowski
  • Article
    | Open Access

    Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Here the authors present a meta-analysis empowered by a new statistical method covering thousands of ChIP-Seq experiments resulting in the identification of more than 500 thousand allele-specific binding (ASB) events in the human genome.

    • Sergey Abramov
    • , Alexandr Boytsov
    •  & Ivan V. Kulakovskiy
  • Article
    | Open Access

    Epigenetic and transcriptional dynamics are critical for both tissue homeostasis and injury response in the kidney. Leveraging a single cell multiomics atlas of the developing mouse kidney, the authors reveal key events in chromatin regulation and gene expression dynamics during postnatal development.

    • Zhen Miao
    • , Michael S. Balzer
    •  & Katalin Susztak
  • Article
    | Open Access

    Modern biological research is complicated by the difficulty of collecting, transforming, annotating, and integrating datasets. Here, the authors present Go Get Data, a fast, reproducible approach to installing standardized data recipes, with an application to genomics data.

    • Michael J. Cormier
    • , Jonathan R. Belyeu
    •  & Aaron R. Quinlan
  • Article
    | Open Access

    Methods for profiling differences between individual cells are constantly expanding. Here, the authors present a computational framework for the analysis of chromatin accessibility data at the single-cell level that takes into account previous knowledge and data-specific characteristics.

    • Shengquan Chen
    • , Guanao Yan
    •  & Zhixiang Lin
  • Article
    | Open Access

    Most diseases disrupt multiple proteins, and drugs treat such diseases by restoring the functions of the disrupted proteins; how drugs restore these functions, however, is often unknown. Here, the authors develop the multiscale interactome, a powerful approach to explain disease treatment.

    • Camilo Ruiz
    • , Marinka Zitnik
    •  & Jure Leskovec
  • Article
    | Open Access

    Transcription factor over-expression-based cellular conversion methods often endure low conversion efficiency. Here the authors show how to increase conversion efficiency by combining a computational method for prioritizing more efficient TF combinations with a transposon-based genomic integration system for delivery.

    • Sascha Jung
    • , Evan Appleton
    •  & Antonio del Sol
  • Article
    | Open Access

    The integration of independent pan-cancer CRISPR-Cas9 datasets allows better representation of genomic heterogeneity across different cancer types. Here, the authors propose a strategy for the integration of two large CRISPR-Cas9 screens and report increased coverage of molecular diversity and genetic dependencies.

    • Clare Pacini
    • , Joshua M. Dempster
    •  & Francesco Iorio
  • Article
    | Open Access

    Inflammatory bowel diseases are heterogeneous, and little is known about how underlying genetic variation can affect their development. Here, the authors report that intestinal inflammation modulates the effect of host genetics on the gut mucosal expression of 190 genes in the context of inflammatory bowel diseases.

    • Shixian Hu
    • , Werna T. Uniken Venema
    •  & Rinse K. Weersma
  • Article
    | Open Access

    Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. Here, the authors identify robust druggable protein targets within a principled causal framework that makes use of multiple data modalities and integrates aging signatures.

    • Anastasiya Belyaeva
    • , Louis Cammarata
    •  & Caroline Uhler
  • Article
    | Open Access

    Human mobility plays a central role in the spread of infectious diseases and can help in forecasting incidence. Here the authors show a comparison of multiple mobility benchmarks in forecasting influenza, and demonstrate the value of a machine-learned mobility map with global coverage at multiple spatial scales.

    • Srinivasan Venkatramanan
    • , Adam Sadilek
    •  & Madhav Marathe
  • Article
    | Open Access

    Genomic prediction of phenotype may be improved by using DNA mutations with functional, evolutionary, and pleiotropic consequences. Here the authors describe a method for genome-wide fine-mapping of QTLs and develop a genotyping array for improved prediction of genetic values for cattle traits.

    • Ruidong Xiang
    • , Iona M. MacLeod
    •  & Michael E. Goddard
  • Article
    | Open Access

    Establishing the natural history of COVID-19 requires longitudinal data from population-based cohorts. Here, the authors use linked primary care, testing, and hospital data to describe the disease in ~100,000 individuals with a COVID-19 diagnosis among a population of ~5.5 million in Catalonia, Spain.

    • Edward Burn
    • , Cristian Tebé
    •  & Talita Duarte-Salles
  • Article
    | Open Access

    The determination of whether cancer cell lines recapitulate the molecular features of corresponding patient tumours remains essential for the selection of appropriate cell line models for preclinical studies. The method developed here, Celligner, integrates cancer cell line and tumour RNA-seq datasets and reveals large differences in their concordance across cell lines and cancer types.

    • Allison Warren
    • , Yejia Chen
    •  & James M. McFarland
  • Article
    | Open Access

    Age is one of the strongest risk factors for severe illness from COVID-19. By integrating human lung transcriptomes with experimental data on SARS-CoV-2, the authors pinpoint specific age-associated factors that could contribute to the heightened severity of COVID-19 in older populations.

    • Ryan D. Chow
    • , Medha Majety
    •  & Sidi Chen
  • Article
    | Open Access

    Integration of single cell data modalities increases the richness of information about the heterogeneity of cell states, but integration of imaging and transcriptomics is an open challenge. Here the authors use autoencoders to learn a probabilistic coupling and map these modalities to a shared latent space.

    • Karren Dai Yang
    • , Anastasiya Belyaeva
    •  & Caroline Uhler
  • Article
    | Open Access

    The long noncoding RNA XIST plays a central role in sex-specific gene expression in humans by silencing one of two X chromosomes in female cells. Here the authors show that higher order secondary structure creates the modular domain structure of XIST ribonucleoprotein complex and spatial separation of functions.

    • Zhipeng Lu
    • , Jimmy K. Guo
    •  & Howard Y. Chang
  • Article
    | Open Access

    Tibetan adaptation to the high-altitude environment represents a case of natural selection during recent human evolution. Here the authors investigated the chromatin and transcriptional landscape of umbilical endothelial cells from Tibetan and Han Chinese donors and provide genome-wide characterization of the hypoxia regulatory network associated high-altitude adaptation.

    • Jingxue Xin
    • , Hui Zhang
    •  & Bing Su
  • Article
    | Open Access

    Predicting crop performance in environments with limited field testing is challenging. Here the authors combine field experimental, DNA sequence, and weather data to predict genotypes’ future performance. They demonstrate the potential of this approach on a large dataset of wheat grain yield.

    • Gustavo de los Campos
    • , Paulino Pérez-Rodríguez
    •  & José Crossa
  • Article
    | Open Access

    An important aspect of precision medicine is to probe the stability in molecular profiles among healthy individuals over time. Here, the authors sample a longitudinal wellness cohort and analyse blood molecular profiles as well as gut microbiota composition.

    • Abdellah Tebani
    • , Anders Gummesson
    •  & Linn Fagerberg
  • Article
    | Open Access

    Linear mixed models have bias due to the assumed independence between random effects. Here, the authors describe a genome-based restricted maximum likelihood, CORE GREML, which estimates covariance between random effects. Application to UK Biobank data highlights this as an important parameter for multi-omics analyses of phenotypic variance.

    • Xuan Zhou
    • , Hae Kyung Im
    •  & S. Hong Lee
  • Article
    | Open Access

    Pseudogenes are key markers of genome remodelling processes. Here the authors present genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains, update human pseudogene annotations, and characterise the transcription and evolution of mouse pseudogenes.

    • Cristina Sisu
    • , Paul Muir
    •  & Mark Gerstein
  • Article
    | Open Access

    ENCODE is a resource comprising thousands of functional genomic datasets. Here, the authors present custom annotation within ENCODE for cancer, highlighting a workflow that can help prioritise key elements in oncogenesis.

    • Jing Zhang
    • , Donghoon Lee
    •  & Mark Gerstein
  • Article
    | Open Access

    The epigenetic landscape of esophageal squamous cell carcinoma (ESCC) at genome-wide high resolution is incompletely studied. Here, the authors performed an integrated multi-omics analysis of ESCC and non-tumor tissues to define the genome-wide methylome landscape and epigenetic alterations to uncover oncogenic drivers of ESCC.

    • Wei Cao
    • , Hayan Lee
    •  & Trever G. Bivona
  • Article
    | Open Access

    Deep learning is becoming a popular approach for understanding biological processes but can be hard to adapt to new questions. Here, the authors develop Janggu, a python library that aims to ease data acquisition and model evaluation and facilitate deep learning applications in genomics.

    • Wolfgang Kopp
    • , Remo Monti
    •  & Altuna Akalin
  • Article
    | Open Access

    It is not clear which designs, other than completely randomized ones, are valid for scRNA-seq experiments so that batch effects can be adjusted. Here the authors show that under flexible reference panel and chain-type designs, biological variability can also be separated from batch effects, at least by BUSseq.

    • Fangda Song
    • , Ga Ming Angus Chan
    •  & Yingying Wei
  • Article
    | Open Access

    Multi-omics studies are popular but lack rigorous criteria for experimental design. We define Figures of Merit across omics to comparatively describe their performance, and present new algorithms for sample size calculation in multi-omics experiments aiming either at feature selection or sample classification.

    • Sonia Tarazona
    • , Leandro Balzano-Nogueira
    •  & Ana Conesa
  • Article
    | Open Access

    Methods to integrate association evidence across multiple traits often focus on individual common variants GWAS. Here the authors present multi-trait analysis of rare-variant associations (MTAR), a framework for joint analysis of association summary statistics between multiple rare variants and different traits.

    • Lan Luo
    • , Judong Shen
    •  & Zheng-Zheng Tang
  • Article
    | Open Access

    Previous study identified in vivo structured mRNA regions in Saccharomyces cerevisiae by dimethyl sulfate-sequencing. Here the authors use quantitative proteomics to identify protein interactors of 186 RNA folds in S. cerevisiae, providing functional links between RNA binding proteins and distinct mRNA fold.

    • Nuria Casas-Vila
    • , Sergi Sayols
    •  & Falk Butter
  • Article
    | Open Access

    Seasonal influenza epidemics vary in timing and size, but the causes of the variation remain unclear. Here, the authors analyse a 15-year city-level data set, and find that fluctuations in climatic factors do not predict onset timing, and that while antigenic change does not have a consistent effect on epidemic size, the timing of onset and heterosubtypic competition do.

    • Edward K. S. Lam
    • , Dylan H. Morris
    •  & Colin A. Russell
  • Article
    | Open Access

    Characterization of the distance over which TF binding influences gene expression is important for inferring target genes. Here the authors study this relationship using thousands of genomic data sets, finding two classes of TFs with distinct ranges of regulatory influence modulated by chromatin states of topologically associated domains.

    • Chen-Hao Chen
    • , Rongbin Zheng
    •  & X. Shirley Liu
  • Article
    | Open Access

    Pulmonary arterial hypertension (PAH) is a heterogeneous disease, causing severe breathing problems and cardiac morbidity. Here, the authors study chromatin marks in pulmonary arterial endothelial cells from PAH patients and controls and find changes in transcription factor and enhancer activity that suggest an aberrant response to signalling in PAH.

    • Armando Reyes-Palomares
    • , Mingxia Gu
    •  & Judith B. Zaugg
  • Article
    | Open Access

    Multi-omics datasets pose major challenges to data interpretation and hypothesis generation owing to their high-dimensional molecular profiles. Here, the authors develop ActivePathways method, which uses data fusion techniques for integrative pathway analysis of multi-omics data and candidate gene discovery.

    • Marta Paczkowska
    • , Jonathan Barenboim
    •  & Jüri Reimand