Databases

  • Article
    | Open Access

    Local gene co-expression is found throughout the genome, but systematic analysis of these co-expressed genes is needed. Here, the authors identify local co-expressed genes in 49 tissues and characterize the genetic variants which may affect their expression and contribute to disease.

    • Diogo M. Ribeiro
    • , Simone Rubinacci
    •  & Olivier Delaneau
  • Article
    | Open Access

    Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Here the authors present a meta-analysis empowered by a new statistical method covering thousands of ChIP-Seq experiments resulting in the identification of more than 500 thousand allele-specific binding (ASB) events in the human genome.

    • Sergey Abramov
    • , Alexandr Boytsov
    •  & Ivan V. Kulakovskiy
  • Article
    | Open Access

    Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.

    • Christian Koelsche
    • , Daniel Schrimpf
    •  & Andreas von Deimling
  • Perspective
    | Open Access

    The IMEx consortium provides one of the largest resources of curated, experimentally verified molecular interaction data. Here, the authors review how IMEx evolved into a fundamental resource for life scientists and describe how IMEx data can support biomedical research.

    • Pablo Porras
    • , Elisabet Barrera
    •  & Sandra Orchard
  • Article
    | Open Access

    With the generation of large pan-cancer whole-exome and whole-genome sequencing projects, a question remains about how comparable these datasets are. Here, using The Cancer Genome Atlas samples analysed as part of the Pan-Cancer Analysis of Whole Genomes project, the authors explore the concordance of mutations called by whole exome sequencing and whole genome sequencing techniques.

    • Matthew H. Bailey
    • , William U. Meyerson
    •  & Christian von Mering
  • Article
    | Open Access

    Schulz et al. systematically benchmark performance scaling with increasingly sophisticated prediction algorithms and with increasing sample size in reference machine-learning and biomedical datasets. Complicated nonlinear intervariable relationships remain largely inaccessible for predicting key phenotypes from typical brain scans.

    • Marc-Andre Schulz
    • , B. T. Thomas Yeo
    •  & Danilo Bzdok
  • Article
    | Open Access

    Reference databases are essential for studies on host-microbiota interactions. Here, the authors present the construction of VIRGO, a human vaginal non-redundant gene catalog, which represents a comprehensive resource for taxonomic and functional profiling of vaginal microbiomes from metagenomic and metatranscriptomic datasets.

    • Bing Ma
    • , Michael T. France
    •  & Jacques Ravel
  • Article
    | Open Access

    The authors previously developed the Protein Common Interface Database (ProtCID), which compares and clusters the interfaces of pairs of full-length protein chains with defined Pfam domain architectures in different PDB entries to identify biological assemblies. Here the authors extend ProtCID to the clustering of domain-domain interactions that also allows analyzing domain interactions with peptides, nucleic acids, and ligands.

    • Qifang Xu
    •  & Roland L. Dunbrack Jr.
  • Article
    | Open Access

    Most databases of genotype-phenotype associations are manually curated. Here, Kuleshov et al. describe a machine curation system that extracts such relationships from the GWAS literature and synthesizes them into a structured knowledge base called GWASkb that can complement manually curated databases.

    • Volodymyr Kuleshov
    • , Jialin Ding
    •  & Michael Snyder
  • Review Article
    | Open Access

    Glycomics is gaining momentum in basic, translational and clinical research. Here, the authors review current reporting standards and analysis tools for mass-spectrometry-based glycomics, and propose an e-infrastructure for standardized reporting and online deposition of glycomics data.

    • Miguel A. Rojas-Macias
    • , Julien Mariethoz
    •  & Niclas G. Karlsson
  • Perspective
    | Open Access

    Questions of causality are ubiquitous in Earth system sciences and beyond, yet correlation techniques still prevail. This Perspective provides an overview of causal inference methods, identifies promising applications and methodological challenges, and initiates a causality benchmark platform.

    • Jakob Runge
    • , Sebastian Bathiany
    •  & Jakob Zscheischler
  • Article
    | Open Access

    Short-tandem repeats (STR), similar to single nucleotide polymorphisms (SNP), contribute to complex traits, but their ascertainment by next-generation sequencing is costly. Here, Saini et al. provide a SNP+STR haplotype reference panel that allows imputation of STRs from SNP array data.

    • Shubham Saini
    • , Ileena Mitra
    •  & Melissa Gymrek
  • Article
    | Open Access

    Proteoforms arise as protein isoforms or as protein haplotypes, which are the result of genetic variation. Here, the authors develop Haplosaurus, a database that computes protein haplotypes genome-wide from existing genotype data and analyse protein haplotype variability in the 1000 Genomes dataset.

    • William Spooner
    • , William McLaren
    •  & Catherine Chaillan Huntington
  • Article
    | Open Access

    Data sharing is recognized as a way to promote scientific collaboration and reproducibility, but some are concerned over whether research based on shared data can achieve high impact. Here, the authors show that neuroimaging papers using shared data are no less likely to appear in top-ranked journals.

    • Michael P. Milham
    • , R. Cameron Craddock
    •  & Arno Klein
  • Article
    | Open Access

    Here, Libertini and colleagues devise a computation tool that can analyze whole-genome bisulfite sequencing (WGBS) data to recover of 30% of the lost differential methylation position information. They use COMETgazer and COMETvintage to analyze 13 diffferent methylome data to demonstrate their performance.

    • Emanuele Libertini
    • , Simon C. Heath
    •  & Stephan Beck