-
-
Article
| Open AccessTranscriptomic architecture of nuclei in the marmoset CNS
Studies of cell heterogeneity in white matter in primates have been limited to date. Here the authors describe a marmoset brain cell atlas that bridges rodent and human data, revealing strong gray-white matter glial segregation.
- Jing-Ping Lin
- , Hannah M. Kelly
- & Daniel S. Reich
-
Article
| Open AccessSystematic evidence and gap map of research linking food security and nutrition to mental health
There is a broad range of research available on the relationship between food security and mental health. Here the authors carry out a systematic mapping of evidence on food security and nutrition related to mental health and identifies trends in themes, setting, and study design over the 20 year period studied.
- Thalia M. Sparling
- , Megan Deeney
- & Suneetha Kadiyala
-
Article
| Open AccessEndothelial cell heterogeneity and microglia regulons revealed by a pig cell landscape at single-cell level
Pigs are important large animal models for biomedical research. Here, the authors construct a single-cell landscape of pig tissues, unravelling the phenotypic heterogeneity of blood endothelial cells in adipose tissues and the evolutionally conserved regulons of microglia in brains.
- Fei Wang
- , Peiwen Ding
- & Yonglun Luo
-
Article
| Open AccessChIP-Hub provides an integrative platform for exploring plant regulome
A comprehensive data portal to explore plant regulomes is still unavailable. Here, the authors develop a web-based platform ChIP-Hub in the ENCODE standards and demonstrate its applications in the identification of hierarchical regulatory network, tissue-specific chromatin dynamics, putative enhancers and chromatin states.
- Liang-Yu Fu
- , Tao Zhu
- & Dijun Chen
-
Article
| Open AccessThe 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data
This paper describes the ‘4DN Data Portal’ that hosts data generated by the 4D Nucleome network, including Hi-C and other chromatin conformation capture assays, as well as various sequencing-based and imaging-based assays. Raw data have been uniformly processed to increase comparability and the portal is implemented with visualization tools to browse the data without download.
- Sarah B. Reiff
- , Andrew J. Schroeder
- & Peter J. Park
-
Article
| Open AccessKnowledge integration and decision support for accelerated discovery of antibiotic resistance genes
Here the authors present KIDS, a knowledge graph integration and phenotypic prediction framework. When applied on antibiotic data, it identifies 6 novel antibiotic resistant E. coli genes that the authors subsequently validate.
- Jason Youn
- , Navneet Rai
- & Ilias Tagkopoulos
-
Article
| Open AccessHelical structure motifs made searchable for functional peptide design
Here, we present TP-DB; a pattern-based search engine based on 1.67 million helices from the Protein Database (PDB). We demonstrate the utility of TP-DB in identifying microbe-specific antigens, as well as the design of antimicrobial peptides and Protein-protein interaction blockers.
- Cheng-Yu Tsai
- , Emmanuel Oluwatobi Salawu
- & Lee-Wei Yang
-
Article
| Open AccessNetwork medicine for disease module identification and drug repurposing with the NeDRex platform
There is an unmet need for adaptable tools allowing biomedical researchers to employ network-based drug repurposing approaches for their individual use cases. Here, the authors close this gap with NeDRex, an integrative and interactive platform.
- Sepideh Sadegh
- , James Skelton
- & Tim Kacprowski
-
Article
| Open AccessThe molecular basis, genetic control and pleiotropic effects of local gene co-expression
Local gene co-expression is found throughout the genome, but systematic analysis of these co-expressed genes is needed. Here, the authors identify local co-expressed genes in 49 tissues and characterize the genetic variants which may affect their expression and contribute to disease.
- Diogo M. Ribeiro
- , Simone Rubinacci
- & Olivier Delaneau
-
Article
| Open AccessEnhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning
High-quality gRNA activity data is needed for accurate on-target efficiency predictions. Here the authors generate activity data for over 10,000 gRNA and build a deep learning model CRISPRon for improved performance predictions.
- Xi Xiang
- , Giulia I. Corsi
- & Yonglun Luo
-
Article
| Open AccessLandscape of allele-specific transcription factor binding in the human genome
Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Here the authors present a meta-analysis empowered by a new statistical method covering thousands of ChIP-Seq experiments resulting in the identification of more than 500 thousand allele-specific binding (ASB) events in the human genome.
- Sergey Abramov
- , Alexandr Boytsov
- & Ivan V. Kulakovskiy
-
Article
| Open AccessSarcoma classification by DNA methylation profiling
Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.
- Christian Koelsche
- , Daniel Schrimpf
- & Andreas von Deimling
-
Perspective
| Open AccessTowards a unified open access dataset of molecular interactions
The IMEx consortium provides one of the largest resources of curated, experimentally verified molecular interaction data. Here, the authors review how IMEx evolved into a fundamental resource for life scientists and describe how IMEx data can support biomedical research.
- Pablo Porras
- , Elisabet Barrera
- & Sandra Orchard
-
Article
| Open AccessRetrospective evaluation of whole exome and genome mutation calls in 746 cancer samples
With the generation of large pan-cancer whole-exome and whole-genome sequencing projects, a question remains about how comparable these datasets are. Here, using The Cancer Genome Atlas samples analysed as part of the Pan-Cancer Analysis of Whole Genomes project, the authors explore the concordance of mutations called by whole exome sequencing and whole genome sequencing techniques.
- Matthew H. Bailey
- , William U. Meyerson
- & Christian von Mering
-
Article
| Open AccessIon mobility collision cross-section atlas for known and unknown metabolite annotation in untargeted metabolomics
Collision cross section (CCS) information can aid the annotation of unknown metabolites. Here, the authors optimize the machine-learning based prediction of metabolite CCS values and curate a 1.6 million compound CCS atlas, improving annotation accuracy and coverage for known and unknown metabolites.
- Zhiwei Zhou
- , Mingdu Luo
- & Zheng-Jiang Zhu
-
Article
| Open AccessDifferent scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets
Schulz et al. systematically benchmark performance scaling with increasingly sophisticated prediction algorithms and with increasing sample size in reference machine-learning and biomedical datasets. Complicated nonlinear intervariable relationships remain largely inaccessible for predicting key phenotypes from typical brain scans.
- Marc-Andre Schulz
- , B. T. Thomas Yeo
- & Danilo Bzdok
-
Article
| Open AccessSearching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST
Single-cell RNA-seq (scRNA-seq) is being widely used to resolve cellular heterogeneity. Here, the authors present a cell-querying method built on a neural network-based generative model and a customized cell-to-cell similarity metric.
- Zhi-Jie Cao
- , Lin Wei
- & Ge Gao
-
Article
| Open AccessConstruction of a web-based nanomaterial database by big data curation and modeling friendly nanostructure annotations
The low curation of existing nanomaterials’s databases is limiting their application in modeling studies. Here the authors report a publicly available nanomaterial database that contains annotated nanostructures of diverse nanomaterials immediately available for modeling research studies.
- Xiliang Yan
- , Alexander Sedykh
- & Hao Zhu
-
Article
| Open AccessA comprehensive non-redundant gene catalog reveals extensive within-community intraspecies diversity in the human vagina
Reference databases are essential for studies on host-microbiota interactions. Here, the authors present the construction of VIRGO, a human vaginal non-redundant gene catalog, which represents a comprehensive resource for taxonomic and functional profiling of vaginal microbiomes from metagenomic and metatranscriptomic datasets.
- Bing Ma
- , Michael T. France
- & Jacques Ravel
-
Article
| Open AccessProtCID: a data resource for structural information on protein interactions
The authors previously developed the Protein Common Interface Database (ProtCID), which compares and clusters the interfaces of pairs of full-length protein chains with defined Pfam domain architectures in different PDB entries to identify biological assemblies. Here the authors extend ProtCID to the clustering of domain-domain interactions that also allows analyzing domain interactions with peptides, nucleic acids, and ligands.
- Qifang Xu
- & Roland L. Dunbrack Jr.
-
Article
| Open AccessA machine-compiled database of genome-wide association studies
Most databases of genotype-phenotype associations are manually curated. Here, Kuleshov et al. describe a machine curation system that extracts such relationships from the GWAS literature and synthesizes them into a structured knowledge base called GWASkb that can complement manually curated databases.
- Volodymyr Kuleshov
- , Jialin Ding
- & Michael Snyder
-
Article
| Open AccessFDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science
To be able to use infectious disease next generation sequencing as a diagnostic tool, appropriate reference datasets are required. Here, Sichtig et al. describe FDA-ARGOS, a reference database for high-quality microbial reference genomes, and demonstrate its utility on the example of two use cases.
- Heike Sichtig
- , Timothy Minogue
- & Uwe Scherf
-
Review Article
| Open AccessTowards a standardized bioinformatics infrastructure for N- and O-glycomics
Glycomics is gaining momentum in basic, translational and clinical research. Here, the authors review current reporting standards and analysis tools for mass-spectrometry-based glycomics, and propose an e-infrastructure for standardized reporting and online deposition of glycomics data.
- Miguel A. Rojas-Macias
- , Julien Mariethoz
- & Niclas G. Karlsson
-
Perspective
| Open AccessInferring causation from time series in Earth system sciences
Questions of causality are ubiquitous in Earth system sciences and beyond, yet correlation techniques still prevail. This Perspective provides an overview of causal inference methods, identifies promising applications and methodological challenges, and initiates a causality benchmark platform.
- Jakob Runge
- , Sebastian Bathiany
- & Jakob Zscheischler
-
Article
| Open AccessCapturing variation impact on molecular interactions in the IMEx Consortium mutations data set
Genetic variants might exert their functional effects via influencing molecular interaction. Here, the authors present a resource featuring almost 28,000 annotations describing the effect of small sequence changes on physical protein interactions, curated by IMEx Consortium curators.
- J. Khadake
- , B. Meldal
- & P. Porras
-
Article
| Open AccessA reference haplotype panel for genome-wide imputation of short tandem repeats
Short-tandem repeats (STR), similar to single nucleotide polymorphisms (SNP), contribute to complex traits, but their ascertainment by next-generation sequencing is costly. Here, Saini et al. provide a SNP+STR haplotype reference panel that allows imputation of STRs from SNP array data.
- Shubham Saini
- , Ileena Mitra
- & Melissa Gymrek
-
Article
| Open AccessHaplosaurus computes protein haplotypes for use in precision drug design
Proteoforms arise as protein isoforms or as protein haplotypes, which are the result of genetic variation. Here, the authors develop Haplosaurus, a database that computes protein haplotypes genome-wide from existing genotype data and analyse protein haplotype variability in the 1000 Genomes dataset.
- William Spooner
- , William McLaren
- & Catherine Chaillan Huntington
-
Article
| Open AccessAssessment of the impact of shared brain imaging data on the scientific literature
Data sharing is recognized as a way to promote scientific collaboration and reproducibility, but some are concerned over whether research based on shared data can achieve high impact. Here, the authors show that neuroimaging papers using shared data are no less likely to appear in top-ranked journals.
- Michael P. Milham
- , R. Cameron Craddock
- & Arno Klein
-
Article
| Open AccessInformation recovery from low coverage whole-genome bisulfite sequencing
Here, Libertini and colleagues devise a computation tool that can analyze whole-genome bisulfite sequencing (WGBS) data to recover of ∼30% of the lost differential methylation position information. They use COMETgazer and COMETvintage to analyze 13 diffferent methylome data to demonstrate their performance.
- Emanuele Libertini
- , Simon C. Heath
- & Stephan Beck