Computational biology and bioinformatics

Article
05 May 2021 | Open Access

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

Protein structure prediction is a challenge. A new deep learning framework, CopulaNet, is a major step forward toward end-to-end prediction of inter-residue distances and protein tertiary structures with improved accuracy and efficiency.

Fusong Ju
, Jianwei Zhu
& Dongbo Bu

Article
05 May 2021 | Open Access

Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces

Single-cell RNA-seq allows the study of tissues at cellular resolution. Here, the authors demonstrate how deep learning can be used to gain biological insight from such data by accounting for biological and technical variability. Data exploration is improved by accurately visualizing cells on an interactive 3D surface.

Jiarui Ding
& Aviv Regev

Article
03 May 2021 | Open Access

Learning a genome-wide score of human–mouse conservation at the functional genomics level

Understanding conserved functional genomic properties between human and mouse provides important context for mouse model studies. Here, the authors present a genome-wide conservation score integrating epigenomic, transcription factor binding, and transcriptomic data from mouse and human genomes.

Soo Bin Kwon
& Jason Ernst

Article
29 April 2021 | Open Access

Integrative reconstruction of cancer genome karyotypes using InfoGenomeR

Karyotyping of cancer genomes at the base-level is technically challenging. Here, the authors introduce InfoGenomeR, an algorithm that can infer cancer genome karyotypes from whole-genome sequencing data, and test their model on breast, ovarian and brain cancer samples; and identify private and shared mutations between primary and metastatic cancer samples.

Yeonghun Lee
& Hyunju Lee

Article
28 April 2021 | Open Access

Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C

Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.

Zev N. Kronenberg
, Arang Rhie
& Sarah B. Kingan

Article
26 April 2021 | Open Access

Genomic insights into the conservation status of the world’s last remaining Sumatran rhinoceros populations

Highly endangered species like the Sumatran rhinoceros are at risk from inbreeding. Five historical and 16 modern genomes from across the species range show mutational load, but little evidence for local adaptation, suggesting that future inbreeding depression could be mitigated by assisted gene flow among populations.

Johanna von Seth
, Nicolas Dussex
& Love Dalén

Article
23 April 2021 | Open Access

Protein design and variant prediction using autoregressive generative models

The ability to design functional sequences is central to protein engineering and biotherapeutics. Here the authors introduce a deep generative alignment-free model for sequence design applied to highly variable regions and design and test a diverse nanobody library with improved properties for selection experiments.

Jung-Eun Shin
, Adam J. Riesselman
& Debora S. Marks

Article
23 April 2021 | Open Access

The VRNetzer platform enables interactive network analysis in Virtual Reality

Data-rich networks can be difficult to interpret beyond a certain size. Here, the authors introduce a platform that uses virtual reality to allow the visual exploration of large networks, while interfacing with data repositories and other analytical methods to improve the interpretation of big data.

Sebastian Pirch
, Felix Müller
& Jörg Menche

Article
23 April 2021 | Open Access

CRISPR-Cas9 cytidine and adenosine base editing of splice-sites mediates highly-efficient disruption of proteins in primary and immortalized cells

Base editors can inactivate splice sites or introduce stop codons into a gene sequence. Here the authors present SpliceR to design, rank, and test sgRNAs for efficient gene disruption in T cells.

Mitchell G. Kluesner
, Walker S. Lahr
& Branden S. Moriarity

Article
22 April 2021 | Open Access

Leveraging community mortality indicators to infer COVID-19 mortality and transmission dynamics in Damascus, Syria

Reported COVID-19 mortality rates have been relatively low in Syria, but there has been concern about overwhelmed health systems. Here, the authors use community mortality indicators and estimate that <3% of COVID-19 deaths in Damascus were reported as of 2 September 2020.

Oliver J. Watson
, Mervat Alhaffar
& Patrick Walker

Article
22 April 2021 | Open Access

Machine learning guided aptamer refinement and discovery

Current aptamer discovery approaches are unable to probe the complete space of possible sequences. Here, the authors use machine learning to facilitate the development of DNA aptamers with improved binding affinities, and truncate them without significantly compromising binding affinity.

Ali Bashir
, Qin Yang
& B. Scott Ferguson

Article
20 April 2021 | Open Access

An integrative analysis of the age-associated multi-omic landscape across cancers

Our understanding of the age-related molecular alterations in cancer is still limited. Here, the authors perform a pan-cancer analysis of age-associated genomic, transcriptomic, and epigenetic alterations, linking age-related gene expression changes to age-related DNA methylation alterations

Kasit Chatsirisupachai
, Tom Lesluyes
& João Pedro de Magalhães

Article
20 April 2021 | Open Access

Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis

Few genome-wide association studies have explored the genetic architecture of age-of-onset for traits and diseases. Here, the authors develop a Bayesian approach to improve prediction in timing-related phenotypes and perform age-of-onset analyses across complex traits in the UK Biobank.

Sven E. Ojavee
, Athanasios Kousathanas
& Matthew R. Robinson

Article
19 April 2021 | Open Access

Spatially interacting phosphorylation sites and mutations in cancer

Dysregulated phosphorylation is well-known in cancers, but it has largely been studied in isolation from mutations. Here the authors introduce HotPho, a tool that can discover spatial interactions between phosphosites and mutations, which are associated with activating mutation and genetic dependencies in cancer.

Kuan-lin Huang
, Adam D. Scott
& Li Ding

Article
16 April 2021 | Open Access

Conserved long-range base pairings are associated with pre-mRNA processing of human genes

Functional RNA secondary structure is important for the pre-mRNA processing including splicing, cleavage and polyadenylation, and RNA editing. Here the authors present a catalog of conserved long-range RNA structures in the human transcriptome by defining pairs of conserved complementary regions (PCCR) in pre-aligned evolutionarily conserved regions.

Svetlana Kalmykova
, Marina Kalinina
& Dmitri Pervouchine

Article
16 April 2021 | Open Access

Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning

It is challenging to extract structural information from EM density maps at intermediate or low resolutions. Here, the authors present Emap2sec+, a program for detecting nucleotides and protein secondary structures in EM density maps at 5 to 10 Å resolution.

Xiao Wang
, Eman Alnabati
& Daisuke Kihara

Article
16 April 2021 | Open Access

multiSLIDE is a web server for exploring connected elements of biological pathways in multi-omics data

The integration and interpretation of different omics data types is an ongoing challenge for biologists. Here, the authors present a web-based, interactive tool called multiSLIDE for the visualization of protein, phosphoprotein, and RNA data presented as interlinked heatmaps.

Soumita Ghosh
, Abhik Datta
& Hyungwon Choi

Article
16 April 2021 | Open Access

Design of multi-scale protein complexes by hierarchical building block fusion

De novo design of self-assembling protein nanostructures and materials is of significant interest, however design of complex, multi-component assemblies is challenging. Here, the authors present a stepwise hierarchical approach to build such assemblies using helical repeat and helical bundle proteins as building blocks, and provide an in-depth structural characterization of the resulting assemblies.

Yang Hsia
, Rubul Mout
& David Baker

Article
16 April 2021 | Open Access

An integrated multi-omics analysis identifies prognostic molecular subtypes of non-muscle-invasive bladder cancer

Multiple molecular profiling methods are required to study urothelial non-muscle-invasive bladder cancer (NMIBC) due to its heterogeneity. Here the authors integrate multi-omics data of 834 NMIBC patients, identifying a molecular subgroup associated with multiple alterations and worse outcomes.

Sia Viborg Lindskrog
, Frederik Prip
& Lars Dyrskjøt

Article
15 April 2021 | Open Access

Single cell regulatory landscape of the mouse kidney highlights cellular differentiation programs and disease targets

Epigenetic and transcriptional dynamics are critical for both tissue homeostasis and injury response in the kidney. Leveraging a single cell multiomics atlas of the developing mouse kidney, the authors reveal key events in chromatin regulation and gene expression dynamics during postnatal development.

Zhen Miao
, Michael S. Balzer
& Katalin Susztak

Article
15 April 2021 | Open Access

Democratising deep learning for microscopy with ZeroCostDL4Mic

Deep learning methods show great promise for the analysis of microscopy images but there is currently an accessibility barrier to many users. Here the authors report a convenient entry-level deep learning platform that can be used at no cost: ZeroCostDL4Mic.

Lucas von Chamier
, Romain F. Laine
& Ricardo Henriques

Article
14 April 2021 | Open Access

Defining super-enhancer landscape in triple-negative breast cancer by multiomic profiling

Triple-negative breast cancer (TNBC) is an aggressive breast cancer subtype with poor prognostic outcomes. Here the authors characterize super-enhancer heterogeneity and they identify genes that are specifically regulated by TNBC-specific super-enhancers, including FOXC1, MET and ANLN.

Hao Huang
, Jianyang Hu
& Y. Rebecca Chin

Article
14 April 2021 | Open Access

Comprehensive omic characterization of breast cancer in Mexican-Hispanic women

Cancers in different populations have been shown to be genetically distinct. Here, the authors sequence breast cancers from Mexican-Hispanic patients and find that these patients have a higher percentage of Akt1 mutations compared to Caucasian and Asian populations, suggesting these are clinically actionable.

Sandra L. Romero-Cordoba
, Ivan Salido-Guadarrama
& Alfredo Hidalgo-Miranda

Article
14 April 2021 | Open Access

Redundant and non-redundant cytokine-activated enhancers control Csn1s2b expression in the lactating mouse mammary gland

Enhancers and promoters work together to actively regulate gene expression affecting several biological processes. Here, the authors provide molecular insights into the regulation of enhancers and super-enhancers in the Csn1s2b locus during lactation.

Hye Kyung Lee
, Michaela Willi
& Lothar Hennighausen

Article
13 April 2021 | Open Access

Moss enables high sensitivity single-nucleotide variant calling from multiple bulk DNA tumor samples

The study of tumour heterogeneity can be improved by sequencing multiple samples, but currently available variant callers have not been tailored to integrate them. Here the authors present Moss, a tool that can leverage multiple samples to improve somatic variant calling in different cancers.

Chuanyi Zhang
, Mohammed El-Kebir
& Idoia Ochoa

Article
13 April 2021 | Open Access

Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney

Single cell transcriptomic and epigenomic sequencing of human kidney highlight diverse cell types and states. These findings help characterize a novel population of injured proximal tubule cells and illustrate the power of multi-omic approaches to characterizing human tissue.

Yoshiharu Muto
, Parker C. Wilson
& Benjamin D. Humphreys