Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
Genetic variants can lead to aberrant splicing whereby exons are skipped or incorrectly included in a transcript, which may cause disease. This is depicted by the odd presence of a red segment (‘wrong exon’) in a rope. AbSplice is a model that predicts the effect of DNA variation on tissue-specific RNA splicing.
As computational analysis becomes ever-more ubiquitous for researchers, the deposition of the underlying code is now an expected part of publication. Shortcomings in code sharing can lead to delays in peer review and publication, as well as reproducibility issues that are easily avoided with author preparation.
Tumors develop mechanisms to escape immune destruction. A systematic analysis of large genome sequencing datasets shows that one in four tumors develop genetic immune escape and its prevalence is remarkably similar between primary and metastatic tumors, suggesting that immune escape is an early event during tumor evolution.
A new method infers huge gene trees and tests the tree branches for phenotypic associations. This improves power to map the effects of rare variants that are missing from genotype arrays and imputation panels.
Aberrant RNA splicing events resulting from DNA variations are common causes of genetic disorders. Two studies published in Nature Genetics independently describe methods to decipher DNA-variant-associated aberrant splicing using high-throughput RNA sequencing data.
Reconstructing phylogenetic trees from large collections of genome sequences is a computationally challenging task. We developed MAPLE, a method for performing phylogenetic inference on large numbers of closely related genomes, which might be useful when studying the evolution and spread of SARS-CoV-2 and of infectious pathogens in future pandemics.
We developed a machine learning model to quantify cardiac fibrosis (which is associated with cardiovascular disease) using cardiac MRI data from 41,505 UK Biobank participants. In the subsequent large-scale GWAS of cardiac fibrosis, we identified 11 independent genomic loci, 9 of which were implicated in in vitro cardiac fibroblast activation.
Liability scores for chronic obstructive pulmonary disease obtained from our deep learning model improve genetic association discovery and risk prediction. We trained our model using full spirograms and noisy medical record labels obtained from self-reporting and hospital diagnostic codes, and demonstrated that the machine-learning-based phenotyping approach can be generalized to diseases that lack expert-defined annotations.
We introduce scEC&T-seq, a new single-cell sequencing method that enables parallel profiling of extrachromosomal circular DNA and mRNAs in single cells. Using scEC&T-seq, we characterized all types of circular DNA elements in single human cancer cells and profiled the intercellular heterogeneity and structural dynamics of cancer-specific extrachromosomal DNA.
Single-cell RNA-sequencing analysis combined with host genetic data for a Japanese population reveals the dysfunction of innate immune cells, particularly non-classical monocytes, in individuals with severe COVID-19, as well as enrichment of host genetic risk factors for severe COVID-19 in monocytes and dendritic cells.
Genome assembly of nine wild species and two domesticated accessions of tomato generated a super-pangenome for the tomato clade. Comparative analyses revealed the landscape of structural variations in wild and cultivated tomatoes and led to the discovery of a wild tomato gene that has the potential for yield increase in modern breeding.
‘MAximum Parsimonious Likelihood Estimation’ (MAPLE) is a maximum likelihood-based approach for inference of phylogenetic trees from very large datasets of similar sequences incorporating a sparse alignment representation and parsimony-based approximations, offering higher accuracy and reduced computational requirements.
Peripheral blood mononuclear cells from 73 Japanese patients with coronavirus disease 2019 (COVID-19) and 75 healthy controls were analyzed using single-cell transcriptomics. Combining these data with genotyping data highlights the interplay between host genetics and the immune response in modulating disease severity.
ARG-Needle is a method to infer genome-wide genealogies from large-scale genotyping data that can be used in association analyses. Applied to UK Biobank data, genealogy-based testing finds more trait associations than using imputed genotypes.
Genome-wide association analyses identify 11 loci associated with native myocardial T1 time, a marker of interstitial fibrosis, providing insights into the pathways involved in myocardial fibrosis and myofibroblast cell state acquisition.
A deep convolutional neural network calculates liability scores for chronic obstructive pulmonary disease (COPD) from raw spirogram traces and noisy medical-record-based labels in the UK Biobank. Genome-wide analyses using these scores replicate known loci for lung function and identify 67 new disease loci.
Genome-wide association analyses across individuals of East Asian and European ancestries identify new risk loci for inflammatory bowel diseases. A polygenic risk score derived from the combined datasets shows improved prediction accuracy.
Genomic and transcriptomic analysis of 393 non-small cell lung cancer patients treated with checkpoint inhibitors identifies molecular features associated with response.
A pan-cancer analysis of primary and metastatic tumors highlights the diversity of genetic immune escape mechanisms established during tumor evolution. The authors also present LILAC, a tool to characterize the HLA-I locus from whole-genome sequencing data.
High-resolution Micro-C is applied to characterize the effect of RNA polymerase II (RNAPII) loss on chromosome looping, finding that the formation of enhancer–promoter, but not promoter–promoter, loops are dependent on RNAPII binding to their anchors.
SOX9 titration in neural crest cells identifies regulatory elements and genes with sensitive or buffered responses. Sensitive genes are enriched for craniofacial disorder genes phenocopying SOX9, suggesting differential sensitivity contributes to phenotypic specificity.
A tomato super-pangenome constructed using chromosome-scale genomes of nine wild species and two cultivated accessions highlights genomic diversity and structural variation across wild and cultivated tomatoes.
AbSplice predicts aberrant splicing for 50 human tissues by integrating sequence-based deep learning models, DNA variation and RNA-seq obtained from accessible tissues.
Concatenating Original Duplex for Error Correction (CODEC) is a method that concatenates both strands of each DNA duplex to enable highly sensitive mutation detection in a range of analytes with fewer reads and lower error rates than current methods.
scEC&T-seq profiles extrachromosomal circular DNA and full-length mRNA from single human cancer cells, and may be used to interrogate heterogeneity in both cell lines and primary tumor samples.