Article
|
Open Access
Featured
-
-
Article
| Open AccessPLMSearch: Protein language model powers accurate and fast sequence search for remote homology
Homologous protein search is one of the most commonly used methods for protein analysis. Here, authors propose PLMSearch, a search method that takes only sequences as input and can search millions of protein pairs in seconds while maintaining sensitivity comparable to SOTA structure search methods.
- Wei Liu
- , Ziye Wang
- & Shanfeng Zhu
-
Article
| Open AccessDefining the condensate landscape of fusion oncoproteins
Many fusion oncoproteins (FOs) form condensates, some form in the nucleus and regulate gene expression while others form in the cytoplasm and promote cell signaling. In this work, the authors report the analysis of physicochemical features to enable prediction of FO condensation behavior.
- Swarnendu Tripathi
- , Hazheen K. Shirnekhi
- & Richard W. Kriwacki
-
Article
| Open AccessProtein language models trained on multiple sequence alignments learn phylogenetic relationships
Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.
- Umberto Lupo
- , Damiano Sgarbossa
- & Anne-Florence Bitbol
-
Article
| Open AccessDeciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes
Predicting the effects of mutations in a species is a major challenge in genetics. Here, the authors investigate protein sequence landscapes using diverged E. coli sequences, to predict tolerated mutations and capture interactions between mutations.
- Lucile Vigué
- , Giancarlo Croce
- & Martin Weigt
-
Article
| Open AccessMapping the glycosyltransferase fold landscape using interpretable deep learning
Glycosyltransferases (GT) are proteins that display extensive sequence and functional variation on a subset of 3D folds. Here, the authors use interpretable deep learning to predict 3D folds from sequence without the need for sequence alignment, which also enables the prediction of GTs with new folds.
- Rahil Taujale
- , Zhongliang Zhou
- & Natarajan Kannan
-
Article
| Open AccessflDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions
The authors present flDPnn, a computational tool for disorder and disorder function predictions from protein sequences. flDPnn was assessed with the data from the “Critical Assessment of Protein Intrinsic Disorder Prediction” experiment and on an independent and low-similarity test dataset, which show that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions.
- Gang Hu
- , Akila Katuwawala
- & Lukasz Kurgan
-
Article
| Open AccessLarge-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences
Our understanding of the residue-level details of protein interactions remains incomplete. Here, the authors show sequence coevolution can be used to infer interacting proteins with residue-level details, including predicting 467 interactions de novo in the Escherichia coli cell envelope proteome.
- Anna G. Green
- , Hadeer Elhabashy
- & Debora S. Marks
-
Article
| Open AccessMolecular determinants underlying functional innovations of TBP and their impact on transcription initiation
The TATA-box binding protein (TBP) is required for transcription initiation in archaea and eukaryotes. Here the authors delineate how TBP’s function has evolved new functional features through context-dependent interactions with various protein partners.
- Charles N. J. Ravarani
- , Tilman Flock
- & Santhanam Balaji
-
Article
| Open AccessDIP/Dpr interactions and the evolutionary design of specificity in protein families
Dpr (Defective proboscis extension response) and DIP (Dpr Interacting Proteins) are immunoglobulin-like cell-cell adhesion proteins that form highly specific pairwise interactions, which control synaptic connectivity during Drosophila development. Here, the authors combine a computational approach with binding affinity measurements and find that DIP/Dpr binding specificity is controlled by negative constraints that interfere with non-cognate binding.
- Alina P. Sergeeva
- , Phinikoula S. Katsamba
- & Barry Honig
-
Article
| Open AccessClustering huge protein sequence sets in linear time
Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.
- Martin Steinegger
- & Johannes Söding
-
Article
| Open AccessIn silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design
Antimicrobial peptides are considered promising alternatives to antibiotics. Here the authors developed a computational algorithm that starts with peptides naturally occurring in plants and optimizes this starting material to yield new variants which are highly distinct from the parent peptide.
- William F. Porto
- , Luz Irazazabal
- & Octavio L. Franco