-
-
Article
| Open AccessDeciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes
Predicting the effects of mutations in a species is a major challenge in genetics. Here, the authors investigate protein sequence landscapes using diverged E. coli sequences, to predict tolerated mutations and capture interactions between mutations.
- Lucile Vigué
- , Giancarlo Croce
- & Martin Weigt
-
Article
| Open AccessMapping the glycosyltransferase fold landscape using interpretable deep learning
Glycosyltransferases (GT) are proteins that display extensive sequence and functional variation on a subset of 3D folds. Here, the authors use interpretable deep learning to predict 3D folds from sequence without the need for sequence alignment, which also enables the prediction of GTs with new folds.
- Rahil Taujale
- , Zhongliang Zhou
- & Natarajan Kannan
-
Article
| Open AccessflDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions
The authors present flDPnn, a computational tool for disorder and disorder function predictions from protein sequences. flDPnn was assessed with the data from the “Critical Assessment of Protein Intrinsic Disorder Prediction” experiment and on an independent and low-similarity test dataset, which show that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions.
- Gang Hu
- , Akila Katuwawala
- & Lukasz Kurgan
-
Article
| Open AccessLarge-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences
Our understanding of the residue-level details of protein interactions remains incomplete. Here, the authors show sequence coevolution can be used to infer interacting proteins with residue-level details, including predicting 467 interactions de novo in the Escherichia coli cell envelope proteome.
- Anna G. Green
- , Hadeer Elhabashy
- & Debora S. Marks
-
Article
| Open AccessMolecular determinants underlying functional innovations of TBP and their impact on transcription initiation
The TATA-box binding protein (TBP) is required for transcription initiation in archaea and eukaryotes. Here the authors delineate how TBP’s function has evolved new functional features through context-dependent interactions with various protein partners.
- Charles N. J. Ravarani
- , Tilman Flock
- & Santhanam Balaji
-
Article
| Open AccessDIP/Dpr interactions and the evolutionary design of specificity in protein families
Dpr (Defective proboscis extension response) and DIP (Dpr Interacting Proteins) are immunoglobulin-like cell-cell adhesion proteins that form highly specific pairwise interactions, which control synaptic connectivity during Drosophila development. Here, the authors combine a computational approach with binding affinity measurements and find that DIP/Dpr binding specificity is controlled by negative constraints that interfere with non-cognate binding.
- Alina P. Sergeeva
- , Phinikoula S. Katsamba
- & Barry Honig
-
Article
| Open AccessClustering huge protein sequence sets in linear time
Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.
- Martin Steinegger
- & Johannes Söding
-
Article
| Open AccessIn silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design
Antimicrobial peptides are considered promising alternatives to antibiotics. Here the authors developed a computational algorithm that starts with peptides naturally occurring in plants and optimizes this starting material to yield new variants which are highly distinct from the parent peptide.
- William F. Porto
- , Luz Irazazabal
- & Octavio L. Franco