The prowess of protein language models (PLMs) has been demonstrated in handling various tasks, such as protein structure prediction, function analysis and engineering, and novel protein design. Transformers, a deep learning architecture that excels in learning relationships in sequence data, have been commonly employed as the backbone of PLMs, first being pretrained on huge datasets of protein sequences to become versed in the ‘language’ of the protein universe and then adapted for multiple downstream tasks. However, their remarkable performances come at a cost of high computational burden, limiting the length of protein sequences they can digest. Curious to know whether transformers were the only architecture that would work for protein language models, Kevin Yang and colleagues at Microsoft Research New England explored the potential of using another architecture to build PLMs.
The team experimented with convolutional neural networks (CNNs), which were developed earlier than transformers in deep learning research and also widely applied to biological data analysis. One of CNNs’ major appealing features is their linear scalability in sequence length, compared to quadratic scalability of transformers. Yang and colleagues built a series of CNN-based protein language models called CARP (convolutional autoencoding representations of proteins) using the same pretraining strategy and dataset as the popular existing transformer-based PLM ESM. When comparing these models on both the pretraining tasks and a number of downstream tasks (for example, prediction of protein structure, mutation effect, fitness, fluorescence and stability), to their surprise, the overall performance of CARP was on par with, and in some cases even better than, ESM. Furthermore, “We were surprised that, for both architectures, downstream performance did not necessarily improve for bigger models with better pretrain performance,” says Yang.
This is a preview of subscription content, access via your institution