To the Editor — Here, we report the VDJdb database (https://vdjdb.cdr3.net) update prepared between 2019 and 2022, marked by the emergence of SARS-CoV-2, the causative agent of COVID-19.
In 2016, we started a community effort to gather and curate publicly available sequence data acquired from T cell receptor (TCRs) with defined antigen specificities, as well as communicated datasets from our colleagues, by developing the VDJdb database, which has since been extended with a web interface that allows batch querying of adaptive immune receptor repertoire sequencing (AIRR-seq) datasets and the identification of TCR sequence motifs linked with specific epitopes1.
In the current pandemic era, a large majority of recent T cell repertoire profiling and antigen-specificity studies have focused on TCR variants that target the SARS-CoV-2 coronavirus2,3,4. As a consequence, millions of TCR sequences have now been isolated from donors with COVID-19. To complement these efforts, in the latest release of VDJdb, we incorporated TCR specificity data from various studies of COVID-19. We collected data from an international network of laboratories focused on assaying antigen-specific T cell responses in COVID-19 (Fig. 1a). Data acquired from multiple laboratories across the world feature over 3,000 TCR α and β chain sequences recognizing dozens of SARS-CoV-2 epitopes. These analyses revealed a set of reproducible TCR motifs that could find utility in large-scale clinical and experimental studies focused on COVID-19. We showed consistency and reproducibility of TCR specificity data across laboratories. Inferred TCR motifs will facilitate the tracking SARS-CoV-2-specific T cells and the discovery of immune signatures associated with protection against COVID-19. T cell antigen specificity is encoded by somatically rearranged TCRs. Current techniques allow the comprehensive profiling of TCR repertoires via high-throughput sequencing, which is compatible with various methods for elucidating the antigen specificity of T cell populations5.
The first set of TCR repertoires with known specificity for SARS-CoV-2 epitopes was acquired from the Efimov laboratory4. This work prioritized the HLA-A*02-restricted YLQ and RLQ epitopes, producing 573 VDJdb records (unpaired TCR α and β chains), which were subsequently detected in other studies and served as a template for the first SARS-CoV-2-specific TCR–peptide–MHC crystal structures6. This submission was followed by a number of studies from different laboratories performed in 2021. One dataset reported multiple TCR sequences specific for SARS-CoV-2 epitopes restricted by HLA-A*247, a prominent HLA class I allotype among indigenous Asian populations. A report from the Kedzierska laboratory complemented these data with the addition of TCR sequences specific for SARS-CoV-2 epitopes restricted by HLA-A*02, HLA-A*24 and HLA-B*073. A large set of paired TCRαβ sequences specific for a range of SARS-CoV-2 epitopes was acquired from the Thomas laboratory8. Smaller datasets were also imported from other published works and private communications (all listed in the issue section of the VDJdb github repository), including one notable study that reported TCR sequences specific for SARS-CoV-2 epitopes restricted by HLA class II allotypes9. In total, the current VDJdb release features 3,187 unique TCR specificity records spanning 46 distinct SARS-CoV-2 epitopes (Fig. 1b and Supplementary Table 1).
An important test of consistency for any biological dataset is independent reproducibility, and TCR repertoire sequencing in particular is prone to methodological and operator-dependent biases. To explore potential biases in the SARS-CoV-2-related VDJdb dataset, we performed a comparative analysis of TCR α and β chain specificity records for the most widely studied epitope, YLQ-HLA-A*02. No preferential clustering of these specificity records was observed across laboratories (Fig. 1c, top), while the overall structure of the TCR similarity map was preserved, suggesting that different laboratories sampled uniformly from the same space of epitope-specific TCR sequences.
Conversely, the independently generated data validated a set of TCR complementarity-determining region 3 (CDR3) sequences, which clustered as clearly defined motifs across different laboratories (Fig. 1c). Of note, the most commonly obtained CDR3 sequences were used successfully in crystallographic studies to generate ternary structures6, providing new insights into the molecular mechanisms that underpin TCR recognition of the YLQ epitope in complex with HLA-A*02.
Imprints of common infections can be detected in TCR repertoire sequencing datasets10, which in turn can be used to predict immune responses and stratify patients with COVID-195. VDJdb has been used successfully in the past for similar purposes and currently serves as a benchmark standard for testing TCR-specificity prediction algorithms2. In this work we demonstrated that the COVID-19 TCR-specificity compendium is unaffected by inter-laboratory biases and thus can be employed as a reference in TCR repertoire annotation. These precedents suggest that VDJdb can be used in the future to build classifiers trained to identify biologically relevant T cell responses in patients with COVID-19. Overall, we anticipate that the present release will enhance the versatility of VDJdb in the pandemic era, supporting the development of more effective vaccines and addressing future challenges associated with viral evolution and the emergence of new pathogens beyond SARS-CoV-2.
Data availability
All code and data are available at https://github.com/antigenomics/vdjdb-db, https://github.com/antigenomics/vdjdb-motifs and https://github.com/antigenomics/vdjdb-web, released under open-source Apache 2.0 and CC BY-ND 4.0 licenses.
References
Dolton, G. et al. Front. Immunol. 9, 1378 (2018).
Nguyen, T. H. O. et al. Immunity 54, 1066–1082.e5 (2021).
Shomuradova, A. S. et al. Immunity 53, 1245–1257.e5 (2020).
Shoukat, M. S. et al. Cell Rep. Med. 2, 100192 (2021).
Bagaev, D. V. et al. Nucleic Acids Res. 48, D1057–D1062 (2020).
Chaurasia, P. et al. J. Biol. Chem. 297, 101065 (2021).
Rowntree, L. C. et al. Immunol. Cell Biol. https://doi.org/10.1111/imcb.12482 (2021).
Minervina, A. A. et al. Nat. Immunol. 23, 781–790 (2022).
Verhagen, J. et al. Clin. Exp. Immunol. 205, 363–378 (2021).
Pogorelyy, M. V. et al. Genome Med. 10, 68 (2018).
Acknowledgements
This work was supported by a grant from the Ministry of Science and Higher Education of the Russian Federation (075-15-2019-1789). Additional funds were provided by the National Health and Medical Research Council (NHMRC; Australia) via a Leadership Investigator Grant (no. 1173871 to K.K.), the Research Grants Council of the Hong Kong Special Administrative Region, China (no. T11-712/19-N to K.K.) and the Medical Research Future Fund (Australia; no. 2005544 to K.K.). T.H.O.N. was supported by an NHMRC Emerging Leadership Level 1 Investigator Grant (no. 1194036). E.B.C. was supported by an NHMRC Peter Doherty Fellowship (no. 1091516). D.A.P. was supported by a Wellcome Trust Senior Investigator Award (UK; 100326/Z/12/Z). G.A.E. was supported by Russian Science Foundation Grant (20-15-00395).
Author information
Authors and Affiliations
Contributions
M.G., M.S., D.S. and I.Z. proofread and incorporated sequencing data into the database and performed statistical analysis. D. Bagaev and D. Bolotin implemented, hosted and supported the web interface for the database. P.G.T., A.A.M., M.V.P., K.L., J.E.M., D.A.P., T.H.O.N., L.C.R., E.B.C., K.K., G.D., C.R.R., A.S., J.S., F.L., K.V.Z., A.A.K., S.A.S. and G.A.E. gathered, formatted and submitted sequencing data to the database. M.S., I.Z. and D.C. designed and curated the study. M.S., D.C., D.A.P., P.G.T., K.K., F.L., G.A.E. and A.S. wrote and edited the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflicts of interest.
Peer review
Peer review information
Nature Methods thanks Sam Darko, Baojun Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Supplementary information
Supplementary Information
Supplementary Table 1, Supplementary Methods
Rights and permissions
About this article
Cite this article
Goncharov, M., Bagaev, D., Shcherbinin, D. et al. VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2. Nat Methods 19, 1017–1019 (2022). https://doi.org/10.1038/s41592-022-01578-0
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-022-01578-0
This article is cited by
-
The spike-specific TCRβ repertoire shows distinct features in unvaccinated or vaccinated patients with SARS-CoV-2 infection
Journal of Translational Medicine (2024)
-
Single-cell characterisation of tissue homing CD4 + and CD8 + T cell clones in immune-mediated refractory arthritis
Molecular Medicine (2024)
-
Activation-based repertoire analysis for T cell clonal dynamics in hybrid COVID-19 immunity
Nature Immunology (2024)
-
Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells
Nature Communications (2024)
-
Repeated mRNA vaccination sequentially boosts SARS-CoV-2-specific CD8+ T cells in persons with previous COVID-19
Nature Immunology (2024)