Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Matters Arising
  • Published:

The pitfalls of negative data bias for the T-cell epitope specificity challenge

Matters Arising to this article was published on 05 October 2023

The Original Article was published on 06 March 2023

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic overview of the two approaches commonly used for generating negative TCR–epitope data.
Fig. 2: ROC curves of PanPep tested on shuffled negative data.

Data availability

The data used to obtain the results is available on GitHub at https://github.com/PigeonMark/PanPep-Shuffled-Negatives and on Zenodo at https://doi.org/10.5281/zenodo.7798691.

Code availability

All scripts used to obtain the results are available on GitHub at https://github.com/PigeonMark/PanPep-Shuffled-Negatives and on Zenodo at https://doi.org/10.5281/zenodo.7798691.

References

  1. Gao, Y. et al. Pan-peptide meta learning for T-cell receptor–antigen binding recognition. Nat. Mach. Intell. 5, 236–249 (2023).

  2. Narla, A., Kuprel, B., Sarin, K., Novoa, R. & Ko, J. Automated classification of skin lesions: from pixels to practice. J. Invest. Dermatol. 138, 2108–2110 (2018).

    Article  Google Scholar 

  3. Seyyed-Kalantari, L., Zhang, H., McDermott, M. B. A., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).

    Article  Google Scholar 

  4. Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).

    Article  Google Scholar 

  5. Pavlović, M. et al. Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics. Preprint at https://doi.org/10.48550/arXiv.2204.09291 (2023).

  6. Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).

    Article  Google Scholar 

  7. Hudson, D., Fernandes, R. A., Basham, M., Ogg, G. & Koohy, H. Can we predict T cell specificity with digital biology and machine learning? Nat. Rev. Immunol. 23, 511–521 (2023).

  8. Krogsgaard, M. & Davis, M. M. How T cells ‘see’ antigen. Nat. Immunol. 6, 239–245 (2005).

    Article  Google Scholar 

  9. Meysman, P. et al. Benchmarking solutions to the T-cell receptor epitope prediction problem: IMMREP22 workshop report. ImmunoInformatics 9, 100024 (2023).

  10. Zhang, W. et al. A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity. Sci. Adv. 7, eabf5835 (2021).

    Article  Google Scholar 

  11. Bekker, J. & Davis, J. Learning from positive and unlabeled data: a survey. Mach. Learn. 109, 719–760 (2020).

  12. Moris, P. et al. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Brief. Bioinform. 22, bbaa318 (2021).

    Article  Google Scholar 

  13. Grazioli, F. et al. On TCR binding predictors failing to generalize to unseen peptides. Front. Immunol. 13, 1014256 (2022).

  14. Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: a survey. ACM Comput. Surv. 41, 1–58 (2009).

  15. Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, e0220113 (2019).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

C.D. performed the study. C.D. and P.M. wrote the manuscript. W.B., K.L. and P.M. conceived and supervised the study. W.B., P.M. and K.L. revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Pieter Meysman.

Ethics declarations

Competing interests

K.L. and P.M. hold shares in ImmuneWatch, an immunoinformatics company.

Peer review

Peer review information

Nature Machine Intelligence thanks Geir Kjetil Sandve for their contribution to the peer review of this work. Primary Handling Editor: Dr Liesbeth Venema, in collaboration with the Nature Machine Intelligence Editorial Team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Methods and data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dens, C., Laukens, K., Bittremieux, W. et al. The pitfalls of negative data bias for the T-cell epitope specificity challenge. Nat Mach Intell 5, 1060–1062 (2023). https://doi.org/10.1038/s42256-023-00727-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00727-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing