The development of artificial intelligence algorithms typically demands abundant high-quality data. In medicine, the datasets that are required to train the algorithms are often collected for a single task, such as image-level classification. Here, we report a workflow for the segmentation of anatomical structures and the annotation of pathological features in slit-lamp images, and the use of the workflow to improve the performance of a deep-learning algorithm for diagnosing ophthalmic disorders. We used the workflow to generate 1,772 general classification labels, 13,404 segmented anatomical structures and 8,329 pathological features from 1,772 slit-lamp images. The algorithm that was trained with the image-level classification labels and the anatomical and pathological labels showed better diagnostic performance than the algorithm that was trained with only the image-level classification labels, performed similar to three ophthalmologists across four clinically relevant retrospective scenarios and correctly diagnosed most of the consensus outcomes of 615 clinical reports in prospective datasets for the same four scenarios. The dense anatomical annotation of medical images may improve their use for automated classification and detection tasks.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The main data supporting the results in this study are available within the paper and its Supplementary Information. The slit-lamp images and patient data used in this study cannot be shared publicly, yet with certain restrictions they may be available from the corresponding authors for research purposes.
Carneiro, G. et al. (eds) Deep Learning and Data Labeling for Medical Applications: First International Workshop, LABELS 2016, and Second International Workshop, DLMIA 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, October 21, 2016, Proceedings Vol. 10008 (Springer, 2016).
Wiener, J. M. & Tilly, J. Population ageing in the United States of America: implications for public programmes. Int. J. Epidemiol.31, 776–781 (2002).
Jauhar, S. When rules for better care exact their own cost. The New York Times (5 Jan 1999).
Zhang, M. & Zhou, Z. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng.26, 1819–1837 (2014).
Cho, J., Lee, K., Shin, E., Choy, G. & Do, S. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? Preprint at arXiv https://arxiv.org/abs/1511.06348 (2015).
Nosowsky, R. & Giordano, T. J. The health insurance portability and accountability act of 1996 (HIPAA) privacy rule: implications for clinical research. Annu. Rev. Med.57, 575–590 (2006).
Hand, D. J., Smyth, P. & Mannila, H. Principles of Data Mining (Adaptive Computation and Machine Learning) (MIT Press, 2001).
Wang, L. et al. Comparative analysis of image classification methods for automatic diagnosis of ophthalmic images. Sci. Rep.7, 41545 (2017).
Long, E. et al. An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nat. Biomed. Eng.1, 0024 (2017).
Dart, J., Stapleton, F. & Minassian, D. Contact lenses and other risk factors in microbial keratitis. Lancet338, 650–653 (1991).
Guillaumin, M., Küttel, D. & Ferrari, V. Imagenet auto-annotation with segmentation propagation. Int. J. Comput. Vis.110, 328–348 (2014).
Krishna, R. et al. Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis.123, 32–73 (2017).
Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. Preprint at arXiv https://arxiv.org/abs/1902.09063 (2019).
Nowak, S. & Rüger, S. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In Proc. International Conference on Multimedia Information Retrieval (Eds. Wang, J. Z. & Nozha, B.) 557–566 (ACM, 2010).
Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging34, 1993–2024 (2014).
Maier-Hein, L. et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun.9, 5217 (2018).
Miller, G. A. WordNet: a lexical database for English. Commun. ACM38, 39–41 (1995).
Gerstenblith, A. T. & Rabinowitz, M. P. The Wills Eye Manual: Office and Emergency Room Diagnosis and Treatment of Eye Disease (Lippincott Williams & Wilkins, 2012).
Kanski, J. J. Signs in Ophthalmology: Causes and Differential Diagnosis E-Book (Elsevier Health Sciences, 2010).
Crick, R. P. & Khaw, P. T. A Textbook of Clinical Ophthalmology: a Practical Guide to Disorders of the Eyes and Their Management (World Scientific, 2003).
Paul, R.-E. & Whitcher, J. Vaughan and Asbury’s General Ophthalmology 17th edn (McGraw Hill, 2007).
Trebeschi, S. et al. Deep learning for fully-automated localization and segmentation of rectal cancer on multiparametric MR. Sci. Rep.7, 5301 (2017).
Collier, S. A. et al. Estimated burden of keratitis—United States, 2010. MMWR Morb. Mortal. Wkly Rep.63, 1027–1030 (2014).
Wong, T. Y. & Bressler, N. M. Artificial intelligence with deep learning technology looks into diabetic retinopathy screening. JAMA316, 2366–2367 (2016).
Jannin, P. et al. Validation of medical image processing in image-guided therapy. IEEE Trans. Med. Imaging21, 1445–1449 (2002).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA316, 2402–2410 (2016).
Verghese, A., Shah, N. H. & Harrington, R. A. What this computer needs is a physician: humanism and artificial intelligence. JAMA319, 19–20 (2017).
Kovler, I. et al. Haptic computer-assisted patient-specific preoperative planning for orthopedic fractures surgery. Int. J. Comput. Assist. Radiol. Surg.10, 1535–1546 (2015).
Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging34, 1993–2024 (2015).
Pascolini, D. & Mariotti, S. P. Global estimates of visual impairment: 2010. Br. J. Ophthalmol.96, 614–618 (2012).
Rosenblatt, R. A., Andrilla, C. H., Curtin, T. & Hart, L. G. Shortages of medical personnel at community health centres: implications for planned expansion. JAMA295, 1042–1049 (2006).
Bodack, M. I., Chung, I. & Krumholtz, I. An analysis of vision screening data from New York City public schools. Optometry81, 476–484 (2010).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015).
Claassen, J. A. H. R. The gold standard: not a golden standard. Brit. Med. J.330, 1121–1121 (2005).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.39, 1137–1149 (2017).
We thank the following clinicians and clinical technicians for participating in data annotation: K. Chen, X. Zhao, D. Wu, T. Yu, X. Li, Y. An, Q. Wu, R. Li, X. Huang, Y. Li, C. Huang, J. Feng, J. Ye, X. Zhang, B. Lin, H. Ma, Y. Chen, X. Cui, H. Bai, T. Feng, X. Liu, J. Lu, Y. Zhou, H. Zhong, Q. Wang, Z. Wang and C. Huang. This study was funded by the National Key R&D Program of China (no. 2018YFC0116500), the Guangdong Science and Technology Innovation Leading Talents (no. 2017TX04R031), the Science and Technology Planning Projects of Guangdong Province (no. 2019B030316012), the Science Foundation of China for Distinguished Young Scholars (no. 81822010), the National Natural Science Foundation of China (nos. 81770967, 81873675 and 81800810), the Science and Technology Planning Projects of Guangdong Province (no. 2018B010109008) and the Fundamental Research Funds of the Innovation and Development Project for Outstanding Graduate Students in Sun Yat-sen University (no. 19ykyjs36). These sponsors and funding organizations had no role in the design or performance of this study.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary methods, figures and tables.
Output of the DSV, answers of the ophthalmologists and reference standards for the retrospective external-validation dataset.
Output of the DSV and reference standards for the prospective dataset.
Demonstration of the Visionome website.
About this article
Cite this article
Li, W., Yang, Y., Zhang, K. et al. Dense anatomical annotation of slit-lamp images improves the performance of deep learning for the diagnosis of ophthalmic disorders. Nat Biomed Eng 4, 767–777 (2020). https://doi.org/10.1038/s41551-020-0577-y