Glaucoma is an optic neuropathy that results in visual field defects, and its diagnosis requires assessment of multiple domains, including optic nerve assessment on fundus exam or optical coherence tomography imaging, visual field assessment, and intraocular pressure [1]. Despite the multidimensionality and complexity of glaucoma diagnosis, many researchers tried to develop machine learning models that aid in the diagnosis or assessment of glaucoma. It is expected that machine learning models will have important impact on glaucoma patient assessment in the near future [2]. Such models mostly depended on fundus photography of optic disc, as the most important domain in glaucoma diagnosis [3]. Fundus photography used to train machine learning models are either derived from local hospital settings, where expert ophthalmologists are involved in the project to diagnose glaucoma patients and provide the ground truth for the data included, or more commonly obtained from openly available datasets.

Currently, there are multiple openly accessible fundus photography datasets commonly used by machine learning researchers. Table 1 provides the details of these commonly used datasets. The following concerns are raised about these datasets and need to be considered by prospective researchers:

  • The criteria for glaucoma diagnosis in most datasets are vaguely described and are usually dependent on clinical decision by treating physician. The glaucoma diagnosis in the studies using the dataset is usually taken for granted.

  • Most datasets did not specify the severity of glaucoma, but usually included moderate to severe cases. Training models on moderate to advanced glaucoma will not yield a significant added value, as these cases do not impose a diagnostic dilemma for ophthalmologists and are usually already on treatment [4].

  • Another issue is also raised by “glaucoma mimickers” on fundus photography, where the optic disc may appear severely cupped without the presence of glaucoma. Such mimickers include large disc area, myopic features, or even physiological cupping [5]. Models will yield false positive results if presented with these fundus photographs if the model was not properly trained on. Openly accessible datasets rarely include fundus photographs for such cases.

Table 1 Openly accessible fundus photography datasets using in machine learning models for glaucoma diagnosis.

As a low prevalence disease, a cost-effective model to screen glaucoma needs a sensitivity and specificity of more than 95% [6]. Training such a model will need a large dataset of mild to moderate glaucoma patients, along with data on glaucoma mimickers on fundus photography, a dataset that is not yet available for public use.