Circular functional analysis of OCT data for precise identification of structural phenotypes in the eye

Progressive optic neuropathies such as glaucoma are major causes of blindness globally. Multiple sources of subjectivity and analytical challenges are often encountered by clinicians in the process of early diagnosis and clinical management of these diseases. In glaucoma, the structural damage is often characterized by neuroretinal rim (NRR) thinning of the optic nerve head, and other clinical parameters. Baseline structural heterogeneity in the eyes can play a key role in the progression of optic neuropathies, and present challenges to clinical decision-making. We generated a dataset of Optical Coherence Tomography (OCT) based high-resolution circular measurements on NRR phenotypes, along with other clinical covariates, of 3973 healthy eyes as part of an established clinical cohort of Asian Indian participants. We introduced CIFU, a new computational pipeline for CIrcular FUnctional data modeling and analysis. We demonstrated CIFU by unsupervised circular functional clustering of the OCT NRR data, followed by meta-clustering to characterize the clusters using clinical covariates, and presented a circular visualization of the results. Upon stratification by age, we identified a healthy NRR phenotype cluster in the age group 40–49 years with predictive potential for glaucoma. Our dataset also addresses the disparity of representation of this particular population in normative OCT databases.

Progressive optic neuropathies such as glaucoma can cause irreversible blindness, especially when left untreated or diagnosed late. Indeed, early detection and management hold the key to slowing the progressive loss of vision and preventing blindness due to many chronic and age-related degenerative eye diseases. Glaucoma, for instance, is the second-leading cause of blindness worldwide 1 . In 2020, an estimated 80 million individuals worldwide had glaucoma and this number is expected to increase to over 111 million by 2040 2 .
There are multiple sources of subjectivity and analytical challenges that are often encountered by the clinicians in the process of early diagnosis and clinical management of degenerative eye diseases. In glaucoma, the functional damage is established most commonly by the occurrence of visual field (VF) loss whereas the structural damage is often characterized by neuroretinal rim (NRR) thinning of the optic nerve head (ONH), and loss of retinal nerve fibers, which are the axons of retinal ganglion cells (RGC). Such thinning could be measured in terms of reduction in either NRR area or NRR thickness. On the functional side, while standard automated perimetry (SAP) has been the gold standard for detection of VF loss, often 30% of RGC loss may have already occurred before VF defects could be detected by SAP 3 .
On the structural side, biological heterogeneity of ONH phenotypes, with or without any neuropathy, can present challenges to clinical decision-making. For instance, the NRR area has been found to normally decline at the rate of 0.28%-0.39% per year 4 . There is no single, specific management guidance for patients with diverse morphology of ONH 5 . For instance, to assess the progression of glaucoma, one of the parameters assessed is the optic cup to optic disc ratio (CDR) which is calculated by comparing the diameter of the "cup" portion of the optic disc with the total diameter of the latter. Yet, while a large CDR may indicate glaucoma or other pathology, www.nature.com/scientificreports/ 6 months, and any retinal (including macular) or other neurologic diseases that could confound the structural measurements with SD-OCT. The performance of the OCT layer segmentation algorithms can be affected by poor image quality, leading to erroneous demarcation of the retinal layers and inaccurate measurements. We excluded any OCT scan image sample from our study with poor image quality, signal strength < 6, motion artifacts, blinking artifacts, misidentification of inner and outer retinal layers, and off-center artifacts. Several independent studies, including by some of the authors of the present study, have shown that signal strength reduction is associated with decreased accuracy of nerve fiber layer thickness measurement by OCT, which may be erroneously interpreted as presence of glaucomatous damage on a cross-sectional evaluation or when multiple scans are compared [20][21][22] . For the Cirrus SD-OCT platform used in the present study, we followed the manufacturer's definition of scans of adequate quality to be those of signal strength 6 or above (within a range from 0 to 10) 17 .
Healthy eyes were defined by the absence of anterior and posterior pathology. Each digital optic disc photograph was evaluated by three glaucoma specialists independently. The specialists were masked to the other clinical findings and the other imaging outcomes of the subjects. Eyes were excluded from the study in case of any disagreements among the specialists. All participants underwent a comprehensive ophthalmic examination which included detailed medical and systemic history. The means of clinical determination included bestcorrected visual acuity measurement, slit-lamp photographs (Topcon, Bauer Drive, Oakland, NJ), Goldmann applanation tonometry (Hagg-Streit AT 900, Hagg-Streit AG, Switzerland), gonioscopy with a Sussman four mirror gonioscope (Volk Optical Inc, Mentor, Ohio, USA), dilated fundus examination, central corneal thickness (CCT) assessment, Humphrey visual fields (HVF) with 24-2 Swedish Interactive threshold algorithm (Carl Zeiss Meditec Inc. Dublin, CA). Visual fields (VF) were considered if false positive, false negative, and fixation losses were less than 20%, and all the stereophotographs of the optic disc had good quality.
In addition, digital optic disc photography and SD-OCT imaging with Cirrus HD-OCT (software version 9.0.0.281; Carl Zeiss Meditec, Dublin, CA, USA) were used. This is a computerized instrument that acquires and analyzes cross-sectional and three-dimensional tomograms of the eye using SD-OCT technology. The instrument's algorithm automatically identifies the optic disc margin as the termination of Bruch's membrane (BM). BM opening (BMO) is used as the landmark to measure the amount of NRR tissue in the optic nerve. Optic Disc Cube 200 × 200 protocol was used to scan the ONH and peripapillary area through a 6 mm square grid, which consists of 200 horizontal linear B-scans and each composed of 200 A-scans. First, the Cirrus HD-OCT algorithm identifies the center of ONH and then automatically places a calculation circle of 3.46 mm diameter evenly around it. The circular scan starts at an extreme temporal point and moves around the ring in the superior direction, then nasal, then inferior, then back to temporal (TSNIT). The circular measurements are made clockwise for the right eye and counter-clockwise for the left eye. NRR thickness is measured by the amount of neuro-retinal tissue in the optic nerve around the entire edge of the optic disc. Zeiss Cirrus HD-OCT used Bruch's membrane opening-minimum rim width (BMO-MRW) to measure the rim area. The BMO-MRW is the shortest distance from BMO to the retinal internal limiting membrane. The advanced export functionality was used to record the NRR thickness values at 180 points in TSNIT order spaced evenly by 2 degrees (from 2° to 360°) around the circle. We refer to this as our NRR OCT high-resolution circular data. The data were stratified into 3 age groups: (1) 40-49 years, (2) 50-59 years, and (3) 60 years and older.

Methods.
We describe CIFU pipeline for circular functional modeling and clustering of OCT NRR data, followed by metaclustering-based clinical characterization of the clusters identified by CIFU. The steps of the CIFU pipeline are graphically illustrated in Fig. 1.
Circular functional modeling and clustering. First, we introduce a method for clustering OCT NRR data into K homogeneous groups of samples, i.e., eyes. As an unsupervised approach, the clustering is based only on NRR data as input, and not any clinical variables of the samples. While the actual OCT measurements are taken on a discrete grid of angles around the center of ONH, in principle, these finely indexed measurements are assumed to vary continuously around a circular scale ranging from 0 to 360 degrees, and wrapped around. Thus, for statistical modeling, we will refer to the OCT data as a collection of n circular curves X 1 (t), X 2 (t) . . . , X n (t), where X i (t) represents the NRR thickness in the i th sample ( i = 1, 2, . . . n ) measured at angle t ∈ [0, 360] and around a common center. The angular indices t l , where l = 1, . . . 180 , at which the curves are actually measured are aligned for all samples, and spaced 2 degrees apart. Specifically, the l-th measurement angle is t l = 2(l − 1).
Our modeling begins with the use of p basis functions for capturing the functional nature of data. If ψ 1 , ψ 2 ,..., ψ p are the basis functions with the associated basis expansion coefficients γ ij , where i = 1, 2, . . . n and j = 1, 2, . . . p , then the functional approximation for the i th curve at point t , X i (t) , is given by Different sources of artifacts could be present in OCT data 23 . As mentioned above, we exclude samples with such artifacts, yet OCT NRR curves with highly unusual shapes might appear rarely. While precise characterization of an outlier curve may be difficult 24 , we used an intuitive criterion for detecting outlier NRR curves using Eq. (2) below, in which the factor 3.5 is based on our exploratory analysis. A given curve X i is considered an outlier if there exists a point t l for which X i (t l ) exceeds the following threshold (in terms of mean and sd defined below) www.nature.com/scientificreports/ After the outlier curves were removed, to allow for comparison of the curves based on their shapes rather than the magnitude, we normalize each curve X i (t) by dividing it by 360 0 X i (t)dt , where the integral is approximated numerically. Note, the normalization step is optional in the CIFU pipeline.
As the NRR thickness values are measured radially around a circle (Fig. 2), they are nonnegative and have the natural periodicity of 360 degrees, so that the value corresponding to 5 degrees is the same as that at 365 degrees. After performing the normalization described above, they also contain a unit area on [ 0, 360), and thus possess the properties of a probability density around the circle. Such "circular densities" can be expressed in terms of an infinite series of Fourier coefficients 12 and then approximated by the first p terms, for a sufficiently large p , of the expression in Eq. (1). The idea is similar to approximating a continuous function by a polynomial of high enough degree, except that the Fourier basis also retains the periodicity that is critical to interpretation of the normalized OCT NRR curves.
Our objective is to cluster each of the curves described above into a pre-specified number ( K ) of clusters. Then, we used a discriminative functional mixture (DFM) model given by Bouveyron et al. 23 in which γ i = γ i1 , . . . , γ ip t of curve X i follows a finite mixture model of K Gaussian components with density function where π k ≥ 0 is the mixing proportion of the k th component (i.e., the cluster k ) such that K k=1 π k = 1 , and ∅ is the standard Gaussian density function. Here, U is a p × d orthogonal matrix mapping the basis coefficients γ into the discriminative subspace (of dimension d < p ) through a linear transformation. Similarly, µ k and k are the mean vector and covariance matrix (for cluster k ) of γ mapped into the discriminative subspace, and the noise of the above transformation is normally distributed with mean zero and covariance .
The DFM model was fit with an Expectation-Maximization (EM) algorithm implemented in the R package funFEM 24 . EM is a popular iterative method used for optimal fitting of a model and estimation of the model parameters 25 . Following the idea of constraining variance parameters in Fraley and Raftery 26 , the funFEM package allows 12 different choices of DFM models. As an initial step, using different restrictions on the noise covariance matrix , a preliminary model search was run with NRR data, with outlier curves included, over all 12 models and a flexible set of (61) basis functions. Based on the results of the initial run, and after removing the outlier curves as described below, the clustering algorithm was run using a smaller selective set of (21) basis functions.
(2) www.nature.com/scientificreports/ To avoid model overfitting, we determined the smallest number of basis functions ( p ) that recover the input curves sufficiently well, as determined by the fraction of variation explained ( FVE ) as described below. Let the sample mean and standard deviation of n given curves be respectively Then the total variation (TV) is given by and the fraction of variation explained (FVE) by Figure 2. For 3 real samples, selected from each age group (1 to 3 from left to right), the NRR thickness data measured for 180 evenly-spaced points around the circle is shown in the top panel. The corresponding circular functional approximations are shown as pre-and post-normalized NRR curves in the middle and the bottom panels respectively. The direction of the NRR curves is given by TSNIT (clockwise). www.nature.com/scientificreports/ where the integrals are again approximated numerically using the discrete observations X i (t l ). We used FVE as the criterion for selecting an optimal number of basis functions ( p ) by identifying the smallest value of p for which FVE exceeds 0.99, i.e., 99% ( Supplementary Fig. S1).
Finally, the number of clusters identified by the DFM models for each age group was determined by 3 popular model selection criteria: AIC (Akaike Information Criterion) 27 , BIC (Bayesian Information Criterion) 28 , and ICL (Integrated Complete Likelihood) 29 .
For intuitive visualization of the clustering results, we plotted the curves of every cluster in a distinct color using a circular scale. The mean curve of each cluster, as computed by (4), is included as a bold black curve, which serves as a cluster-specific template.
The R modules of CIFU for fitting the basis functions to OCT data, clustering and visualizing them as circular curves are available from the authors upon request. For each age-group, the de-identified and normalized OCT NRR data for each eye, along with the CIFU-assigned cluster Id, are given in the Supplementary Table S1. For each CIFU-identified cluster, its size, the estimated basis coefficients of its mean circular curve, and its total variation given by the trace of the sample covariance matrix are given in the Supplementary Table S2.
Comparative clustering analysis. We performed comparative analysis of our circular functional clustering method against three popular non-functional data clustering methods, namely, k-means 30 , Partitioning Around Medoids (PAM 30 ) which is similar to kmeans but uses higher dimensional medoids in place of means, and Gaussian mixture model (as implemented in Mclust 26 ). Each of these methods were run with the same OCT NRR data points, i.e., not as NRR curves, of each age group. Each method was run with a possible choice of fitting K = 2 through 10 clusters for each age group. Thus, we ran the 3 alternative clustering methods for 3 age groups for 9 possible values of K . The optimal number of clusters identified by each method was determined with the Average Silhouette Width (ASW), while Dunn Index was calculated as a measure of inter-cluster variation. We used the R packages 'optCluster' 31 and 'factoextra' 32 for clustering, validation and visualization.

Metaclustering and clinical characterization of clusters.
In the metaclustering step, the clusters identified by circular functional data were grouped based on their samples' similarity in terms of a selected set of clinical variables that are known covariates of glaucoma. A feature selection step was performed simultaneously to detect the covariates that were the most distinctive across the metaclusters. The metaclustering workflow consists of the following steps: 1. In each age group, we performed agglomerative hierarchical clustering of the clusters given by their mean covariate data with complete linkage, while simultaneously doing feature selection to select a sparse set of covariates that are the most distinctive across the metaclusters. 2. We plotted the metaclusters (identified in Step 1) with age group-specific dendrograms. A flat cut of the dendrograms at a common height threshold was used to distinguish the metaclusters in each age group. The metaclusters that correspond across the age groups are shown as subtrees of matched colors. 3. We visualized using contour plots the corresponding metaclusters of each age group to compare the distributions of the selected covariates across the metaclusters as well as the age groups.
In step 1, a set of 9 covariates were used based on their clinical relevance. The R package 'sparcl' 33 was used for agglomerative and sparse hierarchical metaclustering in step 2; the feature selection in this package is done by varying the values of its 'wbound' parameter from 2 to 5.

Results
The CIFU pipeline was run with OCT NRR thickness data and clinical assessment data of a normal cohort consisting of 3973 healthy eyes. The steps of the pipeline began with stratification of the OCT and clinical data by age into 3 age groups with (1) 1841, (2) 1351, and (3) 781 samples respectively. The list of clinical variables is summarized in Table 1. An identical sequence of steps of analysis was followed by CIFU within each age group.
The 180-point data for each sample (eye) were modeled using p = 11 Fourier basis functions. We chose p = 11 since it was the smallest value of p for which FVE , as given in Eq. (6), exceeded 99% ( Supplementary Fig. S1). The curves were normalized and aligned to a common starting angle of 0 degrees to allow for comparison of their shapes around the center of ONH. Using Eq. (2), the outlier OCT curves were removed: 6 samples from age group 1, 1 from age group 2, and 5 from age group 3. Then, within each age group, the curves were clustered by a discriminative functional mixture (DFM) model as described in Eq. (3). The optimal number of clusters ( K ) for each age group was determined by 3 different well-known criteria: AIC, BIC, and ICL (described in "Methods"). These criteria showed overall strong agreement attesting to optimal model selection as seen in Supplementary  Fig. S2. Based on the value of K beyond which no significant gain was noted in these criteria, we determined the number of clusters, for age group 1, 2, and 3 as 7, 8, and 6 respectively. The statistics of each cluster are given in Supplementary Table S2.
The results of our circular functional clustering are shown in Fig. 3a-c as a panel of K clusters for each age group. Each cluster C within a panel consists of the circular curves for the samples that belong to C (all shown in a common color specific to C ) based on the similarity of their functional representation. To gain an intuitive understanding of the 180-point OCT data on NRR phenotypes, we used a visualization of curves as represented on a common circular scale. Unsupervised clustering of the circular functions revealed various NRR patterns in the identified clusters, some of which were distinctive whereas others have subtle differences. Notably, the www.nature.com/scientificreports/ visualization reveals the unique mean shape (or NRR "template") of each cluster as shown by a bold black circular curve in each plot of Fig. 3. The circular curve visualization allows several interesting observations. We note the consistent dip at the temporal (T) region near 0 degrees, which is a characteristic feature shared by the templates of all clusters. This is supported by the well-known ISNT rule 34 according to which, from the center of ONH, the rim is the thinnest at the temporal (T) region. Interestingly, we also observed various shapes and features in the cluster templates (such as distinctive protrusions, notches, tilts, etc.) that appear as well as vary continuously in different (non-T) regions around the circle. The clustering solution allows us to record the intra-cluster variation which could be used to quantitatively compare the dynamics (say, the rates of focal change) of corresponding clusters across age groups. In this regard, we note that the popular methods of traditional clustering of the same OCT NRR data failed to capture the distinctive shapes and other spatial features of the NRR curves. These methods yielded only 2 clusters of NRR data points (not curves) each in every age group (Supplementary Fig. S3). Further, the traditional clusters ( Supplementary Fig. S4) had low inter-cluster structural variation as measured by their small values of Dunn Index for age group (1)  In order to establish a correspondence among the clusters in different age groups as well as to characterize the samples that belong to each cluster, we conducted a metaclustering analysis. In this step, we clustered the clusters based on a set of 9 clinical variables ( Table 2) of the samples in each cluster. These are known covariates of glaucoma, and no NRR data from the previous clustering step was used. The results of sparse hierarchical metaclustering are shown in Fig. 4. The dendrograms reveal the similarities among the clusters in terms of their mean sample covariates as well as the counts of metaclusters identified at different levels of each dendrogram. Based on flat cuts of all the dendrograms (at a common height threshold of 0.1), we identified 3 metaclusters { M 1 1 , M 1 2 , M 1 3 } for age group 1; 2 metaclusters { M 2 1 , M 2 2 } for age group 2; and 2 metaclusters { M 3 1 , M 3 2 } for age group 3. Notably, all the dendrograms show the metacluster M · 2 (pink subtree) to be more heterogeneous in every age group than the metacluster M · 1 (blue subtree). Among the youngest participants, i.e., in age group 1, the metacluster M 1 3 (consisting of the original cluster 4) is distinct from the metacluster M 1 2 . A feature selection step, performed along with metaclustering, identified the optic disc cup volume and the average cup-to-disk ratio (CDR) of an eye as the most significant features in terms of the contributions of the different covariates to the metaclustering. These distinctive covariates allow us to register the correspondence of the metaclusters across the different age groups in Fig. 5, which shows the contour plots of the metaclusters in their matched colors. The 3 metaclusters M · 1 shown in blue have the smallest mean values of cup volume and CDR, the 3 metaclusters M · 2 shown in pink have comparatively higher mean values of these covariates, and the single unmatched metacluster M 1 3 shown in red (Fig. 5c) has the highest mean values of all metaclusters ( Table 2). It is interesting to consider the unmatched metacluster M 1 3 which not only has the highest mean values of the covariates (cup volume and average CDR) but is, in fact, comprised of a single, distinct cluster based on the OCT NRR phenotype data (cluster 4 in Fig. 3a). Here we note that notwithstanding a large value of CDR (especially > 0.5 ), cupping by itself is not indicative of glaucoma. In fact, it is known that deep but stable cupping can occur www.nature.com/scientificreports/ due to hereditary reasons without glaucoma (see "Discussion"). Rather, it is a change in these ONH parameters with age of the participants that is a clinical indicator of glaucoma. Since the samples included in the present study contain only healthy eyes as determined clinically by agreement of multiple glaucoma specialists, the presence of this unmatched cluster only in the youngest age group serves as a signature of healthy NRR phenotype with predictive potential for glaucoma. That is, the corresponding metacluster with such high values of these covariates among the older age groups would have the likelihood of progressing to glaucoma, and thus, is unlikely to be represented in a cohort that consists of healthy eyes only, as we have in the present study.

Discussion
Unsupervised learning of the heterogeneity of normative ONH phenotypes in a given population can provide a more comprehensive understanding of the diversity of baselines that exist for degenerative neuropathies. Such knowledge is particularly useful in glaucomas for which different ONH parameters play a combined role in early detection. For example, in a non-glaucoma multiethnic cohort of Asian individuals, the inter-eye RNFL profile was found by OCT to be less symmetric in Malays and Indians than that in Chinese eyes 35 . Not only are the structural characteristics of individual eyes known to vary racially, even their rates of change over time could be different across population groups. For instance, the rate of change of BMO-MRW was recorded as -1.82 μm/ year and -2.20 μm/year in glaucoma suspect eyes of European and African descents respectively 36 . In another multi-centered normal population study, both age-related decline and between-subject variability in BMO-MRW were observed 37 . Indeed, even the manufacturers of OCT technology noted racial differences in optic disc area, CDR, cup volume, and RNFL thickness when measured using their platform 38 . The presence of phenotypic heterogeneity makes it less justified to apply common, universal thresholds for clinical determination of glaucomatous damage in different population groups using OCT measurements, particularly in the early stages of the disease when the baselines could have stronger initial effects. To account for the effects of normal variation in ONH parameters, large and racially representative normative databases of healthy eye OCT phenotypes should be created. However, often such collections of healthy samples tend to be small or moderately sized, e.g., the normative database of Cirrus HD-OCT platform included just 284 subjects 38 . In that cohort, Caucasians represented 43%, Chinese 24%, African Americans 18%, Hispanics 12%, and others 6%. The representation of Asian Indians, in contrast, was about 1% of the Cirrus HD-OCT cohort, which does not adequately reflect the 2020 projections about India to become the second in global glaucoma numbers, surpassing Europe 39 . Thus, more than 16 million Indians could be affected by glaucomas, and nearly 1.2 million could be blinded from the disease. Some resources such as the HRT3 Normative Database, while including 104 Indian individuals, did not improve the diagnostic sensitivity or specificity for glaucoma in that group for the potential reasons of limited sample size and intra-racial variation of ocular topography 40 .
In this study, we leveraged the large population-based LVPEI-GLEAMS study to generate new and a relatively large OCT dataset based on nearly 4000 samples from normal Asian Indian participants. In fact, given that the recruitment of all the study participants was from a single geographic region (namely, the state of Andhra Pradesh), the scope of intra-racial variation to affect this dataset is limited. Moreover, the relatively large sample size of the data allowed us to identify a variety of clusters of NRR phenotypes, including the signature (cluster 4) with predictive potential for glaucoma consisting of 6.6% of all samples in the youngest age group (40-49 years). The absence of its corresponding cluster in the older healthy age groups despite their considerable sample sizes (total of 2132 samples of age 50 years and above) leads to a reasonable supposition of its potential pathological progression with increase in age, thereby resulting in lack of subsequent representation in a healthy cohort such as in the present study. Such phenotypic decline is consistent with the findings from a prospective longitudinal  41 . Importantly, independent support for the identified signature relies on its clinical characterization in terms of covariates such as cup volume and CDR, which are useful parameters for diagnosis of glaucoma suspects 42 . Despite its normal mean value of intraocular pressure (IOP) as is expected of healthy eyes, the signature cluster has the highest mean values of average CDR and cup volume of all metaclusters across all 3 age groups ( Table 2). In a recently published longitudinal study that started from baseline values and was run over a 5-year period, the covariates which had statistically significant increase in glaucomatous progression included CDR and cup volume 43 . We understand that it would perhaps be ideal to follow-up healthy individuals and measure the changes in their clinical covariates as they age in order to classify the ONH phenotypes via supervised learning. The approach of CIFU, in comparison, involves unsupervised learning of different high-resolution phenotypes in agestratified data and their characterization using key covariates, which is far less time-consuming and yet has the potential to produce a clinically insightful database for diverse populations with high phenotypic heterogeneity.
In addition to its sample size and racial representation, perhaps the most noteworthy feature of the present OCT dataset is its unique high-resolution measurements of NRR thickness around a circle. These 180 samples, collected at every 2 degrees, extend the typical use of such measurements recorded at either 4 quadrants or 12 clock-hours, and even 48 angular positions 44 , to higher-dimensional analysis. As clustering with curves show, the focal variations could be more nuanced than that suggested by a general rule, e.g., ISNT, and a capability to "zoom" into finer angular divisions can reveal further patterns 45 . In low-resolution data, it may be difficult to detect focal changes within the confines of pre-determined inflexible sectors. Moreover, the templates of the clusters could also be compared using known tests in shape analysis 46 . Be it circular data from OCT, or www.nature.com/scientificreports/ optic phenotypes in general, they seem suitable as candidate applications of circular statistics, and yet, we are unaware of any major previous studies in this regard. Further, the high-resolution also allowed the data to be closely approximated by continuous curves, and thus, specified by the corresponding functional representation. While clustering of circular point data 12,47,48 as well as clustering of curves [49][50][51] and (non-circular) functional data 13,14,23,[52][53][54] have been addressed by past studies, the present clustering of curves in the form of circular functions is possibly a novel application. We understand that the present study has certain limitations. As we noted above, a prospective cohort study would be better suited to validate the predictive glaucomatous potential of the identified NRR phenotypic signature. We plan to address this in our future work. While high-resolution data could be accessed from the OCT scans, it is not commonly done by the clinical protocols. We hope that by adding user-friendly interfaces to CIFU, we may be able to promote such data acquisition and analysis, especially since the functional representation is independent of the number of observations per sample. Notably, there are several distinct advantages of our www.nature.com/scientificreports/ approach which could be built upon further in future studies. The estimated parameters of the fitted functional mixture model could be used to test the similarity of ONH phenotypes in normal versus disease conditions, thus allowing us to characterize any changes with precision and rigor. After a database of phenotypic parameters is developed, known measures of shapes and distances between curves could be used for objective clinical classification of new OCT samples. Applied to longitudinal analyses, our high-resolution modeling could identify intermediate, or previously uncharacterized, stages of disease progression, especially by focusing on variations within fine angular sections. Focused analysis of angular sections of OCT NRR and RNFL data have revealed interesting differences between healthy and glaucoma subjects, and we plan to apply CIFU for mining locally distinctive features in higher resolution 55 . Indeed, straightforward extensions are feasible for similar circular data such as RNFL phenotypes and other optic neuropathies as well as related eye imaging platforms, e.g., OCT-Angiography (OCTA). As we have demonstrated for other biomedical platforms 51,56-59 , the new pipeline CIFU could be enhanced incrementally with different functionalities, say, to increase computational efficiency or capture the perspective of the clinical experts. The circular curve visualization introduced in the present study may lead to a more user-friendly tool for clinical purposes as we plan to make it interactive, with advanced capabilities to jointly handle data and metadata, in our future work.