Automatic choroidal segmentation in OCT images using supervised deep learning methods

The analysis of the choroid in the eye is crucial for our understanding of a range of ocular diseases and physiological processes. Optical coherence tomography (OCT) imaging provides the ability to capture highly detailed cross-sectional images of the choroid yet only a very limited number of commercial OCT instruments provide methods for automatic segmentation of choroidal tissue. Manual annotation of the choroidal boundaries is often performed but this is impractical due to the lengthy time taken to analyse large volumes of images. Therefore, there is a pressing need for reliable and accurate methods to automatically segment choroidal tissue boundaries in OCT images. In this work, a variety of patch-based and fully-convolutional deep learning methods are proposed to accurately determine the location of the choroidal boundaries of interest. The effect of network architecture, patch-size and contrast enhancement methods was tested to better understand the optimal architecture and approach to maximize performance. The results are compared with manual boundary segmentation used as a ground-truth, as well as with a standard image analysis technique. Results of total retinal layer segmentation are also presented for comparison purposes. The findings presented here demonstrate the benefit of deep learning methods for segmentation of the chorio-retinal boundary analysis in OCT images.

www.nature.com/scientificreports www.nature.com/scientificreports/ probability map for each boundary used to construct a graph. Finally, the graph search, originally proposed by Chiu et al. 33 , outputs the predicted boundary location. Hamwood et al. 30 examined the effect of changing the patch size and network architecture and subsequently improved the performance as a result. Replacing the CNN with an RNN, Kugelman et al. 31 showed that a similar RNN-based approach (RNN-GS) performs competitively to a CNN one.
Similar to retinal segmentation, early methods of choroidal segmentation relied on standard image processing methods [34][35][36][37][38][39][40][41] . However, in contrast to OCT retinal layer segmentation, previous work utilising machine learning methods for choroidal segmentation has been limited. Sui et al. 42 proposed a multi-scale CNN to learn the edge weights in a graph-based approach. Here, the CNN was composed of a coarse-scale, mid-scale and fine-scale network each to learn a different set of features within the images. The output edge costs from the network were used within a graph search to delineate two choroidal boundaries (Bruch's membrane (BM) and the choroid-scleral interface (CSI)). In a similar approach, Chen et al. 43 used a fully-convolutional encoder-decoder architecture based on SegNet 44 to output edge probability maps for BM and the CSI. From here, seam carving was used to delineate the boundaries within an image by finding a path of connected pixels across the width of the image. Al-Bander et al. 45 combined superpixel clustering, image enhancement and deep learning to segment the choroid. Here, superpixel-centred patches were classified using a CNN as either choroid or non-choroid, from which the contours defining the edges of the choroid are then resolved. Devalla et al. 24 presented their Dilated-Residual U-Net (DRUNET) architecture to segment the various regions in OCT images including the retina, choroid and optic nerve head. Here, they combined the benefits of skip connections, residual connections and dilated convolutions by incorporating each into their network. Alonso-Caneiro et al. 32 extended the previously proposed patch-based approach for retinal segmentation to additionally segment the choroid-scleral interface in OCT images.
In this paper, a range of deep learning methods for OCT choroidal boundary segmentation are explored. Similar methods have been applied for retinal segmentation in the past, however the application of machine learning methods to choroidal segmentation is significantly less prevalent. This study extends upon our previous work on patch-based approaches to choroidal segmentation 32 and expands on the use of semantic segmentation architectures. Additionally, there is also limited work investigating the effect of network architecture changes and image pre-processing on the performance of a semantic segmentation approach to this problem. Here, the aim is to investigate the effect of changes in the patch size, network architecture, and image pre-processing as well as the method used (patch-based vs semantic segmentation). For each, the impact on performance was primarily evaluated by comparing the segmentation performance on the chorio-scleral interface (CSI). Given the vast range of machine learning model architectures and associated parameters, this work takes an important step towards understanding the optimal architecture and approach for choroidal boundary segmentation in OCT images. For comparison purposes the segmentation of the total retinal thickness was also evaluated. The outcomes of the approaches presented here are likely to aid in the future for the design and evaluation of machine learning-based OCT image analysis techniques.
Methods oct data. The dataset used consists of spectral domain OCT (SD-OCT) scans from a longitudinal study that has been described in detail in a number of previous publications 5, 6 . In this study, OCT scans were collected from 101 children at four different visits over an 18-month period. Approval from the Queensland University of Technology human research ethics committee was obtained before the study, and written informed consent was provided by all participating children and their parents. All participants were treated in accordance with the tenets of the Declaration of Helsinki. At each visit, two sets of six foveal centred radial chorio-retinal scans were taken on each subject, however, only the data from the first visit is used in this paper. The scans were acquired using the Heidelberg Spectralis (Heidelberg Engineering, Heidelberg, Germany) SD-OCT instrument using the enhanced depth imaging mode. To improve the signal to noise ratio, automatic real time tracking was used with 30 frames averaged for each scan. The acquired images each measure 1536 × 496 pixels (width x height). With a vertical scale of 3.9 µm per pixel and a horizontal scale of 5.7 µm per pixel which corresponds to an approximate physical area of 8.8 × 1.9 mm. These images were exported as bmp (lossless) images with other related data stored in an accompanying xml file, and subsequently analysed using custom software where an automated graph based method was used to segment three layer boundaries for each image. This segmented data was then assessed by an expert human observer who manually corrected any segmentation errors. The three layer boundaries within the labelled data include the outer boundary of the retinal pigment epithelium (RPE), the inner boundary of the inner limiting membrane (ILM), and the CSI. An example of the positions of these boundaries is shown in Fig. 1.
For computational reasons, only a subset of the dataset described above is utilised here. This consists of a single set of scans (six scans) for 99 participants from their first visit only. These participants are randomly divided into two sets; set A for neural network training and validation (50 participants, 300 B-scans in total) and set B for evaluation (49 participants, 294 B-scans in total). Within set A, an 80/20 split is used for training (40 participants, 240 B-scans) and validation (10 participants, 60 B-scans) with participants selected randomly for each. There is no overlap of participants between the training and validation sets or between sets A and B. Henceforth; ' A-scan' refers to a single-column of an OCT image while 'B-scan' refers to a full-size OCT image.
overview. The deep learning automatic segmentation methods considered in this work are comprised of two main types: patch-based and semantic segmentation. Each method involves a number of steps. Firstly, a set of OCT scans (set A) is used to train a neural network for patch classification (patch-based method) or for area segmentation on full-size B-scans (semantic segmentation method). Next, a second set of OCT scans is used to evaluate the network (set B). For each scan in set B, per-boundary probability maps are constructed by classifying each pixel in the scan (patch-based method) or segmenting the scan and then applying the Sobel filter (semantic segmentation method). In both cases, each probability map is then used to construct a graph, and a boundary position prediction is obtained by performing a shortest-path graph search. The following sections provide greater detail of the two methods while Fig. 1 illustrates the various steps involved in each. Some of the patch based methods have been presented elsewhere 32  patch-based networks. Convolutional neural network (CNN) architecture. Convolutional neural networks (CNNs) have had considerable use and demonstrated success for a range of image classification 48 , and segmentation tasks 49 . CNNs consist of a number of different layers with a set of parameters associated with each layer. Convolutional layers take a number of equal sized kernels (filters) which are convolved with the input and stacked together to produce an output. The parameters include: the kernel size (height × width), the stride lengths (vertical, horizontal), the quantity of zero-padding (top, bottom, left, right) applied to the input, and the number of kernels. Pooling layers takes a single window sliding step-by-step over the input. At each step, an operation is performed to pool the input to a smaller size. Such operations that are commonly used include max pooling (where the maximum value is taken from within the window), and average pooling (where the average of the values is taken). The parameters of this layer include: the window size (height × width), the stride (step) lengths (vertical, horizontal), the quantity of zero-padding applied to the input (top, bottom, left, right) and the pooling operation (max or average). Activation layers are used to introduce non-linearity into neural networks where the rectified linear unit (ReLU) 50 is a common choice for CNNs and has been shown to outperform other variants such as tanh and sigmoid 51 . Fully-connected (FC) layers are equivalent to convolutional layers where the kernel size is equal to the spatial size of the input and there is no zero-padding applied to the input. Two CNNs with a variety of different patch sizes and complexity are used within this work with the architectures listed in Supplementary Table S1. These include: the Cifar CNN (CNN 1) introduced by Fang et al. 20 , and the Complex CNN (CNN 2) presented by Hamwood et al. 30 , with variants for a range of patch sizes. Dropout for regularisation has not been used for the CNNs in this work, consistent with previous approaches 20, 30 .
Recurrent neural network (RNN) architecture. Recurrent neural networks (RNNs) have been widely applied to, and have shown to be useful for, problems involving sequential data such as speech recognition 52,53 , and handwriting recognition 54 . However, there are just a handful of examples of their application to images. To perform OCT image classification using a recurrent neural network, the architecture to be used here is that introduced by Kugelman et al. 31 . This network, partially inspired by the ReNet architecture 55 , possesses a number of parameters associated with each layer including: the direction of operation (vertical or horizontal), number of passes (1: unidirectional, 2: bidirectional), number of filters, dropout percentage and receptive field size (height, width). The size of the receptive field represents the size of the region of the input which is processed by the RNN at each step. The direction of operation corresponds to whether the RNN will process each row of a column (vertical) or each column of a row (horizontal) before moving to the next column or row respectively. A unidirectional layer will pass over the input only in a single direction (left to right or top to bottom) whereas a bidirectional layer will additionally pass over the input in the opposite direction (right to left or bottom to top) with the outputs for each pass concatenated along the feature axis. The number of filters in each layer indicates the depth of the output, with the addition of more filters enabling the network to learn an increased number of patterns from the input. The dropout percentage 56 corresponds to the number of units within a layer that are randomly turned off at each epoch. The RNN architecture used within this work is described in Supplementary Table S2.
Training. The Cifar CNN, Complex CNN and RNN networks are trained to perform classification using specific sized (height × width pixels) patches of the OCT images. Here, each patch is assigned to a class based on the layer boundary that it is centred upon, with classes constructed for each of the three layer boundaries of interest (ILM, RPE and CSI) as well as an additional background class (BG) for patches that are not centred upon any of the three layer boundaries. This is a similar procedure to that used in previous work 20, 30 . In their work, Fang et al. 20 , utilised 33 × 33 patches while Hamwood et al. 30 , extended upon this and, using 33 × 33 and 65 × 65 patch sizes, showed that utilising a larger patch size can improve performance. Kugelman et al. 31 also experimented with the patch size using 32 × 32 and 64 × 64 patch sizes as well as 64 × 32 and 32 × 64 sized rectangular patches. Of their tested sizes, the vertically oriented patch size (64 × 32) provided the best trade-off between accuracy and complexity in the context of retinal segmentation using RNNs. With this in mind, to assess the effect on choroidal segmentation, patches of various sizes including 32 × 32, 64 × 32, 64 × 64 and 128 × 32 (height × width pixels) are utilised with layer boundaries centred one pixel above and to the left of the central point.
Patches are constructed for training (~1,200,000 patches) and validation (~300,000 patches) from the data in set A. In each scan, three boundary patches and one random background patch are sampled from each column ensuring equally balanced classes. However, patches are only created within a cropped region of each scan (approximately 100 pixels from the left to 250 pixels from the right) due to the lack of true boundary locations present as a result of the optic nerve head as well as shadowing within this region for some scans. The Adam algorithm 57 with default parameters ( 8 is used for training to minimise cross-entropy loss with each network trained until convergence is observed with respect to the validation loss. No early-stopping is employed. Here, convergence is determined based on the inspection of the validation losses. No transfer learning is performed. Instead, each network is trained from scratch with weights initialised using small random values. Afterwards, the model with the highest validation accuracy (percentage of patches correctly classified) is chosen for evaluation. It should also be noted that no learning rate schedule is used.
Semantic segmentation networks. Architecture. Semantic segmentation network architectures have evolved over time with a number of modifications proposed. Supplementary Table S3 summarises some of the key features presented, which are used to inform the choice of network architectures in this study. Building upon previous work 58,59 in the area of semantic segmentation using fully-convolutional neural networks, the U-Net 60 was proposed for biomedical image segmentation. Architectures based on the U-Net have been used previously for OCT retinal segmentation 22,23,31 , and as such, a similar standard U-net architecture (referred to as 'Standard') will be used in this work, along with a number of modified variants to assess the potential for performance improvement in choroidal segmentation. These modifications include the incorporation of residual learning 61-64 (referred to as 'Residual'), the replacement of the bottleneck with RNN layers 65 (referred to as 'RNN bottleneck'), and the addition of squeeze-excitation blocks [66][67][68] (referred to as 'Squeeze + Excitation'). Additionally, the combination of all three modifications is also considered (referred to as 'Combined'). There are three squeeze and excitation block variants considered: spatial squeeze and channel excitation (cSE), channel squeeze and spatial excitation (sSE) and concurrent spatial and channel squeeze and channel excitation (scSE). Note that the 'Combined' network utilises the 'scSE' squeeze and excitation block variant. An illustration of each architecture used is provided in Fig. 2. Note that, in each network, convolutional layers incorporate zero-padding such that the input and output of each are the same size and no cropping is required. Batch-normalization 69 , is utilised at the input to each rectified linear unit in an effort to enhance training performance. A dropout of 50% 56 , is used at the output of the bottleneck of the network for regularisation. Each network used consists of four pooling layers and four up sampling layers. The first layer contains eight filters with this number doubled at each subsequent pooling layer and halved in a similar fashion for each up sampling layer.
Training. Each of the networks illustrated in Fig. 2 and described above are trained to perform semantic segmentation on full-size OCT images. To do this, a network is tasked with classifying each pixel in an image into one of four area classes. These area classes are defined as the vitreous (top of the image to ILM), retina (ILM to RPE), choroid (RPE to CSI) and sclera (CSI to bottom of the image). Therefore, each image has an associated area mask which is the target output for the FCNs. As described in set A in the data, 240 full-size OCT images are used for training while a separate 60 images are used for validation. For each column where at least one true boundary location is not present in the data (normally associated with shadows at the edge of some images), the corresponding column in the area mask is set to be the top area class (vitreous) and the same column in the image is zeroed. Due to the relatively small number of images, the data was augmented using horizontal flips (left to right/right to left). For each epoch, each image was randomly flipped horizontally with a 50% chance.
= . = × − ( 0001, 09, 0 999,  is used for training to minimise the sum of cross-entropy loss and Dice overlap loss 70 . This loss combination is similar to that used in previous work 22  image pre-processing. The choroid is a vascular layer of the eye. Its vascular nature, combined with the fact that is located behind a hyper-reflective layer (RPE), means that the contrast and visibility of the posterior boundary tends to be weak. The use of OCT image contrast enhancement techniques 71 , also known as attenuation coefficients 72 , was therefore considered in this work since it may improve the visibility of the boundaries, especially for the CSI, and also reduces the effect of shadows caused by the retinal blood vessels. This method has been used previously for improving visibility of the CSI 73 . The technique works under the assumption that local backscattering can be related to that of the corresponding attenuation, and therefore can be compensated. In this work the effect of the attenuation compensation was tested with two different network-input options; the standard OCT intensity image and the contrast enhanced (attenuation coefficient) equivalent. Boundary prediction and model evaluation. Given a scan and a trained network, probability maps for each of the boundaries can be calculated. For a patch-based method the probability maps are obtained by classifying patches centred on each pixel in the scan 20 . For a fully-convolutional method, the boundary probability maps are acquired by applying the Sobel filter to the area probability output of the FCN 37 . In both cases, the boundary positions may then be delineated by performing a graph search using Dijkstra's shortest path algorithm 74 , where each pixel in the probability map corresponds to a vertex in the graph. This is inspired by the approach originally used by Chiu et al. 33 . Directed edges associated with each vertex are connected to neighbouring vertices to the immediate right (horizontally, diagonally above and diagonally below). To remove the need for manual start and end point initialisation, columns of maximum probability vertices, connected top to bottom, are appended to each end of the graph, with additional left to right connections made to the existing graph as required. The edge weights between each pair of vertices are determined by the respective probabilities and are given by Eq. (1).:  www.nature.com/scientificreports www.nature.com/scientificreports/ expert human observer), from which the Dice overlap percentage is calculated for the four regions of interest, including the vitreous, retina, choroid, and sclera, as well as the mean pixel error and mean absolute pixel error (for the ILM, RPE and CSI) for each scan. Because the patch-based networks do not output area maps, Dice values cannot be calculated directly from the network output. Due to this and for the purposes of consistency between the methods, all Dice overlap values are calculated post-segmentation. Note that these values will be greater than Dice values obtained directly from the network output (in the semantic segmentation case) for cases where misclassifications do not affect the boundary errors.
In an effort to obtain a fair indication of the performance of the models, the full-width scans are used for input to the networks with a graph search performed on the corresponding full-size probability map. However, final error calculations and comparisons are only performed on a cropped region of all scans (approximately 100 pixels from the left and 250 pixels from the right) due to the presence of artefacts with this region (i.e. optic nerve head and shadows).

Results
patch-based method results. The Cifar CNN (CNN 1), Complex CNN (CNN 2) and RNN networks were trained using 32 × 32, 64 × 32, 64 × 64, and 128×32 patch sizes. All networks were additionally trained with contrast enhanced images for each patch size. The results for the dice overlap are summarised in Supplementary  Table S4 and the boundary position errors in Table 1. For reference, evaluation is also performed with an automatic non-machine learning graph-search image-processing segmentation method, referred to below as automatic baseline 37 on the same set of data (set B). Figure 3 illustrates results from a single example scan evaluated using an RNN. To assess the effects of the different architectures, patch size and the use of contrast enhancement on segmentation performance, a repeated measures ANOVA was also performed to examine the statistical significance of the differences in the mean absolute boundary errors associated with these factors. The networks converged in an average of 4.31 ± 5.54 epochs with a range of 2-20 epochs.
All patch-based methods perform comparably on the vitreous with mean dice overlaps of approximately 99.80% and standard deviations between 0.05 and 0.20 (Supplementary Table S4). For the retina, the dice overlaps of all machine learning methods were again comparable and ranged between 99.19% and 99.41% with standard deviations between 0.10 and 0.20. Overall, the machine learning methods performed noticeably better than the automatic baseline on the retina. The results for the sclera and retina translate directly to the similarities observable for the ILM and RPE boundary position errors with similar mean absolute errors for all methods of approximately 0.50 pixels for the ILM and between 0.46 and 0.77 pixels for the RPE.
Although the difference in performance of the methods on the ILM and RPE boundaries is marginal, there were statistically significant differences between some of the methods. The RNN yielded significantly smaller mean absolute errors (p < 0.01) compared to the other two architectures for both the ILM and RPE boundaries. In addition, a lack of contrast enhancement provided significantly lower error (p < 0.01) for the RPE, while there was no significant effect of contrast enhancement for the segmentation performance for the ILM. In terms of patch size, for the ILM boundary, 32 × 32 patches yielded significantly lower error (p < 0.01) than 128 × 32 patches but were not significantly different to the 64 × 32 or 64 × 64 variants. For the RPE boundary, 32 × 32 and 128 × 32 patches both showed significantly lower error (p < 0.01) than 64 × 32 and 64 × 64 patches, however there was no significant difference between 32 × 32 and 128 × 32 patches (p > 0.05).
The dice overlaps for both the choroid and sclera as well as the boundary position error for the CSI showed greater variability between the various methods. Here, the architecture, patch size and effect of contrast enhancement all exhibited statistically significant effects on performance. Overall, the RNN architecture exhibited the lowest error on the CSI boundary with an average of 3.64 pixels (average of the eight methods) mean absolute error compared to 3.74 and 3.97 pixels for the Cifar and Complex CNN respectively, which was statistically significant (p < 0.01). Using contrast enhanced images also yielded significantly lower CSI boundary mean absolute error overall with an average of 3.53 pixels compared to 4.12 pixels without (difference of 0.59 pixels) (p < 0.01). Of the patch sizes, the 64 × 64 showed the lowest error with an average CSI mean absolute error of 3.55 pixels. This was significantly lower (p < 0.01) than the 32 × 32 (4.24 pixels) and 64 × 32 (3.86 pixels) patch sizes but not significantly different to the 128 × 32 (3.66 pixels).
For a complete comparison of all the patch-based methods, the per B-scan evaluation time (speed) and number of network parameters (complexity) is reported against the CSI boundary mean absolute error for each architecture. A complete visual comparison of each method's performance is provided in Fig. 4. It is evident that the RNN architecture is the simplest (fewest parameters) but also the slowest (longest per B-scan evaluation time) while the Cifar CNN was the fastest and the Complex CNN possessed the most parameters.
Semantic segmentation method results. Each of the semantic segmentation networks depicted in Fig. 2 were trained and evaluated as described in the Methods section. Like the patch-based methods, all networks were trained and evaluated using contrast-enhanced images in addition to the raw images. Results for the dice overlap are presented in Supplementary Table S5 while the boundary position errors are reported in Table 2. Using the mean absolute boundary errors, a repeated measures ANOVA was performed to examine the statistical significance of any differences in performance between the methods. Figure 5 presents some example segmentations using the standard U-net architecture (without contrast enhancement). The networks converged in an average of 77.57 ± 18.46 epochs with a range of 34-98 epochs.
The results for the dice overlap are similar across all semantic segmentation methods for all regions. The difference between the best and worst performing methods was small with just 0.02% difference for the vitreous, 0.06% for the retina, 0.18% for the choroid, and 0.09% for the sclera. A similar trend is observable for the mean absolute boundary position errors with a difference of just 0.05 pixels mean absolute error observed between the best and worst performing methods on the ILM and RPE boundaries. There was slightly more variability in the results for www.nature.com/scientificreports www.nature.com/scientificreports/ the CSI with a range of 0.33 pixels mean absolute error. Notably, all machine learning methods performed substantially better than the automatic baseline on the RPE and CSI with respect to both accuracy and consistency with a relatively smaller improvement observed on the ILM.
Overall, there were no statistically significant effects of architecture or contrast enhancement for the mean absolute errors of the ILM and CSI boundaries. For the RPE boundary, the standard architecture yielded the lowest average mean absolute error which was significantly lower (p < 0.01) than that of the RNN bottleneck, sSE and scSE architectures. However, the difference in errors was small for each of these (<0.05 pixels). Contrast enhancement also had a significant effect (p < 0.001) with smaller mean absolute boundary errors for the RPE but the improvement was small (<0.02 pixels).

Discussion
This paper has examined a number of supervised deep learning methods for the task of retinal and choroidal segmentation in OCT images. Here, both patch-based methods and semantic segmentation methods were considered with each compared to an automatic baseline method. The effect of patch size (for the patch-based methods), network architecture and contrast enhancement were analysed. The deep learning methods gave superior performance on all boundaries compared to a standard image analysis method used as a baseline. Overall, the findings suggest that all machine learning methods exhibit similar accuracy and good performance on the retinal layers (ILM and RPE) while performance on the CSI showed more variability between methods. This is likely linked to the well-defined ILM and RPE boundaries in comparison with the CSI. This relative performance between the boundaries is illustrated in Fig. 6.
For the patch-based methods; changes in architecture and patch size as well as the use of contrast enhancement had a significant effect on the CSI boundary error. Contrast enhancement reduced the CSI mean absolute error as a result of the additional emphasis applied to the boundary. The performance benefit of increasing the patch size can be attributed to the additional context available around each pixel, allowing the networks to more easily classify each individual patch. In terms of architecture, the RNNs exhibited lower CSI errors than the corresponding CNNs, in general. Despite possessing the fewest parameters, the RNNs were considerably slower than the CNNs due to the large number of operations required to pass over the images pixels sequentially. www.nature.com/scientificreports www.nature.com/scientificreports/ For the semantic segmentation methods; the change in architecture and the use of contrast enhancement had less noticeable effects on the CSI with just 0.33 pixels mean absolute error separating the best and worst performing methods. In contrast, the corresponding range of the patch-based methods was 1.82 pixels. Overall, the semantic segmentation methods performed comparably to one another in terms of accuracy, evaluation speed as well as complexity. However, compared to the patch-based methods, they performed noticeably better on the CSI boundary with a mean absolute error for the best performing method of 2.53 pixels compared to 3.23 pixels. This  improvement can be attributed to the additional context available to the network as the whole image is processed at once. The semantic segmentation methods were also considerably faster, taking approximately 20 seconds per B-scan as opposed to approximately 35-240 seconds for the various patch-based methods. Figure 4 illustrates a comparison between the patch-based methods and the semantic segmentation method (using the standard U-net architecture). The comparison shows that for OCT image segmentation, patch-based methods are not of significant benefit given the slower evaluation and higher error.
It is worth noting that the different architectural changes introduced for the semantic segmentation did not show a significant effect on the performance. This is possibly due to the lack of overall depth (number of layers) of the network architecture. In particular, residual networks were introduced to improve the performance of very deep networks and potentially have minimal impact otherwise. Additionally, it is possible that the performance  here is not limited by the architecture. For example, the performance may be constrained by the richness of the data, the loss function and/or the optimizer used among other aspects. There exists a vast number of possible combinations of parameters (architectural and otherwise) that can be tested, far too many than are feasible to include in this work. Future work may extend the findings here and investigate other changes in the methodology. For instance, activation functions such as Leaky ReLU 75,76 , Parametric ReLU 75,77 , Randomized Leaky ReLU 75 , and Flexible ReLU 78 have been proposed as improvements to the standard ReLU and may be considered. Loss functions such as Tverksy loss 79 may be used to address data imbalance while a loss function may be designed or modified with the goal to better discriminate boundary transitions 22 . Given the promising performance of Adam, variants including Nadam 80 and Adamax 57 may be useful alternatives for training while additional performance may be gained from optimally tuning the dropout values 56 . Other parameters such as kernel size, number of convolutional layers and number of pooling layers may also be considered. For instance, ReLayNet 22 utilised a single 7 × 3 convolutional block for each of three pooling layers while Venhuizen et al. 23 utilised two 3 × 3 blocks for each of six pooling layers.
Given the low error and high consistency on retinal boundaries such as the RPE and ILM, future work in the area should focus on the more challenging CSI boundary. In particular, methods utilising semantic segmentation seem promising and appear to provide superior accuracy and speed to a patch-based approach. For volumetric data, this idea can be extended by including adjacent B-scans to introduce additional context 81 . There is also potential benefit in improving or even replacing the graph search component of these methods. Ideally, an end-to-end ML approach could be adopted which outputs per-boundary positions or, to ensure correct layer topology, the thicknesses of each layer 82 . Another option to consider is transfer learning 83,84 , using pre trained weights, which may help to improve performance particularly in the case of insufficient data. Additional augmentations (e.g. rotations, noise, contrast) may also be used to build a richer training set. The findings presented here may be used to inform future work in the area of chorio-retinal boundary analysis in OCT images. Future studies should explore how these methods will perform in other OCT modalities, particularly swept-source OCT that has demonstrated a superior performance to visualize the deeper choroidal layer 85 compared to spectral domain OCT used in this study. It is worth noting that the images used in the current study are from young healthy participants, and therefore further work is required to examine these segmentation methods in cases of ocular pathology and in older populations.
Since most of the commercially available OCT instruments do not provide methods for automatic choroidal segmentation and the use of deep learning methods for choroidal segmentation is still largely unexplored, this work demonstrates the potential of these techniques and the advantage (superior performance) over standard image analysis methods. Thus, the methods presented here are likely to have a positive impact on clinical and research tasks involving OCT choroidal segmentation.

Data Availability
The datasets analysed during the current study are currently not publicly available. However, the algorithms developed in this work are available from the corresponding author on reasonable request. Figure 6. Accuracy comparison for the three boundaries of interest. The range of mean absolute errors for all machine learning methods is shown for each boundary (range indicated by each coloured box). RPE and ILM boxes contain both sematic and patch-based methods, while CSI has two separate boxes for each of the methods. Each boundary is compared to the automatic baseline method indicated by a solid black line along the same row.