Practicable assessment of cochlear size and shape from clinical CT images

There is considerable interpersonal variation in the size and shape of the human cochlea, with evident consequences for cochlear implantation. The ability to characterize a specific cochlea, from preoperative computed tomography (CT) images, would allow the clinician to personalize the choice of electrode, surgical approach and postoperative programming. In this study, we present a fast, practicable and freely available method for estimating cochlear size and shape from clinical CT. The approach taken is to fit a template surface to the CT data, using either a statistical shape model or a locally affine deformation (LAD). After fitting, we measure cochlear size, duct length and a novel measure of basal turn non-planarity, which we suggest might correlate with the risk of insertion trauma. Gold-standard measurements from a convenience sample of 18 micro-CT scans are compared with the same quantities estimated from low-resolution, noisy, pseudo-clinical data synthesized from the same micro-CT scans. The best results were obtained using the LAD method, with an expected error of 8–17% of the gold-standard sample range for non-planarity, cochlear size and duct length.

Evaluation metrics were based on Dice similarity coefficients and surface errors, so it is unclear how well this method can estimate surgically relevant parameters. Kjer et al. 23 developed a similar approach using a statistical deformation model, reporting measurement accuracy and precision for cochlear length, width and height in addition to surface errors, but with no consideration of vertical trajectories. van der Jagt et al. 24 describe an automatic, three-dimensional tracing method that was used to estimate inner and outer wall radii, duct diameter and vertical trajectory in low-resolution CT scans of 242 patients. Significant variation was observed in the cohort, but there was no validation against gold-standard measurements. Iyaniwura et al. 10 present a method to fit a grayscale cochlear atlas to low-resolution, clinical CT data using sequential landmark, affine and B-spline registration. Evaluation was performed using 20 specimens scanned at micro-CT and clinical CT resolutions. Gold-standard, micro-CT "A-values", which correlate to some degree with insertion depth angle 13 and cochlear duct length 25 , were compared with A-values derived from the fitted atlas and also A-values estimated by experts on the clinical CT images. There was no consideration of vertical trajectories. Heutink et al. 26 used 123 ultrahigh-resolution clinical CT scans to train, validate and test a novel deep learning approach to cochlea localization, segmentation and analysis. Errors were calculated between automatic measurements of cochlear volume, duct length and basal lumen diameter, and corresponding ground truth measurements obtained manually. Again, there was no consideration of vertical trajectories.
In this work, we describe a fast, simple and freely available method to fit surface models of the otic capsule to CT data. The fitting may be directed by a statistical model, in the spirit of Noble et al. 22 and Kjer et al. 23 , or constrained only by a smoothness criterion, in the spirit of Iyaniwura et al. 10 . We compare the performance of the two approaches, with specific reference to three-dimensional, surgically relevant measurements like those considered by van der Jagt et al. 24 . Particular emphasis is placed on an improved metric for characterizing the different vertical trajectories first described by Avci et al. 15 . Validation is by way of pseudo-clinical CT data synthesized from the original, gold-standard micro-CT images.

Materials and methods
Temporal bone specimens and micro-CT scanning. A convenience sample of 18 human temporal bones was provided by the Human Anatomy Centre at the Department of Physiology, Development and Neuroscience, University of Cambridge, who approved their use in this study. All experiments were performed in compliance with the UK Human Tissue Act 2004 (licence no. 12146). The donors had given informed consent for the use of their bodies for anatomical research. The specimens were scanned using a Nikon Metrology XT H 225 ST micro-CT scanner (Nikon Metrology NV, Leuven, Belgium) at 125 kV , 120 µA , 1080 projections, 2 frames per projection and 1 s exposure time. Reconstruction was at an isotropic voxel resolution of around 25 µm , apart from specimen #18, which was a larger bone section reconstructed at 61 µm . The bones were a mixture of left and right sides and were all unimplanted apart from specimen #17, which was implanted before scanning (as part of a different study) and whose scans therefore suffered significant beam-hardening artefacts.
Construction of the template and statistical shape models. Figure 1, steps 1 and 2, show how the 18 micro-CT scans were segmented and the otic capsules represented as triangulated surface meshes. A template surface was then constructed as follows. One of the 18 specimens was selected, by eye, as being the most "average". The chosen mesh was registered to all 18 specimens, and the mean deformation was calculated and applied at each vertex, producing a mean otic capsule surface. This surface was re-triangulated to a reasonable resolution (11,145 vertices, sufficient to capture the shape without an excessive number of vertices that would only add to the computational complexity of surface registration), the resulting mesh serving as the template for all remaining experiments in this paper. Steps 3 and 4 show how the template mesh was then registered to each specimen using the sliding semilandmark algorithm, originally developed for planar morphometry 27,28 and subsequently extended to surfaces 29 . Segmentation and mesh construction were performed using Stradview (mi.eng.cam.ac.uk/Main/StradView), while surface registration was carried out in wxRegSurf (mi.eng.cam. ac.uk/~ahg/wxRegSurf).
Following registration, the n = 18 sets of deformed template vertex coordinates were standardized for location, orientation and scale using Procrustes analysis 30 . This involves translating each specimen to a common origin, scaling to unit centroid size, and then rotating to minimize the sum of the squared distances between the vertices of each specimen and the undeformed template mesh. We then rescaled each specimen's vertex coordinates by its centroid size, and used principal component analysis to build a point-based SSM from the resulting n sets of coordinates. Let X i be the 33435-element vector formed by concatenating the coordinates of individual i, and let X = 1 n n i=1 X i . Then the principal modes of shape variation are the n − 1 eigenvectors m i of the sample covariance matrix 1 n−1 n i=1 (X i −X)(X i −X) T with corresponding non-zero eigenvalues. In SSM-based segmentation, the surfaces of new specimens are encouraged to take anatomically plausible shapes by representing the mesh as a linear combination of the shape modes where S i are referred to as shape coefficients. The shape model is available for free download as part of the Stradview package.
Synthesis of pseudo-clinical CT data. The micro-CT data, and the otic capsules segmented from them, provide the gold-standard measurements for the experiments in this paper. For the clinical measurements, we synthesized pseudo-clinical CT images from the micro-CT data. We achieved this by downsampling the micro- www.nature.com/scientificreports/ CT until the desired clinical resolution was achieved. We then projected the downsampled data into the CT detector space, in effect recovering the sinogram, added Gaussian noise to the sinogram, and then backprojected the noisy data into the world space. This processing was performed using wxDicom (mi.eng.cam.ac.uk/Main/ GMT_wxDicom). We synthesized three different classes of pseudo-clinical data, which we shall refer to as standard multidetector CT (MDCT) (isotropic voxel dimension 0.3 mm ), poor MDCT (isotropic voxel dimension 0.45 mm ) and next-generation cone beam CT (CBCT) (isotropic voxel dimension 0.15 mm ): see Fig. 2. The level of Gaussian noise was adjusted by trial and error until the results resembled reference images from the literature. For example, the standard MDCT image in Fig. 2b resembles the exemplar clinical image in Fig. 1c of Phillips et al. 31 , while the poor MDCT image in Fig. 2c is noticeably worse. The next-generation CBCT image in Fig. 2a is superior to those currently found in clinical practice, but resembles the state-of-the-art research images in Zou et al. 32 .
Fitting the model to CT data. Figure 3 shows the process of fitting the otic capsule model to new CT data.
The data in Fig. 3 is pseudo-clinical CT data, though the method is equally applicable to micro-CT data. The model-fitting process is designed to be clinically practicable, in that it requires around 1 min of expert interaction, followed by several minutes of computation.
The first step is to position the template surface at approximately the correct location, by manually identifying three point landmarks in the data: the cochlear apex, the centre of the oval window and the posterior-anterior canal bifurcation (Fig. 3, step 1). Stradview then computes the similarity transformation (rotation, translation and isotropic scaling) that best aligns these three points in the data with corresponding points predefined on the template mesh (Fig. 3, step 2). The operator can optionally reflect the template in the plane of the three points, if the left-right fit was incorrect. The final manual interaction is to select an appropriate grayscale threshold to segment the boundary of the otic capsule (Fig. 3, step 3). The thresholded contours define the point cloud (Fig. 3, step 4) to which the model is now fitted automatically.
An initial, approximate alignment is computed using the iterative closest point (ICP) approach of Besl and McKay 33 . This approximate alignment is parameterized by a second similarity transformation (Fig. 3, step 5). There follows a further iterative process to compute the additional, local displacement of each template vertex (Fig. 3, step 6). Since the thresholded data is noisy (structures other than the otic capsule are captured, and some of the boundaries of the otic capsule, especially at the inner wall and the round and oval windows, are lost), this  1) The micro-CT scans were segmented in Stradview by simple thresholding followed by manual tidying up of the contours. (2) Stradview was then used to construct triangulated surface meshes of each specimen. (3) Since each mesh has a different number of triangles and vertices, the next step is to align a common template mesh (red) with each specimen (green). This allows statistical analysis of the deformation at each of the template's vertices, and subsequent construction of an SSM. (4) The alignment involves translation, rotation, isotropic scaling and (if necessary) reflection, followed by a nonrigid thin-plate spline deformation. The deformation was computed in wxRegSurf using the sliding semilandmark algorithm. www.nature.com/scientificreports/ nonrigid registration must be regularized, to prevent over-fitting of the model to the noise. Stradview offers two methods for regularized, nonrigid registration. The first is the locally affine registration algorithm of Feldmar and Ayache 34 . Associated with each vertex k of the template is a set of neighbouring vertices N k , where each member of N k lies within a distance d of vertex k. At iteration i, every vertex on the template is paired with the closest point in the cloud. Then, for each vertex k on the template, the rigid transformation R k,i is found that minimizes the sum of the squared distances between the transformed vertices in N k and their partners in the cloud. The local displacement of vertex k is then calculated using a proximity-weighted average of all the rigid transformations R k,i within N k . At iteration i + 1 , the closest neighbours and consequent rigid transformations R k,i+1 are recomputed, and so on, until convergence. d is the algorithm's only parameter, its effect being to control the amount of allowable deformation. Smaller values of d permit more deformation and closer alignment of the template to the point cloud, while larger values of d favour smooth displacement fields over alignment accuracy. We shall refer to this algorithm using the acronym LAD (Locally Affine Deformation).
In Stradview's second method, the nonrigid deformation is governed by the SSM, with the template's vertices constrained according to Eq. (1). The registration again proceeds within an ICP framework. Each of the template's vertices is paired with the closest point in the cloud. Then, the SSM shape coefficients S i are found that minimize the sum of the squared distances between the deformed template vertices and their partners in the point cloud. At iteration i + 1 , the closest neighbours and consequent shape coefficients S i are recomputed, and so on, until convergence. This algorithm is parameter-free, since we use all the available SSM modes in Eq. (1). Figure 4 summarises the three measurements we make on the cochlear surfaces, to compare the similarity of the meshes fitted to pseudo-clinical CT data with their gold-standard counterparts. All measurements are made on the curve that delineates the cochlear outer wall, with particular emphasis on the first 270 • of the basal turn, a range that covers the most common sites of insertion trauma [5][6][7][8] . In a one-off process, the outer wall contour was traced on the template mesh by means of a semi-automatic heuristic requiring an expert user to click on points defining the cochlear apex, the centre of the round window and the coiling axis. A contour was then followed automatically from the apex to the round window, passing through those points on the mesh where the surface normal is perpendicular to the coiling axis. The contour was then divided into 100 equal intervals, producing the 101 outer wall points shown in Fig. 4.

Clinically relevant shape and size measurements.
The first measurement, which we shall refer to as "reach" (and is comparable with common measures of cochlear size, including the "A-value" of Escudé et al. 13 ), is the distance from the round window to the furthest point on the first 270 • of the curve, as defined in Fig. 4. The second is the total cochlear duct length, measured along the outer (lateral) wall, as is common for image-based estimation of this quantity 21 . The third concerns the cochlea's vertical trajectory, in which a down-then-up "rollercoaster" profile was identified by Avci et al. 15 as a potential risk factor for insertion trauma. However, Demarcy et al. 35 observed that vertical trajectories are sensitive to the definition of "vertical", which is normally taken to be the modiolar axis 15 . We further explore this point in Fig. 5, which shows two vertical trajectories of the same cochlea, with the vertical axis defined by the modiolar axis in (a) and the normal to the basal plane in (b). We note not only the sensitivity to the vertical direction, but also that the rollercoaster profile in (a) does not necessarily imply a challenging insertion.
We therefore propose an alternative way to characterize the vertical trajectory. We define the "basal plane" as the best fit plane to the first 270 • of the outer wall contour. Vertical trajectories are measured along the normal to this plane, as in Fig. 5b, thus avoiding any sensitivity to the less germane anatomy of the middle and apical turns. Having established a reliable vertical trajectory, Fig. 4 illustrates how we summarise the "non-planarity" www.nature.com/scientificreports/ of the basal turn as the mean absolute distance between the first 270 • of the outer wall contour and the basal plane. The hypothesis is that cochleas with lower non-planarity are less susceptible to insertion trauma than those with higher non-planarity. The consensus approach to cochlear coordinate systems 36 is somewhat ambiguous, in that the basal plane is assumed to be perpendicular to the modiolus. By anchoring our coordinate system to the basal plane and not the modiolus, we break from the usual interpretation and, unfortunately but inevitably, hinder comparability with previous studies. www.nature.com/scientificreports/

Experiments, results and discussion
The otic capsule model was fitted to each of the 54 pseudo-clinical scans (18 specimens, 3 different resolutions) at grayscale thresholds of 160, 170 and 180. The threshold of 170 was observed to produce visually appropriate segmentations in most cases, with ±10 re-runs to assess sensitivity. For each data set at each threshold, the model was fitted three times: using the "full" SSM, trained using all 18 micro-CT data sets; using a "leave-one-out" SSM, trained using 17 of the micro-CT data sets, but not the specimen on which it was being evaluated; and using the LAD method, with a fixed parameter d = 5 mm.
The full SSM provides an upper bound on SSM performance, with an effectively perfect model and fitting compromised only by the image resolution and detector noise. In contrast, the leave-one-out results are indicative of expected performance on unseen specimens with a model trained using only 17 exemplars. Since the LAD method does not require training, the results presented here are expected to generalise to new specimens without gross malformations. The parameter d = 5 mm was chosen since it produced visually plausible nonrigid deformations in all cases, without over-fitting to noise. d = 5 mm represents a high degree of regularization, as befitting the low-resolution, noisy, pseudo-clinical data: considerably smaller values of d would be preferable when fitting to micro-CT data. 50 ICP iterations were used throughout. Figure 6 illustrates the LAD method's ability to recover the vertical trajectories of the least and most nonplanar cochleas, when applied to the standard pseudo-clinical MDCT data. Sensitivity to the grayscale segmentation threshold appears to be reasonable. Figure 7 shows the non-planarity, reach and duct length results for standard . Clinically relevant measurements. In evaluating the success or otherwise of the model fit, we consider the cochlea's non-planarity, reach and duct length. The non-planarity and reach measurements are made on the first 270 • of the basal turn (red), ignoring the rest of the spiral (blue). Cochlear duct length is measured from the round window to the apex, along the outer (lateral) wall. The basal plane (black dotted line) is defined as the best fit plane to the first 270 • of the outer wall contour. Since we do not detect the modiolus in this study, the 270 • angle is not measured in the usual polar coordinate system defined by the modiolus (origin) and the round window ( 0 • ). Instead, we consider the angle through which the tangent to the outer wall contour has turned with respect to its initial trajectory at the round window. 270 • in this paper's notation corresponds to somewhat more than 270 • in round window/modiolar polar coordinates.  , a). However, estimation of this axis is neither straightforward nor (in this context) helpful, since "rollercoaster" height profiles that go down and then up may nevertheless correspond to planar insertion trajectories that present little risk of trauma. An alternative is to replace the modiolar axis with the normal to the best fit plane through the first 270 • of the basal turn (solid line, b). In this coordinate system, the "height" axis corresponds to deviation from the best fit plane, and it is clear at which point the insertion trajectory becomes nonplanar and potentially traumatic to the cochlear structures. www.nature.com/scientificreports/ www.nature.com/scientificreports/ MDCT, with the gold-standard measurements on the x-axis and the measurements derived from the pseudoclinical images on the y-axis. Similar graphs for next-generation CBCT and poor MDCT are omitted here for concision, but may be scrutinized in Gee et al. 37 : they show the expected degradation in performance with lower resolution, more noisy data. Also as expected, the full model performs significantly better than the leave-one-out model. The specimen with the implanted electrode is identified by half-sized markers and is a frequent outlier, since the scans suffered from beam-hardening artefacts that corrupted the thresholded point cloud. Performance might be improved by preprocessing such scans to suppress these artefacts, for instance using the freely available software of Treece 38 , though true clinical scans would be required to test this hypothesis. The full set of results for all three pseudo-clinical resolutions is summarised in Table 1, omitting the specimen with the implanted electrode. The tabulated numbers are the mean absolute error expressed as a percentage of the gold-standard sample range. Thus, for example, when applied to standard MDCT data, the LAD method produces reach estimates with an expected error of around 12% of the gold-standard range (maximum minus minimum). For comparison, the corresponding errors for the unfitted model (i.e. measured directly on the template without fitting to the individual) are 24.0% for non-planarity, 22.9% for reach and 24.1% for duct length. Table 2 quantifies the segmentation accuracy of the 17 non-implanted specimens thresholded at a grayscale value of 170. For comparison, the average vertex error for the unfitted model is 0.185 mm . While such results are difficult to interpret from a clinical perspective, they do allow tentative comparison with the work of Noble et al. 22 , who achieved average vertex errors of around 0.2 mm (fitted) and 0.27 mm (unfitted). Considerable caution is required though, since Noble et al. 22 segmented the scala tympani, while Table 2 is for the entire otic capsule. There are also likely differences in the way the gold-standard and clinical coordinate systems were aligned. Kjer et al. 23 reported mean surface errors of 0.11 mm for the cochlear scalae.
On the basis of these results, the LAD approach would appear to offer a practicable way to estimate clinically relevant anatomy of the human cochlea from standard, clinical MDCT. Analysis of one cochlea requires around 1 min of expert interaction followed by several minutes of computation. The expert does need to exercise reasonable care when selecting the segmentation threshold: the one outlying result for the LAD method, for non-planarity at a threshold of 180 with the poor MDCT data, was due to this threshold failing to capture part of the outer wall in many of the scans.
At the central threshold of 170, the LAD approach is able to estimate cochlear reach with a mean absolute error of 11.5% of the gold-standard sample range, or 1.16% ± 0.88% (mean ± one standard deviation) of the gold-standard values. This compares favourably with the method of Iyaniwura et al. 10 , where the absolute error in comparable "A-value" estimates was 2.7% ± 2.1% of the gold-standard values. Kjer et al. 23   www.nature.com/scientificreports/ At the same threshold of 170, cochlear non-planarity was estimated with an average absolute error of 10.6% of the gold-standard sample range. This is a novel metric that we suggest might correlate with the risk of insertion trauma, and may be more reliable than the "rollercoaster" classification of Avci et al. 15 , which is sensitive to estimation of the modiolar axis 35 . While van der Jagt et al. 24 demonstrated automatic estimation of cochlear vertical trajectories from clinical CT scans, to the best of our knowledge this is the first study to validate such measurements against micro-CT gold standards.
The core LAD experiments in this paper were all conducted with the same template, d = 5 mm and 50 ICP iterations, these parameters corresponding to convenient Stradview settings that produced visually plausible deformations and apparently full convergence. Figure 8 and the error bars in Fig. 7 give an indication of parameter sensitivity, using a different template that is independent of each test specimen, double the number of iterations, and both smaller and larger values of d. The only material sensitivity appears to be with d, where smaller values allow more local deformation to which the non-planarity metric, though not the general form of the vertical trajectory, is sensitive. The metric is a blunt tool for characterizing trajectories, conceivably penalising a series of small, local ripples more than a single, significant hurdle.
A limitation of this study is the focus on just three specific shape and size descriptors. Two other measurements were attempted: lumen area 37 and the φ angle between the plumb line of the round window and the tangent to the inner wall of the basal turn 18 , of interest since it constrains the initial insertion and bending angles of the electrode. Neither measurement was successful. For example, at the central threshold of 170, the LAD approach was able to estimate lumen area with a mean absolute error of 18.0% of the gold-standard sample range, which barely improves on the corresponding error of 18.8% for the unfitted model. The prognostic differentiating factor between successful and unsuccessful measurements was involvement of the cochlear inner wall in the latter. Clinical CT contrast at the inner wall is significantly worse than at the outer wall, resulting in poor model fitting around the modiolus. We conclude that anatomical measurements involving the cochlear inner wall are currently infeasible with this methodology.
A further limitation of this study is the use of pseudo-clinical data for the low-resolution model fitting. Real MDCT scans of the temporal bones would arguably have provided a more sound basis for the work, but they were not available. That said, real MDCT data is no panacea: a dissected temporal bone imaged in a clinical MDCT scanner would not appear identical to the same bone scanned intact in a living human being. A reassuring indicator of the validity of the present study is the failure to estimate lumen area or any metric involving identification of the cochlear inner wall.
In comparison with the LAD method, the performance of the SSM approach was disappointing. It was a failure to generalise that limited the SSM's efficacy in the present study, as evidenced by the relative performance of the full and leave-one-out models. Improved performance may be achievable through more sophisticated grayscale modelling 23 , or by using more training data, or by constraining the shape coefficients to their variation in the training sample, or by limiting the model to the cochlea alone 22,23 . That said, a benefit of including the canals is to leverage the posterior-anterior canal bifurcation as a readily identifiable landmark for initial model positioning.

Conclusions
We have demonstrated a simple, rapid and freely available technique for estimating cochlear morphology from pseudo-clinical MDCT scans. Average vertex errors are comparable with the state of the art, as are estimates of cochlear size and duct length. A further contribution of this study is an enhanced understanding of the cochlea's vertical trajectory, leading to a novel metric for characterizing the non-planarity of the basal turn. The www.nature.com/scientificreports/ non-planarity metric can be estimated from pseudo-clinical scans with an average absolute error of 10.6% of the gold-standard sample range. The hope is that these techniques will perform equally well with true clinical scans, and may one day assist in personalized implant selection and surgical planning, in the same way that similar methods have already been shown to improve implant programming 39 .

Data availability
Stradview, wxRegSurf and wxDicom are available for free download: links are provided in the text. The micro-CT scans are not publicly available, since unrestricted publication would breach the terms of the donors' consent.
Reasonable requests for sharing of this data can be made through A.H.G.