Compact-Morphology-based poly-metallic Nodule Delineation

Poly-metallic nodules are a marine resource considered for deep sea mining. Assessing nodule abundance is of interest for mining companies and to monitor potential environmental impact. Optical seafloor imaging allows quantifying poly-metallic nodule abundance at spatial scales from centimetres to square kilometres. Towed cameras and diving robots acquire high-resolution imagery that allow detecting individual nodules and measure their sizes. Spatial abundance statistics can be computed from these size measurements, providing e.g. seafloor coverage in percent and the nodule size distribution. Detecting nodules requires segmentation of nodule pixels from pixels showing sediment background. Semi-supervised pattern recognition has been proposed to automate this task. Existing nodule segmentation algorithms employ machine learning that trains a classifier to segment the nodules in a high-dimensional feature space. Here, a rapid nodule segmentation algorithm is presented. It omits computation-intense feature-based classification and employs image processing only. It exploits a nodule compactness heuristic to delineate individual nodules. Complex machine learning methods are avoided to keep the algorithm simple and fast. The algorithm has successfully been applied to different image datasets. These data sets were acquired by different cameras, camera platforms and in varying illumination conditions. Their successful analysis shows the broad applicability of the proposed method.

Poly-metallic nodules (PMNs -also referred to as manganese nodules) are a marine mineral resource. They contain relevant concentrations of Copper, Nickel and Cobalt 1 . Most PMN research and resource assessment focusses on the deep abyssal plains in the eastern Pacific Ocean. Economically interesting areas were found within the Clarion Clipperton Zone (CCZ) where several countries are exploring PMN claims. These are license areas provided to the countries through the International Seabed Authority.
International research projects are investigating aspects of nodule mining (Ecological Aspects of Deep-Sea Mining: https://jpio-miningimpact.geomar.de; Managing Impacts of Deep-Sea Resource Exploitation: https://www.eu-midas.net/). Current foci of these research projects lie on environmental aspects: e.g. epifauna associated with PMNs, mining-induced habitat disturbances and the resilience of the ecosystem to cope with this anthropogenic pressure 2,3 .
Quantitative measurements of PMN abundance are required for various applications. Individual nodule sizes (in cm 2 ) provide basic measurements that can be aggregated to create descriptive spatial statistics. Examples are PMN density (nodules/m 2 ) and seafloor coverage (in %) that data can related to environmental parameters, like faunal densities or chemical gradients.
Quantitative nodule information is traditionally provided by two methods. First, by physical sea floor sampling, e.g. using box cores. Second, by hydro-acoustic sensors, e.g. Multi-beam Echo Sounder (MBES) or Side Scan Sonar (SSS). Physical sampling by box cores provides precise size measurements of individual PMNs at minimum areal coverage. MBES and SSS provide high areal coverage but minimum quantitative resolution.
It is possible to correlate hydro-acoustic data and physical sampling for specific PMN abundances, but nodules need to be frequent and homogeneously distributed. Box cores can provide misleading data in heterogeneously populated areas and might even provide empty measurements despite an actual PMN occurrence in sparsely populated areas.
Optical imaging provides a bridge technology between these two methods. It features higher quantitative resolution than hydro-acoustics and higher areal coverage than physical sampling (see Fig. 1) 4 . It can be used to measure local PMN heterogeneity as well as regional abundance.
Semantics have to be assigned to pixels to determine nodule quantities in optical imagery. The objects of interest, i.e. PMNs, have to be segmented from the sediment background making this semantic segmentation is a binary task. Each pixel of an image needs to be assigned to either the PMN class ω 0 or the sediment background class ω 1 .
After the segmentation, individual PMNs need to be delineated. This delineation provides nodule sizes in cm 2 -like physical sampling. By aggregating those size measurements imaging provides spatial nodule abundance statistics for large seafloor areas -like hydro-acoustic mapping. Examples of spatial statistics are size distributions, seafloor percent coverage and the number of nodules per area.
Traditionally, manual image annotation is conducted to add semantic information to images and specific tools for manual annotation of underwater imagery have been developed (e.g. Squidle or BIIGLE 6,7 ). In manual annotation, human experts inspect images and perform two steps: detection and classification of objects of interest. Detections are quantified by geometrical markers placed on top of the images (e.g. rectangles, polygons). Classifications are quantified by category names (e.g. "nodule").
Classification is easy for the binary nodule case (ω 0 or ω 1 ), detection however is complicated. Nodules are embedded within the sediment and a gradual transition is frequent between the two classes. Delineating nodules manually can not be achieved in a pixel-perfect way and early methods use percent coverage to measure nodule abundance in images 8,9 . These coverage measurements have to be seen as subjective inspections rather than objective annotations. Also, all manual image interpretation -inspection and annotation -is prone to bias by human factors like fatigue. Several studies showed that manual underwater image annotation is an error-prone task [10][11][12] as it is in other image analysis domains 13 .
Pattern Recognition (PR) has been proposed to reduce human observer bias. PR-based image analysis methods are usually developed for imagery acquired in air. Their application to in-water imagery is complicated by the underwater environment due to physical factors like scattering, wavelength-dependent light absorption and inhomogeneous illumination. Biological factors add turbidity, marine snow and biofouling of image acquisition gear as further challenges. Some underwater imagery can be pre-processed to make in-air PR methods applicable 14 .
Successful underwater application of PR has been performed for specific use cases. Fish were classified by a deep neural network 15 and by manually tuning shape features 16 . Nematode biomass was assessed by manually tuning intensity thresholds 17 and scallop abundance has been quantified using Adaboost 18 . General-purpose megafauna detection has been initiated using rich feature representations and Support Vector Machines 11 .
Efforts to automate the PMN segmentation have emerged as well but because of the aforementioned challenges of underwater imaging, traditional segmentation methods like Region growing 19 , Pyramid linking 20 , Normalized-cuts 21 , Mean-shift 22 and Otsu thresholding are not directly applicable 23 . Otsu thresholding for example heuristically tunes a threshold value based on the colour frequencies in an image but does not take the spatial distribution of colours into account.
Successful PMN segmentation methods are characterised by varying manual annotation efforts and intense computational runtimes, which is higher than the image acquisition time. An early approach to assess PMNs provides seafloor percent coverage only 9 . This approach requires few manual annotations as input for an unsupervised pixel clustering. A more sophisticated method implements manual annotation of clusters in a feature space 24 . Parts of this 'pixel clustering' method have been sped up using high-performance computing techniques 25 . Other parts of this method were fully automated at increased computational cost based on a compactness criterion 23 .
All methods mentioned above were tuned for specific image data sets. Those data sets are characterised by a particular illumination pattern, prevalence of backscatter and camera view angle. One method was successfully linked to hydro-acoustic measurements and contributed to PMN abundance assessment on a spatial scale larger than 1,000 m 2 26 .
Here, the Compact Morphology-based Nodule Delineation (CoMoNoD) method is presented. It combines the advantages of the above-mentioned methods but overcomes some of their individual shortcomings. CoMoNoD competes by neglecting time-consuming feature computation and machine learning. It requires no manual image annotation, applies data-driven parameter tuning and is computationally optimised. CoMoNoD maximises the between-class contrast by a compactness criterion, similar to the criterion proposed in 23 . Comparable to the Otsu method, a colour threshold is determined. In CoMoNoD the threshold is based on spatial colour distributions within the image. The algorithm is implemented in a GPU-optimised way to allow for rapid image processing. Images acquired at 1 Hz can be processed in realtime on one computer. This provides PMN abundance assessments shortly after image acquisition and thus helps scientists to e.g. pin-pointing subsequent sampling locations during research cruises based on the determined spatial nodule statistics.

Material
CoMoNoD was applied to two diverse image sets, I (1) and I (2) , showing poly-metallic nodules. Both image sets were acquired in the deep sea of the Pacific Ocean. The images show a vertical view down onto the seafloor (see Fig. 2).
Image set I (1) ) was acquired by GEOMAR using the Deep Survey Camera 5 on board AUV Abyss. Images were acquired in the DISCOL experimental area of the Peru Basin (station SO242-1_083_ AUV10 27 ). The imaged area was chosen to reinvestigate a benthic disturbance experiment conducted in 1989 28 . I (1) is a subset of a much larger AUV image collection acquired during the Ecological Impacts of Deep Sea Mining cruises SO239 29 and SO242-1 27 . Image and meta data are available through OSIS (https://portal.geomar.de) and the web-based image annotation software DIAS (https://dias.geomar.de). The images in I (1) were acquired from an average altitude of 7.5 m above the sea floor. A FishEye lens was used to capture a 90° field of view. FishEye un-distortion was applied to create rectified images of 4096 (3072) pixel width (height). Resolution in the rectified images is ca. 1 px/cm. Those rectified images were then analysed with CoMoNoD.
Image set I (2) (| | = I 88,630 (2) ) was acquired by the National Oceanography Centre Southampton (NOCS) in an Area of Particular Environmental Interest (APEI No. 6) in the CCZ, using AUV Autosub6000 30 . The images in I (2) were acquired from an average altitude of 3 m. They feature a footprint of ca. 1.8 m 2 and a resolution of ca. 16 px/cm.
The segmentation of nodules and sediment is considered a binary task for I (1) and I (2) . As less than 3% of the image pixels show other objects like fauna, these objects are neglected in the analysis. Both image sets come with meta data for each image including latitude and longitude for geo-referencing as well as altitude to compute image footprints in m 2 .

Method
The proposed CoMoNoD method consists of two phases: 1) contrast maximisation (see Figs 3 and 4) and 2) nodule delineation (see Fig. 5). The core heuristic -assuming that PMNs are elliptical, mostly convex objects -is exploited in both phases. All image processing is conducted using the OpenCV C++ library and GPU acceleration is used where applicable. Contrast maximisation. Each image I i of an image set I is first filtered with a 3 × 3 px median filter to remove sensor noise and small artefacts. Artefacts can be caused by floating particles like marine snow or shell fragments on the seafloor. The images are then scaled with a scale factor A is the median image footprint in the image set I. Cubic interpolation is used while scaling the images. This is the most accurate interpolation method in the GPU-accelerated part of OpenCV. Scaling leads to a uniform px/cm ratio within the scaled image set.
Scaled images are smoothed by a Gaussian filter of 3 × 3 px size. Afterwards, they are colour corrected using the fSpice algorithm (see Figs 3b) and 6(b)) 11 . This data-driven colour correction method removes an illumination  ( ) (dashed, grey curve) and the first derivate γ′ ⋅ ( ) (solid, black curve). The curves represent the compactness in one image from I (1) . To select the intensity threshold t i , first the cone from individual images and equalises the colour histograms of all images within an image set. The algorithm was developed to reconstruct the natural appearance of the seafloor. Here, fSpice is applied in a way which increases the contrast between nodules and the sediment. The colour corrected images are then converted to 8 bit grey-scale. Each grey-scale image is transformed to a binary image B i to maximise the contrast. An image-specific thresh- Fig. 3c)). To determine t i , first a grey level co-occurrence matrix G i ( ) is computed considering the four Moore-neighbouring pixels in 1 px distance. At this point the compactness heuristic is exploited: few colour co-occurrences of pixels below t i (likely PMNs -i.e. ω 0 ) and above t i (likely background -i.e. ω 1 ) are targeted. Hence for each t i , a compactness value γ t ( ) i is computed: (255) 1. Next, the first derivative of γ t ( ) i is computed and its peak position p i determined: The threshold t i is selected as: i where θ γ is one of the CoMoNoD input parameters. The concept for selecting the threshold t i like this is as follows. The compactness γ p ( ) i decreases when the binary image B i becomes noisy. This is the case when background pixels are erroneously added to the PMN class ω 0 (see Figs 3d) and 4). It will happen when t i is chosen too high. The maximum compactness change γ′ p ( ) i thus serves as the upper limit for the threshold value t i . Nodule delineation. After contrast maximisation, the binary image B i is subject to multiple blob detection, splitting and fusion steps. These steps delineate individual nodules in the second phase of CoMoNoD (see Fig. 5).
First, the distance image D i is computed, which has the same size as B i . Each pixel value in D i encodes the shortest distance to a pixel in B i assigned to class ω 1 (see Fig. 5a)).
Local maxima are determined in D i . Maxima are pixels that exceed their Moore-neighbouring pixels. These local maxima constitute the initial nodule candidate centroids. Neighbouring local maxima are filtered and only the highest peak is retained within a θ * 5 r px radius (see Fig. 5b)). θ r is another input parameter for CoMoNoD. It represents the minimum nodule radius in pixels and depends on image resolution.
Each pixel in B i , which is set to ω 0 , needs to be assigned to one of these peaks. Here it is again assumed that PMNs are compact objects of elliptical shape. The pixels set to ω 0 form connected pixel clusters or blobs.
Bottlenecks in the blob contours are evaluated to separate adjacent PMNs in the binary image (see Fig. 5b)red lines). Each blob is iteratively split up to virtual blobs to find the optimal separation of the peaks within the blob. The iterative splitting is discontinued when each peak is contained in its own blob (see Fig. 5b)).
All pixel blobs are fused with their largest neighbour when they are smaller than π θ * r 2 pixels. To be fused the neighbour has to be closer than θ * 2 r pixels distance. Fusion avoids over-segmentation. Small blobs are discarded if no such neighbour exists.
Nodule candidate blobs are delineated by their convex hull to account for gaps between PMNs. Gaps can be caused by sediment coverage or epifauna (see Fig. 5c)). Each convex hull is finally fit with an ellipsoid (see Fig. 5d)). The size of these ellipsoids provides individual nodule sizes in cm 2 from which descriptive nodule statistics can be computed. The main axes of the ellipse also provide measurements of two of the nodule axes.    Figure 6 shows an example of the delineation result for I (1) (in d)). It also shows intermediate results of the fSpice pre-processing and the binarization step. An example delineation is shown in Fig. 5d) for I (2) . By applying CoMoNoD to all 34,200 images in I (1) , nodule abundance was assessed within a contiguous seafloor area of ca. 500 × 400 m 2 . Nodule abundance was quantised to a 0.25 × 0.25 m 2 resolution grid to render the map (see Fig. 7). Therefore, each image was subdivided into 25 × 25 cm 2 tiles. CoMoNoD's descriptive statistics were then computed for each of these grid tiles.

Results
The average nodule number per square meter for the entire area is Φ = . 2 7 N m 2 , the average seafloor coverage is Φ = .

6
s 0 5 cm 2 (corresponding to a radius of 3.9 cm). Figure 8 shows the nodule size distribution and the coverage distribution.
More than 2.6 million nodules were delineated in the surveyed area. They cover a total area of 8.8 thousand square meters. The nodule volume can be estimated by assuming that the nodules are discoidal rotational ellipsoids. They would form a contiguous manganese nodule cube of 6.6 meter edge length. Assuming a metal composition as reported in the literature (Cobalt: 0.26%, Nickel: 1.2%, Copper: 1%) 33 , contained metal amounts can be deduced. The nodule cube would have an estimated worth of ca. 1 million EUR at current market prices (Cobalt: Further nodule statistics are being computed for the remainder of the image set which I (1) belongs to (cruises SO239 and SO242-1 -see above). Results will be presented in a follow-up publication.

Runtime.
CoMoNoD was implemented and tested on a high-end desktop computer. It was equipped with a GeForce GTX 980 Ti GPU, Intel Xeon E5-1650 CPU and 64GB RAM. An average runtime of ca. 0.1 s/MPix was observed. Computational complexity of the contrast maximisation is linear in the pixel size. Complexity of the nodule delineation steps is quadratic in the number of nodule segments. Thus images showing more nodules will lead to a longer runtime.
Images in I (1) are ca. 20 MPix in size and the runtime for the entire data set was about 19 hours. Images in I (2) are ca. 5 MPix in size, resulting in a runtime for the entire data set of about 12 hours. Table 1 shows average runtimes for image set I (1) . Source Code. The source code for CoMoNoD is available from the GEOMAR OpenSource Git repository (https://git.geomar.de/open-source/comonod). Code in this repository is maintained to include future algorithmic updates. A snapshot of the code used for this publication has been archived in Pangaea 34 . Additional software required to run the algorithm is the OpenCV image processing library (available through http://opencv.org/) and the fSpice algorithm (https://git.geomar.de/open-source/oceancv).

Discussion
So far, assessing PMN abundance variations was not possible over multiple spatial scales in near realtime. Measurements performed after physical sampling provide local abundance measurements. In case of box corers the sampled area represents 0.25 m 2 of the seafloor. Patterns on scales larger than 0.5 m are missed. Ship-based hydro-acoustic data enables assessing nodule patterns at scales of several hundred square kilometres. Its beam-footprint of ca. 50 × 50 m 2 oversees patterns on smaller scales.
Optical imaging provides a tool to capture PMN abundance variations over several scales. Automated nodule delineation allows to rapidly extract quantitative data from those images. Applied as a joined methodology, Figure 8. Nodule size distribution (left) and coverage distribution (right) for I (1) .

Method Training [min / MPix] Training for I (1) [h] Application [s / MPix] Application to I (1) [h]
CoMoNoD --0. imaging and automated analysis can assess PMN abundance and abundance variations from centimetre to kilometre scale.
CoMoNoD is currently applied to image data sets of several hundred thousand photos acquired across the Pacific Ocean. This study will provide nodule abundance statistics for local, regional and ocean basin scale.
Statistically assessing the natural heterogeneity requires to select a sampling size for the seafloor area quants. In physical sampling this is restricted to one specific size (e.g. 0.25 m 2 for box cores). For hydro-acoustics the minimum sampling size depends on the beam opening angle and the distance to the seafloor (meter scale for AUV-based data, 40-100 m for ship-based data). For image-based methods, this area quant can be selected. Its size will affect the derived descriptive statistics. When choosing a very small area quant (e.g. 10 × 10 cm 2 ) it will be rare to detect nodules. When a nodule does occur within the area, the coverage can reach 100%. Similarly, local nodule abundance heterogeneity might be occluded when the area is chosen too large (e.g. 100 × 100 m 2 ). The summative character of the Φ i will then average out those variations. A study on assessing nodule abundance heterogeneity on various scales is in preparation and will provide suggestions for sampling size selection.
Abundance maps like Fig. 7 provide a subjective impression of the quality of the nodule delineation algorithm. Ground truth information on nodule abundance would be needed to verify the detection accuracy. As manual annotations are lacking for nodule image and as the quality of manual annotations is disputable, robust verification is currently impossible. Ground truth data from physical sampling could be linked with our image analysis method, yet the natural abundance heterogeneity and the limited accuracy of underwater navigation currently prohibit such an assessment. No data set exists where an image has been taken prior physical sampling to assure a one-to-one comparison of image-derived and ground truth data. Video material of TV-guided box cores is of too low quality to be analysed quantitatively.
The contrast maximisation step provides good binary images in all cases. Sediment coverage creates holes in the third example. The nodule delineation succeeds for the first two examples. Percent coverage could be measured in the latter two only.
To overcome these shortcomings, an imaging survey should be conducted prior to the physical seafloor sampling. This would allow assessment of the visual baseline of the undisturbed seafloor. The physical sampling afterwards would provide precise ground truth measurements of individual nodules. Accuracy of underwater navigation is far from the required accuracy of few tens of centimetres for direct comparison. To overcome ambiguities sampling sites need to be distant enough from each other (>50 m). This will allow identification of each sampling location at the seafloor based on underwater navigational data. A second imaging survey over the same area would then determine the exact positions of the sampling impact sites using visual navigation.
Frequently occurring fauna can become a challenge for CoMoNoD. Under such conditions, the assumption that these objects do not contribute to the nodule class does not hold any more. An example of a misinterpretation is shown in Fig. 9. There, a Xenophyophore has been erroneously detected as a PMN, contributing an additive error to the coverage and nodule count measures and likely affects Φ i s as well. Interestingly it has been shown, that CoMoNoD can also be used to detect specific fauna by selecting images with outliers for Φ . i s ,0 99 . These outliers can be an indicator for large Holothurians. However, most complex objects are characterised by more complex visual features. Those objects require pattern recognition systems that are more sophisticated then CoMoNoD. Deep sea fauna can be colourful, complex-shaped or textured. Even nodules can present rich morphologies and colour features when imaged at high resolution. In those cases CoMoNoD can be challenged as well (see below). CoMoNoD should be accompanied by other detection methods when dark fauna represents more than 10% of the pixels. Bright fauna will be assigned to the sediment class and will not skew the nodule statistics. For I (1) and I (2) dark fauna contributes to less than 3% of the pixels.
As binary segmentation and delineating convex objects are also topics elsewhere, CoMoNoD could be applied in those contexts as well. Potential fields are microscopy (e.g. for blood cells) and remote sensing (e.g. for permafrost thaw craters).
CoMoNoD was applied to further nodule image data sets (see Fig. 10). These were obtained by different camera systems mounted on different platforms. Illumination was different in terms of light sources and direction. The data sets are mostly in-situ imagery like I (1) and I (2) . One data set shows ex-situ images of box core samplings acquired on deck of a research vessel. For all of these data sets, appropriate settings for θ γ provided binary images with maximised contrast between nodules and the sediment background.
For the two image data sets with the highest pixel resolution, and thus highest pixel number per nodule, no appropriate nodule delineation could be achieved. This is likely because the high resolution makes the internal heterogeneity of the nodule itself visible and detectable by CoMoNoD. In such cases, CoMoNoD could be extended by a fallback system using surrounding pixel's colour information (as described above). This would derive nodule outlines more robustly when the initial delineation does not provide satisfying results. Further studies are needed to determine the limits of the delineation process.
CoMoNoD is governed by the parameter θ γ . An intelligent parameter guessing system could be implemented that would again use the nodule convexity-heuristic. This system would compute delineations for various settings of θ γ and θ r in a brute-force manner. The most promising settings could then be determined by picking the one that produces mostly convex pixel blobs. It would also be possible to analyse multiple images with one parameter setting and assess the resulting nodule statistics as a quality criterion for the chosen parameters.
In the current implementation t i is estimated for each image. An improved CoMoNoD could use the estimate of the previous image to reduce the parameter search space for the current image. When images are acquired in quick succession (e.g. at 1 Hz) it can be assumed that they are rather similar. The nodule distribution, image illumination and seafloor distance should remain comparable and hence t i should be similar.
The second phase of CoMoNoD merges pixel blobs where nodules were cut into multiple segments. It also breaks up pixel blobs where nodules are connected to each other in B i . These steps can fail when large epifauna occurs on a nodule or when a sediment cover occludes the visibility on the nodules. Both steps are steered by the compactness heuristic and θ r . Using aerial feature descriptors, further information about pixel neighbourhoods could be included in the nodule fusion and decision process. This additional information would come at an additional computational cost. Further intelligence could be added to CoMoNoD by making the initial median filter adaptive, i.e. using the median only when the intensity change exceeds a threshold. These improvements were purposely neglected for CoMoNoD to maintain its rapid data analysis capability.
For new users of the algorithm it is easy to explore the influence of different settings for θ γ and θ r . A graphical user interface with two sliders would enable users to rapidly create nodule delineations for visual inspection. No feature computation, data normalisation or model training is necessary. CoMoNoD can directly be applied to single images. When promising settings have been determined, these can then be applied to the remainder of the data set in an automated way.

Conclusion
The proposed CoMoNoD algorithm segments poly-metallic nodules from the sediment background by delineating individual nodules and extracting quantitative abundance data from optical seafloor imagery. It links to the traditional methods for nodule abundance assessment that use physical sampling and hydro-acoustic mapping.