Underwater image restoration with Haar wavelet transform and ensemble of triple correction algorithms using Bootstrap aggregation and random forests

This paper presents both a new strategy for traditional underwater image restoration using Haar wavelet transform as well as a new learned model that generates an ensemble of triple correction algorithm labels based on histogram quadrants’ cumulative distribution feature instead of generating pixel intensities. The Haar wavelet transform is our tentative dynamic stretching plan that is applied on the input image and its contrast stretched image to generate the degraded wavelet coefficients which are blended using Gaussian pyramid of the saliency weights to restore the original image. The ensemble of triple corrections exerts three color correction algorithms sequentially on the degraded image for restoration. The ensemble of algorithms entails the superposition effect of the red channel mean shifting, global RGB adaptation, global luminance adaptation, global saturation adaptation, luminance stretching, saturation stretching, contrast stretching, adaptive Gamma correction for red spectrum, even to odd middle intensity transference using look-up table, green to red spectrum transference using histogram equalization, local brightening, Dark Channel Prior, fusion restoration, and our Haar wavelet transform restoration. The source is available at https://github.com/vahidr213/Underwater-Image-Restoration-And-Enhancement-Collection.

Various researchers have recently visualized several mathematical methods and thoughts to define an effective solution for color losses in underwater images and limitations by the extent to which several literatures in the machine vision science have approached the underwater image restoration problem from different perspectives. On the other hand, underwater image restoration is by no means trivial. It is actually as challenging as most in machine vision and image processing.
The primary factor in underwater image degradation is the high attenuation of the radiated light from the surface of the objects 1 . An early and famous single image haze removal for outdoor images named Dark Channel Prior (DCP) introduced the assumption of darkness in at least one of the RGB channels in any non-sky patch of the image. Underwater Dark Channel Prior (UDCP) 2 is derived from the DCP 3 for underwater images assuming the red channel as the dark channel, hence using green and blue channels to estimate the medium transmission matrix.
From the application point of view, 3D reconstruction of seafloor encounters objects in images that require automatic true color recovery since the effects can limit the ability to assess the changes happened to underwater organisms such as those living in benthic zone 4 .
The Sea-thru method 5 examined the dark channel prior in detail via finding the optimized medium transmission matrix by estimating the range-dependent attenuation coefficient using local space average color. An alternative to dark channel prior named red channel prior 6 is a rearrangement of the original DCP equation as well as refining the estimated transmission map with guided filter instead of matting technique. Recently, the transmission map of the DCP method for the red channel have been estimated 7 by applying adaptive non-convex and non-smooth variation to a rough initial transmission map and then using this optimized transmission map to obtain red global background light via thermal exchange optimization, then the refined red channel is recombined with green and blue channels to form the restored image. As mentioned before, a complex image formation model is proposed 1 that has drawn attention to be utilized in 8 for depth and background light estimation. In 9 , an OPEN Independent Author, Ahvaz, Iran. email: v.rowghanian@gmail.com

Proposed method
In this paper, our contribution to the problem of underwater image restoration includes a traditional method as well as a learned model using new features. The traditional methods solve portions of the whole problem quite topnotch. We used an alternative single image-based solution through blending the Haar wavelet coefficients of the raw image with its contrast stretched version (static tolerance) to convey a sense of dynamic contrast stretching. Our approach is applied after the multi-scale fusion restoration 10 to further improve the quality of the restored image. The flowchart of this approach is shown in Fig. 1.
On the other hand, a comprehensive method for random patterns requires trained models. In this paper, we use the histogram quadrants' cumulative distribution as a new feature to characterize each image instead of using pixel intensity. The features are then learned using Bootstrap Aggregation and Random Forests. We have trained three models separately. The responses of models are in the form of a numeric label instead of pixel intensities. Each model generates a numeric label corresponding to a special color correction algorithm that should be applied sequentially to restore the original image.
Input images. The input images of fusion process can generally be different because the fusion is based on combining multiple sources to preserve the significant features of them. In this regard, the contrast stretching  www.nature.com/scientificreports/ is one of the most popular initial color correction tools as yet has overcome significant initial white balancing. Numerous white balancing methods have been suggested but none of them are always experimentally appropriate in underwater scenes. The Gray-World approach of Bachsbaum et al. 25 is a mediocre example of white balancing methods, yet creates color deviation by introducing reddish artifacts where the appearance is overall blue. The following method is used to adjust white balance initially which is simple and efficient and creates less red artifacts. At first, a three-element vector is defined as below: where K is a constant (e.g. 0.005) to make a vector of probability ratios in the interval [0 1]. The three cumulative probabilities presented in ratio and their complements in (1-ratio) determine the percentage of the data that should be saturated to the lowest and highest values in each of the three RGB channels respectively.
Haar wavelet inputs. The Haar wavelet transform is fed with the raw input image as well as a %0.5 contrast stretched version of the raw image through the method described above. The input image for Haar transform is resized into 640 × 640 in order to have the Haar pyramid dimensions as equal as the Gaussian pyramid of the blending weights (W 1 , W 2 ) as well as considering a power of 2 dimension to facilitate the wavelet transform. The input image can also be resized to 1024 × 1024 to increase the accuracy.
Multi-scale fusion input. The first image for the pyramid of multi-scale fusion 10 is the contrast stretched version of the input image explained above and the second one is an equalized luminance image provided by adaptive histogram equalization 26 .
Weights of multi-scale fusion. The choice of correct weights reduces computations and artifacts. There are four weights in 10 that are explained below: Laplacian contrast weight (W L ). Laplacian contrast weight is the absolute value of the Laplacian filter applied on the luminance component of the image.
Local contrast weight (W LC ). Local contrast weight is an additional contrast measure to recover the contrast in the regions where the Laplacian contrast weight (W L ) is not sufficient (e.g. ramp and flat regions). It improves the transitions between dark and bright regions. The (W LC ) is computed as the square of the difference between the luminance and its locally averaged luminance (e.g. 5 × 5, [1,4,6,4,1]/16 filter plus cutting off values above π/2.75 to π/2.75).
Saliency weights (W S ). Saliency weights emphasize on the pixels standing farthest relative to the mean of the reddish, bluish and luminance components collectively. In other words, the salient objects in a scene are biologically recognized and similarly mathematically defined as yet taken up a region of center-surround contrast. The saliency algorithm of Achanta et al. 27 is used which is defined in Lab color space by: Exposedness weights (W E ). Exposedness weights are a rough estimation of how much a pixel is exposed to light. This weight is defined by a Gaussian profile resembling (^) sign with mean value 0.5 and standard deviation 0.25. Therefore, pixels close to the average value are exaggerated to produce better results when combined with saliency weights. Exposedness weight (W E ) is defined as: The final weights W 1 and W 2 combine the above mentioned four weights and have the same size as input image (like all four weights). The final weights determine the amount of modifications that each pixel will receive after the fusion applied. The final weights W 1 and W 2 are computed as below:  Figure 2 demonstrates the structure of the coefficient blending.
In the blending process, the saliency weights are used solely since the saliency weights take up the principal structure among the four weights described above. The refinement or blending weights ( W 1 , W 2 ) are computed according to the following equations: The blending of the degraded Haar wavelet coefficients are performed according to the following equations: where H l I k c x, y , V l I k c x, y , D l I k c x, y are the horizontal, vertical and diagonal coefficients of kth image for the channel c ∈ {R, G, B} at level l. A k is the approximation coefficient matrix for image k.
The inverse Haar transform is then applied on the refined coefficients (A new , H new , V new , D new ) to restore the original image.
The Haar wavelet restoration is able to restore underwater images independently. Nevertheless, our experiments shown the overall improvement of efficiency through using the Haar wavelet restoration method as a complementary algorithm. Thus, we applied the Haar wavelet restoration method on the restored image produced by multi-scale fusion 10 . Figure 1 shows the flowchart of our Haar wavelet restoration and multi-scale fusion 10 structure. The overall algorithm shown a record breaking result representing an average mean square error (MSE) better than almost all trained models and traditional methods except for Dive + algorithm. where and R is the restored image. I k can generally be any color corrected form of the raw input image. Due to halos and artifacts that are introduced in R(x,y) by directly applying equation above, both weights and input images are decomposed into a multi-scale pyramid defined below: where l represents the number of pyramid levels. Each scale is derived by down sampling the previous level. The initial level is the original image.
To preserve the important and desired information, an operation such as Gaussian filtering or Laplacian filtering can be applied before down sampling. In this regard, we decomposed weights into a Gaussian pyramid and decomposed images into a Laplacian pyramid. Therefore, the equation above can be rewritten by: where L l I k x, y is the Laplacian of the kth image at level l, and G l W k x, y is the Gaussian of the kth weight at level l. The Gaussian pyramid is built by Gaussian filtering (or 5 × 5, [1,4,6,4,1]/16) and down sampling the image repetitively until we reach to the maximum level, considering the first layer to be the same size but only filtered version of the input image. The Laplacian pyramid is created by building a pyramid of down sampled images at first, then calculating the difference of the lower level image from the resized (enlarged) upper level image (using bi-cubic interpolation). The first level for both the Gaussian pyramid and the Laplacian pyramid is as the same size as the input image to increase the accuracy.

Ensemble of triple corrections
Traditional methods present a global solution (or limited joint solutions) for a typical prior assumption such as homogeneous lighting along the line of sight, unbalanced color spectrum in RGB channels, saturated colors, low contrast and brightness, blurriness, and especially degradation based on underwater image formation. On the other hand, the comprehensiveness of a specific traditional method solving different patterns of underwater degradation is as yet undecided.
There are several conventional color correction techniques aimed at increasing the visual quality of an image. These techniques include stretching, global adaptation, sharpening, unsharp masking, histogram equalization and many others. In this paper, we propose to predict and choose the best three combinations from an ensemble of conventional color correction techniques as well as our current proposed methods to be applied on a single image according to the features extracted from the image. In this paper, we have also proposed new features to be learned which are explained in detail in the feature extraction section. Three models are trained with these extracted features. The training method is Bootstrap Aggregation and Random Forests. Each model is responsible for generating a numeric label for its corresponding stage that indicates a specific color correction method among the fifteen available methods. It should be noted that the triple corrections are applied sequentially as the block diagram of the proposed method ETCA shows in Fig. 3. The ensemble of color correction methods entails www.nature.com/scientificreports/ the superposition effect of the red channel mean shift and sharpening, global RGB adaptation, global luminance adaptation, global saturation adaptation, luminance stretching, saturation stretching, contrast stretching, adaptive Gamma correction for red spectrum, our even to odd middle intensity transference using look-up table, green to red spectrum transference using histogram equalization, local brightening, DCP 3 , fusion restoration 10 , and our Haar wavelet transform restoration. For comparison purposes, Fig. 4 is used for all the subjective visualizations. Figures 5, 6 illustrate the visual and spectral effects of DCP and local brightening methods applied on Fig. 4. The total number of the available methods including no action is equal to fifteen. It should be noted that the null operation leaves the image intact. This was necessary due to existence of some fully enhanced images in the UIEB dataset which had a negligible difference with their reference image (negligible mean square error). Other color correction algorithms are assigned a special numeric label identifying the corresponding operation. The color correction methods are described in the feature extraction section in detail.
Features extraction. The feature extraction plays a key role in the quality of the deep learning-based underwater enhancement methods. As far as we know, most of the recent deep learning models have used the RGB values as their features. A majority of the state of the art underwater deep learning methods map pixel to pixel or pixel to difference. Pixel to pixel mapping mutually produces an output pixel in response to an input pixel. Pixel to difference mapping produces a positive or negative amount in the output to be added to or subtracted from the input pixel intensity. However, the possibility of generating multiple RGB combinations in the outputs of these networks still remain unsolved and will fall behind the generalization of the such conventional deep learning-based underwater restoration methods.
In this regard, we propose a probability based solution to be an alternative feature for pixel intensity. We have used cumulative distribution function (CDF) in 4 closed intervals or regions of the probability distribution function (PDF) of the input image. Therefore, each quadrant of the image's PDF is assigned with a CDF scalar. Since 8-bit RGB images have three channels, a total of twelve scalars are extracted for each input image using equations below.  We have used these twelve CDFs as input features. The cumulative distribution (CDF) of each histogram quadrant decreases the number of training features significantly in contrast to huge training data used in above mentioned networks. The histogram quadrants also decrease the training time significantly.
The training process also needs the corresponding responds for which the input features have been extracted. In this work, the respond of the network is also not a pixel intensity. The respond to each input feature is an ensemble of optimized triple labels minimizing the cost function (in our case, the Mean Square Error between the reference image and the restored image). Each numeric label corresponds to a single special color correction method and each ensemble of triple labels has to be applied on the input image sequentially to restore the original image. Therefore, an optimization has been performed to generate the optimum triple correction methods (labels) for each input image.
A total of fifteen color correction methods used in evaluations. One of them is no action. Hereafter, other color correction methods are described in detail. It should be mentioned that the superiority of contrast stretching and pyramidal methods over other proposed methods with shallow positive effects is obviously not a strange phenomenon but the essence of beneficiary complementary approaches is exclusively indispensable for other aspects of visual improvements.
• The static contrast stretching in RGB color space, static luminance stretching in Lab color space, and static saturation stretching in HSV color space are performed simply by using the following equation which satu- where scalar I max is image maximum pixel intensity and the scalar I is the log-average of the image that enables the I g to adapt to the scene and is computed by equation below: where I x, y is an M × N gray scale image and δ is small value to avoid singularity (e.g. δ = 10 −3 ). As the log-average I increases to larger values, the global adaptation equation I g behaviors more in linear shape than logarithm shape. Therefore, darker scenes (i.e. contrast, saturation and luminance spectrums) are brightened more than brighter scenes through global adaptation. Figures 10, 11, 12 illustrate the visual and spectral effects of these global adaptation methods applied on Fig. 4. www.nature.com/scientificreports/ • Adaptive Gamma correction is performed using the method proposed in 30 which provides an automatic way to adaptively compute the Gamma for a given image instead of using a constant scalar. This method is computed through equations below for a gray scale image I: where scalar µ is the average value of the image, scalar σ is the standard deviation of the image, and δ is an infinitesimal value to avoid zero input to Heaviside function. Figure 13 illustrates the visual and spectral effect of this method applied on Fig. 4. • Red channel mean shifting and sharpening is composed of two operations. It should be noted that this approach is used in only few cases since this method requires a user defined scalar for blending. The initial operation shifts the mean of the red channel toward the gray image's mean and then blends the green spectrum into red spectrum according to the following equation: The second operation sharpens the red channel I R using unsharp masking. Unsharp masking is performed by subtracting a blurred (unsharp) version of the image from the initial image. Figure 14 illustrates the visual and spectral effect of this method applied on Fig. 4.
• In this work, we propose the idea of even to odd middle intensity transference using look-up table to augment the probability of middle intensities (roughly doubling). After transferring the even middle intensities www.nature.com/scientificreports/ to their adjacent odd intensities, every odd middle intensity will roughly have a doubled probability. This phenomenon is due to the continuity of the cumulative distribution (cdf) of the discrete and continuous data. In other words, two adjacent intensities have quite slight difference in their probabilities. The selected middle intensities are between [8 and 250]. Other intensities are left intact. This method partially affects the image and has no extensive positive or negative effect on the image. Figure 15 illustrates the visual and spectral effect of this method applied on Fig. 4. • Green to red spectrum transference using histogram equalization is performed since we have observed many positive effects using green spectrum in contrast to blue spectrum. The transference is performed with the following equation: where ρ green is the green spectrum which is used in histogram equalization. Figure 16 illustrates the visual and spectral effect of this method applied on Fig. 4. • The local brightening 31 is used as one of our color correction methods but it should be noted that local brightening method quite degrades the overall underwater images. Due to this effect, we have set the blending option of the brightening method to (0.1).
We have used the Bootstrap Aggregation (Bagging) and Random Forests 32,33 to create three trained models with which the three labels are generated. The first, second and third labels are generated separately through their designated trained model and then these labels are sequentially exerted on the input according to their order.

Experimental results
A comprehensive and fair experiment of several traditional underwater image enhancement methods is performed in UIEB dataset paper 34 from which our raw and reference images are drawn. Likewise, a comprehensive evaluation of several recent learned models (deep learning) is performed in 35 from which we quote them in Table 1. The UIEB dataset 34   www.nature.com/scientificreports/ to Noise Ratio (PSNR) and Structural Similarity Index for Measuring image quality (SSIM). In UIEB dataset, the degraded images are paired with the references by asking 50 volunteers to select one image out of 12 enhanced results. The enhanced results were produced by 12 traditional methods while one of the was Dive + app (with predefined settings). Table 1 quotes the performance of our methods as well as the performance of other methods according to 34,35 . Our simulations are executed via Matlab software and the source is freely available for evaluation. The speed performance depends on the image size, methods applied on the image and the processor. The larger the image, the slower the result is produced since all of our color correction algorithms are applied on the whole image. Therefore, the number of pixels has direct influence on the speed performance. The maximum latency happens when both multi-scale fusion 10 and our proposed Haar wavelet restoration go along with each other inside the ensemble of triple algorithms. The frame rate (FPS) for red channel mean shift, contrast stretching, luminance stretching, saturation stretching, global RGB adaptation, global luminance adaptation, global saturation adaptation, DCP, even to odd transference, local brightening, green to red transference, multi-scale fusion 10 , and Haar wavelet is 0.60, 35, 0.70, 4.95, 3.08, 0.67, 3.87, 0.28, 312.0, 2.70, 5.5,0.24, and 0.93 respectively which is the average measured with an Intel Core-i3 2100-3.1 GHz processor on a 1120 × 1380 image after 300 repetitions.
The number of levels used in decomposing the input images and weights in both multi-scale fusion and Haar wavelet transform is equal to l = 5. The maximum boundary of each quadrant depends on the number of histogram bins. We have used 256-bin histogram. Therefore, the maximum boundaries 1 , 2 , 3 , 4 will be {63,127,191,255}.
The underwater image restoration using Haar wavelet coefficients refinement has shown a superior improvement in terms of average MSE, PSNR and SSIM on the whole dataset over traditional and deep learning methods as you can see in Table1. The only superior method over our proposed traditional method is the commercial app called Dive + . As mentioned before, the images are resized to 640 × 640 in image restoration using Haar wavelet transform. Due to this, the accuracy decreases while the processing speed increases. It is possible to increase the input image to higher dimensions to increase the precision since the Gaussian pyramid of the saliency weights will carry more information to be used in the modification of the final pixel.
Our method ETCA is flexible and it is possible to use other color correction algorithms that are more robust instead of some of our current fifteen methods. As it can be seen in Table 1, the best performance among all the  During optimization, more than 1460 triple permutations are evaluated to reach the best ensemble of labels. The number of images on which the ETCA has applied null operation is 11.
Our technique shown limitations when the optimization process has produced weak ensemble of correction algorithms in few cases. By evaluating the restored images visually, we found that most of the UIEB dataset images are well detected and restored by histogram quadrants and only in very few cases some reddish artifacts are produced. In Fig. 17 evaluation of multi-scale fusion 10 and our proposed methods on some test images that are not from UIEB dataset is shown (from unsplash.com).
Out of fifteen color correction algorithms, only two of them rely quite a lot on experimental user defined parameters. These two algorithm can create extreme changes when applied with inappropriate parameters. These methods include red channel mean shifting and local brightening. Both of them require a parameter for blending amount to which we have set a scalar value of (0.1).   Our evaluations on robustness of histogram's quadrants have proven the veracity of these features to be discriminative enough for raw images (before any correction). This indicates that with each stage of correction, the histogram disguises into a shape that causes the histogram quadrants become more indistinguishable. This situation entails a considerable risk if images are required to be characterized before the second stage. To confront this situation, we opt histogram octants instead of quadrants. Another feature of the ETCA is that it relies on few user-defined parameters.
The alternative approach to random forests and bootstrap aggregation is the pattern recognition neural networks. We have built the ETCA using pattern recognition neural networks with under hundred neurons in a network with a single hidden layer. While the quadrants are learned easily to characterize raw images, it is strikingly not feasible for characterizing a largely corrected image.
The Haar wavelet coefficient blending is our tentative plan to incorporate a dynamic contrast stretching method through blending the coefficients of a %0.5 contrast stretched image with the raw image (as a counterpart to static %1 stretching).
The idea of merging multi-scale fusion with wavelet coefficient refinement has shown a mediocre performance since the method generally produces stretched/equalized spectrums that is insufficient for all types of degradations. Nevertheless, among all the state of the art methods in Table 1, Dive + app merely outperforms ours.
The numerical and visual inspections via carrying out the global RGB adaptation, global luminance adaptation and global saturation adaptation on the raw images have shown to cause displeasing yields as an initial color correction operation for underwater images. www.nature.com/scientificreports/ The global RGB adaptation exerted as the second or the third correction operation yields an image spectrum with confined upward compression based on the spectrum's average provided that the first color correction operation broadens the initial spectrum. Likewise, the global saturation adaptation and global luminance adaptation exerted as the second or the third operation yield upward compressed saturation and luminance spectrums respectively which will dynamically amplify the saturation and luminance spectrums according to the log-average value.
The contrast stretching, luminance stretching and saturation stretching are by orders of magnitude more engrossing than three global adaptations as initial color correction methods since most of the underwater images suffer initially from a dense narrow red spectrum. These three stretching methods also to some extent, overwhelm the three global adaptation methods as the second or the third correction methods.
The idea of static %1 contrast stretching is prone to produce reddish artifacts or sometimes supplant the green water pixels with blue water pixels. This confinement brings up the idea of dynamic contrast stretching such as with variable stretching tolerance.
Likewise, the ample green and blue pixels in a scene may occasionally turn the luminance stretched image into a more displeasing greenish or bluish scene if it is applied as the initial color correction method. On the other hand, luminance stretching mostly confronts the red spectrum with an indispensable zenith (sufficiently large bin but not the peak) at zero while one or two zeniths may spawn at each end of green and/or blue spectrums (introducing wide range of pixel exposures) provided that it is not applied as the first color correction operation.
Saturation stretching confronts the red spectrum's first quadrant (Q1) whether supplanted with an indispensable non-vain Q1 or turned down to the lower values due to imbalance between RGB spectrums, therefore, confining its usage as an initial color correction.
Saturation stretching is also prone to introduce darker colors or turn the blue underwater sunlight into bright white sunlight when it is applied as the second or third operation.
Adaptive Gamma correction supplants the initial repressed red spectrum (having large Q1 cdf and low mean) with a striking wider and turned up spectrum whose peak has been settled between Q2 and Q3. This method may occasionally create displeasing reddish artifacts.
The veracity of green and red spectrums' resemblance is frequently seen in different environments and scenes by visual inspections of image histogram. Therefore, an engrossing tentative solution for red channel compensation is to convey the green spectrum's distribution to red channel.