Multiscale relevance of natural images

We use an agnostic information-theoretic approach to investigate the statistical properties of natural images. We introduce the Multiscale Relevance (MSR) measure to assess the robustness of images to compression at all scales. Starting in a controlled environment, we characterize the MSR of synthetic random textures as function of image roughness \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text{ H }$$\end{document}H and other relevant parameters. We then extend the analysis to natural images and find striking similarities with critical (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {H}\approx 0$$\end{document}H≈0) random textures. We show that the MSR is more robust and informative of image content than classical methods such as power spectrum analysis. Finally, we confront the MSR to classical measures for the calibration of common procedures such as color mapping and denoising. Overall, the MSR approach appears to be a good candidate for advanced image analysis and image processing, while providing a good level of physical interpretability.


INTRODUCTION
Recent advances in image processing have benefited from the emergence of powerful learning frameworks combining efficient architectures [2][3][4] with large highquality databases [5,6].In particular, neural networks, layering simple linear and non-linear operators such as convolution matrices or activation functions, have proven to be very efficient to classify or generate high dimensional data.They are now able to capture similarities between images with unprecedented success.However, while their performance increases with the depth of the architecture, it is generally at the cost of physical interpretation.Understanding the learning dynamics and the statistical features of the resulting images remains a challenge for the community [7,8].
Before the advent of machine learning algorithms, tasks such as compression [9,10], denoising [11] or edge detection were (and in some cases still are) performed using signal processing methods.Among the classical approaches, the first kind is based on specific measures, such as the widely used Peak Signal-to-Noise Ratio (PSNR) [12], that are built upon common signal processing metrics (Euclidian distance, power spectrum, etc.).The second family uses vision based experiments to construct semi-empirical measures of similarities, such as the Structural Similarity Index (SSI) [13].In both cases the approach is fully deterministic, which means that stochastic properties like roughness, stationarity, or local correlations are ignored.
In the context of statistical physics, the problem of high dimensional data inference has recently been addressed using a novel, fully agnostic, approach.Developed to measure specific properties of finite size samples [14], the approach consists in assessing the influence of a prescribed compression procedure over simple entropy measures.Applications in biological inference [1], finance [15], language models [14] or optimal machine learning [16,17] have already shown exciting results.In this paper, we adapt the latter formalism to image analysis and image processing, focusing specifically on the case of natural images.Natural scenes or landscapes have long been studied as they display distinguishable statistical features such as scale invariance [18][19][20], non-Gaussianity [21], or patch criticality [22].
The outline of the paper is as follows.In Section I, we introduce the Resolution/Relevance formalism using an illustrative example, and adapt it to the purpose of image analysis.In Section II, we analyse a class of parameterizable images, that is random 1/f α Gaussian fields, and introduce the Multiscale Relevance (MSR).In Section III, we extend the analysis to natural images and their gradient magnitudes.We discuss meaningful statistical similarities with the synthetic Gaussian fields.In Section IV, we show how the MSR approach can be used in the context of common image processing tasks.

I. THE RESOLUTION/RELEVANCE FRAMEWORK
Here we present the information-theoretic framework that was recently built by one us [14] for agnostic analysis of high-dimensional data samples and their behaviour under compression procedures.Relevant metrics are derived from simple statistics of the compressed samples.

A. Tradeoff between precision and interpretability
Let us consider the problem of binning, namely clustering samples of a random variable X into groups characterized by a similar value of X.If the sampled data points S = {x 1 , . . ., x N } all take different states (e.g. when the distribution of X is continuous) the empirical distribution is a Dirac comb.In order to gain insight into the sampled variable, one can visualize the data by using histograms with well chosen bins/boxes.Indeed, this procedure enforces the emergence of structure by reducing data resolution through compression, allowing for more interpretability.One can then make assumptions on the underlying process and find the optimal parameters to best describe the data.
We illustrate this intuition by sampling N = 100 realizations of a Gaussian variable X ∼ N (0, 1) in Fig. 1.The data are binned into n identical boxes, for three different values of n = 5, 23 and 400.We also define the bin width as a compression parameter transforming the original sample S into a compressed sample S .The compression step consists in replacing each data point by its corresponding histogram bar index.Figure 1(a1) (large ) displays a situation of oversampling.With only five bins a considerable amount of data resolution is lost.On the contrary, Fig. 1(a3) (small ) corresponds to an undersampling regime, with very narrow bins (mostly containing only one data point) and a resulting distribution close to a Dirac comb. Figure 1(a2) (intermediate ) appears as a reasonable compromise in which the histogram is visually close to the generator density, indicating we might be close to the optimal level of data compression.From the latter observation, one is tempted to go for a Gaussian model, with suitable estimators for the mean and variance.However such decision solely relies on a specific compression level, and thus does not make full use of the sample at play.The formalism that we introduce in the next section provides a principled framework to connect the choice of the compression level with an optimality criterion that is agnostic to the nature of the generative model from which the data is sampled.

B. Resolution and Relevance
Previous work from Marsili et al. [15] addressed the issue of the overampling/undersampling transition by introducing observables that allow one to monitor changes in a reduced sample S = {s 1 , . . ., s N } obtained by compressing S with a parameter .First, let us consider k s the number of data points of identical state s and m k the number of states appearing k times in S .It follows that s k s = k km k = N .For example, in the compressed sample displayed in Fig. 1 The Resolution is the entropy of the empirical distribution {p s = k s /N } s and describes the average amount of bits needed to code a state probability in S .The compression clusters data points together hence reducing the average coding cost.The Resolution is maximal for raw data and monotonically decreases with , until it reaches the minimally entropic fully compressed sample.The Relevance is the entropy of the distribution {q k = km k /N } k , that is the probability that a data point sampled from S appears k times in the sample.This is a compressed version of p s , where identical frequency states are clustered, dropping their label s in the process.Knowing q k is then sufficient to build a histogram without labels, and is equivalent to assuming indistinguishability of states sampled the same number of times.Sorting them in decreasing frequency values would yield the famous Zipf plot.In the end, the Relevance encodes the height of each bar and is maximal when {km k /N } k is uniformly distributed, leading to m k ∝ k −1 .We reported in Tab.I the typical sampling situations and their corresponding value in Resolution/Relevance.
Coming back to the Gaussian sampling example, Fig. 1 a3) respectively correspond to oversampling and undersampling.Let us emphasize at this point, that, despite the visual impression in this specific example, the sample (a2) does not necessarily minimize the distance between the underlying and empirical distributions.Interestingly, the Resolution/Relevance properties are only dependent on the raw sample S and the compression parameter , making the overall approach agnostic to the generating process.What is most interesting is thus the way in which the sample evolves with compression, while transitioning from undersampling to oversampling.As a result, one must choose a compression procedure that allows to crossover between these two regimes.

C. Application to images
Images are usually described as fields h(r) where r ∈ {1, . . ., N X }×{1, . . ., N Y }.This is equivalent to a sample made of S = {(r, h(r)} of size N = N X N Y , describing the position and color of each pixel.Naturally, S lies in the full undersampling regime as each data point is unique.
To compress grayscale images, we therefore propose a simple procedure consisting in two steps: (i) segmentation, and (ii) spatial compression, as illustrated in Fig. 2. Segmentation means grayscale levels are transformed into black and white pixels using a threshold level a (fraction of black pixels), leading to the binary image h a (r) (Fig. 2(b)).This lowers the amount of possible color states in the sample, a necessary condition to reach the full oversampling regime.Note that one can reconstruct the original image by averaging over all segmentations.This step is generalisable for colors, for example by using a triplet (a R , a G , a B ) in the RGB space.The second step consists in the compression of pixel positions (Fig. 2(c)).One replaces each coordinate r by the index r of its position on a grid of stepsize .One ends up with a compressed sample: Each pixel value is then replaced by its average in the reduced grid (Fig. 2(d)).Finally, k (r ,0) and k (r ,255) would be defined as the number of black and white pixels in cell r , and m k as the number of cells with k black or white pixels at scale .Using Eqs (1), one can compute the values of Ĥ [s] and Ĥ [k] that will be used in the sequel.One can make a direct analogy between this compression procedure and image processing architectures such as Convolution Neural Networks (CNN) [2].First, their constitutive layers usually combine a spatial compression procedure, that is a first linear convolution, with a trainable or prescribed layer.Then, a segmentation step is performed using a nonlinear transformation on pixel values called activation function.In a similar fashion, our procedure consists in a one layer network, taking S as input and giving S a .Interestingly, we do not need to specify a particular convolution matrix as an input to the algorithm, but only a size parameter, by that making our approach more agnostic.Ultimately, note that any compression procedure allowing the undersampling/oversampling transition could have been selected.For example, one could use Discrete Fourier or Wavelet coefficients, classically used in JPEG compression algorithms [9,10].Another approach would consist in using intermediate representations of trained or untrained networks with binary activation functions (perceptron-like) and tunable layer size, as in the Resolution/Relevance trade-offs of deep neural architectures [16].

II. RELEVANCE OF RANDOM TEXTURES
In this section we illustrate the use of the metrics ( Ĥ [s], Ĥ [k]) on a simple yet widely encountered class of processes: two-dimensional 1/f α random Gaussian fields.We first recall the properties of such fields and then study the influence of α on Resolution and Relevance.
A. On 1/f α Gaussian fields 1/f α Gaussian fields consist in the linear filtering of an initially uncorrelated 2D white noise (see Appendix.A).The latter presents a flat Fourier spectrum that is then multiplied by 1/f α , therefore leading to a power spectrum scaling as 1/f 2α .This leads to the forcing of spatial correlations in the direct space.Such power law filter introduces scaling properties that are usually described by the roughness Hurst exponent H := α − d/2 where d is the field dimension (here d = 2).Depending on the sign of H, one can recover two types of processes.When H < 0 the random field is stationary, that is with fixed mean and correlations C(δr) ∝ δr 2H at lag distance δr.The specific case H = −d/2 corresponds to an unmodified spectrum (white noise).When H > 0, the process is no longer stationary but possesses stationary increments with scaling [h(r + δr) − h(r)] 2 ∝ δr 2H .We generate three samples of distinct roughness values H ∈ {−1/2, 0, 1/2}, shown in Fig. 3 increases.Figure 3(d) shows the azimuthally averaged power spectrum S(f ) = | h(f, θ)| 2 θ allowing to check that the generating method is robust as the expected scaling behavior and exponents are recovered.

B. Multiscale Relevance of random textures
We now perform the segmentation described above on the fields presented in Fig. 3.The resulting textures for threshold value a = 0.5 are displayed in Fig. 4(a)-(c) and the corresponding Resolution/Relevance curves ( Ĥ [s], Ĥ [k]) ∈{1,...,N } are plotted in Fig. 4(d).
One can see that while the patterns remain quasiidentical for H = −0.5 (Fig. 4(a)) and H = 0 (Fig. 4(b)), this is not the case for H = 0.5 (Fig. 4(c)) where large areas of uniform tint are created by the segmentation procedure.This is due to the presence of stronger spatial correlations, inducing more persistence of patterns and less fluctuations around the average.Further, one can see that the H = 0 texture displays interesting visual features at all scales, as reported in visual quality assessment experiments [23], while they appear limited to small scales for H = −0.5.It is not straightforward to connect these observations with the Relevance curves in Fig. 4(d), as the relative Relevance varies with Resolution.It thus seems more natural to consider the Relevance across all levels of compression.To do so, we introduce a measure that quantifies the overall robustness of a sample to compression called Multiscale Relevance (MSR) and defined as: which is non other than the area under the Resolution/Relevance curve.This measure was introduced in [1] as an order parameter characterizing neuronal activity time series, and was successful at distinguishing useful information from ambient noise, as expected from a complexity measure [24].Note that while several measures of complexity based on multi-scale entropy contributions have already been introduced in the literature [25,26], the MSR differs in that the contribution of each scale is naturally weighted by the Resolution.Other measures generally give identical weights to each compression level.For the images in Fig. 4, one obtains MSR(H = 0.5) < MSR(H = −0.5)< MSR(H = 0).This is consistent with our previous visual impression that the texture in Fig. 3(b) seems to contain more information at different scales.

C. Most relevant segmentation(s)
One naturally expects the segmentation threshold a to influence the Relevance.Indeed, at given H < 0, most relevant representations do not seem to correspond to a = 0.5.This is confirmed in Fig. 5(a) where the Relevance curve for H = −0.8 is higher for a = 0.66 than a = 0.5.Figure 5(b) displays the MSR as function of a for three values of H.For H = −0.8(dashed curve) one observes two symmetric maxima at a c = 0.5±.13,consistent with Fig. 5(a).Interestingly, breaking the symmetry in the distribution of pixels by choosing a "background canvas" leads to more interesting samples in terms of Resolution/Relevance.As one can see in Fig. 5(c), there is a bifurcation at H ≈ 0 below which two maxima of MSR coexist.The obtained values of a c for H < −1/2 fall close to the classic percolation threshold a * = 0.59 on the 2D square lattice [27].Indeed, our segmented images are equivalent to samples of the correlated percolation site problem.In particular, Prakash et al. [28] observed, as we do here, that when H → 0 from below both maxima continuously meet at a c = 0.5 while flattening the MSR(a) curve around such value (see Fig. 5(b)).At this critical point, the information content of images becomes less sensitive to the segmentation process.
When H 0, MSR(a) displays one unique maximum at a c = 0.5.However, as H increases further, so does the range of correlations, leading to finite-size effects.The resulting a c becomes very noise dependent as different samples lead to different critical thresholds.Interestingly, such behavior was also reported in the percolation of 2D Fractional Brownian Motion [29].

III. RELEVANCE OF NATURAL IMAGES
We now focus on natural images, namely pictures of natural scenes and landscapes.These have long been studied in the literature [18][19][20][21]30], as they display robust statistical features, such as scale invariance and criticality.
A. On the grayscale field h(r) Figure 6(a) shows the photograph from Tkacik et al. [31] in the Okavango Delta in Botswana, described as a "[...] tropical savanna habitat similar to where the human eye is thought to have evolved ".The image is subdivided into fifteen patches of size 512 × 512 pixels.One can observe a wide variety of patterns, ranging from uniform shades of light gray in the sky to strong discontinuities with tree branches and noisy vegetation textures.
A power spectrum analysis for all patches is shown in Fig. 6(b).The shape in the high frequency limit is due to camera calibration, optical blurring, or post-processing procedures, which are independent of the patch content.At low frequency we observe a decaying power law with exponent −2.0 ± 0.1.Note that, although there are small fluctuations that may be related to patch features [30], the power spectrum analysis seems rather unable to cap- ture the visual heterogeneity from one patch to another mentioned above.
This being said, S(f ) ∼ 1/f 2 translates to H = 0.0 ± 0.1 in terms of roughness exponents, which is precisely the range in which the MSR displayed critical and nontrivial behaviour for random textures in Sec.II.We thus expect that the MSR approach may allow for a finer characterization of each patch.Another issue with classical spectral analysis is that the power spectrum of the image is expected to be extremely sensitive to non-linear transformations of its color histogram, even monotonous, that keep the visuals identical.With the MSR method, there is no such issue as the segmentation parameter a defines the proportion of black and white pixels, regardless of the shape of the color histogram.
Figure 6(c) shows the MSR curves for all patches.First observation is that the range of MSR values is similar in magnitude to that of H ≈ 0 textures in Sec.II.Then, one clearly sees significant differences between the MSRs of each patch.Patches containing mainly bushy textures with no abrupt changes in patterns display a unique maximum in the MSR(a) curve.Note that the singularities that appear in some cases are due to specific colors being disproportionate in the histogram (uniform sky).Patches containing heterogeneous shades, or physical objects of different sizes combining tree truncs, branches and bush (e.g.bottom left in 6(a)) tend to display two maxima, similarly to H < 0 (see Sec. II). Figure 7 focuses on the bottom-left patch of Fig. 6(a).This sub-image seems to display two distinct dominant color levels.Such levels actually correspond to the maxima of the MSR curve in Fig. 7(b).This is visually confirmed from the segmentations 7(c) and 7(d) which capture best the fluctuations at the top and bottom of the image respectively.We emphasize that the latter representations together constitute the most informative segmentations of (a).Superimposing them (Fig. 7(e)) indeed leads to a good approximation of the original image with only three color levels {0,127,255}.The MSR method thus seems to account well for the diversity of content of natural images, inaccessible through classical power spectrum analysis.

B. On the gradient magnitude |∇h|
To understand further the architecture of natural images, we now focus on the gradient magnitude field intended to capture strong spatial irregularities such as contours or borders.In addition, taking the gradient has the advantage of stationarizing the initial field.The gradient analysis is a fundamental block of various image processing procedures, from classic edge detection [32], to supervised [33] or unsupervised [2] classification architectures in machine learning.From a more perception-based psychophysical perspective, it has been shown that essential information such as orientations, geometries and positions could be directly inferred from the visual assessment of the gradient field [34][35][36].We compute the gradients |∇h| from wavelet convolutions.This method is now extensively used as shows excellent robustness for signal processing tasks [37][38][39][40][41]. On has: where ψ j := (ψ j,x , ψ j,y ) is a wavelet gradient filter of characteristic dyadic size 2 j .This wavelet consists in mixing gradient and Gaussian windows, the latter being of standard deviation σ j = 2 j pixels.The procedure with j = 0 yields the image in Fig. 8(a).As expected, one obtains a strong signal (bright shades) for fluctuating textures of vegetation or sharp contours like branches, and low values (dark shades) for smooth and uniform regions like the sky.
We then conduct the MSR analysis on these new patches (Fig. 8(b)), and observe that most patches give flat MSR curves.This is tantamount to the critical H ≈ 0 case with logarithmic correlations described in Section II (see Fig. 5).One may indeed think of natural images as a patchwork of objects of various sizes; such superposition of patterns is reminiscent of additive cascades processes [42] that also display logarithmic correlations.
We now explore the effect of changing the wavelet size (see Fig. 9).We chose the top middle patch in Fig. 8(a) as it contains large objects and small scale details.As one can see in Figs.9(c) and (d), increasing j has the effect of coarse-graining small fluctuations to only leave larger ones.This translates into smaller Relevance at low compression, which in turn reduces the overall MSR (Fig. 9(b)).Finally, the segmented gradient fields at critical threshold values (Figs.9(e) and (f)) remain visually close to initial fields (Figs.9(c) and (d)).This is indeed expected as gradient magnitudes already show a large proportion of black and white pixels at the contours of physical objects.

IV. APPLICATION TO IMAGE PROCESSING
Here we illustrate the potential of MSR in the context of common digital image processing tasks, namely color mapping and denoising.

A. Color mapping
Consider the color mapping problem consisting in projecting pixel values onto a reduced palette.For the sake of simplicity, let us consider the case of an initial grayscale palette projected on binary values {0, 255} (B&W).We implement a stochastic mapping procedure using the Boltzmann distribution P(c|h ij ) ∝ e −(hij −c) 2 /T , where h ij is the original color of pixel with coordinates ij, c ∈ {0, 255} the color in the reduced palette [43], and T a temperature parameter, see [44].T = 0 corresponds to the choice of the closest color in the reduced palette, while T → ∞ leads to uniform noise.
Optimizing the procedure consists in calibrating T to maximize some advanced similarity measure between the original and reduced images, in the hope that it will capture more interesting properties than a simple pixel-pixel Euclidian distance minimization.Here we propose an alternative approach consisting in maximizing an information measure, the MSR, and compare it to classical metrics, namely the Peak Signal-to-Noise Ratio (PSNR) [12] and the Structural Similarity Index (SSI) [13].PSNR is directly related to the Mean Squarred Error (MSE) between original and mapped images through PSNR = 10 log 10 ∆ 2 /MSE where ∆ is the range of the signal, that is 255 for typical grayscale encoding.SSI is based on the comparison of patches between two images and takes into account properties such as luminance and contrast.Both are widely used in the digital image processing community.Figure 10(a) displays the original patch extracted from Fig. 6(a).Figure 10(b) shows the evolution of each metrics with temperature T .One sees that the PSNR between the original and mapped images is maximized at T = 0.This is not surprising as the PSNR is monotonously related to the MSE by definition.The corresponding mapping in Fig. 10(c) appears too sharp and contrasted, clearly separating vegetation from sky while introducing thresholding artifacts.Optimization of the SSI yields a non-zero yet small temperature T = 0.1, barely improving the resulting image (see Fig. 10(d)).We then compute the MSR for both direct and gradient fields.The maximization of MSR(T ) leads to the image shown in Fig. 10(e), which contains more faithful visual features and a decent similarity to the original image at large scales, at the cost of artificial small scale features.Finally, the maximization of the gradient magnitude MSR [45], shown in Fig. 10(f), seems like a good compromise between (c),(d) and (e) as it also displays medium scale features (tree trunk details) without blurring finer ones (small branches).
Hence, for strong color reduction, a Multiscale Relevance approach can bring better visuals than classical metrics such as the Structural Similarity Index which, in addition, requires an a priori semantic knowledge of the original image.Note that the analysis could be extended to more elaborate color mapping procedure such as error diffusion [46,47] or Monte-Carlo based algorithms [48].

B. Denoising with Rudin-Osher-Fatemi algorithm
We now focus on a denoising procedure which consists in correcting unwanted noise caused by signal processing or camera artefacts.To tackle this problem, a classic algorithm is the Rudin-Osher-Fatemi (ROF) [11] which minimizes the following functional: where h is the original noisy image, f the target denoised image and λ a regularization/penalty term preventing gradient explosion and allowing for smooth solutions.The free parameter λ is generally chosen by the operator through visual assessment.
Here we propose to calibrate such a model using again the PSNR, SSI and MSR ∇ metrics.We consider the image in Fig. 11(a) obtained by adding a Gaussian white noise to the patch in Fig. 6(a).We intentionally choose a high noise value to make the denoising procedure difficult, such that some details from the original image may never be recovered.Our goal is to seek the optimal λ * leading to the best visual.The scores obtained for each method as function of λ are displayed in Fig. 11(b).Optimally denoised images using PSNR, SSI and MSR ∇ are shown in Fig. 11(c), (d) and (e) respectively.With PSNR, one is left with a rather high level of noise, while details on the trunk surface or in the branches are conserved.In contrast, SSI removes a significant part of the noise, but at the cost of blurring small scale details.Although less obvious than for the color mapping procedure, optimal denoising with MSR ∇ seems like a good compromise between a too noisy PSNR image and an overly smoothed SSI image.

V. CONCLUSION
Let us summarize what we have achieved.We first introduced the Resolution/Relevance framework through a simple illustrative example.We showed how such formalism can be applied to image analysis.With the aim of investigating the framework in a controlled environment, we started by studying random textures.We then defined the Multiscale Relevance (MSR) which measures the entropy contribution at all compression scales, and obtained statistical features reminiscent of the correlated percolation problem.In particular, we highlighted the existence of a critical roughness parameter H c ≈ 0, corre-sponding to logarithmic correlations, and discussed optimal segmentation.We then extended the analysis to natural images and drew a successful comparison with random textures; we observed strong similarities with critical random Gaussian fields.Looking at gradient magnitude fields revealed an even stronger similarity to roughness criticality.Finally, we confronted the MSR procedure to classical signal processing measures in the context of simple image processing tasks: color mapping and denoising.We obtained interesting results thereby demonstrating the potential of the agnostic MSR approach for image processing.
This last section would benefit from an extension to more elaborate image processing techniques, beyond the scope of the present paper.Future research should also focus on analytically tractable developments of Relevance and Resolution in simple cases, e.g.Gaussian white noise with well chosen cascading processes.Also note that we considered a straightforward compression procedure on the direct space but equivalent representations, for example Discrete Cosine [9] or Wavelet harmonics [10], could be used to define the reduced sample S. Finally, we have seen that the MSR is able to capture the most relevant segmentation values, which may be used as a pre-processing method for learning frameworks.

7 IV
1/f α Gaussian fields 4 B. Multiscale Relevance of random textures 4 C. Most relevant segmentation(s) 5 III.Relevance of natural images 6 A. On the grayscale field h(r) 6 B. On the gradient magnitude |∇h|

FIG. 1 .
FIG. 1. Relevance analysis of a Gaussian distribution sample (N = 100).(a) Influence of the number of bins n on the normalized histogram (black bars), for (a1) n = 5, (a2) n = 23 and (a3) n = 400.The red curve corresponds to the underlying distribution.The bottom markers (+) represent the initial sample data points with color indicating local data density.(b) Resolution/Relevance curve.

FIG. 2 .
FIG. 2. Illustration of the segmentation/compression procedure on a classic benchmark image.(a) Original Image.(b) Thresholded image at a given quantile value a. (c) Thresholded image with reduced grid.(d) Reduced sample where each grid cell is replaced by the average pixel color.

FIG. 5 .
FIG. 5. Influence of the segmentation value a.(a) Relevance curves for H = −0.8 for two values of a.(b) MSR as function of a for H = −0.8(black dashed line), H = −0.1 (red dashed dotted line) and H = 0.5 (black dotted line).(c) Density plot MSR(H, a).The maxima are signified with black markers.
FIG. 6.(a) Natural grayscale image from [31], segmented in patches of size 512× 512.(b) Power spectrum for each patch.Dotted line is a decaying power law with exponent −2.(c) MSR as function of a for each patch.

TABLE I .
Typical sampling situations.