Unsupervised geochemical classification and automatic 3D mapping of the Bolshetroitskoe high-grade iron ore deposit (Belgorod Region, Russia)

We stated and solved three successive problems concerning automatization of geological mapping using the case of the Bolshetroitskoe high-grade iron ore deposit in weathered crust of Banded Iron Formation (Kursk Magnetic Anomaly, Belgorod Region, Russia). (1) Selecting a classification (clustering) method of geochemical data without reference sampling, i.e., solution of an “unsupervised clustering task”. We developed 5 rock classifications based on different principles, i.e., classification by visual description, by distribution of economic component (Fe2O3), by cluster analysis of raw data and centered log-ratio transformation of the raw data, and by artificial neural network (Kohonnen’s self-organized map). (2) Non-parametric comparison of quality of the classifications and revealing the best one. (3) Automatic 3D geological mapping in accordance with the best classification. The developed approach of automatic 3D geological mapping seems to be rather simple and plausible.

In statistics, this kind of problems is known as an "unsupervised clustering task". For this task, a problem of quantitative comparison of different classifications quality is not solved in general form 9 , in contrast to a supervised learning task, where the clustering quality is estimated by cross-validation, bootstrap, etc. 10 . In this work, we suggest a particular method of quantitative comparison of classifications quality, however, we did not solve a problem of optimal quantity of clusters (i.e., rock types, in our case)-we used quantity of clusters defined by a visual geological description of drill core.
We developed the method on the Bolshetroitskoe high-grade iron ore deposit in weathered crust of banded iron formation (BIF), Belgorod Region, Russia (Figs. 1,2). In this paper, we developed several geochemical classifications of the deposit rocks based on different principles, i.e., visual description, one-dimensional statistics, multiple regression, and artificial neural networks. Then we introduced a method of quantitative comparison of classifications. Based on the best classification, we automatically built a 3D geological map of the deposit.

Geological setting
The Bolshetroitskoe high-grade iron deposit is a part of the Belgorod ore district (Fig. 1). The Belgorod district is a world's largest iron ore district. Beside the Bolshetroitskoe, there are the Yakovlevskoe, Gostishchevskoe, Shemraevskoe, Vislovskoe (measured resources), Razumenskoe, Olimpiyskoe, Melikhovo-Shebekinskoe, Olkhovatskoe (indicated and inferred resources) high-grade iron deposits in the district. All these deposits are large and unique.
High-grade iron ore of the Belgorod metallogenic district is considered to be a weathered crust of Banded Iron Formation (BIF) and, to a lesser degree, ferriferous schists of the pre-Visean age, the Carboniferous system, 346 ± 1 Ma 11 . The weathered crust rocks usually preserve structures and partly mineral composition of maternal rocks.
Main minerals of the high-grade ores are hematite (including martite-pseudomorph after magnetite, and microplaty hematite), goetite-limonite, magnetite, siderite, Fe-rich micas and hydromicas, quartz, clay minerals, boxites. When described visually, ore types are classified by proportions of these minerals, as well as by mechanical properties.
The Bolshetroitskoe high-grade iron deposit (Fig. 2) is located in the SW part of the Belgorod ore district and confined to a sharp bend of the Korochan-Mukhin regional magnetic anomaly. The Bolshetroitskoe deposit is considered to be a syncline, a part of the Belgorod regional graben-syncline. In a core of the Bolshetroitskoe syncline, there are Early Proterozoic BIF (Kursk series), and in limbs of the syncline, there are Archean rocks.  [11][12][13] . Drawn by Adobe Illustrator CS6 (https ://www.adobe .com).  14 .
The Bolshetroitskoe deposit was discovered in 1947 as a part of the Korochan-Mukhin regional magnetic anomaly. Research and development of a hydraulic borehole mining of the deposit was performed in 1988-1991. A detailed exploration was carried out in 2006-2013, and hydraulic borehole mining of the deposit took place by "Belgorodskaya GDK" (Belgorod, Russia) in 2008-2014. This innovative method allows mining loose rock from under a thick (~ 500 m) sedimentary cover 15,16 . Measured resources of the Bolshetroitskoe deposit are 410 Mt of ore and indicated resources are 2150 Mt of ore at Fe total = 62.4%.

Description of the approach
The 3D automatic mapping of ore deposits without reference sampling consists in three general tasks: (1) Selecting a classification (clustering) method of geochemical data ("unsupervised clustering task"). (2) Interpolation of the input data. (3) Joining the results of the first two tasks, i.e., applying the selected clustering method to an interpolation block model. This block model will be a 3D geological map of a deposit. The more detailed approach is applied as follows (a flowchart is shown in Fig. 3).
1. Collecting and preparing representative data on whole-rock chemistry of a deposit in 3D. 2. Finding functions of data clustering by whole-rock chemistry (i.e., finding parameters of determination of rock type). 3. Choosing the best clustering function, if the best way of clustering is unknown beforehand. 4. Interpolation of whole-rock chemistry data taking part in the clustering. Joining the interpolation models in a single block model. 5. Applying the clustering function found in step 2 to each block of the single block model built in step 3, i.e., computation of rock type. 6. Visualization the computed rock type as a set of cross-sections, a 3D body, or a grid, etc.

Results
A general characteristic of the sample set (step 1). 1029 samples with an average length of 4 m were sampled along 28 drill holes (see details in the "Materials and methods" section). In the sample set, there are both ore of different quality and host rocks, so the set is obviously heterogeneous. Descriptive statistics of the sample set are given in Table 1, and correlations are shown in Fig. 4.
Strong verifiable correlation relationships are typical of Fe 2 O 3 (with SiO 2 , Al 2 O 3 , MgO, TiO 2 ). It is clear that these relationships are negative, i.e., the richer ore, the less impurities. Beside the component of interest, Fe 2 O 3 , there is a strong positive correlation of Al 2 O 3 vs TiO 2 (r = 0.97) and FeO vs MgO (r = 0.61). Nearly all scatterplots (even highly correlated Al 2 O 3 vs TiO 2 ) have minimum two trends, which is typical of heterogeneous samples.
Distribution of Fe 2 O 3 is left-asymmetric (Q-normal), and the rest of the components have right-asymmetric distribution (lognormal or exponential).

Geochemical classification of rocks (step 2).
To define the best approach to geochemical classification of the deposit rocks, we used four different methods, plus visual ('manual') geological classification as a basis for comparison.
1. Geological classification of rock via visual description of a drill core by geologists of the "Belgorodskaya GDK" (Belgorod, Russia). They picked out 13 rock types: appreciably martite ore; banded martite ore; martite with magnetite and platy-hematite ore; appreciably platy-hematite limonitized ore; banded platy hematite limonitized ore with carbonate cement; martite with magnetite and platy-hematite ore with carbonate cement;  www.nature.com/scientificreports/ appreciably limonite with hematite and martite ore; banded limonitized ore with silicates and carbonates; martite limonitized and sideritized ore with magnetite and platy-hematite; weathered intra-ore schist and allite; weathered above-ore schist; banded iron formation (BIF); breccia. In this work, we excluded the last rock type because it has no geochemical and mineralogical sense. So, we have 12 rock types, and in other classification, we picked out the same number of rock types. 2. Classification by content of the principal economic component, Fe 2 O 3 , based on its multimodal distribution (Fig. 5). We accepted local minima in the histogram as borders between rock types. This approach to ore classification is common since it suits for economic (technological) ore classification of single-component deposits (e.g., rich, intermediate, poor ores).
3. Cluster analysis of raw chemical composition. We used five rock-forming components: Fe 2 O 3 , FeO, SiO 2 , Al 2 O 3 , CaO. Method: k-means clustering. Initial cluster centers were taken by choosing observations to maximize initial between-cluster distances. Cluster: cases (rows). Number of clusters was 12, solution was obtained after three iterations. Missing data were casewise deleted.
4. Cluster analysis of chemical composition with centered log-ratio transformation of the raw data to avoid spurious correlation in closure numerical systems 18,19 . We used five rock-forming components: Fe 2 O 3 , FeO, SiO 2 , Al 2 O 3 , CaO. Method: k-means clustering. Initial cluster centers were taken by choosing observations to maximize initial between-cluster distances. k-means clustering. Initial cluster centers were taken by choosing observations to maximize initial between-cluster distances. Number of clusters was 12, solution was obtained after six iterations. Missing data were casewise deleted. 5 Comparison of the classifications (step 3). As we obtained five classifications of the same object, and these classifications are based on different principles, a problem to choose the best classification raised.
Method of comparison of approximation and interpolation is known in statistics, e.g., cross-validation 20 and bootstrap 10 . However, approximation and interpolation problem differs from clustering (classification) problem. www.nature.com/scientificreports/ It has a reference sample set, and measure of fitting quality is based on comparison of the reference sample set and approximation/interpolation model. Sometimes there is a reference set for classification problems, it is known as a supervised learning task, see review in 21 . However, we did not have a reference set, i.e., we had an unsupervised learning task. For such type of problems, there is no general solution, and the method of quantitative comparison of classifications is usually developed for a specific problem, see review in 9 . Statement of the  Table 2). www.nature.com/scientificreports/ problem in a general form and an approach to its solution were introduced in 22 . Our case was simpler because we did not take into account the problem of selection of cluster number. We took 12 rock types since this number of rock types (excluding the 'breccia' type) is used for the Bolshetroitskoe deposit, and the approximately same number of rock types is used for other high-grade iron ore deposits of the region.
In accordance with the approach suggested by Dy and Brodley 22 , we supposed that a sum of 'inhomogeneity' of all classes (from 1 to 12) of the classification m could be a measure of negative quality Q m . I.e., the less sum of 'inhomogeneity' of all classes, the better a classification under the condition that a number of classes is equal in all compared classifications. Thus, the optimal classification has a minimal sum of 'inhomogeneity' Q * : This approach is in agreement with a definition of clustering as grouping of similar objects 23 . We used the standard deviation σ as a measure of inhomogeneity.
A flowsheet for comparing the quality of classifications (in our case) is as follows. A result of application of the flowsheet to the sample set of the Bolshetroitskoe deposit is shown in Table 3. The result shows that the classification by neural network is the best. The neural network is available online in Supplementary Materials 1. So, this classification became a basis for a 3D automatic geological mapping of the Bolshetroitskoe deposit. A chemical composition of the rock types picked out by this classification is shown in Table 4.

Interpolation, rock type evaluation, and visualization of the 3D geological model (steps 4-6).
Application of the automatic 3D geological modelling method to the Bolshetroitskoe deposit (steps 4-6 in accordance with the flowchart, Fig. 3) is described below.
Step 4.1. Interpolation of the determinative components: Fe 2 O 3 , FeO, Al 2 O 3 , SiO 2, and CaO. The interpolation was conducted using the anisotropic inverse distance weighted method, power = 2. Search ellipsoid was determined by variography (set of directional semivariograms) of the dominant component Fe 2 O 3 , taking into account distances between boreholes. Interpolation of each component was carried out in three runs with a successive increase in the search radius (130, 270, and 560 m) and a decrease in the threshold number of points falling into the search ellipsoid (4, 3, and 1). A number of sectors of the search ellipsoid is 4, a maximum number of points in a sector is 5. Parameters of the search ellipsoid: azimuth of the 1st axis is 90°, dip is 0°, factor is 1; azimuth of the 2nd axis is 180°, dip is 0°, factor is 1; azimuth of the 3rd axis is 0°, dip is 90°, factor is 0.1. Sections of the interpolation block models of the five components across the line I-I' (Fig. 2) are shown in Fig. 6.
Step 4.2. Conjugation of the interpolation block models into one table. As a result, we have a block model with value of each chemical component for each block.
Step 5. Evaluation of a rock type for each block using the previously created classification [see the sections "Geochemical classification of rocks (step 2)" and "Comparison of the classifications (step 3)"], i.e., clustering by artificial neural network (Kohonnen's self-organized map), the program code of which is presented in the Supplementary Material file.
Step 6. Visualization of the block model. The final result is shown in Fig. 7.

Discussion
Model of the Bolshetroitskoe deposit. We got the 3D geological model of the deposit without human decisions. It is based only on structure of spatial variation of rock-forming components (Fe 2 O 3 , FeO, Al 2 O 3 , SiO 2 , CaO). The spatial variation cannot be utilized during manual drawing of geological model or cross-section of a deposit. Plus, geologists usually have a priori model of genesis and a structure of a deposit that can influence on a geological model. For example, there are two manually built cross-sections of the Bolshetroitskoe deposit ( Fig. 8) 15,24 . We can see that the authors of the first cross-section had a conception that the regional folding formed the deposit, and the authors of the second one supposed subhorizontal bedding with tectonically induced permeability, which resulted in a thick zone of the high-grade ore. An automatic mapping approach

Scientific Reports
| (2020) 10:17861 | https://doi.org/10.1038/s41598-020-74505-y www.nature.com/scientificreports/ is data-driven and free of any a priori conception. These two circumstances (basing on spatial variation and absence of a priori conception) forced us to suggest that the automatic approach can be more precise than the manual one. A general agreement of our automatic model and the most recent manual cross-section (Fig. 8b) supports this suggestion. However, the automatically built geological model has some constrains or shortcomings.
-Some rock types are named as formation, viz. BIF, and moderately weathered BIF. Of course, they are conditional names since they should be referred to a certain formation based on textural, structural, and mineral properties. Yet, in our case, we use these names for more lucidity because it usually coincides with a visual description of drill cores.
-Most of rock types are named in accordance with their chemical composition, and these names does not have geological sense of the rock types. We suppose that it is the main shortcoming of the developed geological model. However, the shortcoming arises from quality of input data, in this case. If we obtained an exactly determined mineral composition and quantitative description of textural/structural properties throughout the drill cores, we would build a geologically interpretable model, as we did in the works 25,26 .
-Classification of some blocks seems to be wrong from the geological point of view. E.g., near the bottom of the drill hole 26p (Fig. 6), there is a block of BIF surrounded by a moderately weathered BIF above high-grade ore blocks. Table 3. A comparison of quality of five geochemical classifications (see Table 2) of rocks of the Bolshetroitskoe deposit. σ is a standard deviation of the component of the rock type. N is a number of samples in the class. σ is a standard deviation of the responding component, as in Eq. (2). www.nature.com/scientificreports/ In general, the automatically built structure of the deposit seems to be geologically correct: BIF and moderately weathered BIF are in the bottom of the cross-section; there are iron ores of different grades above maternal BIF; and, finally, iron ores are overburden by silica-and alumina-rich rocks; the highest-grade ore (rock type #1) forms the central bulge (around drill hole 2p).

Comparison of the classifications of rocks of the Bolshetroitskoe deposit. It was foreseeable that
the one-parameter classification by Fe 2 O 3 histogram became the worst one, its rate sum is twice worse than the leader's, the classification by ANN (21 vs 10, Table 3). Surprisingly, the manual classification became the worst too (rate sum is 20), and logratio transformation of raw data did not enhance the classification: rate sum of the classification by cluster analysis of raw data is 12, and logratio transformed is 12, too ( Table 3). Clustering of the data by artificial neural network became the best (rate sum is 10). It is expectable because it is known that multiparametric non-linear methods of clustering are better than k-means cluster analysis, see e.g. 27,28 .
We used standard deviation as σ in Eq. (2) since it is the most common measure that is used to quantify the degree of variation of a data value set, although other measures such as interquartile range, median absolute deviation, mean absolute difference, average absolute deviation, etc. can be used. In our case, we tested other 'measures of unhomogenity' (interquartile range and weighted standard deviation) and found that the classification by artificial neural network is still a leader. Justification of the best 'measure of unhomogenity' and its usability condition is an independent mathematical problem and is outside the scope of this work. Rigorous mathematical investigation and generalization of the approach are in our future plans.
Approach of automatic 3D geological mapping. In general, the approach consists of the following three steps: (1) interpolation of variables required for rock type determination in a single block model; (2) rock type determination for each block of the block model by a certain classification algorithm; (3) visualization. The most difficult problem is step 2. A classification algorithm depends on available data. The ideal (or simplest) case is when a directly determined mineral composition and quantified textural properties of rocks are available 25,26 , the classification algorithm will be just a logical evaluation of rock type in accordance with the commonly accepted classification (as in our aforementioned works) or any local classification. The case is more complex when a mineral composition is calculated from a chemical composition of rocks 6 , i.e., a chemistry-tomineral conversion problem 29,30 should be taken into account. The most complex case-rock classification by "plain" rock chemistry-is investigated here. In all these cases, the principal workflow is the same. It suggests that the developed approach is sufficiently universal. It seems that the most difficult problem (rock classification algorithm) can be reduced by total determination of mineral composition during a deposit exploration, e.g., by automatized mineralogical systems like QEMSCAN 29,31 . The second important source of errors of the geostatistical-based approach is interpolation. However, this field of knowledge is being actively developed now, and nearly all problems of uncertainty usually have an acceptable solution, general or special (see review e.g., in 32,33 ).
Except for a chemistry-to-mineral conversion problem, the developed approach does not require a specially developed mathematical software or code. Commonly used mining or geographical information systems (e.g., Micromine, Datamine, Mineframe, or ArcGIS, etc.) are suitable for implementing the approach.
One more feature of the approach is a requirement of quantified data (mineral and chemical composition of rock, quantified structural/textural properties, etc.), and qualitative data (e.g., visually described textural properties) cannot be used since such type of data cannot be interpolated. We suppose that it is an advantage because an interpolation-based analysis of spatial structure of a deposit is more founded than intuitively drawn manual cross-section (see the example in Fig. 8). Besides, the approach can be used for a geometallurgical modelling: in this case, technological ore types should be taken into account (determined by mineral processing or metallurgical technology) instead of common geological rock typessup 34 . Table 4. Mean chemical composition of the rock types of the Bolshetroitskoe deposit picked out by artificial neural network (Classification #5).