A novel method for correcting scanline-observational bias of discontinuity orientation

Scanline observation is known to introduce an angular bias into the probability distribution of orientation in three-dimensional space. In this paper, numerical solutions expressing the functional relationship between the scanline-observational distribution (in one-dimensional space) and the inherent distribution (in three-dimensional space) are derived using probability theory and calculus under the independence hypothesis of dip direction and dip angle. Based on these solutions, a novel method for obtaining the inherent distribution (also for correcting the bias) is proposed, an approach which includes two procedures: 1) Correcting the cumulative probabilities of orientation according to the solutions, and 2) Determining the distribution of the corrected orientations using approximation methods such as the one-sample Kolmogorov-Smirnov test. The inherent distribution corrected by the proposed method can be used for discrete fracture network (DFN) modelling, which is applied to such areas as rockmass stability evaluation, rockmass permeability analysis, rockmass quality calculation and other related fields. To maximize the correction capacity of the proposed method, the observed sample size is suggested through effectiveness tests for different distribution types, dispersions and sample sizes. The performance of the proposed method and the comparison of its correction capacity with existing methods are illustrated with two case studies.

Rockmass is a discrete medium composed of rock material and discontinuities including faults, fractures, joints, veins, bedding planes, cleavage planes, and schistosity planes, among others. Such discontinuities dominate the kinematical and mechanical behaviour of engineering rockmass [1][2][3][4][5] , with the analysis of this behaviour extending to various applications in such areas as rockmass stability evaluation, rockmass permeability analysis, rockmass quality calculation and other related fields, using three-dimensional rockmass models frequently generated via discrete fracture network (DFN) modelling with input geometrical variables including orientation [6][7][8][9][10][11][12][13][14][15] . These orientations are primarily measured on fresh rock exposures, with previous studies reporting various representative techniques [16][17][18][19] . More recently, a circular window sampling has been used to measure geometrical features 20 , and a careful window mapping was used for geometric fracture measurement at a surface outcrop in Kilve on the southern margin of the Bristol Channel Basin 21 , while a Lidar scanning technology was applied to determine the discontinuity orientation along a highway in Canada 22,23 .
However, some studies have used a line sampling technique, the most common being scanline sampling or borehole sampling. For example, scanlines were fixed on the rock faces of the San Manual copper mine in Arizona, USA, to collect orientations 24 . Boreholes were also used to obtain the orientation sample of a trending anticline at an anonymous field 25 . Both a single scanline and a multiple scanline method were used at Big Quarry site in northeast Wisconsin on the Door Peninsular between Lake Michigan and Green Bay 26 . These studies have shown that line sampling introduces a bias because a scanline preferably samples those discontinuities having large intersection angles, meaning the sampled probability increases with the intersection angle. To illustrate this bias, Supplementary Fig. S1(a) arbitrarily supposes two sets of discontinuities, where (1) the intensities of the two discontinuity sets are equal and (2) the intersection angle between the scanline and the Discontinuity Set 1 is smaller than that between the scanline and Set 2, that is, θ 1 < θ 2 . Consequently, fewer discontinuities in Set 1 are sampled by the scanline than for Set 2. In the extreme case, none of Set 1 is sampled if θ 1 = 0°, and consequently

Results
The effect of sample size on correction capacity. When preparing a discontinuity survey, it is important to know the sample size required for maximizing the capacity of the bias correction, referred to as the optimal sample size. To analyse the effect of sample size on the capacity and to select the optimal sample size for the proposed method, we compared the accuracies of the correction results of 84 artificial datasets from 4 distribution types, 3 dispersions and 7 sample sizes.
The accuracy is expressed by the root square error between the true and the corrected probability densities. For dip direction this value is represented by c t 2 min max where p c (α) is the corrected probability density function of the dip direction, p t (α) the true probability density function of the dip direction, α min the lower limit of the definition domain, α max the upper limit of the definition domain, and δ(α) the root square error between the true and the corrected probability densities of the dip direction. For dip angle the root square error is calculated using the formula below, e t 2 min max where p c (β) is the corrected probability density function of the dip angle, p t (β) the true probability density function of the dip angle, β min the lower limit of the definition domain, β max the upper limit of the definition domain, and δ(β) the root square error between the true and the corrected probability densities of the dip angle. The lower the root square errors δ(α) and δ(β), the more accurate the correction is. The investigation method is shown in the Methods Section. The result ( Fig. 1 (15,75) and U (140, 220)/U (5, 85), the error curves are approximately horizontal at 2.9 × 10 −3 /2.0 × 10 −3 and 2.1 × 10 −3 /3.2 × 10 −3 . These results suggest that the change in sample size has little effect on the correction capacity. Furthermore, these errors are close to 0, suggesting that the proposed method is quite effective for uniformly distributed orientations. (d) Exponential distribution: The error curves of Exp (180) dip direction/Exp (45) dip angle reach up to more than 1.1 × 10 −2 /2.9 × 10 −2 . In particular, the error is more than 2.1 × 10 −2 and 3.6 × 10 −2 when the sample size is between 300 and 1000. Similarly, in the cases of Exp (185)/Exp (50) and Exp (190)/Exp (55), the errors are fairly large, more than 0.0177/0.0134 and 0.0111/0.0116. This shows the proposed method is unsuitable for exponentially distributed orientations.
The actual distribution type and dispersion of orientations of any of the samples are not known until the correction is calculated. Thus, taking into account these findings, the optimal sample size, if possible, is 150 for the proposed correction method. However, this optimal sample size is applicable for each discontinuity set rather than for the entire sample of multiple discontinuity sets, and it may vary depending on the correction method. In other words, the optimal sample size may be a different value or may not exist for various correction methods.
Case I. This case of natural discontinuities is used to compare the correction capacities of the proposed and the existing methods. The existing methods for correcting orientation bias include the Terzaghi method 17,18,27-29 and its modified versions 31,32 . The Terzaghi method obtains the corrected frequency by weighting the observed frequencies using the bias-compensatory factor: where w is the bias-compensatory factor, θ the intersection angle between the scanline and the discontinuity defined at each cell centre. A modification of this method, the Fouché method, adds the discontinuity sample size to the original Terzaghi equation to improve the correction capacity 32 : where n is the sample size of observed discontinuities and K denotes the largest integer less than or equal to K. It was found that the original Terzaghi method is more applicable to the case n → ∞ while this modified method is applicable to an arbitrary value of n. The comparison conducted here is limited to the Fouché method.
Case I involves joints. The background and test method for Case I are shown in the Methods Section. The test returned two-tailed significances corresponding to the Fouché method of 0.739 for the dip direction and 0.782 for the dip angle, and for the proposed method of 0.913 for the dip direction and 0.918 for the dip angle. The significances for the proposed method are higher than that for the Fouché method, indicating that the proposed method performs more effectively than the Fouché method.
Case II. To support the comparison result above, a case with natural discontinuities, Case II, is used. In addition, this case is further used to check the optimal sample size previously determined from the artificial data. Case II involves bedding planes. The background and test method for this case are shown in the Methods Section. The result ( Fig. 2) indicates that for the proposed method, the highest correction capacity is achieved at a sample size of approximately 150; when the sample size is over 150, the correction capacity does not significantly improve as the sample size increases. This result is in good agreement with the optimal sample size previously determined from the artificial data. Additionally, the comparison of significances corresponding to the Fouché method and the proposed method in this figure reveals that the proposed method provides a more accurate correction result than the Fouché method.

Discussion
As suggested in the Results Section, the method proposed here is more effective than the Fouché method because of its higher correction capacity. The reason for this is that the Terzaghi method is based on the assumption that all of the discontinuities in each counting cell are parallel 33 (see Supplementary Appendix A). Because all discontinuities are not parallel, this discrepancy between the assumption and the fact leads to an error. The revised version of this method is also based on this assumption, meaning the Fouché method includes the same error. Unlike the Fouché method, the proposed method does not make this assumption and, thus, does not introduce this error.
The proposed method is derived from Supplementary Equation (11) based on an implicit assumption that the sampling tool is 0 m in diameter, an assumption fulfilled by a scanline. However, it is not clear if this method can be applied to diameters greater than 0 m, for example a borehole/well. We believe that the applicability of this proposed method in this situation could be determined if adequate fracture data from boreholes can be obtained, or more importantly, a revised method could be derived as long as the underlying Supplementary Equation (11) can be rewritten to address a diameter greater than 0 m. While this study did not include these data, future research can investigate the borehole/well case.
The proposed method is based on probability theory, which assumes that discontinuity orientations follow probability distributions, an assumption also made by numerous geologists. More specifically, many geologists have assumed or verified that the orientations of fractures investigated by the line sampling technique follow several theoretical random distribution types, such as the uniform distribution 34,35 , exponential distribution 36 , normal distribution 37-40 , Fisher distribution 41-44 , Kent distribution 45-47 , Weibull distribution 48 , and Bingham distribution [49][50][51][52][53] . In addition, engineers who have found that these theoretical distribution types do not fit the orientation sample have concluded that the data may follow empirical probability distributions not yet reported 54,55 ; some observed orientation samples may fit a lognormal distribution. Hence, the probability distribution assumption is reasonable for most observed fractures. Unlike fractures, another type of discontinuity, bedding planes, are nearly parallel. In fact, the orientation of bedding planes can be considered as a special case following a probability distribution defined within a narrow domain, for instance from 132 to 141° in dip direction and from 68 to 76° in dip angle shown here in Case I. From this perspective, such a special case of sub-parallel discontinuities, thus, belongs to a probability distribution, where the proposed method can be applied.
The distribution after correction in Case I is altered slightly relative to the raw distribution, while that in Case II is altered more significantly as seen in Fig. 3 even though the averages of the intersection angles between the scanline and the discontinuities (48.1° for Case I and 51.7° for Case II) are similar. The dispersion of the orientations of the non-parallel discontinuities (e.g. 12.1° and 10.0° standard deviations corresponding to dip directions and dip angles, respectively, of the 1,016 joints in Case II) is responsible for this observational bias. Compared with joints, sub-parallel discontinuities are less dispersed in orientation (e.g. 2.9° and 2.2° standard deviations corresponding to dip directions and angles of bedding planes, respectively, in Case I), meaning that the observational bias is smaller, even negligible at times. Consequently, the correction for bedding planes seems less significant than for joints. In particular, for extremely parallel bedding planes, the correction may not be required due to two primary reasons. The first is that, to our knowledge, the observational bias is so small that the data sufficiently approximate the distribution in three-dimensional space which is just required for DFN modelling. The second is that the improved accuracy will be very limited even using the proposed method, owing to the appropriately high approximation of the raw data in relation to the three-dimensional distribution as mentioned previously. Our next study will analyse this issue more fully.
The proposed method uses dip direction/dip angle to delineate orientation. This linear delineation is limited because when a discontinuity cluster crosses the 0° dip direction, the statistics of dip direction will break down. However, the conversions of the dip direction in Supplementary Appendix B can be used to avoid this statistical break. A similar limitation may appear with a discontinuity cluster close to the 0° dip angle, where the same conversion of the dip direction is also required before correction.
In addition to the orientation focused on in this paper, another geometric element of discontinuity, volume intensity, sometimes referred to as volume density, is important in DFN modelling. Three aspects are needed to calculate volume intensity, scanline or borehole orientation, the orientation probability distribution, and the diameter or radius probability distribution. As a prerequisite for this calculation, the orientation probability distribution corrected by the proposed method may contribute to a more accurate intensity calculation.

Conclusions
This paper presents a novel method for correcting the scanline-observational bias of discontinuity orientation. This method includes two steps that 1) correct the cumulative probabilities of orientation based on numerical solutions of orientation probability density in three-dimensional space and 2) determine the distribution of the corrected orientations using approximation methods. The numerical solutions in the first step were derived using probability theory and calculus under the distribution independence hypotheses of dip direction and dip angle. The approximation in the second step can be easily implemented using the one-sample Kolmogorov-Smirnov test. The results corrected by the proposed method, the probability distribution function of orientation in three-dimensional space, can be used as one of the parameters for discrete fracture network (DFN) modelling. As is known, DFN can be applied in various research areas, including the evaluation of rockmass stability, the analysis of rockmass permeability, the calculation of rockmass quality and other such related fields.
To maximize the correction capacity of the proposed method with few observations, the optimal sample size is determined from capacity comparisons of different sample sizes, distribution types and dispersions of orientation data. The results revealed that the highest correction capacity is achieved at an approximate sample size of 150; once it exceeds this value, the correction capacity does not significantly improve with a larger sample size, meaning the optimal sample size tends to be this value. However, this optimal sample size is only applicable for each discontinuity set, not the entire sample comprised of multiple discontinuity sets, and the determination of the optimal sample size depends on the correction method. In other words, the optimal sample size is approximately 150 for the proposed method, but may take different values or even not exist for other correction methods. This optimal sample size determined from artificial data is supported by an actual orientation sample of dacite joints exposed on a tunnel wall in Mankang, Tibet, China.
The proposed method was subsequently compared with the existing Fouché method using two observed orientation samples from both the tunnel wall and a second case involving lithic arkose beddings exposed on an outcrop in Wenchuan, Sichuan, China. The results demonstrate that the proposed method provides a more accurate corrected orientation distribution for DFN modelling than the Fouché method. The reasons for this higher correction capacity of the proposed method were discussed as well as the low correction capacity when applied to bedding planes.

Methods
The proposed method. The proposed method is based on numerical approximate solutions and includes one hypothesis and two procedures. Supplementary Appendix C presents a detailed derivation of the functional relationship between the observed distribution by scanline (in one-dimensional space) and the inherent distribution (in three-dimensional space). The derived result shows that it is difficult to obtain the exact analytic solutions of the inherent distribution because of an unsolved integral. Thus, numerical approximate solutions are derived in terms of piecewise functions (see Supplementary Equations (33) for dip direction and (34) for dip angle).
Hypothesis. As shown in Supplementary Appendix C, the numerical approximate solutions are derived under the hypothesis that the observed dip direction and dip angle are independent of each other. The proposed method uses these solutions and accordingly relies on this hypothesis. Although the dip direction and dip angle are dependent for some discontinuities like bedding planes in plunging folds 56 , there are several cases that consider them as independent variables 17,34-36,42-44,57-60 . For this reason, it is necessary to check whether the observed orientation sample meets the independence hypothesis before using the proposed method. Only if the independence test is met, can the proposed method be applied.
There are many methods for checking for independence, one of which is the Pearson's chi-square (χ 2 ) test. This method assesses whether paired observations of two variables are independent of each other as expressed in a contingency table. It returns a two-tailed significance characterizing the independence ranging from 0 to 1; the greater the value, the more significant is the independence. It is generally believed that the independence hypothesis can be accepted if the significance is above the confidence level of 0.05. Its properties were first investigated in the last century 61 and further details can be found in recent literature 62,63 .
Procedure. The procedure for the proposed method is as follows. (33) and (34), respectively. 2. Determine the distribution of the corrected orientations using approximation methods such as the one-sample Kolmogorov-Smirnov test. This nonparametric test was developed to compare a sample with a hypothesized probability distribution 64,65 . It returns a two-tailed significance quantifying the distance between the empirical distribution function of the sample and the cumulative distribution function of the hypothesized distribution. This test is usually applied to approximating the distribution of a sample with a hypothesized distribution type 66,67 . It allows for various hypothesized distribution types to be selected to obtain the fittest distribution.

Correct the cumulative probabilities of the dip direction and angle based on the numerical solutions Supplementary Equations
Investigation of the effect of sample size. This investigation focuses on the effect of sample size on the correction capacity of the proposed method. It contains the following four steps. Firstly, the true distribution of the orientations is arbitrarily hypothesized, in addition to other geometric parameters that are necessary for the modelling as listed in Table 1. This hypothesized orientation includes 4 common probability distribution types: normal, lognormal, uniform and exponential. For each distribution type, dispersion is set at 3 different values, while for each dispersion, the sample size is assigned 7 different values: 50, 100, 150, 200, 300, 500 and 1,000.
Then, entering these parameters into discrete fracture modelling as described in the previous literature [68][69][70] results in the construction of 12 models of discontinuity networks, corresponding to the 12 combinations of the 4 distribution types and the 3 dispersions. Supplementary Fig. S2 shows only the model for the uniform distribution (i.e.,   Table 1) due to space constraints.
Thirdly, the observations for each of the 12 models are conducted under 7 different sample sizes, so that a total of 84 samples of orientations of discontinuities are derived. Supplementary Fig. S3  samples from Groups 7, 28, 49 and 70 in Table 1. A Pearson's chi-square (χ 2 ) test is then executed to check the independence of the observed dip directions and dip angles, with results listed in Supplementary Table S1. As this table shows, the two-tailed significances are all above the confidence level of 0.05, indicating that the observed orientations meet the independence hypothesis, meaning the proposed method can be used. Finally, the 84 observation samples were corrected using the proposed method, with results being listed in Table 2. Then, the root square errors between the corrected and the true distributions are calculated based on Equations (1) and (2), with results being shown in Fig. 1.
Case I. The study area of this case is located near Yingxiu town in Wenchuan, Sichuan Province, China, about 1,800 m east of the epicentre of the 2008 Wenchuan earthquake. The specific roadcut is 11 m long, 5 m wide and 6 m high, and consists of Upper Triassic lithic arkose of the Xujiahe Formation. The rockmass has two primary discontinuity sets, one of which comprises the bedding planes seen in Fig. 4.
A scanline with the trend/plunge 108/15° was fixed on the outcrop to sample these bedding planes. Their orientations were measured with a geologic compass. As geologists have reported 71-75 , a single measurement by a geologic compass will introduce a large measurement error to the bedding plane orientation data. Because of this error, the data are frequently unable to represent the natural distribution. To reduce such an error, ten repeated measurements were conducted here and their average was considered as the observed orientation. Supplementary  Fig. S4(a) shows the orientations of 121 observed bedding planes. A Pearson's chi-square (χ 2 ) test was used to   Table 1. Parameters for discontinuity modelling. Some of technical terms and notations used in this table are defined as follows: volumetric intensity = number of discontinuity centres per rock volume; N(i, j 2 ) = normal distribution, where i represents the mean and j the standard derivation; Exp(k) = exponential distribution, where k represents the mean; lnN(l, m 2 ) = lognormal distribution, where l represents the location parameter and m the scale parameter; and U(n, p) = uniform distribution, where n represents the lower limit and p the upper limit.
calculate the independence of the dip direction and dip angle. The two-tailed significance obtained from this test was 0.86, above the confidence level of 0.05, indicating that the observed orientations met the independence hypothesis of the proposed method, meaning it can be used for these observed orientations. Firstly, the sampling bias of the observed orientations was corrected using both the Fouché method and the proposed method. The result corrected by the Fouché method is shown in Supplementary Fig. S4(b); the result corrected using the proposed method indicated that the dip direction follows the normal distribution N(140.9, 5.0 2 ) and the dip angle, the normal distribution N(77.4, 4.0 2 ). Moreover, other essential parameters for modelling, specifically the volumetric intensity, the diameter and the aperture, were calculated, with the results being listed in Table 3.
Secondly, using these corrected geometric parameters, two three-dimensional models of the rock were constructed (see Supplementary Fig. S5). A scanline with the same orientation as the field scanline was applied to the model outcrop, and the discontinuities intersected by this scanline were then "measured". Here, the sample size of these "measured" discontinuities was set equal to that of the observed discontinuities in the field. To   Table 2. Probability distribution of orientations corrected using the proposed method.
distinguish them from the discontinuities observed, these "measured" discontinuities generated artificially were named "modelled" discontinuities. Supplementary Fig. S6 shows these "modelled" discontinuity orientations. Finally, the distribution difference between the observed and the "modelled" orientations was tested using the two-sample Kolmogorov-Smirnov test, a nonparametric hypothesis test that evaluates the difference between the cumulative distribution functions of two sample data vectors [76][77][78][79][80] . This test returns a two-tailed significance characterizing the difference. The significance ranges from 0 to 1, with the higher the significance, the lower the difference. The result is shown in the text of the Results Section.
Case II. This case involves the Rumei Dam slope on the Lancang River, located in Mankang, Tibet, China (29° 34′ 30.0″ N, 98° 20′ 49.2″ E). The study area is primarily composed of two exposed strata: one is a light-gray, ash-black or dark-green dacite originating from the Zhuka Formation, Triassic, and the other is a fuchsia or lime-green sandstone from the Huakai Formation, Middle Jurassic. Three primary sets of discontinuities (bedding, Joint 1 and Joint 2) can be found in the rockmass. We conducted a geometrical measurement of the discontinuity orientation, spacing, aperture and trace length along a scanline with a trend/plunge of 243/4° on a tunnel wall. This paper considers only Joint 1; the orientations observed with a compass are shown in Supplementary  Fig. S7(a). To check the independence of the dip direction and dip angle, a Pearson's chi-square test was executed on the orientation sample, resulting in a significance of 0.56, above the confidence level of 0.05, suggesting that the observed orientation sample satisfies the independence hypothesis of the proposed method, meaning it can be applied to this sample.
Similar to the steps used in Case I, the orientations corrected using the Fouché method (see, for example, Supplementary Fig. S7(b) showing the corrected result from the sample of the first 1,000 observed orientations) and that using the proposed method (in terms of probability functions; see Table 4) were obtained, as well as the "modelled" orientations derived from the corrected results of these two methods (see Supplementary Fig. S7(c) and (d)). Then, the significances of 7 sample sizes (the first 50, 100, 150, 200, 300, 500 and 1,000 of 1,016 observed orientations) were tested, with the results being shown in Fig. 2.   Table 4. Probability distribution of orientations corrected using the proposed method (Case II).