Soil plays an important role in the material cycles of the terrestrial ecosystem1. However, with rapid urbanization and the overexploitation of mineral resources, many heavy metals (HMs) enter the soil2,3. HMs in soil do not easily degrade, and they accumulate in the food chain, potentially causing serious harm to human health4. Many studies have shown that HMs beyond the threshold can harm humans. For example, excessive amounts of Cr, Cu, and Cd in the environment pose certain noncarcinogenic damage, and liver disease5. Meanwhile, HMs such as Pb and Cd are potential carcinogens. Cd entering the human body can lead to lung cancer, hypertension, and renal dysfunction. Pb can lead to neurological, kidney, and gastrointestinal diseases6,7. While Zn is an essential element of the human body, excessive intake of Zn can lead to loss of appetite, diarrhea, and anemia, and Hg can cause Minamata disease. Thus, the problem of soil HM pollution has received extensive attention worldwide8. HMs in the soil can enter the human body through ingestion, dermal contact, and inhalation. Therefore, to control pollution, it is important to quantitatively identify the sources of HMs in soils, assess the potential health risks of each source, and identify priority control factors9.

Source apportionment methods for HMs include qualitative and quantitative pollution source identification10,11. Qualitative methods mainly identify possible pollution sources and heavily HM-polluted areas using geostatistics12. However, such methods have difficulty analyzing the contribution rate of each pollution source to provide scientific guidance for pollution control. The receptor model, meanwhile, can give the contribution rate of possible sources based on the concentration of multiple pollutants. Common receptor models include APCS-MLR, PMF, Unmix, and CMB13,14,15. Unmix is a receptor model recommended by the US Environmental Protection Agency (USEPA) for quantitatively analyzing pollutant sources. It was first used to quantitatively identify air pollutant sources. In recent years, some studies have applied it to the source analysis of soil HM pollution16. Most previous health-risk assessment (HRA) methods for HMs were concentration-driven17. The sources of HMs in soil can be divided into both natural and artificial sources. Since it is difficult to intervene in natural sources, the control of HMs in soil has mainly focused on artificial sources18. It is difficult, however, to determine the contribution of each source in concentration-oriented HRA, which poses difficulties for decision-making regarding reducing HM risks in the soil. In addition, the HRA of HMs should adopt appropriate strategies. The traditional health risk assessment models mainly rely on fixed exposure parameters and pollutant concentrations, and assume that the exposure parameters are the same for all children, adult males, and adult females. This approach may lead to inaccurate assessment results. However, uncertainty analysis models, such as Monte Carlo risk simulation is an effective way to address such problems19. Compared with other uncertainty analysis models, the Monte Carlo method can achieve more accurate analysis results with fewer sample data. In recent years, it has been widely used in the HRA of HMs in soil and water20,21,22. However, there have been few studies on the human health risks of HMs in soil using Monte Carlo simulation combined with source analysis. The contribution rate of heavy metal pollution sources to regional human health risks is still unclear. Monte Carlo simulation combined with source analysis can explain the contribution rate of heavy metal sources to soil pollution and the resulting pollution to human health risks. It can also eliminate to a certain extent the problem of inaccurate human health risk assessment caused by fixed parameters23.

The economic belt of the northern slope of the Tianshan Mountains is located in Xinjiang Uygur Autonomous Region, China. This region is far from the sea and has scarce precipitation. It has a typical temperate continental climate and a fragile ecological environment. This economic belt is the most developed area in Xinjiang, with the densest population and the most extensive industrial and agricultural activity. The area accounts for 69.1% of Xinjiang’s GDP. The GDP of the core Urumqi–Changji area accounts for 37.6% of the region’s GDP, and its population accounts for 38.9% of that of Xinjiang. Meanwhile, large amounts of HMs are discharged into the soil in this area. This study aimed to accurately assess soil HM–related health risks for people living in this area. It provides scientific guidance for decision-makers in the region to govern the area and reduce human health risks of soil HMs through scientific means. This study aims to (1) identify and quantify pollution sources by using the Unmix model for source apportionment of soil HMs; (2) assess the human health risks of soil HMs for different population groups in the study area using Monte Carlo simulation, and evaluate the health risk status of each population group; and (3) taking the sensitive population in the study area as an example, use Monte Carlo simulation to evaluate the human health risks of various sources of heavy metals and determine the primary pollution source for control.

Materials and methods

Study area

Fukang, Jimsar, Qitai, and Midong are located in the middle of the northern slope of the Tianshan Mountains and the southern margin of the Junggar Basin desert (Fig. 1). Urbanization in this area is rapid and much higher than the average level in Xinjiang24, and productivity in the area is highly concentrated. It is the leading area for the development of modern industry, agriculture, and transportation in Xinjiang. It also has the largest integrated coal field in China—the Zhundong coal field—which is an important energy base for power and gas transmission25. Given the high intensity of human activity, the soil in this area is seriously polluted by HMs, and the human health risk is high.

Figure 1
figure 1

Sketch map of the study area. Map generated with ArcGIS 10.7 (ESRI).URL:

Measurement methods for HM concentration

Based on a comprehensive survey of the study area, sampling was conducted in July 2019, with the aim of distributing the sampling points as evenly as possible. The sampling quadrats were set according to the diagonal method, and the size of the quadrats was determined to be 10 m × 10 m. Five samples were collected from each quadrat according to the diagonal line; only topsoil (0–20 cm) was taken during collection. Five samples in the quadrat were mixed evenly as a representative sample of the sampling point and stored in a clean self-sealing bag. Each representative sample included about 1 kg of original soil samples. GPS was used to locate and record the latitude and longitude of the sampling sites. A total of 171 samples were collected from the study area. Figure 1 shows the distribution of sampling points in the study area. The collected soil samples were pretreated after natural air drying in a dark and ventilated laboratory. First, impurities such as stones, plant impurities, and plastic fragments in the soil samples were sieved. The soil was then ground and passed through a 100-mesh sieve. To avoid introducing other impurities during the grinding process, corundum mortar was used for grinding. Finally, about 100 g of soil was used to determine HM concentration. After digestion with HNO3, four HMs (Zn, Cu, Cr, and Pb) were determined by flame atomic absorption spectrometry (Hitachi Z-2000 atomic absorption spectrophotometer, Tokyo Hitachi High-Tech Co., Ltd.). Hg was detected by atomic fluorescence spectrometry (Hitachi Z-2000 atomic absorption spectrophotometer). For the whole analysis process, the national soil primary standard material (GSS-1) was used as the quality control standard. The recovery rate of all elements was within the range of 100% ± 10%.

Research methods

Unmix model

Unmix identifies pollution sources and their contributions using self-modeling curve analysis26. The model assumes the data are a linear combination of an unknown number of mixed sources, and the contribution of different sources to each sample is unknown27. The principle can be expressed by the following formula:

$$C_{ij} = \sum\limits_{k = 1}^{m} {G_{ik} p_{kj} + e}$$

where Cij is the concentration of j in sample i, Gik is the contribution of source k in i samples, pkj is the mass fraction of item j in source k (i.e., the composition of the source), and e is the error of model estimation.

HRA model

Human health risks include carcinogenic risks (CR) and noncarcinogenic risks (NCR), which are generally calculated by the HRA model provided by the USEPA28. CR assesses the probability that an individual will develop cancer owing to long-term exposure to a specific pollutant or mixture of pollutants. NCR is related to individual chronic exposure, including genetic and teratogenic effects. To assess the health risks posed by HMs in soil, the population was divided into three groups: females, males, and children. Different from atmospheric particulates, soil HMs have little risk of inhalation exposure. Therefore, the average daily exposure dose (ADD) considers only two main exposure pathways: ingestion and dermal contact29. The calculation formula is as follows30:

$$ADD_{ing} \, = \, \frac{{C_{soil} \, \times \, IR_{ing} \, \times \, CF \times \, EF \, \times \, ED \, }}{BW \, \times \, AT}$$
$$ADD_{dermal} \, = \, \frac{{C_{soil} \, \times \, CF \times \, SA \, \times \, AF \, \times \, ABF \, \times \, EF \, \times \, ED \, }}{BW \, \times \, AT}$$

Csoil is HM concentration in soil (mg/kg). Refer to Table 1 for BW, ED, CF, SA, AF, IRing, ABF, AT, EF, and other HM exposure risk parameters.

Table 1 Calculation parameters and values used in health-risk assessment to evaluate soil exposure risk with Monte Carlo simulation.

NCR is assessed by the total hazard index (HI), and CR is assessed by the total carcinogenic risk (TCR) of HMs in the soil, calculated as follows31:

$$TCR = \sum\limits_{i = 1}^{n} {CR_{i} = \sum\limits_{i = 1}^{n} {(ADD_{i} } } \times SF_{i} )$$
$$HI \, = \, \sum HQ_{i} \, = \, \sum \frac{{ADD_{i} }}{{RfD_{i} }}$$

where CRi is the CR of each HM, SFi is the carcinogenic slope factor of each HM, and its reference is shown in Table 2. If TCR > 10–4, the risk is unacceptable; if TCR < 10–6, the risk is the opposite. NCR was evaluated by HI. HQi is the hazard quotient of each HM, and RfDi is the corresponding reference value of each HM. When HI < 1, NCR is acceptable; when 1 < HI < 4, NCR is moderate; and when HI > 4, NCR is high.

Table 2 Reference dose (RfD) and slope factor (SF) values of heavy metal(loid)s in soil by different exposure pathways used in health-risk assessment with Monte Carlo simulation.

Hybrid model combining Unmix and HRA

The health risks of HMs from different sources were quantitatively evaluated by combining the Unmix and HRA models. This hybrid model involves four steps:

  1. A)

    Analyze HM sources based on Unmix.

  2. B)

    Calculate HM concentration in each sample from each source. The calculation formula is as follows:

    $$C_{ij \, }^{k} { = }F_{ij \, }^{k} \times X_{ij}$$

where \({\text{C}}_{\text{ij}}^{\text{k}}\) is the concentration of each HM in each sample from each source, \({\text{F}}_{\text{ij}}^{\text{k}}\) is the estimated contribution rate of the i element in the kth source in the jth sample, and Xij is the measured concentration (mg/kg) of the i element in the jth sample.

  1. C)

    Fit the probability distribution curve of the ith element of the kth source in all samples.

  2. D)

    Quantitatively evaluate the human health risks of different HM sources using Monte Carlo simulation. HM health risks from each source are added by the ith element of the kth source in the jth sample. The calculation formula is as follows:

    $$ADD_{ij,ing}^{k} \, = \, \frac{{C_{ij}^{k} \, \times \, IR_{ing} \times CF \times \, EF \, \times \, ED \, }}{BW \, \times \, AT}$$
    $$ADD_{ij,dermal}^{k} \, = \, \frac{{C_{ij}^{k} \, \times CF \, \times SA \, \times \, AF \, \times \, ABF \, \times \, EF \, \times \, ED \, }}{BW \, \times \, AT}$$

The CR \(CR_{ij,n}^{k}\) and NCR \(HQ_{ij,n}^{k}\) of HMs from different sources were determined by

$$CR_{ij,n}^{k} { = (}ADD_{ij,n}^{k} \times SF_{i} {) }$$
$$HI = HQ_{ij,n}^{k} = \frac{{ADD_{ij,n}^{k} }}{{RfD_{i} }}$$

where \(CR_{ij,n}^{k}\) is the CR of the kth source to the nth exposure path in the jth sample, and SFi is the slope factor for each HM (Table 2). NCR from different sources is determined by the formula where \(HQ_{ij,n}^{k}\) is the hazard factor for the kth source of the ith metal in the jth sample on the nth exposure path.

Statistical analysis

Excel 2019 was used for the descriptive statistics. EPA Unmix 6.0 was used for the quantitative analysis of sources. The Monte Carlo risk simulation of HRA was based on Crystal Ball Origin 2021 was used for the risk distribution probability map.


Descriptive statistics of soil HMs

Table 3 shows the descriptive statistics for soil HMs in the study area. The mean values of Zn, Cu, Cr, Pb, and Hg in the soil were 55.34, 27.16, 37.31, 79.60, and 0.03, respectively. Compared with the background values of HMs in the soil, except for Zn and Cr, other elements exceeded the standard to varying degrees. Cu was slightly higher than the background value of the soil in Xinjiang, and Pb and Hg were 4.1 and 2.0 times the background value of the soil, respectively. In addition, the maximum content of all elements in the region was higher than the background value of the soil. Compared with the national risk screening value of agricultural land, the average value of all HMs did not exceed the risk threshold, but the values of Cu and Pb at some sites did exceed the risk threshold. In addition, the coefficient of variation of HMs ranged from 0.22 to 0.36, indicating that the soil environment in the study area was strongly affected by human activity. Furthermore, HM concentration in the soil had a high degree of spatial heterogeneity. These results indicate that HM contamination of soil in the region is serious, and the possible health risks need to be emphasized.

Table 3 Descriptive statistical results for the heavy metals in the soil.

Source analysis of HMs in soils

Unmix was used to quantitatively analyze the possible sources of HMs in the soil. When using Unmix, data do not need to be standardized, which could change the pollution source information and affect the accuracy of the results. After eliminating outliers, the data were input into Unmix. The total HMs concentration was set to Total and Norm, and the model was run. At this time, the minimum R2 was 0.98, and the minimum signal-to-noise ratio was 2.23, indicating that the quantitative analysis results were reliable. Figure 2 shows the quantitative analysis results.

Figure 2
figure 2

Unmix results. Software: Origin2021. URL:

The main loadings on factor 1 are Pb (55%), Zn (16%), and Cu (15%); Cr and Hg have no loadings on the source. Previous studies have indicated that the main source of Pb in the soil is transportation emissions. The wear of automobile engines and the combustion of lead-containing gasoline will emit Pb32,33. Meanwhile, the wear of automobile tires and related galvanized parts will emit Zn and Cu34. Roads are dense in the study area, which is an important logistics and transportation hub in Xinjiang. Traffic flow is large, and there is frequent movement of large coal transport vehicles in the Zhundong coalfield. High-frequency traffic activity emits these HMs into the soil; thus, factor 1 represents the traffic source. This is consistent with the results of previous studies35,36.

Factor 2 had higher loadings on Zn (26%), Cu (19%), and Cr (11%) and lower loadings on Pb (8%) and Hg (7%). The mean values of Zn and Cr in the study area were lower than the soil background value in Xinjiang. The mean value of Cu was equivalent to the soil background value and was far lower than the national secondary standard. Some studies have investigated how the weathering of soil parent materials and rock components produce Zn and Cu, among others37. Therefore, factor 2 is likely to represent natural sources.

Factor 3 had the highest loadings on Hg (71%), Cr (30%), and Cu (29%), followed by Pb (16%); it had a low loading on Zn (12%). Many studies have shown that Hg in the soil is strongly related to coal. As mentioned previously, the large Zhundong coalfield is near the study area. There are also many coal-related industrial and mining enterprises in the study area. Coal dust formed by coal mining and accumulated coal gangue will transfer a large of HMs to the soil. Cr is the result and proof of coal-dust diffusion38. In addition, the southern part of the study area has the densest urban agglomeration in Xinjiang. Urban development requires a great deal of electricity. China’s power sources are still dominated by coal combustion, which is an important source of soil Hg. Therefore, factor 3 represents the coal source.

Factor 4 had the highest loadings on Cr (59%), followed by Zn (46%), Cu (37%), Hg (22%), and Pb (22%). Previous studies have shown that industrial emissions of Cr will indirectly enter the soil through waste gas, wastewater, and solid waste. There are many coal processing–related industrial enterprises in the study area39, such as coal washing and metal smelting, resulting in increases in Cr in the soil environment. Meanwhile, studies have also shown that industrial production is related to Zn, Cu, and Pb in the soil environment. For example, Cu and Pb can enter the soil from burning fuel. The smelting and electroplating industries will also discharge Cu-containing compounds into the soil40. Zn is an excellent anticorrosion material. Galvanized materials are widely used, and their production processes will produce pollution. At the same time, mining, coal combustion, and battery manufacturing also produce Zn. Thus, factor 4 represents the industrial source.

HRA based on Monte Carlo simulation

Using Oracle Crystal Ball, Monte Carlo simulation was used to evaluate the health-risk probability of soil HM concentration and pollution sources. Previous studies have shown that simulation results are stable after 10,000 simulations41; thus, the number of simulations was set to 10,000.

Concentration oriented HRA

The health-risk probability distribution for children, females, and males in the study area was evaluated using Monte Carlo simulations. Figure 3 shows the HI distribution for different populations. The mean HI of the three groups from large to small was children (0.218), females (0.044), and males (0.039). According to the probability distribution, the HIs of children, women, and men were all below the critical value recommended by USEPA (HI = 1) and within the safe range. The HI of all populations in this study area was below the acceptable risk threshold. In summary, there was no significant NCR of HMs in soils for regional populations; it can therefore be ignored.

Figure 3
figure 3

Distribution for TCR and HI: (a) total carcinogenic risk (TCR); (b) hazard index (HI). Software: Origin2021. URL:

Figure 3b shows the probability distribution of TCR in soil HMs in the study area. The mean TCR of each group is in descending order of children (4.09 × 10–6), females (3.53 × 10–6), and males (2.96 × 10–6). The TCR values of nearly 77.52% of children, 69.09% of adult women, and 65.63% of adult men in the study area exceeded the critical value of 1 × 10–6. This result shows that the CR of soil HMs in the study area is high, which might lead to a higher prevalence of cancer among the population in the region42. Therefore, the CR of soil HMs in the study area cannot be ignored and requires attention. Moreover, children were found to have a more serious CR than adults. Children are usually more susceptible to soil pollution than adults. This is consistent with previous research results43. On the one hand, children are lighter in weight; on the other hand, it relates to children’s higher oral intake rate and skin adsorption factor. It is necessary, therefore, to keep hands and mouth clean (e.g., avoid sucking fingers), and children should be cleaned in a timely manner after contact with soil44. Thus, sensitive groups—namely, children—should be prioritized in source-oriented HRA in the study area.

Source oriented CR assessment

The concentration-oriented health assessment of HMs in soil can help us intuitively understand pollution levels, but it cannot help decision-makers control the sources of HM pollution in soil29. It is necessary, therefore, to carry out the HRA of HMs from different sources. In the source-oriented assessment of health risks, it is necessary to use Monte Carlo simulation to fit the distribution of HM concentration in each source. Table 4 shows the probability distribution types and key parameters of different sources of HMs.

Table 4 Fitting types and parameters of the probability distribution of different HM sources.

Results for source-oriented CR assessment

Figure 4 shows the probability distribution of CR for children and the CR of HMs from different sources. From large to small, the mean CR values were factor 4 (2.35 × 10–6), factor 3 (1.20 × 10–6), factor 2 (4.46 × 10–7), and factor 1 (7.95 × 10–8). The contribution rate of Pb to CR in the four sources was far less than 1.0 × 10–6. The industrial source was the main source of CR in the study area, accounting for 57.7% of the total carcinogenic risk contribution rate of children. Its mean value was 2.35 times the acceptable CR threshold (1 × 10–6), indicating that CR is serious. The source of the second-highest CR risk was factor 3 coal source, accounting for 29.4% of the CR. The mean value exceeded the acceptable threshold by 1.20 times, and the mean values of factors 1 and 2 were below the acceptable CR threshold. Cr was the main element contributing to CR in the study area, which is related to the lighter weight of children and the higher risk of Cr skin contact carcinogenesis (SF). Cr emissions from industrial and coal sources in the study area pose a higher CR to human health in the region. Therefore, in the future, more attention should be paid to industrial activities and coal mining in the region. The planning of residential gathering areas should be as far away from factories as possible. The activities of the mining industry should be further rationally planned to ensure that residents in the region are not affected by the carcinogenic risk of HMs.

Figure 4
figure 4

Probability distribution of total carcinogenic risk for children and carcinogenic risk from heavy metals based on each source. Software: Origin2021. URL:

Source-oriented NCR assessment

Figure 5 shows the probability distribution of NCR for children and the NCR of HMs from different sources. The mean value of the NCR of each source was roughly consistent with the mean value of CR: factor 4 (0.102) > factor 3 = factor 1 (0.063) > factor 2 (0.017). Among them, factor 4 (industrial sources) is more important for HI than other sources, mainly from the contribution of Pb and Cr elements. The HI of all sources was less than 1, which is within the acceptable range, indicating that the potential NCR for children in the study area can be ignored.

Figure 5
figure 5

Probability distribution of the total NCR for children based on different sources and the NCR of each HM. Software: Origin2021. URL:


In this study, 170 soil samples were systematically collected from Fukang, Jimsar, Qitai and other areas on the northern slope of the Tianshan Mountains in Xinjiang, China. A new receptor model Unmix model was used to analyze the source of soil heavy metals. The R2 of the source analysis results reached more than 0.98, and the minimum signal-to-noise ratio S/N reached 2.23, which proved that the source analysis results were reliable. In future studies, other receptor models such as PMF and APCS-MLR can be combined to further improve the interpretability and reliability of the source analysis results. In addition, the human health risk assessment model is considered to be an effective method to determine the degree of harm of soil heavy metals to human body. However, the traditional human health risk assessment model is subject to fixed parameters and the use of a limited number of heavy metal content data. It is difficult to avoid errors in human health risk assessment23. In this study, Monte Carlo simulation was used in combination with human risk assessment model to eliminate this error to a certain extent, and the sources of Unmix model analysis were used for Monte Carlo simulation of human health risk. The health risks of heavy metals from various sources to sensitive people and the contribution rate of various sources to human health risks were obtained, and the primary control pollution sources were determined. However, at the same time, this study only evaluates the human health risks of five kinds of soil heavy metals, so the human health risk assessment in the study area is still incomplete and limited. In future research, the effects of As and Cd on human health risks should be taken into account. Through the soil heavy metal assessment pollution assessment model and the spatial distribution of soil heavy metals, a more comprehensive understanding of the pollution degree and spatial distribution of heavy metal contaminated soil can be obtained to avoid the health threat of soil heavy metals to the population.


The study area in the present research included Fukang, Jimsar, Qitai, and Midong in the economic belt of the northern slope of Tianshan Mountains in Xinjiang, China. The soil environment in this area is seriously polluted by HMs. Soil samples were taken from the area to conduct HRA on the sources of soil HMs. The results showed the following. (1) The mean values of Zn and Cr lover the background value of Xinjiang soil, but the maximum value exceeded the background value of Xinjiang soil. The mean values of Pb and Hg were 4.1 and 2.0 times of the background value of Xinjiang soil, which were lower than the national standard. (2) Unmix analysis showed that the main sources of HMs were traffic, natural, coal, and industrial sources. Cr (59%), Zn (46%) and Cu (37%) were mainly from industrial sources, Hg (71%) was mainly from coal sources, and Pb (54%) was mainly from traffic sources. (3) The concentration-oriented Monte Carlo HRA showed that the NCR of children was 0.218, that adult females was 0.044, and that of adult was 0.039, all within the acceptable range, while CR was at a high risk. The mean value of children was 4.06 × 10–6, that of adult females was 3.53 × 10–6, and that of adult males was 2.96 × 10–6. Children have higher NCR and CR than adults because of their lighter weight and higher oral intake rate. Children are therefore the most sensitive population in the study area. (4) Source-oriented HRA based on Unmix and Monte Carlo quantitatively analyzed the relationship between HMs, pollution sources, and health risks for the sensitive population. The average CR of industrial sources exceeded the acceptable threshold by 2.35 times, the average CR of coal sources exceeded the threshold by 1.20 times, and the average value of natural sources and traffic sources was below the acceptable threshold. Among them, Cr was the main element contributing to CR. Based on the findings, industrial sources should be the main HM source factors that are preferentially controlled in the region. Furthermore, Cr emitted from coal sources also requires attention.