Introduction

Background and motivation

As the global greenhouse effect intensifies, how to effectively address climate change has become a global issue for all nations1,2. The Paris Agreement, a legally binding climate protocol outlining long-term development goals for future temperatures, was signed by about 200 nations in 20153. Hence, a growing number of nations have developed pertinent national strategies with an ambition for a carbon-free future4. As the largest energy consumer and carbon emitter in the world, the Chinese government has committed to reaching carbon neutrality by 2060, and implementing “stronger and more powerful policies and measures” to achieve the peak of emissions by 20305. It implies that China will face great challenges in reducing emissions in the future, and that formulating effective “dual carbon” development strategies is one of the priorities of the Chinese government.

Energy consumption is the majority source of carbon emissions in China6. Furthermore, China has developed an energy consumption pattern that is dominated by fossil fuels due to its resource structure, which is characterized as “rich in coal, poor in oil, and short of gas”7. Nevertheless, the combustion of fossil energy (e.g., coal and oil) generates significant amounts of carbon dioxide emissions. As a result, it is crucial to adjust and optimize the energy consumption structure for China to reduce carbon emissions.

The energy consumption structure is mostly made up of four categories: coal, crude oil, gas, and others (e.g., hydroelectric power, nuclear power)8. As shown in Fig. 1, fossil energy has historically dominated the energy consumption structure in China, and yet its percentage is decreasing every year, while other clean energy (e.g., natural gas and hydropower) is increasing. Actually, the Chinese government’s series of emission reduction initiatives have been more effective, and the energy consumption structure is optimally adjusted each year. However, it is unknown whether the current emissions reduction initiatives will achieve the government’s stated policy goals as expected. Therefore, effectively forecasting the trend of energy consumption structure can not only verify the feasibility of existing policies, but also facilitate the adjustment and formulation of related policies, which can boost the government’s ability to govern.

Figure 1
figure 1

The structure of energy consumption in China during 2012–2022. *Note: Data from China National Statistical Yearbook.

Literature review

Due to the inherent complexity and asymmetry of multiple interacting elements, energy consumption forecasting has become a challenging problem in the field of time forecasting9. At present, there are numerous studies in the energy field related to the dynamical evolution of the energy structure. The quantitative research methods used by scholars fall into two main categories: univariate forecasting models and multivariate forecasting models.

Univariate forecasting models in energy consumption are mainly based on raw series data for forecasting studies, without the intervention of additional influencing factors10. In particular, autoregressive integrated moving average (ARIMA) model and grey model (GM) are most widely used in energy consumption forecasting11,12. Jiang et al.13 estimated coal costs, consumption, and investment for 2016–2030 in China. By using an ARIMA model, Akram et al.14 applied an ARIMA model to forecast the residential energy consumption in the household sector, which belongs to the Eurozone countries. Ding et al.15 proposed a structural adaptive grey model with adjustable temporal power terms to address the time series nonlinear problem of nuclear energy consumption. Yuan et al.16 projected the primary energy consumption in China using the ARIMA, GM(1,1), and GM-ARIMA hybrid models. Meanwhile, Li et al.17 developed two combined models: the metabolism grey model with autoregressive integrated moving average model (MGM-ARIMA), and the back-propagation neural network with autoregressive integrated moving average model (BPNN-ARIMA) for forecasting energy consumption in India during 2018–2030. Ma and Wang18 constructed a nonlinear grey model-autoregressive integrated moving average model (NGM-ARIMA) to forecast the energy consumption in South Africa during 2017–2030.

Multivariate forecasting models in energy consumption, which mainly refer to the construction of forecast models by exploring the relevant influencing factors16. There are numerous external influencing factors affecting energy consumption forecasting, and how to identify the furthest relevant factors from the vast potential factors is the crucial issue in the perspective of this study19. Scholars have adopted various methods to explore the influencing factors, such as logarithmic mean divisia index (LMDI) method20 and stepwise regression21. Simultaneously, within this study perspective, artificial intelligence algorithms22, like support vector machines23 and neural networks24, are frequently employed for energy forecast. Xia and Wang25 solved the contribution values of the influencing forces affecting the energy consumption structure by the LMDI method, and used an empirical model decomposition model to break down the influencing factors with large contribution values into modal components at various scales. According to the LMDI method, Chai et al.26 classified influencing drivers of gas consumption into the indicators of economic progress and cleanliness, and constructed a stochastic impacts by regression on population, affluence, and technology (STIRPAT) model, combined with partial least squares regression (PLSR) to analyze the scenario of natural gas consumption in China during 2016–2025. He et al19 utilized the stepwise regression method to identify major influencing factors and developed two probability density forecasting methods to estimate the consumption of energy in Anhui Province during 2015–2023, Liu et al27 used LMDI method to analyze the driving factors of carbon emission in Beijing, Tianjin, Shanghai and Chongqing.

Based on the aforementioned diverse literature, it is clear that the majority of current research has focused on the absolute amount of specific types of energy consumption, while too few studies have examined the relative information underlying the totality of energy consumption, which also implies that there are seldom studies that consider energy consumption structure for a whole28. The energy consumption structure is essentially a holistic system and should jointly take into account the variability among energy types29. Because the energy consumption structure is comprised of four energy sub-structures: coal, oil, natural gas, and other energy resources, which are required to satisfy non-negativity and the total sum should be one30. However, the study of energy consumption structure using traditional models does not fully investigate the relative information behind the entire structure and overlooks the holistic nature of the structure. To address this research issue, this paper incorporates the theory of compositional data into the investigation of the energy consumption structure.

Compositional data is a class of complex data with a special structure, which mainly describes the relative information among the components rather than their absolute values, and for which every knowledge about the components must be based on the ratio31. The basic concept behind modeling on compositional data is that the initial data is first transformed to produce intertemporal bisectional variables using appropriate techniques. And intermediate variables are then modeled and manipulated using basic modeling methods. Finally, the data results are reduced to compositional data by corresponding inverted transformations32,33. Recently, the compositional data theory has been applied extensively to forecast the regional industrial and economic structure34, study the shift of population structure35, and analyze the distribution of rock composition36, which are already successfully implemented in numerous areas including agriculture, economics and geology, but it is used less in the energy sector. Qian et al.30 suggested a unique adaptive discrete grey forecasting model based on compositional data. He et al.28 developed a dimension reduction through hyperspherical transformation and composite quantile regression neural network (DRHT-CQRNN) model to forecast the structure of total energy consumption in Chongqing during “the 14th Five-Year Plan period (2021–2025)”. Zhang et al.37 forecasted the structure of bioenergy generation in China based on an innovative grey compositional data model.

At present, few scholars consider the energy consumption structure as a whole system, and the relative information on its constituent components is lacking. Therefore, the theory of compositional data is introduced into the energy consumption structure in this paper to thoroughly explore the internal features of the structure and its interrelations. At the same time, it is found that all the current studies of energy consumption structure based on compositional data are single-mode models. In contrast, the combined model can combine the advantages of each single forecast model to enhance its overall accuracy of forecast and make the model fit and forecast extremely stable38,39. Therefore, a joint model is constructed based on the theory of compositional data in this paper. The key to the development of a comprehensive model is to determine the single model weights, yet weight selection is a major challenge for combined forecasting methods40. There are additional common methods for determining the weights of the combined model, such as minimization of the sum of squares error41 and reciprocal variance method42. However, since the compositional data is merely a vector, calculating its inaccuracy cannot be done by directly deducting the true value from the predicted value; On the contrary, all its internal features must be sufficiently considered. Therefore, the distance between vectors of compositional data is used as a measure of prediction error in this paper, whereby the weighting factor is derived from the minimum squared sum of the Atchison distance among the forecast and true values, then the combined MGM-BPNN-ARIMA model base on compositional data is proposed to forecast the energy consumption structure of China in 2023–2040.

Contribution and research structure

The following are the contributions of this paper.

  1. 1.

    At present, few studies have considered an energy consumption structure as a whole system, while the relative information about the components of the structure has not been adequately explored. In this paper, we introduce the theory of compositional data into the energy consumption structure and systematically consider the internal features of the energy consumption structure, which fully satisfies the requirement of the non-negative and constant sum of its components.

  2. 2.

    Considering the vector property of the compositional data, therefore, based on the Aitchison distance sum of squares theory, this paper proposes a combined MGM-BPNN-ARIMA model based on the DRHT method, which has higher prediction accuracy than a single model.

  3. 3.

    Compare the model forecast results with the current policy goals proposed by the Chinese government. It is conceivable to predict whether China will meet its policy objectives on time, and to make relevant practical recommendations to legislators.

The remainder of the paper is organized as follows. "Materials and methods" section describes the methodology involved in this paper, which includes the theory of compositional data and methods for combining forecast models on compositional data; "Model establishment and analysis" section explains the construction of the specific MGM-BP-ARIMA merger model; "Forecast results and discussion" section contains the results and analysis of the forecast for China’s energy consumption structure during 2023–2040; "Conclusions" section presents the relevant conclusions.

Materials and Methods

Methodology

The compositional data are positive data that solely provide relative information, which adds up to a constant in the majority of instances43. Figure 2 illustrates the major process of the forecast study from the perspective of compositional data involved in this paper, which is classified into the following four primary steps.

Figure 2
figure 2

The basic process of compositional data forecast.

Step 1: Take the correlation operation to turn the original data into compositional data (the data are mutually constrained and the sum ratio is distinct).

Step 2: Use appropriate transformation techniques to create original unconstrained variables from compositional data.

Step 3: Adopt a correlated time series prediction model for the original variables.

Step 4: The inverse transformation technique corresponding to the conversion method in Step 2 is taken to convert to the final desired compositional data values.

Compositional data

To represent a series of compositional data, each value of xi (i = 1, 2, …, D) is consistently greater than 0 and satisfies \(\sum\limits_{{{\text{i = }}1}}^{D} {x_{i} { = }c}\), where c is a constant. Therefore, the space SD formed by compositional data satisfying all the above conditions can be described as follows.

$$S^{D} = \left\{ {X = \left[ {x_{1} ,x_{2} ,...,x_{D} } \right]:x_{i} > 0,i = 1,2,...,D;\sum\limits_{i = 1}^{D} {x_{i} = c} } \right\}$$
(1)

The elements are in D-dimensional row vectors, but since the sum of the components is fixed, that makes it a vector space of D-1 dimensions.

Nevertheless, due to the fixed-sum constraint of the compositional data, typical statistical approaches cannot be directly applied to the mathematical evaluation of compositional data44. To overcome the limitations associated with compositional data transformations for general statistical analysis, Aitchison31 proposed a logistic normal distribution model, and addressed the fixed-sum constraint problem with log-ratio transformation method. Egozcue et al.32 put forward the isometric log-ratio transformation to handle overlapping subcomponents in compositional decomposition. However, all of the above approaches require the components to be nonzero, which presents certain drawbacks. To further solve the zero-component problem, Wang et al.45 proposed a dimensionality reduction by the hyper spherical transformation, effectively resolving the dilemma of the existence of zero components in the compositional transformation. The application of spherical coordinate transformation to practical time series forecasting is described as follows.

Set \(X = \left[ {x_{1} ,x_{2} ,...,x_{D} } \right]\) is a composition vector, which satisfies:

$$\sum\limits_{i = 1}^{D} {x_{i} = 1} ,0 \leqslant x_{i} \leqslant 1$$
(2)

If each component of the compositional vector is treated with a square root, then the following results can be obtained:

$$y_{i} = \sqrt {x_{i} } \left( {i = 1,2,...,D} \right)$$
(3)

And it also means that: \(\sum\limits_{i = 1}^{D} {y_{i}^{2} } = 1\).

the vector \(Y = \left[ {y_{1} ,y_{2} ,...,y_{D} } \right]\) can be regarded as a point on the hypersphere. The spherical coordinate transformation maps the D-dimensional vector \(Y = \left[ {y_{1} ,y_{2} ,...,y_{D} } \right]\) to the hypersphere \(\left[ {r,\theta_{1} ,\theta_{2,} ...,\theta_{D} } \right]\), where can be satisfied with \(r^{2} = ||y||^{2} = 1\).

Thus, the computation procedure of the Drht can be summarized as follows:

$$\left\{ {\begin{array}{*{20}l} {\theta_{D} = {\text{arc}}\cos y_{D} } \hfill \\ {\theta_{D - 1} = {\text{arc}}\cos \left( {\frac{{y_{D - 1} }}{{\sin \theta_{D} }}} \right)} \hfill \\ {\theta_{D - 2} = {\text{arc}}\cos \left( {\frac{{y_{D - 2} }}{{\sin \theta_{D} \sin \theta_{D - 1} }}} \right)} \hfill \\ {...} \hfill \\ {\theta_{2} = {\text{arc}}\cos \left( {\frac{{y_{2} }}{{\sin \theta_{D} \sin \theta_{D - 1} \cdots \sin \theta_{3} }}} \right)} \hfill \\ \end{array} } \right.$$
(4)

The calculation process of Drht inverse transformation can be summarized as follows:

$$\left\{ {\begin{array}{*{20}l} {y_{1} = \sin \theta_{2} \sin \theta_{3} \sin \theta_{4} \cdots \sin \theta_{D} } \hfill \\ {y_{2} = \cos \theta_{2} \sin \theta_{3} \sin \theta_{4} \cdots \sin \theta_{D} } \hfill \\ {y_{3} = \cos \theta_{3} \sin \theta_{4} \cdots \sin \theta_{D} } \hfill \\ {...} \hfill \\ {y_{D - 2} = \cos \theta_{D - 2} \sin \theta_{D - 1} \sin \theta_{D} } \hfill \\ {y_{D - 1} = \cos \theta_{D - 1} \sin \theta_{D} } \hfill \\ {y_{D} = \cos \theta_{D} } \hfill \\ \end{array} } \right.$$
(5)

Single model

  1. (1)

    MGM model

Grey model theory46 was proposed by Professor Deng Julong to solve the information uncertainty within the system. GM(1,1) model is an essential component of grey system theory, which is concerned with forecasting small sample data by incomplete information. The essential concept of the GM(1,1) model is to generate the primary series by one accumulation, and then create a differential equation model to roughly obtain an approximate estimate of the original series, so as to forecast the subsequent development of the original data. The specific process of modeling the GM (1,1) model is as follows.

Step 1: Conduct an addition of the initial sequence \(X^{\left( 0 \right)} = \left\{ {X^{\left( 0 \right)} \left( 2 \right),X^{\left( 0 \right)} \left( 3 \right),...X^{\left( 0 \right)} \left( n \right)} \right\}\) to obtain the new sequence \(x^{\left( 1 \right)}\) (the One-AGO sequence \(x^{\left( 0 \right)}\)).

$$x^{\left( 1 \right)} \left( m \right) = \sum\limits_{i = 1}^{m} {x^{\left( 0 \right)} \left( i \right),i = 1,2,...,} n$$
(6)

Step 2: Compute the mean of the immediate neighbors of the series \(x^{\left( 1 \right)}\) to generate the series \(z^{\left( 1 \right)} = \left( {z^{\left( 1 \right)} \left( 2 \right),z^{\left( 1 \right)} \left( 3 \right),...,z^{\left( 1 \right)} \left( n \right)} \right)\).

$$z^{\left( 1 \right)} \left( m \right) = \frac{1}{2}x^{\left( 1 \right)} \left( m \right) + \frac{1}{2}x^{\left( 1 \right)} \left( {m - 1} \right),m = 2,3,..n$$
(7)

Step 3: Construct the whitening differential equation for GM(1,1) based on the above formula.

$$\frac{{dx^{\left( 1 \right)} \left( t \right)}}{dt} + ax^{\left( 1 \right)} \left( t \right) = b$$
(8)

where b denotes the amount of ash action and -a denotes the development factor.

Step 4: Introduce matrix form to calculate the data matrices B and Y.

$$B = \left[ {\begin{array}{*{20}c} { - z^{\left( 1 \right)} \left( 2 \right)} \\ { - z^{\left( 1 \right)} \left( 3 \right)} \\ {...} \\ { - z^{\left( 1 \right)} \left( n \right)} \\ \end{array} } \right],\;Y = \left[ {\begin{array}{*{20}c} {x^{\left( 0 \right)} \left( 2 \right)} \\ {x^{\left( 0 \right)} \left( 3 \right)} \\ {...} \\ {x^{\left( 0 \right)} \left( n \right)} \\ \end{array} } \right]$$
(9)

Step 5: Apply the least square method on the estimates of the parameters a and b.

$$\hat{u} = \left( {\begin{array}{*{20}c} {\hat{a}} \\ {\hat{b}} \\ \end{array} } \right) = \left( {B^{T} B} \right)^{ - 1} B^{T} Y$$
(10)

Step 6: Substitute the solved \(\hat{a},\hat{b}\) into the whitening differential equation, to derive the time-responsive function of the differential equation.

$$\hat{x}^{\left( 1 \right)} \left( {m + 1} \right) = \left[ {x^{\left( 0 \right)} \left( 1 \right) - \frac{{\hat{b}}}{{\hat{a}}}} \right]e^{{ - \hat{a}m}} + \frac{{\hat{b}}}{{\hat{a}}},m = 1,2,...,n - 1$$
(11)

Step 7: Perform the cumulative subtraction operation to obtain the original sequence \(x^{\left( 0 \right)}\) of the predicted value.

$$\hat{x}^{\left( 0 \right)} \left( {m + 1} \right) = \hat{x}^{\left( 1 \right)} \left( {m + 1} \right) - \hat{x}^{\left( 0 \right)} \left( m \right),m = 0,1,...,n$$
(12)

By continuously adding new information, while removing old information promptly, the modeling sequence will more closely represent the present features of the system. In practical forecasting, as the system grows, the information significance of the old data will gradually decline. The MGM(1,1) model is a modernized version of the conventional grey model17. Its forecast principle is to utilize the latest data \(X^{\left( 0 \right)} \left( {k + 1} \right)\) predicted by the GM(1,1) model, to replace the oldest data \(X^{\left( 0 \right)} \left( 1 \right)\) in the primary data series \(X^{\left( 0 \right)}\), to maintain the dimensionality of the data series. Then the GM(1,1) model is repeated with the newest data series \(X^{\left( 0 \right)} = \left\{ {X^{\left( 0 \right)} \left( 2 \right),X^{\left( 0 \right)} \left( 3 \right),...X^{\left( 0 \right)} \left( {k + 1} \right)} \right\}\), and the new data \(X^{\left( 0 \right)} \left( {k + 2} \right)\) is added to \(X^{\left( 0 \right)} = \left\{ {X^{\left( 0 \right)} \left( 2 \right),X^{\left( 0 \right)} \left( 3 \right),...X^{\left( 0 \right)} \left( {k + 1} \right)} \right\}\) and then subtracted from \(X^{\left( 0 \right)} \left( 2 \right)\) (forming a new series), and the GM (1,1) model will then be used again for forecast and testing. Continue in this manner until the prediction target.

  1. (2)

    BPNN model

The BPNN model is a multilayer feedforward neural contraction network model trained by a back-ward error propagation algorithm47. The propagation of the signal and the subsequent propagation of the mistake make up the bulk of the training process. Firstly, the activation function is weighted to calculate the signal in the input layer, which is then sent to the hidden layer and will ultimately be propagated to the output layer as well. If the requirements of the model error are not met, the weights and thresholds of the BPNN are continuously adjusted based on the gradient descent method, which entails the normal input of the signal again, and the cycle repeats until the output signal obtained from the output layer, which fulfills the accuracy requirements of the model.

  1. (3)

    ARIMA model

The ARIMA model was originally introduced by Box and Jenkins in the early 1970s as a time series forecasting method. It has found applications in statistics and computational economics, where it is known to be the most widely employed model for time series forecasting. The AR, MA, and ARMA models are the main models used with this model. Essentially, the ARIMA model employs differencing to first smooth the non-stationary data before applying the ARMA model to the stationary data. Moreo-ver, the ARMA model is made up of two components: the AR model and the MA model48.

The equation of the AR(p) model is defined as:

$$y_{t} = \mu + \sum\limits_{i = 1}^{p} {\gamma_{i} y_{t - i} } + \varepsilon_{t}$$
(13)

The equation for the MA(q) model is defined as:

$$y_{t} = \mu + \sum\limits_{i = 1}^{q} {\theta_{i} \varepsilon_{t - i} } + \varepsilon_{t}$$
(14)

The equation for the ARMA(p, q) model is defined as:

$$y_{t} = \mu + \sum\limits_{i = 1}^{p} {\gamma_{i} y_{t - i} } + \varepsilon_{t} + \sum\limits_{i = 1}^{q} {\theta_{i} \varepsilon_{t - i} }$$
(15)

where \(\mu\) is the constant term, \(\gamma_{i}\) is the AR model coefficient, \(\theta_{i}\) is the MA model coefficient, \(\varepsilon_{t}\) is the white noise series, \(p\) is the autoregressive of orders, and \(q\) is the moving average of orders.

Combined Model

By setting appropriate weights and integrating the projections in a weighted manner, a combined model incorporates forecasts obtained from individual forecasting method. Leveraging the model construction based on each single model, the combined model can maximize the information utilization and thus can optimize the forecast results substantially. The mathematical formulation for the combined model is represented by the following expression.

$$\left\{ {\begin{array}{*{20}l} {f\left( t \right) = \sum\limits_{i = 1}^{n} {\omega_{i} \hat{f}_{i} \left( t \right)} } \hfill \\ {s.t{\mkern 1mu} {\mkern 1mu} \sum\limits_{i = 1}^{n} {\omega_{i} } = 1} \hfill \\ \end{array} } \right.$$
(16)

where \(\hat{f}_{i} \left( t \right)\) is the prediction value for the ith method at moment t and \(\omega_{i} \left( t \right)\) is the combined weight for the ith model at the moment t.

Since the compositional data is just a vector, calculating its inaccuracy cannot be done by simply deducting the true value from the forecast value; instead, all of its internal characteristics must be fully considered. Therefore, in this paper, the distances among the vectors of the compositional data are utilized as a measure of forecast error, whereby the weighting factor is derived from the minimum squared sum of the Atchison distance among the forecast and true values. The Atchison distance is a critical dimension of the compositional data, since it reflects the difference between the proportions of the data. The Acheson distance is defined as follows.

For any \(x,y \in S^{D}\), then the Aitchison distance between x and y would be equal to:

$$d_{S} (x,y) = \sqrt {\sum\limits_{{i{ = }1}}^{D} {\left( {\ln \frac{{x_{i} }}{g\left( x \right)} - \ln \frac{{y_{i} }}{g\left( y \right)}} \right)^{2} } }$$
(17)
$$g\left( x \right) = \sqrt[D]{{\prod\limits_{{i{ = }1}}^{D} {x_{i} } }},g\left( y \right) = \sqrt[D]{{\prod\limits_{{i{ = }1}}^{D} {y_{i} } }}$$
(18)

The error value of the combined prediction of the compositional data for weight calculation based on the Aitchison distance sum of squares is expressed as:

$$J = \sum\limits_{t = 1}^{T} {d_{S}^{2} \left( {x^{t} ,\hat{x}^{t} } \right)} = \sum\limits_{t = 1}^{T} {\sum\limits_{i = 1}^{D} {\left( {\ln \,\frac{{x_{i}^{t} }}{{g\left( {x^{t} } \right)}} - \ln \,\frac{{\hat{x}_{i}^{t} }}{{g\left( {\hat{x}_{i}^{t} } \right)}}} \right)}^{2} }$$
(19)

The error value of a single prediction model for each compositional data point at t is:

$$e_{it} = \left[ {\ln \,\frac{{x_{i}^{t} }}{{g\left( {x^{t} } \right)}} - \ln \,\frac{{\hat{x}_{1i}^{t} }}{{g\left( {\hat{x}_{1}^{t} } \right)}},\ln \,\frac{{x_{i}^{t} }}{{g\left( {x^{t} } \right)}} - \ln \,\frac{{\hat{x}_{2i}^{t} }}{{g\left( {\hat{x}_{2}^{t} } \right)}},...,\ln \,\frac{{x_{i}^{t} }}{{g\left( {x^{t} } \right)}} - \ln \,\frac{{\hat{x}_{ni}^{t} }}{{g\left( {\hat{x}_{n}^{t} } \right)}}} \right]^{T}$$
(20)

The error sum of squares for each single compositional data prediction model is expressed as:

$$E = \sum\limits_{t = 1}^{T} {\sum\limits_{i = 1}^{D} {e_{it} e_{it}^{T} } }$$
(21)

According to the above equations, the combined model error can be expressed as:

$$J = \alpha^{T} E\alpha$$
(22)

where \(\alpha\) denotes the weighted coefficient vector of the combined model.

By introducing the n-dimensional vector \(R = \left[ {1,1,..,1} \right]^{T}\), the constraint on the weighting coefficients can be expressed as:

$$R^{T} \alpha = 1$$
(23)

With the aforementioned transformation, the key to solving for the weights based on the Aitchison distance sum of squares is to achieve J minimization while introducing the Lagrange multiplier. To minimize J, the first-order partial derivative of J concerning being zero, and the final weight coefficient can be calculated as follows.

$$J = \alpha^{T} E\alpha + \lambda \left( {R^{T} \alpha - 1} \right)$$
(24)
$$\frac{\partial J}{{\partial \alpha }} = 2E\alpha + \lambda R = 0$$
(25)
$$\alpha = \frac{{E^{ - 1} R}}{{R^{T} E^{ - 1} R}}$$
(26)

Accuracy of the model

In terms of forecast accuracy metrics, this paper refers to the traditional common model evaluation metrics: root mean square error (RMSE) and mean absolute percentage error (MAPE), and describes the forecast error evaluation metrics: CMAPE and CRMSE for compositional data49. The specific formulas are as follows.

$$CRMSE = \frac{1}{T - M}\sum\limits_{t = M + 1}^{T} {d_{s} \left( {x^{\left( t \right)} ,\hat{x}^{\left( t \right)} } \right)}$$
(27)
$$CMAPE = \frac{1}{T - M}\sum\limits_{t = M + 1}^{T} {\frac{{d_{s} \left( {x^{\left( t \right)} ,\hat{x}^{\left( t \right)} } \right)}}{{x_{s}^{\left( t \right)} }}}$$
(28)

Framework for the study

Considering the vectorial nature of the compositional data, to further improve the accuracy of the energy consumption structure forecast, this paper proposes a compositional binding model based on the theory of minimizing the sum of the squared Aitchison distance errors from the compositional data. In contrast, the study process in this paper is divided into three main steps: (1) Data pre-processing. The primary energy consumption structure is transformed into angle values (intermediate variables), that apply DRHT methods for compositional data. (2) Construction of forecast models. The MGM, BPNN, and ARIMA models were established respectively, the angle values under each model were fitted, and the weight values for each model were calculated, in which Atchison distance squared and minimization theory was applied, with inversion performed by the DRHT method, to obtain the forecast values for the compositional data. (3) Model forecasting. The optimized forecast model is elected by minimum CRMSE and CMAPE values, together with the DRHT method to back-transform the forecast obtained angle values to the actual forecast values. The concrete prediction framework is shown in Fig. 3.

Figure 3
figure 3

The methodological framework for this paper.

Model establishment and analysis

Data

The research object of this paper is the energy consumption structure of China for 2000–2022, and the data are obtained from the China National Statistical Yearbook and National Bureau of Statistics. Moreover, the energy consumption structure covered in this paper is divided into four categories: coal, oil, natural gas, and other energy sources (like hydropower and wind energy). The specific structure of Chinese energy consumption during 2000–2022 is depicted in Table 1, which reveals that coal has long dominated the energy consumption structure, although there has been a more pronounced decreasing trend in the percentage of coal in recent years. The percentage of oil has also been falling each year. The percentage of clean energy (like natural gas) has increased significantly, and despite the continuous improvement and adjustment of the energy consumption structure in China, it is still unbalanced in general.

Table 1 The structure of energy consumption in China during 2000–2022.

Transformation of compositional data

Taking into account the peculiar circumstance that the energy consumption structure may have zero subcomponents, in this study, the DRHT approach can be applied to analyze the Chinese energy consumption structure during 2000–2022. Meanwhile, the MGM, BPNN, and ARIMA model are adopted as the benchmark model for the combined model. Before proceeding with the model, the data of the original consumption structure of Table 1 is first subjected to the DRHT method. In this paper, \(\left( {y_{1} ,y_{2} ,y_{3} ,y_{4} } \right)\) are represented the four major components of the energy consumption structure, and \(\left( {\theta_{1} ,\theta_{2} ,\theta_{3} } \right)\) are denoted as the angle values of the compositional data transformed with DRHT, and the specific angle values after conversion are presented in Table 2.

Table 2 Results of DRHT transformation of energy consumption structure from 2000 to 2022.

Construction of the single model

Based on the data in Table 2, the MGM (1,1) model is utilized to forecast the three groups of angle values \(\left( {\theta_{2} ,\theta_{3} ,\theta_{4} } \right)\), in which the loop node of the MGM model used in this paper is set to 6 through multiple fitting, meaning that the data from the previous six years can forecast the angle value for the upcoming year. Since the MGM(1,1) model is applied to forecast for all three sets of data \(\left( {\theta_{2} ,\theta_{3} ,\theta_{4} } \right)\), and the years forecasted span for 2000–2022, the forecasts for 51 angle values can be obtained. Furthermore, the forecasted angle values are inverted to the compositional data, to derive the fitted values for each component of energy consumption structure from 2006 to 2022, as summarized in Table 3.

Table 3 The fitting results of the single model for 2006–2022.

Regarding the time series prediction of energy consumption structures using the BPNN model, in this paper, the specific energy consumption substructures of three consecutive years are used as inputs to the neural network to predict the substructures of the following year through multiple fitting. Since there are three sets of angular values after DRHT conversion, three different network models need to be constructed. Regarding the setting of the initial parameters of the neural network, its training function is set to train with a maximum number of iterations of 1000 and an error threshold of 1e-6. For the determination of the hidden layers of neural networks, the hidden layers of the three networks are chosen to be set to five layers in this paper by comparing the models trained with multiple layers. To further improve the prediction accuracy and generalization ability of the BPNN model, the genetic algorithm (GA) is employed in this paper to optimize the weights and thresholds of the BPNN model. For the initial parameters of GA, the selection probability is set to 0.09, the crossover probability is set to 0.4, and the variation pattern is nonUnif Mutation. Then, the trained GA-BPNN model is employed to forecast three sets of angle values. These are loosely related to the values in Table 3 previously. Finally, the predicted angular values are inverse transformed to the compositional data to obtain the fitted values for the components of the energy consumption structure, which are summarized in Table 3.

According to the data in Table 2, three independent ARIMA models need to be constructed in this paper to predict the three data sets separately. based on the information criterion of minimization of AIC and BIC to find the optimal parameter values of the three ARIMA models, and the three ARIMA models finally identified for adoption as ARIMA(1,0,2), ARIMA(1,1,0) and ARIMA(0,1,0). Meanwhile, the fitting goodness of fit of all three models is above 0.85, which is a good fit. The fitting results of the above three ARIMA models values after inversion with the compositional data are demonstrated in Table 3.

Optimal model selection

In this paper, the three monomial models constructed above are used as benchmark models for compositional prediction, simultaneously the weights of the combined MGM-BPNN-ARIMA model are derived from the theory of minimization of squared Atchison distances on compositional data. To additionally select the optimal joint model, the corresponding joint model is also constructed in this paper based on any two of the single models mentioned above. The weights and error values for the specific joint model are given in Table 4. Next, the CMAPE and CRMSE values of each combined model are compared and the model with the lowest error is chosen as the forecast model. Considering the inconsistent data nodes used by each model, the values from 2006 to 2022 are presented as the basic values for the weight assignment and model error comparison in this paper. Moreover, Fig. 4 compares the CRMSE and CMAPE values for all potential merger models.

Table 4 Combined weight allocation and error value summary results of each model.
Figure 4
figure 4

Comparison of CRMSE and CMAPE values for each model.

Table 4 sums up the error values (CRMSE and CMAPE) of each model and the results of the weight assignment of the combined model. It can be stated that the error values of the combined models are all less than the single model, in which the ARIMA model is the single model with the lowest error, and the combination of the benchmark models performs well, with CRMSE values below 6% and CMAPE values below 3.25%. And the best prediction is achieved by the MGM-BPNN-ARIMA combination model. The weight value of this combined model is (0.181,0.275,0.544), which predicted the CRMSE value of 5.739% and the CMAPE value of 3.150%. Compared to the ARIMA model, which has the smallest error value among the individual models, its CMAPE value is reduced by 0.173%, and its CRMSE value is reduced by about 0.353%, and compared to the combined BPNN-ARIMA and MGM-ARIMA model based on the ARIMA model, the CMAPE values are reduced by 0.076% and 0.064%, and the CRMSE values are reduced by about 0.146% and 0.175%. It implies that the combined MGM-BPNN-ARIMA model constructed improves the forecast accuracy. Moreover, it also further illustrates that the forecast of the compositional data based on the Atchison distance squared and minimization theory has obvious advantages, as it completely utilizes the internal structural features of the compositional data for the study.

Forecast results and discussion

Based on the values of China’s energy consumption structure during 2000–2022, the model (the DRHT transformed MGM-BPNN-ARIMA combination model) with the lowest CRMAE and CMAPE values is adopted in this paper, to forecast the energy consumption structure of China for 2023–2040. The forecast results of the specific sub-structure percentages for the four categories of energy consumption structure are shown in Table 5, while the trends of each type of energy consumption are depicted in Fig. 5.

Table 5 Forecast results of the structure of energy consumption in China for 2023–2040.
Figure 5
figure 5

The energy consumption structure in China for 2023–2040.

As indicated in Table 5 and Fig. 5, the future energy consumption structure of China will be adjusted and improved, in which the proportion of coal consumption will keep decreasing, and remain at 53.5% in 2030 and 48.4% in 2040, which means that coal consumption will still hold a major position in China’s energy consumption structure. Simultaneously, the share of oil consumption will also decrease, from about 17.5% in 2023 to 13.6% in 2040, so the proportion of fossil energy consumption will show an obvious declining trend, further indicating the optimized adjustment of China’s energy consumption structure in the future. Meanwhile, the proportion of natural gas consumption will maintain an upward trend, rising substantially from 8.5% in 2023 to 11.6% in 2040. And the proportion of other clean energy (e.g., wind power and hydropower) will reach 21.5% in 2030, 23.9% in 2035, and 26.3% in 2040, a significant increase from 18.4% in 2023. To further compare the forecast results with the actual policy goals, a concrete comparison of the energy consumption structure of China in 2025, 2030, 2035 and 2040 is shown in Fig. 6. In accordance with the above statistics, there is a rapid development of non-fossil energy, and its share in the energy consumption structure is increasing each year, but the entire energy consumption structure is still in a state of imbalance, which also means that China’s energy consumption structure still needs to be adjusted and optimized more.

  1. 1.

    Coal. Coal will remain a substantial part of China’s energy consumption structure in 2023–2040, but its share shows a decreasing trend, falling from 55.7% in 2023 to 48.4% in 2040. However, as a major energy consumer, China’s total energy consumption has always been large and coal is still used to some extent at peak consumption levels. Therefore, China should stick to the objective of exploring new energy sources to alternative coal consumption, so that coal gradually loses its dominance in energy consumption.

  2. 2.

    Oil. From 2023 to 2040, China’s oil share shows a clear downward trend, from 17.5% in 2023 to 13.6% in 2040. Therefore, there is a downward trend for fossil energy (i.e., oil and coal), but the energy consumption structure in China will be dominated by them over time. It is essential for the Chinese government to take measures to develop non-fossil energy sources and reduce oil consumption, thus promoting an optimal transformation of the energy consumption structure.

  3. 3.

    Natural gas. The “Strategy for the Energy Production and Consumption Revolution (2016–2030)” mentions that by 2030, China will reach a natural gas consumption share of about 15%. However, gas consumption is only 9.8% in 2030 and 11.6% in 2040, falling short of the proposed policy target. As such, it is critical to make effective policy adjustments to increase the production and supply of natural gas and thus promote its substitution for conventional elevated-carbon fossil energy sources.

  4. 4.

    Others. The “Action Plan to Achieve Carbon Peak by 2030” issued that during “the 14th Five-Year Plan (2020–2025)”, By 2025, China will have made major strides in the optimization and adjustment of its energy structure, and the percentage of non-fossil energy consumption will be close to 20%. During “the Tenth Five-Year Plan (2026–2030)”, the percentage of non-fossil energy consumption will be more increased, and by 2030, the percentage of non-fossil energy consumption will reach about 25%. However, the forecasted conclusions indicate that the percentage of non-fossil energy consumption in 2030 would only be 21.5%, falling short of the 2030 policy aim, and only corresponding with the Chinese the administration’s 2025 policy target. As a result, China still needs to step up its energy reform efforts, accelerate the development of renewable energy technologies such as wind and solar power, and grow the clean energy industry.

Figure 6
figure 6

Comparison of China’s energy consumption structure in 2025, 2030, 2035 and 2040.

To further support the improvement and modification of China’s energy consumption structure, the following recommendations are made. First, a more detailed and clear development roadmap should be drawn up to ensure that the policy goals set can be promoted in a reasonable and orderly manner. Second, high-tech development should be vigorously developed to speed up the transformation of the industrial structure. Second, we should develop high-tech technologies and accelerate the transformation of industrial structures. For example, improving the energy utilization efficiency of key industries with “high energy consumption” and “elevated emissions” (e.g., the iron and steel industry), to achieve the ultimate “coal reduction”. Third, the development of a diverse energy landscape should be actively encouraged, which implies expanding the growth of clean energy sources, such as water and wind power, and the progressive and orderly replacement of fossil energy consumption by clean renewable energy consumption, as with coal. Finally, public awareness of green and low-carbon development should be increased, and green consumption by all should be encouraged.

Conclusions

The energy consumption structure is fundamentally a holistic system with a disjoint internal structure, which implies that its constituent parts are non-negative and add up to one. However, due to the lack of sufficient excavation of information on the energy consumption structure, few scholars have conducted research in this area. At the same time, classic time series forecasting methods determine the percentages of each component independently, ignoring structural integrity and failing to thoroughly examine internal development trends. As a result, this paper incorporates compositional data into the energy consumption structure and evaluates the energy consumption structure as a whole system. This not only meets the numerical restrictions of the components (non-negative and constant), but also effectively displays the intrinsic development trend of each component behind the system. In this paper, we use historical data to forecast the trend of China’s energy consumption structure from 2000 to 2022. In terms of forecasting model selection, this paper proposes a joint MGM-BPNN-ARIMA forecasting model with the best predictive performance based on the traditional single model to forecast the evolution of China’s energy consumption structure during 2023–2040.

With the overall objectives of “carbon peaking” and “carbon neutrality”, the Chinese authorities have taken a series of practical steps to optimize the energy structure, and have also set policy goals for the energy consumption structure of China. The predictive studies presented in this paper can, to some extent, test whether the policy objectives for China’s energy consumption structure can be achieved as expected. Based on the combined MGM-BPNN-ARIMA model predictions after the DRHT conversion constructed in this paper, it is evident that the Chinese energy consumption structure is still in a non-reciprocal state during 2023–2040, with the coal already dominating the energy consumption structure but gradually declining in importance. The percentage of non-fossil energy consumption will be 19.3% in 2025, 9.8% for natural gas and 21.5% for non-fossil energy in 2030, which is considerably different from the policy target, but the share of clean energy consumption has increased to 31.3%.

This paper incorporates compositional data into the study of China’s energy consumption structure forecast, which fully considers the overall structure and the internal characteristics required. However, this predictive approach is mainly based on historical data and does not take into account various effects such as actual polarization. Therefore, in the sequel, it is necessary to integrate the essential affecting forces of the energy consumption structure with the theory of compositional data to construct a multi-factorial dynamic predictive model.