Introduction

At present, enterprises worldwide are generally facing the challenges and opportunities of digital transformation. With the rapid development of information technology and the popularization and application of the Internet, digital transformation has become a key path for enterprises to enhance their competitiveness and adapt to market demand, while the digital economy has become more and more prominent in the economic field1. In 2022, the Cyberspace Administration of China released the Digital China Development Report2, stating that the scale of China’s digital economy reached 50.2 trillion yuan in 2022, the total amount of which ranked second in the world, with a nominal year-on-year growth of 10.3%, and the proportion of GDP increased to 41.5%. A number of core businesses of the digital economy, such as electronic information manufacturing, software business, industrial Internet, and agricultural digitization, have seen rapid year-on-year growth, meanwhile, the White Paper on the Development of China’s Digital Economy issued by the China Academy of Information and Communications Technology in 20223 also shows that the average annual growth rate of China’s digital economy since 2012 has been as high as 15.9%, significantly higher than the average GDP growth rate over the same period. And the Digital Economy Report 2021, published by the UNTCD4, makes it clear that the United States and China stand out in terms of their ability to participate in and benefit from a data-driven digital economy. These two countries have the world’s highest 5G penetration rates, are home to half of the world’s hyperscale data centers, and account for 94% of the world’s total AI startup funding over the past 5 years, 70% of the world’s top AI researchers, and nearly 90% of the market capitalization of the world’s largest digital platforms. Given this background, more and more scholars have begun to focus on the research field of enterprise digital transformation, exploring the future direction and prospects of enterprise digital transformation5,6.

Many studies have been conducted in the academia to address the influencing factors of digital transformation in enterprises. Some of these studies have focused on the impact of technical innovation on digital transformation, such as the use of web platforms7, artificial intelligence8, big data analytics9, and other emerging technologies in enterprise transformation. Meanwhile, some scholars have also analyzed the importance of factors such as organizational structure10, leadership thinking11, and employee competence12 for the success of digital transformation from an organizational perspective. In addition, environmental factors such as market competition, policies and regulations, and industry characteristics have also been included in the research13, furthermore, there are also studies that elaborate on the aspects of corporate digital strategy to explore the impact of different strategies on digital transformation14,15. Although the existing literature has empirically demonstrated the effects of variables of different characteristic dimensions on digital transformation, these effects are not single effects, but rather there are relationships such as complementary or substitution between individual characteristics, thus forming a compound effect under the combined effect of multiple factors. At the same time, existing studies use the traditional linear regression model, while in practice, the data related to digital transformation does not meet the linear assumption, that is, the variables may be non-linear relationship. As a result, traditional linear regression models often do not fit the data well, and there are limitations in dealing with nonlinear data.

To solve the problem of multiple factors, this paper will adopt the TOE (Technology-Organization-Environment) theoretical model to assess the degree of enterprise digital transformation. The “TOE” theoretical framework was initially proposed to study and comprehensively analyze the influencing factors that interfere with the adoption of innovative technologies by enterprises, and to classify the factors affecting technical innovation into three levels: technology, organization, and environment16. Examining the interactions of the three levels of factors within the same theoretical framework allows for a holistic view of the drivers of digital transformation. The technical level includes the application and innovation of existing digital technologies and the degree of knowledge intensity, the organizational level focuses on the organizational structure and governance structure, including the characteristics of the executive team, corporate competence, and financial status; and the environmental level concentrates on external macro factors such as the construction of digital infrastructure and monetary policy. Previous studies have shown that the TOE framework has broad applicability and explanatory power in the study of technology, organization and environment17. At present, scholars continue to expand this framework, for example, according to the nature of different enterprises or the specific situation of the industry, proposed new application methods such as TOE-I model or combination with TAM model, and the analysis of data results from many countries has proved the effectiveness and fundamental significance of TOE framework18,19,20. Meanwhile, this paper uses a machine learning model to process the data, which solves the nonlinear, high-dimensional, and large-scale data challenges that arise in the research process,in addition, the machine learning model has stronger predictive ability and adaptability, and can autonomously adjust and optimize according to the changes in the data, which significantly improves the prediction accuracy, and provides richer and more trustworthy prediction information21. In summary, this paper analyzes the role of the above set of factors on enterprise digitization through machine learning approach, quantifies the impact of each factor, and conducts a comparative analysis of different driving forces to provide a more accurate way to comprehensively understand the current situation and development trend of enterprise digital transformation, and to provide theoretical guidance and practical suggestions for the development direction of the implementation of digital transformation in the future enterprises.

Compared to the existing literature, the possible marginal contributions of this paper are as follows: first, at the theoretical level, based on the theoretical perspective of the holistic view, it has found that the multiple drivers affecting the digital transformation of enterprises are not a single effect, which not only evaluates and compares the predictive ability of different dimensions of driver characteristics for the digital transformation of enterprises, but also enriches the idea of the configuration perspective. Second, at the methodological level, most of the existing studies are still dominated by causal inference studies based on multiple linear regression, and only a few studies resort to configuration effects and fuzzy set qualitative comparative analysis (fsQCA). Although some scholars have used this method to focus on the composite effects of multiple factors, it is more suitable for explaining the complex nonlinear causal relationship between conditions and results, which is beneficial for qualitative research and cannot quantitatively predict the driving force of digital transformation in enterprises22,23,24. At the same time, considering that the fsQCA method is more suitable for a few easily classified case studies, in order to conduct a more universal predictive analysis of the driving factors of digital transformation in Chinese enterprises, this article selects A-listed companies in various industries in China from 2010 to 2020 as the initial sample, and for the first time, interdisciplinary machine learning methods are used to analyze the factors affecting enterprise digital transformation, constructing a more accurate prediction model for the intensity of enterprise digital transformation, enriched the application of machine learning methods in the field of economics. Third, at the practical level, this paper adopts the TOE model to take the three factors of technology, organization and environment into comprehensive consideration, and adds the benchmark variable. Meanwhile, the single influence and joint effect of each factor are quantified and compared, so as to predict the driving force of Chinese enterprises' digital transformation, and provide a better reference for the future strategy formulation of enterprises’ digital transformation.

Literature review

Application of machine learning in the economic field

The field of economics attaches importance to the study of empirical data, and the analysis of empirical data depends on analytical methods. With the innovative use of machine-learning methods, though it is more applied in natural sciences than in social sciences, the powerful learning ability and self-correcting ability of machine learning are very suitable for the quantitative analysis of the causal relationship among variables in the economic field. With more scholars studying and updating machine learning algorithms themselves, machine learning models have greater advantages in terms of analysis speed, accuracy and comprehensiveness of results25,26 and its application to the digital transformation of enterprises has begun to thrive. This study examines the application of machine learning in the field of enterprise digital transformation, summarizing as follows: (1) Akbari et al27. used Random Forest Regression to study the driving factors of economic and financial integration, concluding that integration is a gradual process. Meanwhile, the combination of Random Forest Regression and evidence theory can effectively improve the efficiency of enterprise financial risk early warning28 (2) Kamalov et al29. used Logistic Regression (LR), Random Forest Regression (RFR), Multilayer Perceptron (MLP) and Long and Short-Term Memory (LSTM) to analyze and compare the effectiveness that stock prices and stock returns have in predicting stock movements, discovering that the forecast stock price is more advantageous, (3) Nazareth and Reddy30 tested the application performance of machine learning in stock market forecast, investment portfolio management, ideal money, exchange market, financial crisis and bankruptcy and insolvency forecast31; also used machine learning model to explore the forecast of financial indicators for the return of Chinese stock market. (4) The study of32 confirmed that machine learning has a stronger early warning ability for economic crisis than traditional logic models and integration models. Samitas et al33. also uses machine learning as an early warning system for the financial crisis. (5) Achakzai and Peng34 developed a new machine learning model: Dynamic Integration Selection (DES) to detect fraud in financial statements. (6) Murugan35 used cluster-based XG Boost and cluster-based K-nearest neighbor KNN to analyze financial risk. (7) Mashrur et al36. stated that machine learning can predict the possibility of default of individuals or enterprises by identifying loan applicants and enterprises with similar characteristics.

The motivation for digital transformation

The core of digital transformation is to use digital technology to improve the existing organizational mode of enterprise management, fill the “data gap” between different departments of the enterprise, redesign the production and operation structure and management mode, to improve the efficiency of resource allocation and innovate the management mode37. Through the study of the driving factors, enterprises can understand the internal and external environment faced in digital transformation, to better carry out the digital transformation.

In recent years, many domestic and foreign scholars have discussed the preliminary factors of digital transformation of enterprises from the aspects of environment, organization, and management. Existing scholars have multiple dimensions of motivation for digital transformation of enterprises: (1) Technical motivation. Digital skills directly or indirectly affect digital transformation38. The individual investment in IT technology cannot produce the expected results. To have a positive impact on digital transformation, it is necessary to combine IT infrastructure with other capabilities of the company to further develop relevant transformation strategies39. (2) Organizational motivation. Both digital strategy and organizational ability have positive effects on digital transformation of enterprises40,41. (3) Manager motivation. Compared to other factors such as technology, awareness of managers is the biggest obstacle to digital transformation42,43. In addition, Hu et al44. concluded that the overseas education and work experience of senior executives were positively correlated with the level of digital transformation of enterprises. (4) The motivation of the digital economy. Li et al45. believed that digital economy can support enterprises to attain key elements of digital transformation, digital financial inclusion can also significantly improve digital transformation of enterprises46. (5) The motivation for intergenerational inheritance. The intergenerational inheritance of family businesses will promote digital transformation to some extent, but its inhibitory effect is greater than the incentive effect47. (6) Enterprise internal factors. In addition to enterprise size48, enterprise resources, enterprise capabilities and enterprise spirit affect digital transformation as well49. (7) Operating environment motivation. Luo et al50. found that the business environment can promote digital transformation of enterprises by attracting high-tech talents and increasing technology investment. (8) Policy motivation. Wang et al51. discovered that government support, including government subsidies and tax incentives, had a positive influence on digital transformation of enterprises by alleviating financing constraints, increasing R&D investment and improving risk bearing capacity. Moreover, climate policy52 and low carbon strategy53 are also influencing factors in digital transformation of enterprises. (9) Human capital motivation. Enterprise digitization not only includes the upgrade of digitization-related hardware assets, but also requires the software support of knowledge and skills of staff54. (10) Huang et al55. considered the changes in consumer behavior and the experience of several industry backbone enterprises realizing their own transformation through the construction of digital platforms constantly enable other enterprises to embark on the road of transformation. The degree of industry competition56 and the development level of regional big data57 are also key factors that affecting digital transformation of enterprises.

However, the above motivation studies are mainly based on a certain feature of a single dimension, lacking comprehensive consideration and comparative analysis of digital transformation motivation, and it is difficult to be applied to the whole sample. To solve the interaction and configuration effects of various dimensions, the indicators of each dimension can be classified and discussed. After comparing the similarities and differences of the characteristics of different motivation, this study applies TOE theory16 which divide the driving factors that affect digital transformation into technical motivation, organization motivation and environmental motivation. Technical motivation serves as an important support of enterprise digital transformation, incorporating enterprise innovation ability and absorption ability,organization motivation focuses on the enterprise internal governance and structure problems; environmental motivation mainly display in government regulation and market environment, which helps to discuss enterprise digital transformation motivation more comprehensively, with the aim of finding out the key drivers of enterprise digital transformation.

Methods

Research design

Research methods

Machine learning algorithms rely on traditional statistical and mathematical models to identify patterns and regulations in existing data and make predictions or decisions based on these patterns. This study applies the method of ensemble learning and a method of integrating multiple learners to achieve stronger out of sample generalization ability than a single learner. Referring to the existing literature27,35, the study chooses the most advanced Gradient Boosting Regression (GBR) and Random Forest Regression (RFR) method, and advanced ensemble learning methods LightGBM and XGBoost, comparing with multiple linear regression and LASSO in the linear research method. The regression mechanisms of the four methods used in this article are as follows:

Firstly, linear regression. Linear regression is a fundamental regression model that assumes a linear relationship between the dependent variable and the independent variable as Formula 1.

$$\begin{array}{c}y={\uptheta }_{0}+{\uptheta }_{1}{x}_{1}+{\uptheta }_{2}{x}_{2}+\dots +{\uptheta }_{n}{x}_{n}+\epsilon \end{array}$$
(1)

In Formula 1, \(y\) is the dependent variable while \({x}_{1},{x}_{2},\dots {x}_{n}\) are independent variables.\({\uptheta }_{0},{\uptheta }_{1},\dots {\uptheta }_{n}\) are model parameters and \(\upepsilon\) is an error term. The goal of linear regression is to estimate model parameters by minimizing the sum of squared errors (MSE) as shown in Formula 2.

$${\mathop {\min }\limits_{\theta } \frac{1}{m}\sum\limits_{{i = 1}}^{m} {\left( {y^{{\left( i \right)}} - \widehat{{y^{{\left( i \right)}} }}} \right)^{2} } }$$
(2)

Among them, \(m\) is the number of samples, \({y}^{\left(i\right)}\) is the true value of the i-th sample, \(\widehat{{y}^{\left(i\right)}}\) It is the predicted value of the i-th sample. By estimating regression coefficients, new independent variable values can be predicted and the relative importance of different independent variables to the dependent variable can be evaluated.

Secondly, LASSO regression. Lasso regression is an improvement on linear regression that adds an L1 regularization term while minimizing the sum of squared errors, as shown in Formula 3.

$$\begin{array}{*{20}c} {\mathop {\min }\limits_{\theta } \frac{1}{m}\sum\limits_{{i = 1}}^{m} {\left( {y^{{\left( i \right)}} - \widehat{{y^{{\left( i \right)}} }}} \right)^{2} } + \alpha \sum\limits_{{j = 1}}^{n} {\left| {\theta _{j} } \right|} } \\ \end{array}$$
(3)

Among them, \(\mathrm{\alpha }\) is a regularization parameter used to control the complexity of the model, \({\uptheta }_{j}\) is a model parameter other than the intercept term. The purpose of LASSO regression is to prevent overfitting of the model and improve its generalization ability by punishing larger parameter values.

Thirdly, Gradual Boosted Regression Trees (GBR). Progressive gradient regression tree is an ensemble learning method based on tree models, which generates multiple trees through multiple iterations, and then weighted and summed the predicted results of these trees to obtain the final predicted value. The objective function of gradient boosting decision tree is Formula 4.

$$\begin{array}{*{20}c} {\mathop {{\text{min}}}\limits_{{{\theta }} } \sum\limits _{{i = 1}}^{m} l\left( {y^{{\left( i \right)}} ,\widehat{{y^{{\left( i \right)}} }}} \right) + \sum\limits _{{k = 1}}^{K} \Omega \left( {f_{k} } \right)} \\ \end{array}$$
(4)

Among them, \(l\) is the loss function used to measure the difference between the true and predicted values, \(\Omega\) is the regularization term used to control the complexity of the tree, and \({f}_{k}\) is the function expression for the k-th tree, and \(K\) is the number of trees. The advantage of gradient boosting decision trees is that they can optimize the loss function through gradient boosting, and can handle different types of loss functions, such as square loss, absolute loss, logarithmic loss, etc. The parameter estimation of gradient boosting decision trees can be solved through methods such as gradient boosting or Newton boosting.

Fourthly, Random Forest (RFR). Random forest is an ensemble learning method based on tree models, which generates multiple decision trees through multiple random sampling, and then weights or votes the predicted results of these trees to obtain the final predicted value. The objective function of a random forest is Formula 5.

$$\begin{array}{*{20}c} {\mathop {{\text{min}}}\limits_{{{\theta }} } \sum\limits _{{i = 1}}^{m} l\left( {y^{{\left( i \right)}} ,\widehat{{y^{{\left( i \right)}} }}} \right) + \sum\limits _{{k = 1}}^{K} \Omega \left( {f_{k} } \right)} \\ \end{array}$$
(5)

\(l\),\(\Omega\),\({f}_{k}\),\(K\) have same meaning as in GBR. The advantage of random forest is that it can improve the efficiency and effectiveness of the model through techniques such as parallel computing, self-help, and feature random selection. At the same time, it can handle problems such as missing values and category features. The parameter estimation of random forests can be solved through methods such as self-help or extreme random trees.

Fifth, XGBoost. XGboost is an ensemble learning algorithm based on gradient boosting trees, which can be used for both regression and classification problems. Firstly, it uses an optimization strategy called Extreme Gradient Boosting, which can build and train models on multi-core cpUs in parallel, thus greatly improving the computational speed and efficiency. Secondly, it adds a regularization term, which can control the complexity and overfitting risk of the model. The regularization term includes the number of leaf nodes in the tree, the sum of the squares of the weight of each leaf node (the score value of the leafnode), etc. The loss function is

$$\begin{array}{c}L\left(\phi \right)=\sum\limits_{i} l\left({\widehat{y}}_{i},{y}_{i}\right)+\sum\limits_{k} \Omega \left({f}_{k}\right)\end{array}$$
(6)

where, \(L(\phi )\) represents the loss function, \({\widehat{y}}_{i}\)​ represents the predicted value of the first sample in the first iteration (the first tree), \({y}_{i}\) represents the true value, and \(\Omega ({f}_{k})\) represents the regular term.

Sixth, LightGBM. LightGBM is a machine learning method based on Gradient Boosting Decision Tree (GBDT). It has the following characteristics: it supports categorical features, and can directly process numerical and categorical data without one-hot coding; It supports histogram optimization, which can reduce the number of traversals of the global data set and improve the speed of decision tree construction. Gradient-based One-Side Sampling can reduce the sampling times of large Gradient samples and improve the generalization ability of the model. Exclusive Feature Bundling can combine unrelated or conflicting features into one feature to reduce feature dimension and computation. Leaf-wise with depth limitation is supported to avoid the problems of over-fitting and premature convergence. The corresponding loss function value of each sample at each leaf node is formulated as follows:

$$\begin{array}{*{20}c} {L\left( \phi \right) = \frac{1}{2}\sum\limits_{{i = 1}}^{n} {\left[ {{\text{log}}\left( {\frac{{f\left( {x_{i} } \right)}}{{f\left( {x_{{i + 1}} } \right)}}} \right) + \gamma \sum\limits_{{j = 1}}^{m} {y_{i} } \left( {f\left( {x_{i} } \right) - f\left( {x_{i} } \right)} \right)} \right]} } \\ \end{array}$$
(7)

where: \(n\) is the number of training samples, \(m\) is the number of categories, \({x}_{i}\) is the feature vector of the first sample, \({y}_{i}\) is the category label of the first sample, \(\gamma\) is the weight coefficient, \(f\left(x\right)\) is the predicted value.

In summary, ensemble learning methods effectively compensate for endogeneity and other shortcomings caused by non-linear relationships and interactions between variables in linear relationships, and thus perform well in out of sample prediction tasks58. Therefore, the predictive effect of ensemble learning methods on the intensity of enterprise digital transformation should be better than linear research methods such as multiple linear regression.

Model setting

To select a more effective prediction model, the model performance is investigated based on model interpretation power and prediction error. In terms of model interpretation ability, refer to the existing literature29, this study adopts the following three indicators: (1) In-sample goodness of fit \({\text{R}}_{\text{Is}}^{2}\), the index is used to evaluate the degree of fitting of machine learning model on training data, measure the model prediction effect of the training set, the higher the advantages of fitting in the sample, the higher the explanatory ability of the model. (2) Out-of-sample goodness of fit \({\text{R}}_{\text{oos}}^{2}\). To overcome the defects of the In-sample goodness of fit that it cannot completely reflect the generalization of the model on the new data, this article further selects the Out-of-sample goodness of fit \({\text{R}}_{\text{oos}}^{2}\) to measure the universality of the model.(3) Explanatory variance \({{\text{EVS}}}_{{\text{oos}}}\). It is used to measure the interpretation degree of the variability of the dependent variable, and can explain the variance, that is, to calculate the variance between the predicted value and the observed value, and then measure the generalization ability of the model from the perspective of the variance.

In terms of model prediction error, according to the existing research59,60 , out-of-sample mean squared error \({MSE}_{oos}\) is selected to measure the deviation between the predicted value and the actual value. If the model performs well on the training data but has a high mean squared error on the test data, there may be a problem of overfitting, namely that the model does not adapt well to the new data. Therefore, by calculating the out-of-sample mean-square error, the study can evaluate the performance of the model more comprehensively and determine whether it has good generalization ability. Meanwhile, to avoid the influence of extreme values, the average absolute error \({MAE}_{oos}\) and the absolute median difference \({MedAE}_{oos}\) are also used to improve the prediction accuracy of the model. The implications and calculations of the evaluation indicators are shown in Table 1.

Table 1 Model evaluation indicators and calculation methods.

Moreover, one of the main advantages of ensemble learning is that the disadvantages of a single model can be reduced by combining multiple underlying models, so it is difficult to capture the interpretation results of a single learner. In this regard, this study uses relative importance and partial dependence graph to make up for the above deficiencies and interpret the practical significance of ensemble learning. Initially, relative importance refers to the relative contribution degree or influence of each factor to the outcome during model fitting. According to the practice of Nazareth and Reddy30, given that the rest of the model remains constant, the relative importance of the variable can be obtained by measuring the decrease of the loss function caused by adding a variable to the model. The greater the relative importance is, the stronger the ability of this variable to predict the intensity of the digital transformation of enterprises. Secondly, the partial dependency graph refers to the measurement of the influence of a certain variable on digital transformation of an enterprise, if other features remain unchanged, and then displayed in the form of images to attain more visual features. In addition, it makes the single variable more accurate in predicting the degree of enterprise digital transformation61.

Data sources and variable definitions

Data source

In this study, the A-share listed companies from 2010 to 2020 are taken as the initial sample, namely listed companies in Shenzhen Stock Exchange and Shanghai Stock Exchange of China. Company data derives from the Wind and CSMAR databases. In order to exclude the interference of some special observation samples to the prediction results, this study handles the data as follows: (1) Excluding enterprises with abnormal ST, PT and other listing status, avoid the interference with the overall prediction effect because of the abnormal operation of the enterprise itself; (2) Eliminate the samples with serious missing data; (3) The continuous variables in the data are winsorized according to 1% and 99% quantiles to avoid the interference of extreme outliers. Finally, 8310 observed values are obtained, and the yearly distribution of observations is shown in the Table 2.

Table 2 Yearly distribution of observations.

Variable definition

This study selects the Digital Transformation Index (Digitaltransindex) in the CSMAR database as the response variable. According to the CSMAR variable, the response variable using the annual report of enterprise digital transformation related word frequency statistics, including artificial intelligence (AI), block chain (BD), cloud computing (CC), big data (BD) and the application of digital technology (ADT) five parts, this measure can effectively reflect the enterprise digital transformation and transformation degree, detailed calculation are listed in the variable table.

According to the theoretical framework of TOE and the existing research on the driving force of enterprise digital transformation, this study selects the driving force characteristics of the model from the following three dimensions: Technical dimension, this study uses Tamayo et al38. to select the intensity of R&D expenses and the technical size as the measurement index of innovation ability and absorption ability. Organization dimensions, referring to Li et al57., Schoar and Zuo62, Chen et al63. and Bandiera et al64., the study selected senior manager team size (Manager Number), senior executives’ knowledge level (Education Level), senior social capital (Social Network), profitability (ROA), growth (Growth), enterprise value (TobinQ), solvency (Lev), equity concentration (Top Ten Holders Rate), duality of chairman and general manager (Duality), and proportion of independent directors (IndDirector Ratio) and other ten variables to Measure characteristics of organizational drive characteristics. Additionally, referring to the research of Li et al49., Luo et al50., Wu and Wang65, financial support (Financial Support), infrastructure index (Infrastructure Score), monetary policy easing (Monetary Policy), intellectual property protection level (IP Protection), and industry competition pressure (HhiD) are taken as variables to measure the environmental characteristics of media companies.

In addition, the benchmark variable group refers to Li et al57,66., Zhao et al67. and Hanelt et al68., we set up past performance (Past Revenue), cash flow ratio (Cash Flow Ratio), enterprise age (Firm Age), enterprise size (Size), ownership (SOE), etc. As shown in also Table 3.

Table 3 Variable definition.

Empirical results and the analysis

Descriptive statistics

According to Table 4, the average value of Digitaltransindex is 37.7564, and the standard deviation is 11.8132, which indicates the degree of digital transformation of different enterprises is significantly different, and the characteristics of other variables have no outliers, which demonstrates the rationality of the prediction.

Table 4 Descriptive statistics.

The fitting results of the model based on the enterprise digital transformation index prediction

Table 5 lists the prediction results of the models constructed by different ensemble learning methods for the degree of enterprise digital transformation. The results in Column (1) show that the in-sample goodness of fit \({\text{R}}_{\text{Is}}^{2}\) of multiple linear regression, LASSO model and GBR, which are all lower than 0.54. While the results of RFR, XGBoost and LightGBM are high, all higher than 0.9, among which XGBoost has reached 0.9867 and shown that the ensemble learning method has better in-sample fitting effect. In addition, the results of columns (2) and (3) of Table 5 show that the out-of-sample goodness of fit \({\text{R}}_{\text{oos}}^{2}\) and explanatory variance \({\text{EVS}}_{\text{oos}}\) of LightGBM have the highest values, which are 0.7350 and 0.7353 respectively, followed by XGBoost, and the four indexes of the two methods are all higher than 0.72. It illustrates that ensemble learning method can better predict the degree of digital transformation of enterprises. As can be seen from column (4), the out-of-sample mean square errors \({\text{MSE}}_{\text{oos}}\) of XGBoost and LightGBM are smaller than those of the other four methods. Finally, columns (5) and (6) show that XGBoost and LightGBM have lower mean absolute errors \({\text{MAE}}_{\text{oos}}\) (5.3023 and 5.2542) and lower median differences \({\text{MedAE}}_{\text{oos}}\) than the linear regression method. This indicates that the model improvement effect is not obvious after removing the off-bias values.

Table 5 Results of model fitting.

In summary, XGBoost and LightGBM in the ensemble learning method have better data fitting effect, so that a research model with more accurate prediction effect can be constructed. This paper will further discuss the driving force and key factors of enterprise digital transformation.

Differences in the driving force dimensions of enterprises’ digital transformation prediction ability

To explore the differences in the prediction ability of different driving forces on the strength of enterprise digital transformation, this study refers to Chen63, and selects the benchmark models of past performance (Past Revenue), cash flow ratio (Cash Flow Ratio), enterprise age (Firm Age), enterprise size (Firm Size) and ownership (SOE). Then, referring to Bertomeu et al69., calculate and compare the predictive performance of different combinations of TOE theoretical models added to the benchmark model. Considering that the research conclusions obtained based on different evaluation indicators are basically the same, this study analyzes the out-of-sample goodness of fit \({\text{R}}_{\text{oos}}^{2}\), and the research results are as shown in Table 6.

Table 6 Prediction performance under different combinations of driving forces.

Firstly, the difference in the predictive ability of a single dimension driving force for the intensity of enterprise digital transformation is considered separately. As shown in the second row of Table 6, the prediction effect is the best when the technical features are added to the benchmark model. Taking LightGBM as an example, the out-of-sample goodness of fit of the model is improved to 0.7073, 0.7111 and 0.6583 after adding the characteristics of technical driving force, organizational driving force and environmental driving force into the benchmark model respectively. Secondly, considering the combination of two different types of motivations, comparing the out-of-sample goodness of fit among different groups in Table 6. It is found that the model with organizational driving force in the combination has the best fitting effect. Finally, when all three driving forces are integrated, LightGBM has the strongest explanatory power, followed by XGBoost. According to the prediction results, enterprises need to pay attention to the improvement of organizational driving forces, such as the proportion of top ten shareholders and the knowledge level of the top management team. At the same time, enterprises need to pay attention to changes in the external business environment, so as to seize the opportunity of profitable policies and improve the intensity of digital transformation. The following section will make a detailed analysis of the differences of single factors based on LightGBM and XGBoost, and put forward more specific suggestions for enterprises.

Differential analysis of the prediction ability of digital transformation by key factors under different driving forces

Based on the above analysis, the prediction effect of XGBoost and LightGBM is better. Therefore, the two ensemble learning methods of XGBoost and LightGBM are applied to compare the difference in the prediction ability of different variables in the machine learning model for the intensity of enterprise digital transformation by comparing the relative importance. Figures 1 and 2 report the ranking of relative importance of variables, and Table 7 shows the top 15 variables of relative importance in LightGBM and XGBoost prediction methods, which indicates that these characteristics are the key factors affecting the digital transformation of Chinese companies.

Figure 1
figure 1

Relative importance ranking based on XGBoost.

Figure 2
figure 2

Relative importance ranking based on LightGBM.

Table 7 Ranking of relative importance (Top 15).

Prediction model of the intensity of digital transformation of enterprises by important driving factors

Based on the relative importance and ranking of the variables in Figs. 1 and 2 and Table 7, this study selects innovation ability (R&D Expenses), equity concentration (Top Ten Share Holder Rate), executive knowledge level (Education Level), industry competition degree (HhiD) and past performance (Past Revenue). These variables have higher relative importance in the dimensions of technical, organizational, environmental and benchmark respectively, and have a stronger impact on the digital transformation of enterprises. Meanwhile, they are of universal significance for the digital transformation of companies in different industries. Figures 3, 4, 5, 6 and 7 is partial dependence diagram under LightGBM and XGBoost method.

Figure 3
figure 3

Partial dependence on R&D expenses.

Figure 4
figure 4

Partial dependence on Top Ten Share Holder Rate.

Figure 5
figure 5

Partial dependence on Education Level.

Figure 6
figure 6

Partial dependence on HhiD.

Figure 7
figure 7

Partial dependence on Past Revenue.

Figure 3 is partial dependence on R&D expenses. This research selects the R&D investment ratio of enterprises as the proxy variable of innovation capability. As shown in the figure, when the R&D investment of an enterprise is higher than 10%, with the increase of the proportion of investment, the degree of digital transformation also shows a fluctuating upward trend, and reaches the peak when the R&D investment reaches about 42%. When the R&D investment reaches more than 45%, the transformation degree remains at a high level and tends to be flat. R&D investment has the highest relative importance in the technical dimension, indicating that it plays the strongest driving role in the process of digital transformation. Therefore, managers should attach great importance to innovation, not blindly increase R&D expenses, and timely adjust the process of digital transformation.

Figure 4 shows the partial dependence diagram of equity concentration. This paper selects the shareholding ratio of the top ten shareholders as the proxy variable. In general, the fluctuation degree of the image is high, but it still shows a negative correlation trend. When the ratio is around 40%, the degree of transformation is relatively high, and it has a significant decline after reaching 57%. This shows that high equity concentration is not conducive to digital transformation, which is also related to the principal-agent problem within the enterprise. In order to promote the digital transformation and promote the innovation and sustainable development of enterprises, enterprises can introduce more shareholders and stakeholders to make more reasonable decisions.

Figure 5 shows the partial dependence diagram of executives’ knowledge level, which is calculated by assigning and weighting the senior executives’ education level. As shown in Fig. 5, the general trend is that the higher the level of management knowledge, the higher the degree of digital transformation. In particular, the independent variable rises steeply when it reaches 2.7, and then gradually increases. After peaking around 3.6, it begins to decline rapidly. As decision-makers, senior executives with higher education level are better able to accept and implement innovation strategies. At the same time, they also possess professional knowledge and leadership, and can lead the enterprise team to maintain smooth operation in technology research and development, operation and management. Therefore, enterprises should increase the introduction of highly educated talents, optimize the configuration of the top management team, further improve the overall quality and ability level of the top management team, and lay a solid foundation for digital transformation.

Figure 6 shows the partial dependence diagram of industrial competitive pressure, and the proxy variable is the Herfindahl index of the industry in which the enterprise is located. The higher the Herfindahl index, the higher the market concentration, the lower level of the competition. As shown in the figure, it is difficult to describe the relationship between the digital transformation of enterprises and the competitive pressure of the industry with a simple linear relationship. When the Herfindahl index is around 0.02, the degree of digital transformation is the highest. Then it drops sharply, and maintains a relatively stable trend in the range of 0.05–0.10 with a small peak. After reaching 0.18, the digital transformation intensity continues to decline. In general, the greater the competitive pressure in the industry, the higher the degree of digital transformation. Therefore, enterprises in highly competitive industries need to pay attention to the market environment in a timely manner, strengthen the implementation of digital transformation strategy, and establish competitive advantages.

Figure 7 is the partial dependence diagram of the past performance of the enterprise, which natural logarithm of the company’s operating income at the end of the year as the proxy variable. As shown in Fig. 7, the past performance of enterprises shows a positive trend. When it reaches 21.5, the magnitude of the positive impact of past performance on digital transformation gradually becomes larger, accompanied by the appearance of small peaks. Therefore, the annual operating income of the enterprise positively promotes the digital transformation of the enterprise, and the gradient of the influence increases when it reaches a certain value. As a benchmark variable, past performance also ranks high in relative importance among all variables, which proves its universality. Enterprises should first pay attention to the main business, provide funds and operational capacity guarantee for digital transformation, so as to carry out digital reform according to the business situation, and realize the mutual promotion.

Robustness test

First, change the training set division method. In the main test of this study, we use 8:2 proportion in random classification to determine the training set and test set, which weakens the randomness to some extent. To evaluate the performance and generalization ability of the model more accurately, K-fold cross-validation is used to replace the training set. The basic principle of K-fold cross-validation is to divide the original data set into K subsets of similar size, where K-1 subsets are used as the training data while the remaining 1 subset is as the validation data. Then, it was repeated K times and a different subset was selected as validation data each time, resulting in the performance evaluation of K models. Usually, we use the average of the results as the final performance evaluation index of the model. The advantage is that it can fully utilize a limited dataset and reduce the variance of model evaluation results. By multiple verifications and averaging, we can more accurately evaluate the performance of the model on different subsets of data, reduce evaluation bias caused by a specific dataset, and provide more reliable evaluation results. The steps of K-fold cross-validation in machine learning are as follows:

  1. 1.

    Divide the original dataset into K subsets of similar size, taking K values of 10.

  2. 2.

    For each subset i (i from 1 to K), take it as the validation set and combine the other K − 1 subset as the training set.

  3. 3.

    In each training session, the model was trained using the training set and evaluated on the validation set.

  4. 4.

    Calculate the evaluation indicators of the model on the validation set, such as accuracy, recall rate, etc.

  5. 5.

    Repeat steps 2 to step 4 to treat the different subsets as validation sets until each subset is used as a past validation set.

  6. 6.

    Average the validation results of K times to obtain the final performance evaluation index of the model.

Based on the process, K-fold cross-validation can obtain more stable evaluation results from repetition of the process to reduce the contingency caused by different data division. Meanwhile, for small data set, K-fold cross-validation can better evaluate the performance of the model, reducing overfitting or underfitting issues caused by a lack of data. As shown in Table 8, after replacing the training and test sets using the K-fold test, the correlation findings compare Table 5 with no change.

Table 8 Test of robustness -Panel A.

Second, change the measurement indicators of the intensity of digital transformation. To eliminate outlier or other factors that may affect the uncertainty, this study replaces the measurement indicators of the intensity of digital transformation in enterprises. According to Xiao et al54., we use different entry to measure the intensity of digital transformation, eliminating the entry of “digital technology application” from the application level and keeping only basic digital technology level entries “artificial intelligence”, “chain of block technology”, “cloud computing” and “big data technology” . After the total frequency plus 1, we take natural logarithm as the new response variable. The model was re-trained and evaluated using the new response variable. The specific test results are shown in Table 9, the results after the change are consistent with the main test, indicating that the model in this study is robust.

Table 9 Test of robustness-Panel B.

Discussion

Through reviewing the existing literature, it is found that scholars mainly focus on the correlation between a factor of a single dimension and the intensity of enterprise digital transformation, and only make predictions within the sample, lacking comprehensive consideration of the driving force of enterprise digital transformation. In this study, the driving force of enterprise digital transformation is divided into three dimensions: technical driving force, organizational driving force and environmental driving force. By combining and comparing the driving forces of two or three dimensions, the differences in the predictive ability of different dimensions of indicators is listed and the relatively key driving factors are identified. Meanwhile, most existing studies only use traditional econometrics as a tool, which makes it difficult to avoid the interaction between factors and has certain endogeneity issues.

This study takes the relevant data of Chinese A-share listed companies from 2010 to 2020 as samples, discusses the driving force of digital transformation in enterprises, and innovatively uses ensemble learning methods to conduct analysis, which can improve the accuracy of model prediction and enhance its generalization ability. With relative importance ranking and partial dependence graphs, by comparing the fitting effects of adding different dimensional factors to the benchmark model, it is found that technical factors can more effectively and accurately predict the digital transformation behavior of enterprises. This means that in the process of enterprises pursuing digital transformation, technology driving force dominates. Compared with linear methods such as multiple linear regression, the ensemble learning method achieves better performance in high model interpretation ability and less prediction error, among which XGBoost method has the best prediction performance when applied to the samples used in this study. Among many driving force characteristics, equity concentration and knowledge level of executives in the dimension of organizational driving force, and innovation ability in the dimension of technical dimension have the best prediction effect.

Based on the above conclusions, this study proposes the following policy suggestions:

  1. (1)

    For governments, policy support, financial support, technical support, and cooperation opportunities should be provided for enterprises. Financial and tax incentives can be provided to encourage enterprises to invest in the construction of digital technology and information system. Set up special funds to increase the digital infrastructure construction such as network foundation design, cloud computing center and data center, etc. For enhancing the operation performance of enterprises, government can organize professional team and cooperation institutions for technical staff training, encourage higher education institutions, research institutions, and others to participate in the research and innovation work of digital transformation.

  2. (2)

    For the senior management team in enterprises, the strategic goal and path of digital transformation should be clarified. They should strengthen the reserve of high-level talents, and reasonably adjust the proportion of technology research and development. As shown in Fig. 3, when the R&D investment of an enterprise is around 40%, it plays a greater role in promoting the impact of digital transformation. Enterprises should maintain this proportion as much as possible, not blindly invest in R&D, and maximize the transformation. At the same time, enterprises should also assess the risks in the process of digital transformation, take appropriate risk control and response measures, pay attention to the industry policy direction and enterprise value. They can make use of the good economic situation to carry out the layout of transformation. In the process of transformation, performance management is important. Enterprises should actively adjust and innovate their organizational structure, business process and working mode, take the lead in ensuring the stable growth of main business. Then seize the opportunity to carry out digital technology research and development, implement digital transformation strategy, and ensure sufficient funds and organizational stability in the process of transformation.

  3. (3)

    For scholars, continue to focus on the trend of digital transformation. Write professional reports and application cases to provide valuable information and guidance for enterprises and governments, vigorously apply research results to practical scenarios, help enterprises solve practical problems, promote the process of digital transformation, and promote the mutual flow of knowledge and technology.

The limitations of this study are as follows: First, because the data in this study are not randomly sampled, but based on the availability of data, they are not without significant differences from the industry and size distribution of China’s A-share companies, which may lead to the difference in the prediction effect of the potential fitting model. Secondly, the TOE framework cannot cover all the relevant variables and driving factors, for example, the differences in digital transformation modes of different enterprises caused by the characteristics of different industries are not examined. A separate discussion on the degree of digital transformation in each industry will be one of our future research directions. Third, the machine learning methods used in this paper are all black box algorithms. Despite the data robustness test, there is still a risk that the empirical results will be biased due to the errors generated by the algorithm itself. Therefore, it can be considered to combine other analysis methods to make a more comprehensive consideration of enterprise digital transformation.