Introduction

In the digital economy era, media has become ubiquitous in our lives. With the help of digital technologies such as mobile Internet, big data, cloud computing, the Internet of Things (IoT) and artificial intelligence, companies involved in the media industry are facing new development opportunities. The 14th Five-Year Plan (2021–2025) in the National Economic and Social Development and Vision 2035 of the People’s Republic of China lists digital transformation as a national development strategy. This plan clearly outlines the implementation of the digital cultural industry, in accelerating the development of new cultural enterprises, cultural forms, and cultural consumption patterns, and strengthening digital creativity, network audio-visuals, digital publishing, digital entertainment, and the online broadcast industry. COVID-19 has had a serious negative impact on China’s economic and social development. However, driven by digital technology, the traditional media industry pattern has undergone fundamental changes, and the digital transformation and development of media has become an inevitable trend1. In 2022, China’s media industry output fell 2.11% at a value of 290,825 billion yuan, presenting an overall decline but a local rise. Internet advertising, Internet marketing services, mobile data and Internet business, online games and other traditional high-output areas show different degrees of negative growth. The total revenue of radio and television advertising, book sales, and newspaper industry was less than that of the network audio-visual field2.

Although media companies are part of the media industry, their driving forces and the pressures of digital transformation differ due to their different businesses. China’s media industry has certain ideological attributes, especially political attachments represented in publishing, radio, and television mainstream media3,4. Based on guidance promoting the amalgamation of traditional and emerging media in 2014, the Chinese government provides huge external policy support for media companies whose main businesses are in publishing, radio, and television. Advertising and film industries have also been affected by the digital transformation from business processes to organizations and business models. Hence, it is necessary to understand how to overcome the internal differences in the industry and make a comprehensive evaluation and prediction of the digital transformation of Chinese media companies.

Existing research has largely focused on the correlation between single-dimension characteristics and media digital transformation, only making predictions within a sample. There is little comprehensive consideration of the driving forces in the digital transformation of media companies5,6,7. This study is based on the theory of TOE (Technology-Organization-Environment), taking the media industry category as an important dimension in predicting the digital transformation of media companies, and building an out-of-sample prediction model for digital transformation (i.e. a “TOE-I” model) of media companies from the dimensions of technology, organization, environment, and media industry classifications. We aim to analyze the differences in the prediction ability of digital transformation behavior of different types of elements in varying dimensions and to identify the main factors and patterns of influence that drive media companies to participate in digital transformation. Our highlight can be divided into two aspects: (1) the advanced empirical research methods based on machine learning, and (2) the innovative theoretical framework which is applicable to the prediction of Chinese media companies’ digital transformation.

This research innovatively adopted the ensemble learning method in machine learning to explore the main cause behind the digital transformation of Chinese media companies. The advantages of this method are as follows: Firstly, the existing literature regarding media companies’ digital motivation has tended to use multiple linear regression, which cannot accurately reflect the complex relationship between variables due to the nonlinear characteristics of normal relationships. In contrast, ensemble learning can effectively handle the nonlinear relationship and possible interactions between variables; thus the resulting model can better reflect the information. Furthermore, the ensemble learning method does not need to preset the functional form of variables in the model but fits the relationship between variables as much as possible according to the training set data. This is more suitable for predictive analysis than traditional linear research methods8. Secondly, ensemble learning can make use of relative importance and partial dependence plots and other means to analyze different variables on the prediction ability of digital behavior, and can describe the specific variables affecting the data transformation of media companies to facilitate comparative analysis between variables. Thirdly, although machine learning methods are applied more in the field of natural sciences than in social science9, the advanced learning power and self-correction ability of machine learning are suitable for quantitative analysis of causal relationships between economic variables. They can also produce more accurate estimates of control variables, such as fixed effects outside the sample. Therefore, compared to traditional multiple linear regression, machine learning has the advantages of flexibility, accuracy, and foresight.

The media has undergone significant changes in a short period, due to technological changes. From publishing, broadcasting, and television, to the digital platform-based Internet and marketing, there is a significant gap between the business of companies in the media industry and the production of content and products. As mentioned earlier, the media industry represented by radio, television, and newspapers has strong political attributes that differ from the advertising, marketing, and gaming industries that actively integrate into international capital operations and look at the global market. Therefore, following existing literature practices10,11,12, this article innovatively divides the driving forces of the digital transformation of Chinese media companies into the following four dimensions of driving forces (adding “I” to “TOE”): (1) Technical, (2) Organizational, (3) Environmental, and (4) Industrial. This article explores the factors that drive or hinder the digital transformation of media companies from these perspectives. Based on the research results, corresponding policy implications are proposed.

This study is based on the “TOE-I” framework. Part 1 is the introduction of our research and in Part 2 we conduct the literature review mainly focused on the application of machine learning in media economics and the basic “TOE” framework. Part 3 is the methodology including the research design, data sources and variable definitions and empirical results and analysis. Part 4 is the conclusion of our main findings and we provide suggestions for policy makers and companies.

Literature review

The influencing factors of enterprise digital transformation

Current relevant research into enterprises’ digital transformation largely focuses on two perspectives: the driving and hindering factors of digital transformation, and the impact of digital transformation on all aspects of the enterprise. Regarding the drivers of digital transformation, Verhoef et al.13 focus on the response of enterprises to the changes in digital technology, increased digital competition, and the consequential digitalized customer behavior. While emerging digital technologies reduce labor costs, competition among companies intensifies and consumer preferences change accordingly. Yan et al.14 found that mixed ownership reform is the catalyst for digital transformation and is also the key driving force for the sustainable development of China’s state-owned enterprises. In addition, digitalization has become an important factor affecting the decision-making processes of entrepreneurs, and digital strategy is an important part of the corporate strategy of enterprises15. Regarding the factors hindering the digital transformation of enterprises, Roman and Rusu16 found that a lack of technology and capital negatively influences the digital transformation of enterprises, and that digital infrastructure has become an external factor affecting the digital transformation of enterprises. Looking at the impact of digital transformation, there is a focus on the relationship between digital transformation and the performance of enterprises17. Driven by the pursuit of profits, digital transformation has promoted the financialization of enterprises, especially among companies with poor internal and external governance18. The digital transformation of enterprises significantly promotes mergers and acquisitions by reducing internal organizational costs and is more significant among private enterprises19.

There are also many studies on digital transformation in different industries and different types of enterprises. For instance, Lange et al.20 collected data from semi-structured interviews and concluded capital to be a driver for digital start-ups in massive and rapid business scaling (MRBS). Roman and Rusu16 established an econometric-based model and highlighted the relationship between the performance of SMEs and digital transformation indicators. Ardolino et al.21 focused on how digital capabilities (IoT, cloud computing, and predictive analytics) support the service transformation of industrial enterprises.

Application of machine learning in the field of media economics

Media economics is a discipline at the intersection of economics, management, and communication. It has shifted from the traditional media industry represented by the printing, television, and film industries to the new media research period with the Internet, digital platforms, and mobile communication media as the main focus22. The study of media economics under the corporate paradigm, and the evaluation of the operational results of media organizations has always been an issue. To evaluate business performance, Huang23 put forward an evaluation system to measure the financialization level of a media company, including the index of the ownership structure, shareholder value, financial asset holding ratio, and financial investment rate. Sheng et al.24 established an evaluation system on the performance of media organizations’ mergers and acquisitions to evaluate the value-creation ability after mergers and acquisitions. Xie and Li5 looked at the evaluation of the competitiveness of listed media companies during the big data era.

At the core of artificial intelligence technology, machine learning technology has been widely used in the field of journalism and communication; for example, in the mode reformation of content production25,26, the prediction and discovery of social media trends, and the emotional analysis of users27,28,29. These methods are occasionally used in predicting and analyzing the operation and development of media companies. Pan and Wang6 used machine learning methods to conduct text analysis of the annual report information of media companies to identify the relationship between digital transformation and the value of cultural enterprises. Shi and Wang30 focused on the advertising industry, combining artificial neural network (ANN) algorithms to achieve intelligent evaluation and predictive analysis of advertising publishing and click results, and to optimize the resource utilization efficiency of the advertising industry. Sun et al.31 used text mining and natural language processing (NLP) technology to conduct an emotional analysis on negative reports on the operation and financial status of media companies and established a warning mechanism for adverse impact on financial status.

In summary, this study has found that in the field of media economics, research is focused on macro perspective industry characteristics such as the operation and management of integrated media and the development and operation models of new media formats. From the perspective of micro market entities, however, there is still a lack of theoretical and empirical research on the transmission path to the industry change brought by digital transformation within media companies. Analysis and research on the driving factors of the implementation of digital transformation strategies in media companies through the predictive ability of machine learning is rare. This research aims to find indicators to measure the degree of digital transformation in the media industry and applies the machine learning method to identify the key elements driving the digital transformation of the media industry in China.

The “TOE-I” prediction model

The theoretical framework of TOE (Technology, Organization, Environment) was first proposed by Tornatzky and Fleischer to comprehensively study and analyze the influencing factors that may cause interference when enterprises adopt innovative technologies32. At the technology level, the TOE framework considers the influence of the internal technical level and technical support-related factors within an organization; that is, whether the enterprise can apply existing technologies, which is the basis for enterprises to adopt innovative technology33,34. At the organizational level, to achieve the future application of innovative technologies within the organization, the focus is on the composition of specialties and responsibilities of organizational personnel at different levels35,36. Environmental factors represent the macro external characteristics of the specific environment where the organization operates, such as government policies37, competitive pressure, and the business environment38. The TOE framework systematically considers technology and organizational factors both inside and outside organizations so it has strong systematization and operability.

However, industrial segmentation in different industries faces various digital transformation challenges in the digital era. Some scholars have proposed that to determine the specific factors in these three backgrounds and establish the potential relationships between these factors, the TOE framework serves as a basic framework to integrate other relative elements33. Influenced by technological changes, media has changed dramatically in a short period of time. From publishing to radio and television then to the Internet and marketing on a digital platform, there is a big gap between the business and the produced content of the media industry. Therefore, based on the characteristics of Chinese media companies, this study takes “I”-Industry as an important dimension to predict their digital transformation and integrates it into the TOE framework.

Based on existing literature practices10,11,12, this study divides the driving force of the digital transformation of Chinese media companies into the following four dimensions: (1) Technical driving force. Digital technology is the basis of the digitalization of the media industry and should be applied in all fields, particularly the production and operation process of the media industry10,12. The technical upgrading of enterprises is directly reflected in the investment in technical research and development and in the scale of technical personnel39,40. (2) Organizational driving force. The heterogeneity of corporate internal governance subjects, such as the characteristics of senior executives, enterprise organization, and governance structure will lead to different behavior in digital transformation among media companies41. Based on previous research6,7, the characteristics of the organizational driving force include the size, knowledge level, and social resources of the senior management team, as well as the revenue ability, debt repayment ability and continuous growth ability. (3) Environmental driving force. In the tide of media globalization, Chinese media companies represented by advertising and games expand their overseas markets, while media companies such as radio, television, and publishing (combining social and corporate attributes) are influenced by policy. Therefore, this study includes the opening rate of monetary policy, financial support, and competition pressure in the industry, as well as the level of protection of intellectual property by local government as environmental driving factors. (4) Industrial driving force. Publishing, radio and television, advertising and film, games and digital media face different industry bases and characteristics in the digital transformation. Mainstream media, represented by publishing and radio and television, actively develop new forms and content based on new media platforms. They also undertake political tasks in guiding public opinion and “narrating Chinese stories well”3,4. Big data and intelligent algorithms have continued to erode the boundaries of the traditional advertising industry, causing collective concerns in the advertising industry42. Within the industrial driving force dimension, China’s listed media companies are subdivided into six industries of games, advertising marketing, film and television cinema, digital media, publishing, and television broadcasting in predicting the driving force of the industry.

This article therefore aims to use the TOE model and takes “Industry” as one of the influencing factors based on the particularity of the Chinese media industry to explain why Chinese media companies conduct digital transformation, thereby filling the theoretical gap in the interaction mechanism between companies and industry characteristics in the Chinese media industry. To obtain a more accurate model, the study chose machine learning methods. Based on the practices of other research, we innovatively used ensemble learning models other than text analysis, such as Random Forest Regression (RFR) and Gradient Boosting Regression (GBR) models to expand the application of machine learning methods in the media field.

Method

Research design

Research methods

This study uses an integrated machine learning method to construct and integrate multiple base learners to achieve more accurate prediction effects than using a single one. According to the degree of independence among the base learners, the method of Nie et al.43 and Parzinger et al.44 selected the Gradient Boosting Regression (GBR) and Random Forest Regression (RFR) in serial and parallelization methods, then compared them with multiple linear regression and LASSO in a linear research method. The integrated machine learning method effectively illustrates the non-linear relationships and interactions between the variables in the linear relationship, so that it performs well in out-of-sample prediction tasks 8. Therefore, this study predicts that integrated machine learning methods will outperform linear research methods in predicting the degree of digital transformation of media companies.

Model design

The model performance is examined from two perspectives: the model interpretation ability and the prediction error. In terms of model interpretation ability, Chen et al.45 and Ghazwani and Begum46 illustrate that ensemble learning can adjust itself based on the deviation between the model fitting value and the observation value in the previous calculation and can self-check the accuracy of the model. Therefore, we believe that the difference between the estimated values of the model and the observations can be used as a standard to evaluate the interpretation ability of the prediction model. The following two indicators are used: (1) Intra-sample goodness of fit (\({\text{R}}_{\text{Is}}^{2}\)) to evaluate the fitting effect of each research method on the training set sample. With the higher within-sample goodness of fit, the model is also more interpretable to the training set samples. (2) Out-of-sample goodness of fit (\({\text{R}}_{\text{oos}}^{2}\)) to measure the universality of the model. In addition, this study measures the generalization ability of the model from the perspective of variance, and chooses the explanatory variance to measure the dispersion of the actual value \(\text{(}{\text{EVS}}_{\text{oos}}\)).

In terms of model prediction error, we followed the practice of Chen et al.47 in selecting the out-of-sample mean variance (\({\text{MSE}}_{\text{oos}}\)) to investigate the deviation degree between the predicted and actual value. The out-of-sample mean square error is positively correlated with the accuracy of the model prediction. To avoid the large deviation value in the test set, which leads to estimated mean square error inconsistency, the average absolute error (\({\text{MAE}}_{\text{oos}}\)) and absolute median \(({\text{MedAE}}_{\text{oos}})\) differences were used to evaluate the accuracy of the model prediction. The specific methods of each evaluation index are shown in Table 1.

Table 1 Model evaluation indicators and calculation methods.

In interpreting the model results, the integrated machine learning method includes multiple learners so it cannot be directly explained as much as a single learner48. To solve this problem, we used a relative importance and partial dependency graph to interpret the practical significance in the ensemble machine learning model. Relative importance refers to the degree of importance of one variable relative to the others in the process of model fitting. According to the method of Supsermpol et al.49, the relative importance of the variable can be assessed by measuring the decrease of the variable after its introduction. If the relative importance of a variable is high, it has a stronger influence in predicting the digital transformation of media companies. The partial dependency graph illustrates measurement of the influence of the changing degree of a certain variable on the digital transformation of a media company, assuming that other features are unchanged. Moreover, it is displayed in the form of images, which have more visual features. The single variable is more accurate in predicting the degree of digital transformation of media companies50.

Data sources and variable definitions

Data source

This study selected media companies listed on A-shares in 2010–2020 as the initial sample, with company data from Wind and CSMAR databases. To exclude the interference of any special observation samples to the prediction results, the data were processed as follows: (1) exclusion of enterprises with ST, PT, and other abnormal listing status to avoid interference with the overall prediction effect due to abnormal operation of the enterprise itself; (2) elimination of samples with missing data; and (3) continuous variables in the data were treated by 1% and 99% quantile to avoid extreme outlier interference. A final set of 395 observations were obtained. The classification of the media industry adopts the 2021 SHENYIN&WANGUO classification method in the CSMAR database.

Variable definition

The digital transformation index (Digitaltransindex) in the CSMAR database was selected as the response variable. According to the CSMAR variable, the response variable using the annual report of enterprise digital transformation-related word frequency statistics can effectively reflect the enterprise digital transformation and transformation degree. It is divided into five parts: artificial intelligence (AI), blockchain (BD), cloud computing (CC), big data (BD) and the application of digital technology level (ADT). Table 2 shows the detailed calculations.

Table 2 Response variable definition.

Based on the existing research of digital transformation drivers, this study selected the driving force characteristics of the model from the following four dimensions, as shown in Fig. 1:

Figure 1
figure 1

Four dimensions in TOE-I model.

This study draws on Yang and Xu39 and Li9 to select the intensity of R&D expenses and the proportion of technical personnel as the measurement indicators of innovation ability and absorption ability, as shown in Table 3.

Table 3 Technical variable definitions.

In terms of organizational dimensions, there are two main factors that affect the strategic decisions in a company’s digital transformation. The first is the leadership style of the company’s CEO and management. The attitude of the executive team towards risk, as well as the decision-making style and decision-making power of the management are closely related to the implementation level of the digital transformation strategy. The second factor lies in the internal operations and cash flow of the company. The implementation of digital transformation strategy requires a large amount of capital participation so the use and fundraising of internal and external funds of enterprises should be listed as influencing factors. Referring to Bernile et al.51, Schoar and Zuo52, and Bandiera et al.53, Manager Number, Education Level, Social Network, ROA, Growth, TobinQ, Lev, Top Ten Holders’ Rate, Duality, and IndDirector Ratio were selected as these variables51,52,53. Detailed calculations are shown in Table 4.

Table 4 Organizational variable definitions.

Also, with reference to Xu et al.54, Sun and Zheng55, Wu and Ma56, this study took Financial Support, Infrastructure Score, Monetary Policy, IP Protection, and HhiD as variables to measure the environmental characteristics of media companies. The above indicators reflect the overall business environment of the media industry and the support from governments in different regions for innovative development in the media industry, as shown in Table 5.

Table 5 Environmental variable definition.

The fourth dimension is the industry classification. According to the revised version of the SHENYIN&WANGUO classification 2021, the media industry is subdivided into six categories: games, advertising and marketing, film and television cinema, digital media, publishing, and TV broadcasting, with a total of 141 listed companies, as shown in Table 6.

Table 6 Industry variable definitions.

Similarly, this study draws on Li et al.57,58, Zhao et al.59, Hanelt et al.60 in taking Past Revenue, Cash Flow Ratio, Firm Age, Firm Size, and SOE as benchmark variable groups, as shown in Table 7.

Table 7 Benchmark variable definitions.

Empirical results and analysis

Descriptive statistics

As shown in Table 8, the mean value of the digital transformation index of media companies is 493.9037975, and the standard deviation is 297.7080399, indicating that the degree of digital transformation differs significantly among industries, and the characteristics of other variables have no outliers.

Table 8 Descriptive statistics.

The prediction effect of the model constructed based on the machine learning method on the digital transformation index of media companies

Table 9 shows the prediction results of the models constructed by different integrated machine learning methods on the degree of digital transformation of media companies. The results in column (1) show that the within-sample goodness of fit of the multiple linear regression and LASSO model is lower than that of the GBR and RFR, indicating that the within-sample fitting effect of the integrated learning method is superior. In addition, the results of columns (2) and (3) in Table 9 show that the out-of-sample goodness of fit and interpretable variance of GBR have the highest values, 0.59123809 and 0.55601084, respectively, followed by RFR. Four indicators of both methods are higher than 0.5, indicating that machine learning methods can better predict the degree of digital transformation of media companies. It is clear that in column (4) the out-of-sample mean square error of the GBR and the RFR is smaller than the multiple linear regression and LASSO. Finally, columns (5) and (6) indicate that the GBR and RFR have low mean absolute errors, 0.57720771 and 0.58604578, respectively. This indicates that the model improvement effect is not obvious after excluding the deviation value.

Table 9 Results of model fitting.

In conclusion, the GBR and RFR in the ensemble machine learning method fit better to the data, thus constructing a more accurate prediction research model. This study further discusses the driving forces of the digital transformation of media companies and the key factors.

Differences in the driving force dimensions of media companies' digital transformation prediction ability

To explore the different driving force dimensions of media company digital transformation prediction ability, this study first constructed the listed years (Firm Age) and company size (Size) using control characteristics such as benchmark model calculation and comparison to add different driving force combinations of prediction performance. As the research conclusions obtained based on different evaluation indicators are largely the same, this study analyzes the out-of-sample goodness of fit and the research results are as shown in Table 10.

Table 10 Prediction performance under different combinations of driving forces.

Firstly, we considered the difference in the ability of single-dimensional drivers to predict the digital transformation of media companies. In comparison with other driving forces, the addition of environmental driving force features to the benchmark model achieves the best prediction effect. Taking RFR as an example, after adding technology, organization, environment, and industry drivers to the benchmark model, the predicted value increased by 92.86%, 100.53%, 145.17%, and 128.04%, respectively. Secondly, we considered differences in the ability of different combinations of drivers to predict the digital transformation of media companies. A combination including environmental driving force dimensions has the best performance: when the two types of driving force characteristics are combined, the benchmark model adds the environmental driving force and the industry driving force can obtain a higher model interpretation ability. When the technology driving force, environmental driving force and industry driving force are added to the benchmark model, the highest model interpretation ability is achieved. The results show that the environment driving force is more accurate in predicting the digital transformation degree, indicating that stable monetary policy, comprehensive infrastructure construction, government financial support for digital transformation, and good industry concentration are key elements in driving the digital transformation of Chinese media companies.

Differential analysis of the prediction ability of digital transformation by key factors under different driving forces

Based on the GBR and RFR, the relative importance of variables in the machine learning model is clear. Figures 2 and 3 report the relative importance ranking of the variables. Table 11 shows the variables ranked in the top 15 of the GBR and RFR methods, indicating that these characteristics are the key elements affecting the digital transformation of Chinese media companies.

Figure 2
figure 2

Relative importance ranking based on GBR.

Figure 3
figure 3

Relative importance ranking based on RFR.

Table 11 Order of relative importance (top 15).

Prediction model of digital transformation of media companies by important driving factors

Following the prediction method of GBR and RFR (The order in Figures 47 is as follows), among the many factors that affect the digital transformation of media companies, this study found that monetary policy, industry competition pressure, the proportion of technical size, return on assets of enterprises, age of the listed companies and industry classification have the best effect on predicting the digital transformation of media companies.

Monetary policy

Figure 4 is a partial dependency graph of monetary policy. The agent variable of monetary policy is the growth rate of M2. As shown in the figure, when the growth rate of M2 is less than 12.5%, there is no obvious impact on the degree of digital transformation of enterprises. However, when the growth rate is higher than 12.5%, the degree of digital transformation in enterprises shows a downward trend. Therefore, this study holds that the impact of monetary policy on the digital transformation of enterprises is not monotonous, and that enterprise managers should pay attention to the external environment at all times and adjust the process of the digital transformation of media companies timely.

Figure 4
figure 4

Partial dependence on monetary policy.

Industry competition pressure

Figure 5 shows the HhiD of the industry as a tool to measure the level of competition among companies. Industry competition reflects the intensity in the competition for limited resources among companies. When the index is less than 0.05%, its impact on the digital transformation of media companies is significant and monotonous. When the index is 0.05% − 0.3%, the degree of digital transformation of media companies slowly decreases under its action, and when it reaches 0.3%, the impact effect is small. This shows that when the HhiD is high, media companies have low entry barriers and stable profit flow, thus enterprises have little or no demand to achieve differentiation in homogeneous competition.

Figure 5
figure 5

Partial dependence chart of industry competitiveness.

The proportion of technical size

Figure 6 shows a partial dependence diagram of the proportion of technical size. This study selected the proportion of technical size as the agent variable. When the proportion of technical size is less than 10%, its influence on the digital transformation index of media companies shows a rapid increasing trend. When the proportion approaches 20%, the impact on the digital transformation index is highest. Above this, the impact effect is significantly reduced. Therefore, media companies should determine the proportion of technical size according to the requirements of digital transformation.

Figure 6
figure 6

Partial dependence of technical size.

Return on assets of enterprises

Figure 7 shows a partial dependency graph of the return on assets of an enterprise. As shown, when the ROA of a media company is negative, this indicates problems in the operation of the enterprise. Managers will put more time and energy into business activities, rather than focusing on digital transformation, so the impact is small. When the return on equity of media companies is positive, this shows a temporary trend of rapid increase, followed by stability, indicating that its impact on the intensity of digital transformation is fluctuating.

Figure 7
figure 7

Partial dependency of the ROA.

Age of the listed companies

Figure 8 shows the age of the listed company, used to describe the companies’ characteristics in the benchmark variables. The results indicated that when the establishment period of a media company is less than 15 years, the impact on the digital transformation of companies is not significant. When the establishment period of a media company reaches 15 years, the effect on the digital transformation is minimal, and then shows a significant increasing trend until the establishment period reaches 25 years. Media company management experience affects the strength of the digital transformation. When the company age is small, and lacks operating experience, the digital transformation is relatively low. In contrast, established companies with greater ability and resources can effectively support the digital transformation reform process, making full use of information advantage and achieving scale effects.

Figure 8
figure 8

Partial dependence on the age of listed companies.

Industry classification

This research innovatively integrates the dimension of the industry driving force using the theoretical framework of TOE and forms the “TOE-I” model to predict the intensity of digital transformation of Chinese media companies. Firstly, through the two integrated machine learning methods of GBR and RFR, the model is shown to have a significant prediction effect on the advertising industry and the film and television industry, indicating that “TOE-I” can be better applied to the digital transformation prediction of the advertising industry and the film and television industry. Secondly, the prediction ability of radio and television, digital media, and the publishing industry is weak. As traditional mainstream media, radio, television and publishing media extend China’s mainstream media to the Internet field, the stronger the political attribute, the stronger the uncertainty of digital transformation, and the more difficult it is to accurately predict using the “TOE-I” model. Thirdly, game companies have usually been established more recently and are based on digital technology, so taking the digital transformation index as the measurement standard, the effect of digital transformation prediction is not significant enough. See Table 12 for details.

Table 12 Comparison of industry factors on digital transformation of media companies.
Robustness test

First, changing the training set division method. This study used the 8:2 proportion random classification to determine the training set and the test set, which weakens the randomness to some extent. Therefore, the K-fold method was used for further random division in the robustness test. In machine learning, K-fold cross-validation is a common method of model evaluation. It can help us accurately evaluate the performance of machine learning models and provides more reliable results particularly if the data is limited. The validation steps of the K-fold method were as follows:

  • The original dataset was split randomly into K subsets of similar size, taking K values of 10.

  • One subset was selected as the validation set and the remaining K-1 subset as the training set.

  • The model was trained using the training set and evaluated on the validation set.

  • Steps 2 and 3 were repeated until each subset is used as a validation set.

  • The results of K times of evaluation were integrated to obtain the final model evaluation index.

Based on this, K-fold cross validation can be repeated through the process of more stable evaluation results to reduce the contingency caused by different data divisions. For small data sets, K-fold cross validation can better evaluate the performance of the model, reducing the data caused by overfitting or underfitting problems.

As shown in Table 13, after replacing the training and test sets using the K-fold test, the correlation findings are compared to Table 9 with no change.

Table 13 Test of robustness-Panel A.

Second, changing the measure of the intensity of digital transformation. Drawing on Xiao61, this study used different entries of enterprise digital transformation strength, eliminated the term “digital technology application” at the application level, and only retained the terms “artificial intelligence”, “blockchain”, “cloud computing”, and “big data” at the basic digital technology level. Add 1 to the total occurrence frequency and take the natural logarithm as the replacement variable for robustness testing. Using new response variables and refitting the model, the results were consistent with the main test, and the specific tests are shown in Table 14.

Table 14 Test of robustness-Panel B.

Conclusion

Previous research has focused on the correlation between a single factor of a single dimension feature and the digital transformation of media industry, only conducting predictions within the sample, and lacking a comprehensive consideration of the driving forces in the digital transformation of media companies5,6,7. In this study, the driving forces of the digital transformation of Chinese media companies are divided into four dimensions: technology, organization, environmental, and industry drivers (i.e., the “TOE-I” model). The purpose of classifying the driving forces of digital transformation is to explore the differences in the prediction ability of media companies concerning digital transformation. This research analyzed the key driving factors of the digital transformation of media companies and the specific mode influenced by the above factors of the digital transformation of the Chinese media industry.

This study innovatively used ensemble learning methods, taking relative importance indicators and partial dependency graphs to help realize the research purpose. By comparing the fitting effect of the combination of different dimensions of driving forces in the benchmark model, we found that the environmental driving force can predict the digital transformation behavior of the Chinese media industry effectively and accurately, showing environmental drivers to be the dominant factor in influencing Chinese media companies’ strategies for digital transformation. Compared to linear methods such as multiple linear regression, the ensemble learning method achieved better performance in both model interpretation ability and minimization of prediction error, with the RFR method having the best predictive performance. The driving factors of (1) monetary policy, competition pressure in the industry, and the infrastructure index in the environmental driving force, (2) the equity concentration, enterprise value, executive team knowledge level and social networks in the organizational driving force, and (3) advertising, film and television industries in the industry driving forces all have significant predictive effects on the digital transformation of media industry in China.

Based on the above conclusions, the following policy implications are suggested:

  1. (1)

    Policy makers are supposed to provide stable monetary policies. Media companies like game companies and Internet advertising and marketing companies that think globally should be provided with stable economic and financial policies to facilitate their digital transformation. As shown in Fig. 4, the intensity of digital transformation is highest when the M2 growth rate is 12.5%. Therefore, government managers should provide a stable monetary policy to promote the digital transformation of media companies. Moreover, as demonstrated by the prediction ability of the digital transformation, the dimension of the external environmental driving force has the greatest impact on the digital transformation of media companies. Therefore, in addition to providing a stable monetary policy, competition between industries should be guided to provide the matching infrastructure conditions for digital transformation.

  2. (2)

    Managers should maintain stable profit sources to promote digital transformation in media companies; for example for companies in radio, television, and newspapers with strong political attributes and little focus on income output and income. This study indicates that ensuring a positive cash flow of enterprises is an important driving factor for the digital transformation. Attention should also be paid to the decision-making role of core managers in the process of digital transformation. This research adopted two machine learning prediction methods—GBR and RFR that showed enterprise core managers to be an important influence in the prediction factors of China media companies (refer to ranking 2 of GBR and ranking 3 of RFR in Table 11). Thus, to ensure digital transformation, media companies should focus on the decision-making role of core managers. Furthermore, media companies should pay attention to the cultivation and preservation of digital technical talent. In the gradual conversion from traditional content and news production to digital, platform-based production, the cultivation and preservation of technical talent are crucial. Figure 6 illustrates that when the proportion of technical personnel in the company is 20%, digital transformation is the highest. Therefore, media companies should recruit digital technical personnel to maintain this level.

  3. (3)

    Within the media industry, it is necessary to seize the opportunity of technological change and pay close attention to policy changes. As shown in this study, the establishment of enterprises, the change of media, and the internal gap in the media industry all cause differences in the digital transformation of the media industry.

  4. (4)

    There is a gap in the application of empirical research and machine learning methods in existing media economics research, so the application of machine learning in the fields of media economics, journalism, and communication should be continuously promoted.