Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Market-oriented job skill valuation with cooperative composition neural network

## Abstract

The value assessment of job skills is important for companies to select and retain the right talent. However, there are few quantitative ways available for this assessment. Therefore, we propose a data-driven solution to assess skill value from a market-oriented perspective. Specifically, we formulate the task of job skill value assessment as a Salary-Skill Value Composition Problem, where each job position is regarded as the composition of a set of required skills attached with the contextual information of jobs, and the job salary is assumed to be jointly influenced by the context-aware value of these skills. Then, we propose an enhanced neural network with cooperative structure, namely Salary-Skill Composition Network (SSCN), to separate the job skills and measure their value based on the massive job postings. Experiments show that SSCN can not only assign meaningful value to job skills, but also outperforms benchmark models for job salary prediction.

## Introduction

In the era of knowledge economy, skilled talents are always precious treasures. Modern jobs require talents to have substantial and continuous investment on their job skills1,2,3. Therefore, understanding the value of job skill will fulfill the so-called "Skill Gap”4,5 between employers and talents, and bring them competitive edge to cope with the accelerating pace of technological changes. At the micro level, it can not only help individuals to proactively assess their competencies and decide what are the right skills to learn, but also help companies to develop the right salary system of their job positions for attracting and retaining the best possible talent. Moreover, at the macro level, the job skill value is an important indicator of the economic equilibrium of labour market and shows the supply and demand relationship associated with knowledge investments6.

During the past decades, researchers have devoted large efforts to assess the value of job skills in different manners. Many surveys and studies have shown evidence of a worldwide positive association between the distributions of job skill mastery and job salary2,3,7,8. However, due to the dynamic and indistinct nature of job skill value, traditional market survey-based approaches usually fail to provide a fine-grained and up-to-date analysis. In recent years, the newly available online recruitment services have accumulated abundant job advertisement data9,10, which provides an unparalleled chance for Labour Market Intelligence11,12 and data-driven job skill analysis13,14. Nevertheless, most existing studies are focused on job skill demand modeling4,5,15,16, but there still lacks a quantitative way to assess the value of job skills from the perspective of their influence on job salary.

Indeed, achieving quantitative job skill value assessment is far from a trivial task. Specifically, on one hand, the value of a specific skill is not immutable but varies with respect to different job contexts. For example, the talents experienced with algorithm related skills will be appreciated with high-paid jobs for a high-tech AI company, while the engineering skills may be the most valuable ones in a traditional software company. On the other hand, the job skills are usually not isolated, but integrated with each other as a holistic requirement for deciding the job salary. Indeed, along this line, the most critical challenge is that there usually lack of ground truth data of skill value for building an effective and quantitative assessment model. Therefore, how to separately assess the value of job skills and model their impact on job salary under various job contexts is still open to be explored.

To this end, in this paper, we propose a data-driven solution to skill value assessment from a market-oriented perspective through mining the job advertisement data. Specifically, we introduce a market-oriented definition of skill value, and formulate the task of skill value assessment as the Salary-Skill Value Composition Problem, where each job position is regarded as the composition of a set of required skills attached with the job’s contextual information, and the job salary is assumed to be influenced by the context-aware value of these skills. Along this line, we propose an enhanced neural network with cooperative structure, namely Salary-Skill Composition Network (SSCN), to separate the job skills and measure their value from the massive job postings. SSCN regards salary prediction as a cooperative task for skill valuation and holistically models the relationship between skills and the job salary, considering both skill value and domination. Figure 1 shows the schematic diagram of the key idea in this study. Indeed, SSCN provides a cooperative framework to train neural network models for knowledge discovery from unlabeled data, by quantitatively linking them with a supervised learning task. Extensive experiments on a real-world dataset clearly validate that SSCN can not only assign meaningful value to job skills in various job contexts, but also outperforms state-of-the-art models in terms of job salary prediction. Meanwhile, based on the results of SSCN, many interesting findings can be revealed, such as which skills will lead to high-paid jobs.

As a long-standing research direction, the value of job skills in the market is always abstract and has different measurements with respect to different application scenarios4,17. Different from existing studies, in this paper, we aim to introduce a market-oriented definition of skill value with job context awareness, emphasizing the direct impact of skills on job salary. To be specific, the value of a skill is defined as the expected salary of a job that only requires this skill, given a specific job context. It should be noticed that in this paper, context refers to all the factors other than the skill requirement that can influence the job salary, such as the company, recruitment time, work location, and required working experience.

Indeed, the above definition directly measures how much salary a skill will bring when people make full use of it in the job. The motivation behind this definition is to guarantee that the value of different skills can be measured in an independent and comparable manner. In order to precisely estimate this value under various job contexts, we train a model f with parameter Θ that calculates the skill value v = f(s, lv, CΘ) given a set of observable job contexts C and a skill s with level lv (i.e., the degree of mastery, refer to Fig. 2a for examples). To train the model, it is essential to obtain a set of training data containing job postings that only require one skill. However, in the real-world scenario, the job requirements are always complicated and cannot be qualified with only one skill. As a result, each job posting is always associated with multiple required skills, which makes it difficult to train the skill valuation model under the supervised learning paradigm.

Fortunately, the job salary can be regarded as a mixed value of corresponding required skills, and a job requiring many valuable skills should have a high salary. This intuition implies effective supervision for skill value assessment in an indirect way. In other words, if we can model the relationship between skill value and job salary, we can use job salary data to supervise the training of skill valuation model. Specifically, the job postings can be formulated as $${\mathcal{J}}=\{({{\bf{C}}}_{{\bf{j}}},{{\bf{S}}}_{{\bf{j}}},{{\bf{Y}}}_{{\bf{j}}})| j=1,2,\cdots \ \}$$, where Cj denotes a set of job contexts, Sj denotes required skill set, Yj denotes the job salary. In particular, Sj consists of the corresponding skill-level pairs $${{\bf{S}}}_{{\bf{j}}}=\{({s}_{j}^{(i)},{{\rm{lv}}}_{j}^{(i)})| i=1,2,\cdots \ \},$$ where $${s}_{j}^{(i)}$$ is a skill and $${{\rm{lv}}}_{j}^{(i)}$$ is the level. If we have a model that can precisely estimate the salary Yj of a job posting given the value of its required skills, a proper estimation on skill value can lead to a good estimation on the job salary. So in this paper, we regard job salary prediction as a cooperative task for skill valuation. Formally, we define the task of this paper as a Salary-Skill Value Composition Problem, which aims to jointly learn a context-aware skill value assessment model f: (skill, context → value) and a skill-based salary prediction model g: (<skill, value> → salary) from the job postings set $${\mathcal{J}}$$. It should be noticed that, although there might exist more complicated relationships among job skills, context and salary, in the problem formulation, we only consider the skill value is context-aware and can be combined together in a linear way to reflect the job salary. In this way, our model can facilitate the measurement of the influence of contexts on individual skills as well as the influence of skills on job salary.

Based on the above, the salary of a job j can be formulated as $${\widetilde{y}}_{j}=g(\{({s}_{j}^{(i)},{{\rm{lv}}}_{j}^{(i)},{v}_{j}^{(i)})| i=1,2,\cdots \ \},{{\bf{C}}}_{{\bf{j}}}| {{\Phi }}),$$ where Φ and Θ denote the parameters, $${v}_{j}^{(i)}=f({s}_{j}^{(i)},{{\rm{lv}}}_{j}^{(i)},{{\bf{C}}}_{{\bf{j}}}| {{\Theta }})$$. By comparing the predicted job salary with the real salary, both the skill value assessment model f and skill-based salary prediction model g can be trained simultaneously.

To solve the Salary-Skill Value Composition Problem, we propose the SSCN that is a cooperative neural network containing two steps of modeling to achieve skill valuation (the main task) and salary prediction (the cooperative task) simultaneously. The structure of SSCN is shown in Fig. 2b. Specifically, SSCN takes a job posting as the input, calculates the value of all the involved skills and then combines them into the job salary in a straightforward but interpretable way.

The first part of SSCN is a specially designed Context-aware Skill Valuation Network (CSVN), as shown in Fig. 2c. It dynamically models the skills, extracts the context-skill interaction and estimates the context-aware skill value. According to our definition, skill value can be regarded as a special case of job salary, and since salary is given as a range in our data, CSVN models the skill value as a range. Specifically, CSVN assigns each skill with a non-negative lower bound and a non-negative upper bound, constraining that the upper bound is no less than the lower bound.

In the real-world working scenario, the employees allocate their time and effort among the skills according to the importance of different job duties. Intuitively, the more you use a specific skill during work, the more it will influence your salary. Simulating this process, we propose to model the job salary as the weighted average of the skill value. We call the weight as skill domination. This agrees with our definition of skill value because when a job only involves one skill, the only skill has full domination and the salary degenerates into its value. In this way, the skill value is comparable and independent with each other. Considering that skills may have combinatorial influences on salary, we let the model catch skill interactions through modeling the domination. Specifically, the skill co-appearance is considered to influence the domination of each skill, which assures the model to peel explainable skill value that is only context-dependent while maintaining the model’s fitting ability to general job postings. To model the domination, the second part of SSCN is a specially designed Attentive Skill Domination Network (ASDN), as shown in Fig. 2d. Considering that the skill domination can be affected by the related skills (e.g., one skill may play an important role in the job when many related skills are also required), ASDN models the domination with a graph-based approach. Specifically, we attach each job posting with a skill graph, where the node represents the involved skills, and the edge between two skills represents their relationship. ASDN combines this skill graph with context-skill interaction information extracted from CSVN and calculates skill domination with graph-based attention mechanism. Considering that the two salary bounds may correspond to different job duty allocation, for example, common skills may raise the salary lower bound instead of the upper bound, ASDN outputs different skill domination for the two bounds. The details of training both CSVN and ASDN can be found in “Methods”.

Indeed, SSCN models the relationship among skills, context and salary based on the observations of job advertisement data in an end-to-end manner. As a common issue of deep learning models, all the influencing factors and their complicated relationships are implicitly modeled as a blackbox, which is hard to be interpreted in a theoretical way. Nevertheless, it also brings the advantage that we only need to pay attention on the input (i.e., context and job skills) and output (i.e., job salary and skill value), while other latent influencing factors and relationships will be automatically learned by the hidden layers. In this way, the model is easy to be operated, and meanwhile, the skill value influenced by observable contexts can be explicitly estimated, which strongly supports further explainable analysis.

## Results

To validate the models proposed in this paper, we collected IT-related job postings from a popular online recruitment website in China, namely Lagou (https://www.lagou.com/). Our dataset contains over 800,000 postings of various job positions across a time span of 36 months, ranging from July 2016 to June 2019. After filtering the data with some preprocessing steps, we got 215,308 samples. We used these samples to train and validate our model. The details of data preprocessing, feature selection, network configurations, numerical statistics, and additional experimental results can be found in Methods and Supplementary Information. In particular, we also conducted supplementary experiments on an additional designer-related job posting dataset to validate the generalization of our model.

### Skill value analysis under different job contexts

Here we demonstrate the value of skills estimated by CSVN considering different kinds of job contexts. During our experiments, we found that the lower bound and upper bound of skill value always have a similar trend, so we mainly introduce the results of the lower bound, unless noted otherwise.

We define level influence as the average ratio of value increase when a level is specified. Figure 3a shows the levels' average influence (see Supplementary Fig. S8a for influence distribution), where we have used all the skill-level pair instances involving each level for the estimation. The detailed information on sample size and influence distribution can be found in Supplementary Table S10. We can observe that CSVN can significantly distinguish the impact of different levels. In general, most levels have a similar influence on both bounds, and sophisticated levels raise skill value more. In particular, the level Can Read, i.e., the lowest degree of mastery in our dataset, will decrease the skill value by 10%, while the level Versatile can contribute about 10% increase to the value. To get more insights, we show level influence on some specific skills in Table 1. In addition, we conducted significance test for better validating the results. It can be observed that, by ignoring the insignificant entries (i.e., p-value > 0.05), the table is generally consistent with the averaged influence. Nevertheless, the model also learns bias for some special cases. For example, while Know is a relatively low level of mastery, it has positive influence on skill value when describing JavaScript. The reason is that while JavaScript mostly appears in jobs that related to web development, the statement Know JavaScript usually acts as an additional requirement for some complicated and higher-paid jobs like architecture design. Therefore, the model overestimates the skill value due to the imbalanced data distribution. Indeed, this result is explainable from a market-oriented view. Specifically, the mastery level of a specific skill usually indicates the role that it plays in the job; and therefore, the skill value highly depends on the market pricing on the relevant jobs. However, as shown in Fig. 3 (a), the model will still work for the general cases. Furthermore, we calculated the ratio of skill-level observations that might cause the biased level influence estimations. The result shows that only very few samples (0.96% of the whole dataset) encounter this bias. The detailed calculation can be found in the Supplementary Information. A possible solution for alleviating this kind of bias is to enlarge the diversity of the recruitment market data, which is a valuable direction for our future studies. Supplementary Fig. S6a shows the level influence on the designer dataset. The result slightly differs from the result on the IT dataset, which further indicates that level influence varies with respect to occupations.

In this study, time is also regarded as one kind of job context. CSVN assigns the skills with temporal embeddings, this supports dynamic skill value analysis. From Fig. 3b, we can observe that fluctuations exist on skill value, and the skills have different trends of value change (see Supplementary Table S12 for numerical statistics). Some interesting findings can also be observed from the figure. On the whole, Architecture has a relatively stable trend of value increase. Specifically, in 2016-H2, its value is 21.8 K RMB on average. Then, it increased 5% on average for every half-year and reached 27.6 K RMB in 2019-H1. This indicates a rising market demand for this skill, which is good news for architects. However, some hot skills like GoLang and Recommender System seem to be less stable. Especially, GoLang has sharp value increase and decrease. For example, in 2019-H1, its value decreased by 26%, from 28.2 K RMB to 20.8 K RMB on average. This reminds students not to simply pursue the hottest new skills on the market, because their related industry may be still unstable. According to our experiment, we find that many skills with high value meet value decrease in the first half of 2019. We guess this phenomenon is due to the so-called Internet Winter of China this year. The trend of value for designer skills can be found in Supplementary Fig. S6b. Interestingly, the designer skills are stable and there is no general value decrease in the first half of 2019, which indicates that recent market changes have more influence on IT practitioners than designers.

Skill value under different experience requirements can provide talents with a long-term reference on choosing skills to learn. CSVN considers working experience requirements as one kind of job context and has a strong ability on inferring the experience-aware value, even for new skills. For example, although GoLang was officially released in 2009, we can still estimate its value with the working experience of longer than 10 years as 32.0 K RMB by smoothly extending the line. Figure 3c shows that longer experience leads to higher skill value (see Supplementary Table S13 for numerical statistics). Compared with the graduates, 10 years of working experience increases the skill value by 2.5 times on average. This is reasonable because a highly experienced talent usually can get a higher salary. But the speed of value rise has some differences among the skills. For example, Architecture and Project Management increase slowly in the first several years, while quickly after 3–5 years. Specifically, although Algorithm has a higher value (12.8 K RMB) for graduates, in the long term, the value of Project Management (10.2 K RMB for graduates) increases faster and achieves the similar value as Algorithm after 10 years. Similarly, Machine Learning has a higher value (16.8 K RMB) than Architecture (16.4 K RMB) for graduates and increases fast in the first several years. It can be observed that, with 1–3 years’ experience, the value of Machine Learning (24.2 K RMB) is 20% higher than Architecture (19.9 K RMB). However, the rank is reversed after 5 years. This result makes sense, because the ability on Architecture and Project Management accumulates during work, while talents’ programming skills usually gain fast the first several years of their career and may decrease as they get older. We can conclude that CSVN can provide good experience-aware skill value assessment. This provides students a reference to consider their longer future career when choosing a skill to learn, instead of only comparing the job salary at an entry-level. In addition to skills that get you a fortune at the moment you graduate from school, learning skills that are valuable for you in the future may also be a good choice. We also show the experience influence on designer skills in Supplementary Fig. S6c, which shows the similar trend with that of the IT dataset.

For job seekers, the best choice is to work in companies that treasure the skills they possess. Figure 3d shows skill value distribution in different companies, where we have used all the skill-company pair instances involving each corresponding skill-company pair for the estimation. The detailed information on sample size and numerical statistics can be found in Supplementary Table S14. It can be observed that, due to the differences in business strategy, skills are valued differently by different companies. This reveals the traits of companies. For example, while most of these companies give a much higher value to Architecture than Algorithm, ByteDance values them similarly. Besides, ByteDance is the only company that values Python (23.9 K RMB on average) more than Java (21.0 K RMB on average). This implies ByteDance attaches high importance to some research works. In JD.com, Java has a larger range of value distribution than in other companies. Specifically, the gap between the two quartiles of Java in JD.com is 13 K RMB, which is much larger than the gaps of 7 K RMB in the other 4 companies. This implies the higher possibility of salary increase for a Java engineer in JD.com. Meanwhile, different from others, the value of skills in Baidu is quite stable, which means the demand for different skills is more comprehensive. In Supplementary Fig. S6d, we show the distribution of designer-related skill value on these companies. It can be observed that the companies also have different preferences in designer-related skills.

### Evaluation on salary prediction

We compared the performance of SSCN on salary prediction with several baseline methods (see details in “Methods”). The performance is evaluated with root mean square error (RMSE) and mean absolute error (MAE)18, which are both popular metrics for difference measurement between the observations and the predictions. The results of the evaluation are listed in Table 2. There are several observations. First, SSCN outperforms all the baseline models, especially in terms of RMSE where there is a 3.5% decrease on lower bound prediction and 5.2% decrease on upper bound prediction compared to BERT, which outperforms the rest of the baseline models. Though SSCN has a larger variance due to its complex structure, its worst performance is still significantly better than the others’ best performances. Second, SSCN outperforms the linear models (i.e., SVM and LR). To assure the physical meanings of the skill value, SSCN simplifies the last layer of skill composition into a linear form. However, SSCN is still a complicated non-linear deep learning model that can seize the complicated relation between skill, context and salary. So it performs much better than the real linear models. Third, since accurately predict context-aware job salary is a more difficult problem than standard salary benchmarking, HSBMF performs not well. But SSCN can achieve more accurate salary prediction under specific job contexts. Fourth, by replacing ASDN with a mean pooling layer, the model’s performance decreased a lot. This proves the effectiveness of skill domination on job salary modeling. Fifth, simultaneously estimating the two bounds of the range in a single model improves the performance. This is because the lower bound and upper bound of job salary are strongly correlated. In addition to giving constraints on the bounds, CSVN also extracts a shared shallow representation for them. In this way, the two bounds can get part of the supervision from each other, which reduces the chance of over-fitting. The experimental results on salary prediction on the designer dataset can be found in Supplementary Table S8, which are consistent with the results of the IT dataset. Furthermore, we conducted parameter experiments to demonstrate the robustness of our model, which can be found in Supplementary Fig. S5 and Supplementary Table S7. The results show that SSCN is parameter insensitive and can be easily adopted without carefully tuning the hyper-parameters.

It can be concluded that, with the cooperation of the salary prediction task, SSCN trains a quantitative and accurate skill valuation model without using any labeled skill value data. Since skill valuation is an essential component of job salary prediction in SSCN, SSCN’s performance on job salary prediction also quantitatively demonstrated the effectiveness of our skill valuation model.

## Discussion

With the Salary-Skill composition structure, SSCN decouples the job salary into the value of every involved skills by modeling skill domination. Here, we analyze this composition process holistically and show the effect of its factors.

### Skill domination versus skill value

The multiplication of value and domination of some skill in a job posting is its actual contribution to the salary. To analyze the effect of domination and value, we display the averaged value, domination, and salary contribution of machine learning-related skills in Fig. 4. The numerical statistics can be found in Supplementary Table S16–S18. On the whole, more generic skills have higher domination, while more specific skills have higher value. For example, Unsupervised Learning (with domination 37.8% on average) and Multivariable Regression (with domination 46% on average) have high domination, showing many jobs need them. Graph Algorithm (with domination 18.2% on average) has lower domination but higher value (with value 35.2 K RMB on average), indicating that although there are fewer jobs that can make full use of it, you can easily get high salary if you find one. Indeed, most jobs in the market are not so professional and are dominated by some generic skills. In these jobs, some high-value skills may also be involved, but they are usually not a major part of the work. Also, the rapidly-emerging new skills with the fast technology changes enlarge the skill gap between job candidates and employers19. As a result, from the viewpoint of the employers, although it is usually difficult to find candidates who perfectly meet their specific skill requirements, the talents owning generic skills are usually able to quickly learn and adapt to the required skills20. Accordingly, higher education in recent years have been focusing on teaching theoretical and basic knowledge, and cultivating students’ learning ability and problem-solving skills rather than teaching specific skills21. This phenomenon enlarges the domination of more generic skills in the job market.

Our experimental result implies that the breadth of your knowledge decides how easy you can find a job, while the depth of your skill helps to raise your salary. In this way, it becomes a trade-off between domination and value when choosing a skill to learn, the averaged contribution becomes a good reference, as is shown in Fig. 4c, Topic Model (with contribution 8.5 K RMB on average) is a good learning choice. It should be noticed that having a low averaged domination does not mean the skill never dominates a job. When you have excellent knowledge of some specific skills (which is always true for Ph.D. students), you should be confident that you can find somewhere to make full use of your ability. Wordclouds for the designer dataset can be found in Supplementary Fig. S7, where we can distinguish generic and specific skills for designer-related jobs.

### The influence of skill on job salary

For a skill required in a job posting, we can estimate its influence by calculating how much will the salary decrease if we remove this skill from the requirement. By fixing the domination of the other skills and getting their weighted average of value, the new salary can be estimated as $$y^{\prime} =\frac{y-v}{1-d},$$ where v and d represents the value and domination of the removed skill. The ratio of decrease is $$r=\frac{y-y^{\prime} }{y},$$ where y denotes the previous job salary. In Table 3, we can observe that generally, high value and high domination lead to high influence. For example, Matrix Calculation has a high value and high domination, by dropping it, the job salary will decrease by 18.4% on average. According to this table, machine learning-related skills have positive influence on job salary. We will show in the next part that some skills may have negative influences on job salary.

### Case study on a job posting

Everyone wants a job where they can give full play to their ability. However, the job descriptions may contain job duties both you are good at and not good at. Understanding the role of each required skill in a job can help job seekers to decide if a job is suitable for them. For each job posting, SSCN predicts the value of each skill under the specified context, calculates the skill domination based on the skill co-appearance, and finally combines the skill value into the job salary according to the domination. Figure 4d shows the case study to illustrate how SSCN works on a job posting. Specifically, we used the trained SSCN to decompose a randomly selected job posting and analyzed the domination, contribution, and overall influence on salary of the involved skills. This job description is to employ an algorithm engineer who has two parts of job duties, which are data mining with business data and product development. Compared with the coding skills, Machine Learning and Deep Learning have much higher domination and contribution on the job salary, indicating that the job expects a data mining expert instead of an experienced engineer. Though with similar domination, Deep Learning has a much higher contribution than Machine Learning, which is because it has a higher value under its job context. We can also observe that Deep Learning contributes a lot to the higher-bound salary, which agrees with the job description where Deep Learning is listed as the additional requirement. From the above analysis, we can find that the job seekers can try this job if they are good at machine learning and deep learning, there is no need to worry much if they are mediocre at coding. Also, it can be observed that data mining duty has a positive influence while the development duty has a negative influence on the salary. This indicates that if you are an expert in data mining, maybe you should look for a full-time data mining job, it may bring you a higher salary.

### Potential applications

Through the experiments, we show that our skill valuation model has the potential to be applied to various real-world applications. First, our model can be applied to talent recruitment. As can be observed in Table 2, our model achieves high performance on salary prediction. Therefore, it can provide salary references for jobs in the market when the job descriptions are specified. With the predicted salary information, the recruiters can evaluate the market competitiveness of their offered salaries; and the job seekers can get an idea about their salary expectations. Second, our model can be directly applied to business market analysis. For example, as can be observed in Fig. 3b, our model reveals the overall trend of skill value in the market. Third, our model can be applied to student education. Specifically, the skill value provides the students with market-oriented guidance for skill learning. For example, with the experience-aware skill value shown in Fig. 3c, students are able to make better personalized curriculum choices to achieve long-term career development. Fourth, our model can be applied to knowledge management and talent development. For example, as shown in Fig. 3d, the companies can analyze the value of skills for their own business. Then, they can develop specific curriculums to continuously train their employees for valuable skills. Fifth, our model can be applied to job recommendation. For example, by measuring the average value of skills in different companies, as shown in Fig. 3d, job seekers can receive effective guidance on which company is more suitable for them to pursue.

### Technical contribution

Since indirect supervision is common in the real-world, we believe that this work not only provides an intelligent and accurate solution for the skill valuation problem but also can be an inspiration for readers who work on data analysis in other fields of applications. Specifically, in many real-world scenarios, obtaining labeled training data is far from an easy job. It is often the case that we can only obtain indirect supervision from a related task. Learning skill valuation model from job salary data is one of these kinds of problems. In this problem, we have no labeled data of skill value, but we have job salary data as indirect supervision information, with the intuition that high skill value usually leads to high job salary. To this end, we proposed a machine learning-based solution that uses neural network with cooperative structure to model the relationship between job and skills, where the salary prediction is regarded as a cooperative task for training the skill valuation model. In this way, we obtain an effective skill valuation model under the indirect supervision of job salary data.

### Limitations

The first limitation of this paper is the limited data. On the one hand, since our work is based on the accumulated job advertisements in online recruitment website, which has a short history, we are not able to provide insights about the long-term job skill development. On the other hand, since our research has certain requirements on the data quality (e.g., detailed skill requirement, job salary and contextual information), in this paper we only evaluated our model with two datasets collected from one of the largest and most popular Chinese online recruitment website of Internet-related industry. This may bring bias to the analysis. If provided with more large-scale and comprehensive data, our model will obtain more significant insights. The second limitation is the empirical validation of skill value. Since market-oriented skill valuation is a new research problem, we are not able to obtain ground truth for quantitatively validating the accuracy of our model. Therefore, in this paper, we evaluated the performance of our model with the task of salary prediction. The rational behind our evaluation is that, with the explicitly formulated relationship between salary and skill, the effectiveness of skill value will be revealed from the salary prediction performance. In the future, we plan to continuously update our research by seeking more data sources and collaborations for further validating our model.

## Methods

### Job posting formulation

A summary of the notations in this paper can be found in Supplementary Table S2. As shown in Fig. 2a, we formulate a job posting Jj as (Cj, Sj, Yj), where Cj denotes a set of job contexts, Sj denotes required skill set, and Yj denotes the expected range of job salary. $${{\bf{S}}}_{{\bf{j}}}=\{({s}_{j}^{(i)},l{v}_{j}^{(i)})| i=1,2,\cdots \ \}$$ is a set of skill-level pairs involved in the job description, where $${s}_{j}^{(i)}$$ is a skill and $${{\rm{lv}}}_{j}^{(i)}$$ is its level on the degree of mastery, for example Proficient in JavaScript. Considering that the relations between involved skills may affect job salary (e.g., a skill affects the job salary more if many skills related to it are also required), we attach Sj with a skill graph Aj, where each node represents an involved skill and the edge weights represent the co-appearing relations between them.

### Data preprocessing

We extracted 14 level words and 1374 IT-related skill words, so that the job descriptions can be formulated into structured records. The detailed descriptions of data preprocessing can be found in Supplementary Information. Then, we counted the co-appearing frequency of every two skills in the job advertisements. If the frequency is larger than a pre-defined threshold, we added an edge between these two skills, whose weight is the normalized co-appearing frequency. To reduce noise, we first filtered full-time job postings. Next, we ranked the cities according to the number of the samples they involve and filtered the job postings of the top 16 cities, which covers over 90% of the data. Then, we dropped the records whose upper-bound or lower-bound salary is a boxplot outlier22 in the dataset (see Supplementary Fig. S3 for the salary distribution). Finally, we ranked the companies according to the number of involved samples and filtered job postings of the top 1000 companies. After the above preprocessing, we got 215,308 job postings. Based on the observable contexts, we extracted continuous and discrete features to form the input of the model. The detailed descriptions of feature extraction can be found in Supplementary Table S4.

### Overall process

The pseudocode of the overall model training and applying process of this paper can be found in Algorithm 1.

Algorithm 1. Overall process

Require: Dtrain: training set; Dtest: testing set; η: learning rate; MaxIter: the number of training iterations.

1: Build model $${\mathcal{M}}$$ with initial parameter Φ;

2: /**Training**/

3: Forit 1 MaxIterdo

4:  Sbatch ← randomly split Dtrain into batches;

5:  For each DbatchSbatchdo

6:    dΦ = 0;

7:    For each (skillset, context, y) Dbatchdo

8:     $$d{{\Phi }}=d{{\Phi }}+\frac{\partial Loss({\mathcal{M}}(skillset,context;{{\Phi }}),y)}{\partial {{\Phi }}}$$;

9:   Φ = Φ − ηdΦ;

10: /**Validation**/

11: Ypred, Ytrue ← empty lists;

12: for each (skillset, context, y) Dtestdo

13:  Predict salary range $$\widetilde{y}={\mathcal{M}}(skillset,context;{{\Phi }})$$;

14:  Store $$\widetilde{y}$$ in Ypred;

15:  Store y in Ytrue;

16: Calculate MAE(Ypred, Ytrue) and RMSE(Ypred, Ytrue);

17: /**Value Estimation**/

18: For (skillset, context, y) DtrainDtestdo

19:  For each (level, skill) skillsetdo

20:   Estimate value v and domination d for (level, skill, context) with $${\mathcal{M}}(\cdot ;{{\Phi }})$$.

21: Analyze skill value;

### Context-aware skill valuation network

#### Temporal skill embedding

Considering that the skills’ traits change over time, CSVN assigns temporal embeddings for skills at each time interval. To reduce model complexity, we use the idea of Matrix Factorization23 and assume the skill embedding is composed of a low-ranked embedding and a latent projecting matrix. Formally, $${{\bf{E}}}_{{\bf{s}}}^{({\bf{t}})}={({{\bf{W}}}^{{\bf{us}}})}^{({\bf{t}})}{{\bf{W}}}^{{\bf{vs}}},\quad t=1,2,\cdots \ ,T,$$ where $${{\bf{E}}}_{{\bf{s}}}^{({\bf{t}})}\in {{\mathbb{R}}}^{{N}_{s}\times {\rm{de}}}$$ stores the skill embeddings of the t-th time interval, T is the number of time intervals, Ns denotes the size of the skill vocabulary, $${({{\bf{W}}}^{{\bf{us}}})}^{({\bf{t}})}\in {{\mathbb{R}}}^{{N}_{s}\times {\rm{dl}}}$$ is the low-ranked skill embeddings of the t-th time interval, $${{\bf{W}}}^{{\bf{vs}}}\in {{\mathbb{R}}}^{{\rm{dl}}\times {\rm{de}}}$$ is the latent projection shared by all the time intervals, de is the embedding dimension, dl is the number of latent factors. Though the temporal embedding gives CSVN the ability to model skills’ dynamic changes, it brings higher model complexity. To avoid over-fitting, we add a temporal regularization to the model, formulated as

$${L}_{t}=\mathop{\sum }\limits_{t=1}^{T-1}\parallel {({{\bf{E}}}^{{\bf{s}}})}^{({\bf{t}}+{\bf{1}})}-{({{\bf{E}}}^{{\bf{s}}})}^{({\bf{t}})}{\parallel }_{F},$$
(1)

where F denotes the Frobenious norm. Lt constrains the temporal embeddings not to change sharply. With the temporal skill embedding, our model can distinguish the development and change on skill semantic over time and maintains low model complexity. However, it should also be noticed that our model is not a forecasting model as training data of each time period is needed to train the corresponding embedding.

#### Skill-context interaction extraction

To increase fitting ability, CSVN takes both continuous context vectors (e.g., salary statistics of a city) and discrete contexts (e.g., city index) as inputs. Then, inspired by the famous CTR prediction model DeepFM24 in the field of recommender system, CSVN extracts both deep and shallow interactions between these job contexts and the skill. Specifically, the input contexts are processed in different manners and go though linear projection, multiplicative operation and Multi-Layer Perceptron (MLP) to extract interaction of different orders. Formally, each continuous context $$i\in {\mathcal{C}}$$ inputs a feature vector, written as $${{\bf{o}}}_{{\bf{i}}}^{{\bf{c}}}\in {{\mathbb{R}}}^{{d}_{i}}$$, where $${\mathcal{C}}$$ denotes continuous job contexts. Each discrete context $$i\in {\mathcal{D}}$$ inputs an index, CSVN encodes it into an one-hot representation $${{\bf{o}}}_{{\bf{i}}}^{{\bf{d}}}\in {{\mathbb{R}}}^{{m}_{i}}$$, where mi is the maximum possible value of this context. Then the linear projection extracts the first-order interaction as

$${{\bf{h}}}_{{\bf{1}}}=\mathop{\sum} \limits_{i\in {\mathcal{C}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{c}}l}{{\bf{o}}}_{{\bf{i}}}^{{\bf{c}}}+\mathop{\sum} \limits_{i\in {\mathcal{D}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{dl}}}{{\bf{o}}}_{{\bf{i}}}^{{\bf{d}}}+{{\bf{W}}}^{{\bf{sl}}}{{\bf{e}}}^{{\bf{s}}}+{{\bf{b}}}^{{\bf{l}}},$$
(2)

where $${{\bf{e}}}^{{\bf{s}}}\in {{\mathbb{R}}}^{{\rm{de}}}$$ denotes the input skill’s current embedding vector, $${{\bf{W}}}_{{\bf{i}}}^{{\bf{cl}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\times ^{{d}_{i}},{{\bf{W}}}_{{\bf{i}}}^{{\bf{dl}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\times ^{{m}_{i}}$$, $${{\bf{W}}}_{{\bf{i}}}^{{\bf{sl}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\times ^{\rm{de}}$$ and $${{\bf{b}}}^{{\bf{l}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}$$ are the trainable parameters, do1 is the output dimension. Then, multiplicative operation extracts the second-order interactions. Specifically, each discrete context $$i\in {\mathcal{D}}$$ is first assigned with an embedding $${{\bf{e}}}_{{\bf{i}}}^{{\bf{d}}}={{\bf{o}}}_{{\bf{i}}}^{{\bf{d}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{e}}},$$ where $${{\bf{W}}}_{{\bf{i}}}^{{\bf{e}}}\in {{\mathbb{R}}}^{{m}_{i}\times {\rm{de}}}$$ stores the value embeddings of context i. For continuous context $$i\in {\mathcal{C}}$$, we project the feature vector into the space of discrete job contexts, written as $${{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}={{\bf{o}}}_{{\bf{i}}}^{{\bf{c}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{p}}}+{{\bf{b}}}^{{\bf{p}}},$$ where $${{\bf{W}}}_{{\bf{i}}}^{{\bf{p}}}\in {{\mathbb{R}}}^{{d}_{i}\times {\rm{de}}}$$ and $${{\bf{b}}}^{{\bf{p}}}\in {{\mathbb{R}}}^{{\rm{de}}}$$ are trainable parameters. The multiplicative operation is formulated as

$${{\bf{h}}}_{{\bf{2}}}=\mathop{\sum}\limits _{i\in {\mathcal{C}}}\mathop{\sum}\limits _{i\ne j,j\in {\mathcal{C}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}\odot {{\bf{e}}}_{{\bf{j}}}^{{\bf{c}}}+\mathop{\sum}\limits _{i\in {\mathcal{D}}}\mathop{\sum}\limits _{i\ne j,j\in {\mathcal{D}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{d}}}\odot {{\bf{e}}}_{{\bf{j}}}^{{\bf{d}}}+\mathop{\sum}\limits _{i\in {\mathcal{C}}}\mathop{\sum}\limits _{j\in {\mathcal{D}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}\odot {{\bf{e}}}_{{\bf{j}}}^{{\bf{d}}}+{{\bf{e}}}^{{\bf{s}}}(\mathop{\sum}\limits _{i\in {\mathcal{C}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}+\mathop{\sum}\limits _{i\in {\mathcal{D}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{d}}}),$$
(3)

where  denotes element-wise multiplication. At last, MLP extracts the higher order information, which is tiled by several fully connected layers, formulated as

$${{\bf{x}}}^{({\bf{0}})}={{\bf{o}}}_{{\bf{0}}}^{{\bf{c}}}| {{\bf{o}}}_{{\bf{1}}}^{{\bf{c}}}| \cdots | {{\bf{e}}}_{{\bf{0}}}^{{\bf{d}}}| {{\bf{e}}}_{{\bf{1}}}^{{\bf{d}}}| {{\bf{e}}}^{{\bf{s}}},\quad {{\bf{x}}}^{({\bf{k}})}=\sigma ({{\bf{x}}}^{({\bf{k}}-{\bf{1}})}{({{\bf{W}}}^{{\bf{m}}})}^{({\bf{k}})}),\quad k=1,2,\cdots \ ,K$$
(4)

where K is the depth, $${({{\bf{W}}}^{{\bf{m}}})}^{({\bf{k}})}\in {{\mathbb{R}}}^{{d}_{m}^{(k-1)}\times {d}_{m}^{(k)}}$$ and x(k) denotes the parameter and the output of the k-th layer, σ denotes the activation function, ** denotes concatenating two vectors. We set the final output x(K) as the high-order interaction h3.

To provide context-skill representation for domination modeling, this MLP has a multi-head structure. Specifically, since outputs of the shallow layers are general context-skill interactions, while the whole MLP extracts value related information, the output of some shallow middle layer is fed into ASDN to extract domination related features. The details will be described in the salary prediction part.

#### Constrained value range modeling

CSVN estimates the value range by predicting its bounds. To assure that the predicted bounds can form a meaningful value range, we have two constraints. First, since skill value is a special case of salary, its lower bound is non-negative. Second, the upper-bound value is no less than the lower-bound value. We concatenate the extracted interaction of different orders, then estimate the range with two constrained linear projection, formulated as

$${v}^{l}=[{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{l}}}+{b}^{l},\quad {v}^{u}=[{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{u}}}+{b}^{u},\quad {\rm{s.t.}}\quad 0\le {v}^{l}\le {v}^{u}.$$
(5)

As vl and vu are intermediate variables of SSCN, its whole training process becomes a constrained optimization. However, it is hard for deep learning models to deal with constraints. Though we can add a soft constraint regularization to the loss function, it cannot guarantee the constraints are strictly satisfied and can easily cause the model fail to converge. To avoid constrained optimization and enable gradient descent, we adjust the network structure so that the constraints are naturally satisfied. Specifically, we add a non-negative activation to the lower-bound output, formulated as

$${v}^{l}=\max ([{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{l}}}+{b}^{l},0).$$
(6)

Next, instead of directly predicting the upper-bound value, we change the mission of the second linear projection to output the gap p between the bounds, the upper bound is thus calculated as vu = vl + p. The upper bound is guaranteed to be no smaller than the lower bound if we constrain the gap to be non-negative, formulated as $$p=\max ([{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{g}}}+{b}^{g},0).$$

### Attentive skill domination network

In Fig. 2d, we show the structure of ASDN. ASDN use features extracted by CSVN as the input, denoted by IA. From IA, it first independently extracts two kinds of skill representations with MLP. ASDN first extracts an important representation for each skill, which implicates the traits of the skill that impact their domination, e.g., some skills may be common and easy to become the major part of the jobs. Meanwhile, ASDN extracts an influence representation for each skill to model their influence on domination to each other. We use $${{\bf{X}}}_{{\bf{imp}}}\in {{\mathbb{R}}}^{N\times {\rm{dp}}}$$ and $${{\bf{X}}}_{{\bf{inf}}}\in {{\mathbb{R}}}^{N\times {\rm{di}}}$$ to denote the importance/influence representation, where each row of them is a skill’s representation and N denotes the number of appeared skills.

ASDN supposes the domination of skill is affected by three factors, which are its own importance, the global influence from all the skills, and the local influence from the related skills. The global influence is calculated as the averaged influence vector of all the skills, written as $${\bf{Q}}=\frac{{{\mathbb{1}}}^{{\mathsf{T}}}{{\bf{X}}}_{{\bf{inf}}}}{N},$$ where $${\mathbb{1}}\in {{\mathbb{R}}}^{{N}_{s}}$$ is a vector whose elements are all 1. The global influence is the same for all the skills, we regard it as the query in the attention mechanism. To model the influence from the neighboring skills, we apply a simple Graph Convolutional Network (GCN)25 on the skill graph to extract the local influence, formulated as

$${{\bf{U}}}^{({\bf{0}})}={{\bf{X}}}_{{\bf{inf}}},\quad {{\bf{U}}}^{({\bf{k}})}=\sigma ({\bf{A}}{{\bf{U}}}^{({\bf{k}}-{\bf{1}})}{({{\bf{W}}}^{{\bf{g}}})}^{({\bf{k}})}),\,k=1,2,\cdots \ ,{K}_{c},$$
(7)

where Kc is the depth of GCN, $${\bf{A}}\in {{\mathbb{R}}}^{N\times N}$$ is the adjacency matrix of the skill graph, Ai,j denotes the edge weight from skill i to skill j, $${{\bf{U}}}^{({\bf{k}})}\in {{\mathbb{R}}}^{N\times {d}^{(l)}}$$ stores the output vectors of all the nodes in the k-th layer, d(k) is the output dimension, and $${{\bf{W}}}^{{\bf{g}}}\in {{\mathbb{R}}}^{{d}^{(k)}\times {d}^{(k+1)}}$$ is the trainable parameter. We concatenate the importance vectors with the local influence vectors as the keys and calculates the dominations of each skill with an attention layer, formulated as

$$\widetilde{{\bf{a}}}=\tanh \left({\bf{Q}}{{\bf{W}}}^{{\bf{q}}}+[{{\bf{U}}}^{({{\bf{K}}}_{{\bf{c}}})}| {{\bf{X}}}_{{\bf{imp}}}]{{\bf{W}}}^{{\bf{k}}}\right){{\bf{W}}}^{{\bf{v}}},\quad {\bf{a}}={\rm{softmax}}(\widetilde{{\bf{a}}}),$$
(8)

where $${\bf{a}}\in {{\mathbb{R}}}^{N}$$, the element ai represents domination of the i-th skill, $${{\bf{W}}}^{{\bf{q}}}\in {{\mathbb{R}}}^{{\rm{di}}\times {\rm{da}}},{{\bf{W}}}^{{\bf{k}}}\in {{\mathbb{R}}}^{({d}^{(k)}+{\rm{dp}})\times {\rm{da}}}$$ and $${{\bf{W}}}^{{\bf{v}}}\in {{\mathbb{R}}}^{{\rm{da}}}$$ are the trainable parameters. To guarantee that each skill has separate domination factors for lower-bound and upper-bound salary, ASDN trains two sets of the above attentional parameters.

### Job salary prediction

For a job posting Jj, SSCN models its job salary as the weighted average of the skill value. The lower bound salary $${\widetilde{y}}_{j}^{l}$$ and upper bound salary $${\widetilde{y}}_{j}^{u}$$ is estimated as

$${\widetilde{y}}_{j}^{l}=\mathop{\sum }\limits_{i}^{| {{\bf{S}}}_{{\bf{j}}}| }{({v}^{l})}_{j}^{(i)}{({a}^{l})}_{j}^{(i)}\quad {\widetilde{y}}_{j}^{u}=\mathop{\sum }\limits_{i}^{| {{\bf{S}}}_{{\bf{j}}}| }{({v}^{u})}_{j}^{(i)}{({a}^{u})}_{j}^{(i)},$$
(9)

where $${({v}^{* })}_{j}^{i}$$ represents the value bound of the i-th skill in Jj, $${({a}^{* })}_{j}^{i}$$ represents the corresponding domination factor. We set the loss function to be the difference between the predicted and the real salary bounds, formulated as

$${L}_{s}=\frac{{\lambda }_{l}}{| {\mathcal{J}}| }\mathop{\sum }\limits_{j}^{| {\mathcal{J}}| }{({\widetilde{y}}_{j}^{l}-{y}_{j}^{l})}^{2}+\frac{{\lambda }_{u}}{| {\mathcal{J}}| }\mathop{\sum }\limits_{j}^{| {\mathcal{J}}| }{({\widetilde{y}}_{j}^{u}-{y}_{j}^{u})}^{2},$$
(10)

where $${y}_{j}^{* }$$ denote the observation of job salary bounds, λl and λu are hyper-parameters for balancing the importance of these two loss, $$| {\mathcal{J}}|$$ denotes the job postings set.

Combining the Ls with the skills’ temporal regularizer Lt, we formulate the loss function of SSCN as

$$L=\frac{{\lambda }_{l}}{| {\mathcal{J}}| }\sum _{j}{({\widetilde{y}}_{j}^{l}-{y}_{j}^{l})}^{2}+\frac{{\lambda }_{u}}{| {\mathcal{J}}| }\sum _{j}{({\widetilde{y}}_{j}^{u}-{y}_{j}^{u})}^{2}+\beta \mathop{\sum }\limits_{t=1}^{T-1}\parallel {{\bf{E}}}_{{\bf{s}}}^{({\bf{t}}+{\bf{1}})}-{{\bf{E}}}_{{\bf{s}}}^{({\bf{t}})}{\parallel }_{F},$$
(11)

where β is a hyperparameter balancing the importance of the temporal regularizer.

### Network configuration

The network configurations can be found in Table 4. Since the lower-bound salary is smaller than the upper bound, we set λl and λu to be 2 and 1. The time regularizer β was set to be 0.004. We use residual structure26 to accelerate the training and Leaky ReLU27 as the activation function. The weights are initialized with glorot normal initializer28. For optimization, we use Adam optimizer29. We found slight changes in parameters did not affect much on the performance. Specifically, the additional parameter experiments can be found in Supplementary Information.

### Baseline methods for salary prediction

Our baseline methods for salary prediction including four parts:

• Classic regression models including linear regression (LR), Support Vector Machine (SVM), and Gradient Boosting Decision Tree (GBDT). Since these methods process the structured feature vectors of fixed size, we concatenated the one-hot skillset representation, the averaged features of skills, and job context as their input.

• Deep Neural Network with the same depth and a similar number of variables as SSCN for fairness of comparison. The input was also the concatenated feature vector.

• Holistic Salary Benchmarking Matrix Factorization (HSBMF)30. HSBMF is the state-of-the-art salary benchmarking model. HSBMF groups the job advertisements into posts and predict their salary with matrix factorization. We used the job contextual information and skill requirements for building regularization matrices in HSBMF to assure it considers the same information as SSCN.

• State-of-the-art text mining-based methods. We compared two groups of typical methods that model the job postings as texts. The first group consists of well-adopted Natural Language Processing (NLP) network architectures trained in an end-to-end manner with our data, including Convolutional Neural Network (TextCNN)31,32, Hierarchical Attention Network (HAN)33, and the recently proposed Transformer-XL34. In these models, we used pre-trained Chinese word embeddings35 to initialize the parameters. The second group consists of state-of-the-art pre-trained models, including Bidirectional Encoder Representations from Transformers (BERT)36, Robustly optimized BERT approach (RoBERTa)37, and XLNet38. To better process our input data, we have adopted models trained with Chinese corpus39.

We also disabled some parts of SSCN to show their effectiveness, including two parts:

• “CSVN + Mean”, where we replaced ASDN with a mean pooling layer.

• “SSCN (Independ)”, where we disabled the range prediction part and train the models for the upper bound and lower bound independently.

For all the compared methods that are not designed for range prediction, we separately train the lower-bound and upper-bound regression model with them and validate their performances independently.

### Validation

We repeated 10 times of hold-out validation on the models. Specifically, at each time, we randomly split the data into training and testing set with a ratio of 4:1. We used the training data for model training and used the testing data for performance evaluation.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

The job posting data that support the findings of this study are available in figshare with the identifier “10.6084/m9.figshare.14060498”40. All data generated or analyzed during this study are included in this published article (and its Supplementary information files). Source data are provided with this paper.

## Code availability

Codes of this paper are available in CodeOcean with the identifier “10.24433/CO.0239280.v1”41.

## References

1. Ng, T. W. & Feldman, D. C. A conservation of resources perspective on career hurdles and salary attainment. J. Vocat. Behav. 85, 156–168  (2014).

2. Dix-Carneiro, R. & Kovak, B. K. Trade liberalization and the skill premium: a local labor markets approach. Am. Econ. Rev. 105, 551–57 (2015).

3. Burstein, A. & Vogel, J. International trade, technology, and the skill premium. J. Political Econ. 125, 1356–1412 (2017).

4. Xu, T., Zhu, H., Zhu, C., Li, P. & Xiong, H. Measuring the popularity of job skills in recruitment market: A multi-criteria approach. In AAAI 2018 (2018).

5. Wu, X. et al. Trend-aware tensor factorization for job skill demand analysis. In IJCAI 2019 (2019).

6. Card, D. & DiNardo, J. E. Skill-biased technological change and rising wage inequality: some problems and puzzles. J. Labor Econ. 20, 733–783 (2002).

7. Desjardins, R. et al. Oecd skills outlook 2013: first results from the survey of adult skills. J. Appl. Econom. 30, 1144–1168 (2013).

8. Kankaraš, M., Montt, G., Paccagnella, M., Quintini, G. & Thorn, W. Skills matter: Further results from the survey of adult skills. oecd skills studies. (OECD Publishing, 2016).

9. Yan, R. et al. Interview choice reveals your preference on the market: to improve job-resume matching through profiling memories. In ACM KDD 2019 (2019).

10. Zhu, C. et al. Person-job fit: Adapting the right talent for the right job with joint representation learning. ACM Transactions on Management Information Systems (TMIS) (2018).

11. Boselli, R., Cesarini, M., Mercorio, F. & Mezzanzanica, M. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 330–342 (Springer, 2017).

12. Boselli, R. et al. Wolmis: a labor market intelligence system for classifying web job vacancies. J. intell. inform. Syst. 51, 477–502 (2018).

13. for the Development of Vocational Training (Cedefop), E. C. The online job vacancy market in the eu: driving forces and emerging trends (2019).

14. for the Development of Vocational Training (Cedefop), E. C. Online job vacancies and skills analysis: a cedefop pan-european approach (2019).

15. Lovaglio, P. G., Cesarini, M., Mercorio, F. & Mezzanzanica, M. Skills in demand for ict and statistical occupations: Evidence from web-based job vacancies. Stat. Anal. Data Min. 11, 78–91 (2018).

16. Colombo, E., Mercorio, F. & Mezzanzanica, M. Applying machine learning tools on web vacancies for labour market and skill analysis. Terminator or the Jetsons? The Economics and Policy Implications of Artificial Intelligence (2018).

17. Arnett, K. P. & Litecky, C. R. Career path development for the most wanted skills in the mis job market. J. Syst. Manag.45, 6 (1994).

18. Willmott, C. J. & Matsuura, K. Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance.Clim. Res. 30, 79–82 (2005).

19. McGowan, M. A. & Andrews, D. Skill mismatch and public policy in oecd countries (2015).

20. Badcock, P. B., Pattison, P. E. & Harris, K.-L. Developing generic skills through university study: a study of arts, science and engineering in australia. High. Educ. 60, 441–458 (2010).

21. Council, N. R. et al. Education for Life and Work: Developing Transferable Knowledge and Skills in the 21st Century (National Academies Press, 2012).

22. Dawson, R. How significant is a boxplot outlier? J. Stat. Educ. https://www.amstat.org/publications/jse/v19n2/dawson.pdf (2011).

23. Lee, D. D. & Seung, H. S. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (2001).

24. Guo, H., Tang, R., Ye, Y., Li, Z. & He, X. Deepfm: a factorization-machine based neural network for ctr prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (ed. Sierra, C.) 1725–1731 (AAAI Press, California, 2017).

25. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR, California, 2017).

26. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE CVPR 2016 (2016).

27. Maas, A. L., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml (2013).

28. Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, 249–256 (2010).

29. Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR, California, 2015).

30. Meng, Q., Zhu, H., Xiao, K. & Xiong, H. Intelligent salary benchmarking for talent recruitment: A holistic matrix factorization approach. In IEEE ICDM 2018 (2018).

31. Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 746–1751 (2014).

32. Zhang, X., Zhao, J. & LeCun, Y. Character-level convolutional networks for text classification. Adv. Neural Inf. Process. syst. 28, 649–657 (2015).

33. Yang, Z. et al. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480–1489 (2016).

34. Dai, Z. et al. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2978–2988 (2019).

35. Li, S. et al. Analogical reasoning on chinese morphological and semantic relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics Vol. 2 (Short Papers), 138–143 (2018).

36. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), 4171–4186 (2019).

37. Liu, Y. et al. Roberta: A robustly optimized bert pretraining approach. Preprint at https://arxiv.org/abs/1907.11692v1 (2019).

38. Yang, Z. et al. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, 5753–5763 (2019).

39. Cui, Y. et al. Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, 657–668 (2020).

40. Sun, Y., Zhuang, F., Zhu, H., Zhang, Q. & Xiong, H. Job posting data. https://figshare.com/articles/dataset/Job_Posting_Data/14060498/1 (2021)

41. Sun, Y., Zhuang, F., Zhu, H., Zhang, Q. & Xiong, H. Market-oriented job skill valuation with cooperative composition neural network (code). https://codeocean.com/capsule/7695173/tree/v1 (2021).

## Acknowledgements

We thank the members of the Baidu Talent Intelligence Center for their support, ideas, and encouragement. The research work supported by the National Key Research and Development Program of China (Grant No. 2018YFB1004300), the National Natural Science Foundation of China (Grant Nos. U1836206, U1811461, 61773361, 91746301, and 61836013), the Project of Youth Innovation Promotion Association CAS (Grant No. 2017146).

## Author information

Authors

### Contributions

This work was accomplished when Y.S. and Q.Z. working as interns in Baidu supervised by H.S.Z. H.S.Z. came up with the idea of market-oriented skill valuation. Y.S. and H.S.Z. formulated the problem of Salary-Skill Value Composition Problem. Y.S. designed and implemented Salary-Skill Composition Network under the guidance of F.Z.Z. and H.S.Z. Q.Z. gave important advice on model structure. Y.S. and Q.Z. processed the data. Y.S., F.Z.Z., and H.S.Z. conceived the experiments and evaluated the results. F.Z.Z., H.S.Z., Q.H. and H.X. advised on the literature review, data process and technical design of this work. Y.S., H.S.Z. and H.X. wrote the paper. H.S.Z., F.Z.Z. and H.X. managed this project.

### Corresponding authors

Correspondence to Fuzhen Zhuang, Hengshu Zhu or Hui Xiong.

## Ethics declarations

### Competing interests

H.S.Z. is currently affiliated with Baidu. Y.S. and Q.Z. are currently affiliated with Baidu as research interns. The other authors declare no competing interests.

Peer review information Nature Communications thanks Fabio Mercorio and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Sun, Y., Zhuang, F., Zhu, H. et al. Market-oriented job skill valuation with cooperative composition neural network. Nat Commun 12, 1992 (2021). https://doi.org/10.1038/s41467-021-22215-y

• Accepted:

• Published:

• DOI: https://doi.org/10.1038/s41467-021-22215-y

• ### Market-oriented job skill valuation with cooperative composition neural network

• Ying Sun
• Fuzhen Zhuang
• Hui Xiong

Nature Communications (2021)