Introduction

In the era of knowledge economy, skilled talents are always precious treasures. Modern jobs require talents to have substantial and continuous investment on their job skills1,2,3. Therefore, understanding the value of job skill will fulfill the so-called "Skill Gap”4,5 between employers and talents, and bring them competitive edge to cope with the accelerating pace of technological changes. At the micro level, it can not only help individuals to proactively assess their competencies and decide what are the right skills to learn, but also help companies to develop the right salary system of their job positions for attracting and retaining the best possible talent. Moreover, at the macro level, the job skill value is an important indicator of the economic equilibrium of labour market and shows the supply and demand relationship associated with knowledge investments6.

During the past decades, researchers have devoted large efforts to assess the value of job skills in different manners. Many surveys and studies have shown evidence of a worldwide positive association between the distributions of job skill mastery and job salary2,3,7,8. However, due to the dynamic and indistinct nature of job skill value, traditional market survey-based approaches usually fail to provide a fine-grained and up-to-date analysis. In recent years, the newly available online recruitment services have accumulated abundant job advertisement data9,10, which provides an unparalleled chance for Labour Market Intelligence11,12 and data-driven job skill analysis13,14. Nevertheless, most existing studies are focused on job skill demand modeling4,5,15,16, but there still lacks a quantitative way to assess the value of job skills from the perspective of their influence on job salary.

Indeed, achieving quantitative job skill value assessment is far from a trivial task. Specifically, on one hand, the value of a specific skill is not immutable but varies with respect to different job contexts. For example, the talents experienced with algorithm related skills will be appreciated with high-paid jobs for a high-tech AI company, while the engineering skills may be the most valuable ones in a traditional software company. On the other hand, the job skills are usually not isolated, but integrated with each other as a holistic requirement for deciding the job salary. Indeed, along this line, the most critical challenge is that there usually lack of ground truth data of skill value for building an effective and quantitative assessment model. Therefore, how to separately assess the value of job skills and model their impact on job salary under various job contexts is still open to be explored.

To this end, in this paper, we propose a data-driven solution to skill value assessment from a market-oriented perspective through mining the job advertisement data. Specifically, we introduce a market-oriented definition of skill value, and formulate the task of skill value assessment as the Salary-Skill Value Composition Problem, where each job position is regarded as the composition of a set of required skills attached with the job’s contextual information, and the job salary is assumed to be influenced by the context-aware value of these skills. Along this line, we propose an enhanced neural network with cooperative structure, namely Salary-Skill Composition Network (SSCN), to separate the job skills and measure their value from the massive job postings. SSCN regards salary prediction as a cooperative task for skill valuation and holistically models the relationship between skills and the job salary, considering both skill value and domination. Figure 1 shows the schematic diagram of the key idea in this study. Indeed, SSCN provides a cooperative framework to train neural network models for knowledge discovery from unlabeled data, by quantitatively linking them with a supervised learning task. Extensive experiments on a real-world dataset clearly validate that SSCN can not only assign meaningful value to job skills in various job contexts, but also outperforms state-of-the-art models in terms of job salary prediction. Meanwhile, based on the results of SSCN, many interesting findings can be revealed, such as which skills will lead to high-paid jobs.

Fig. 1: A schematic diagram of the key idea in this study.
figure 1

(1) Our main task is to train a skill valuation model with machine learning technology. Under the paradigm of supervised learning, we need a set of training data with explicit labels of skill value to provide supervision for the model. Then the model can learn a function that maps the input (i.e., context and skills) to the observation (i.e., skill value). However, the labeled data of skill value is unavailable in our dataset. (2) We have abundant data of job postings with labels of salary, which can provide supervision for training a salary prediction model. Therefore, with the intuition that valuable skills should lead to high job salary, we regard salary prediction as a cooperative task that provides indirect supervision for skill valuation model. (3) We propose a model, SSCN, to simultaneously achieve skill valuation and salary prediction tasks, where the skill valuation model is a component of the salary prediction model. Specifically, SSCN estimates the skill value and composes skill value into job salary. In this way, the skill valuation model can be trained with feedbacks from the salary prediction task.

As a long-standing research direction, the value of job skills in the market is always abstract and has different measurements with respect to different application scenarios4,17. Different from existing studies, in this paper, we aim to introduce a market-oriented definition of skill value with job context awareness, emphasizing the direct impact of skills on job salary. To be specific, the value of a skill is defined as the expected salary of a job that only requires this skill, given a specific job context. It should be noticed that in this paper, context refers to all the factors other than the skill requirement that can influence the job salary, such as the company, recruitment time, work location, and required working experience.

Indeed, the above definition directly measures how much salary a skill will bring when people make full use of it in the job. The motivation behind this definition is to guarantee that the value of different skills can be measured in an independent and comparable manner. In order to precisely estimate this value under various job contexts, we train a model f with parameter Θ that calculates the skill value v = f(s, lv, CΘ) given a set of observable job contexts C and a skill s with level lv (i.e., the degree of mastery, refer to Fig. 2a for examples). To train the model, it is essential to obtain a set of training data containing job postings that only require one skill. However, in the real-world scenario, the job requirements are always complicated and cannot be qualified with only one skill. As a result, each job posting is always associated with multiple required skills, which makes it difficult to train the skill valuation model under the supervised learning paradigm.

Fig. 2: A schematic diagram of SSCN based skill valuation framework.
figure 2

a An example of job posting in our data, which consists of some structured contextual information (e.g., company name, timestamp of publishing, work location, and required working experience), expected range of monthly salary (i.e., lower/upper bound in RMB), and detailed job description that introduces the requirements on candidates' job skills. In particular, each skill usually has a descriptive requirement on the degree of mastery, such as Proficient in JavaScript, and Familiar with AS. b We formulate the job posting as a set of skills formed in a skill graph, and some contextual inputs. Our proposed SSCN estimates skill value and combines them into the job salary. The color gray, blue, yellow and pink indicate inputs, model structures, outputs, and loss functions, respectively. c The detailed structure of CSVN. d The detailed structure of ASDN.

Fortunately, the job salary can be regarded as a mixed value of corresponding required skills, and a job requiring many valuable skills should have a high salary. This intuition implies effective supervision for skill value assessment in an indirect way. In other words, if we can model the relationship between skill value and job salary, we can use job salary data to supervise the training of skill valuation model. Specifically, the job postings can be formulated as \({\mathcal{J}}=\{({{\bf{C}}}_{{\bf{j}}},{{\bf{S}}}_{{\bf{j}}},{{\bf{Y}}}_{{\bf{j}}})| j=1,2,\cdots \ \}\), where Cj denotes a set of job contexts, Sj denotes required skill set, Yj denotes the job salary. In particular, Sj consists of the corresponding skill-level pairs \({{\bf{S}}}_{{\bf{j}}}=\{({s}_{j}^{(i)},{{\rm{lv}}}_{j}^{(i)})| i=1,2,\cdots \ \},\) where \({s}_{j}^{(i)}\) is a skill and \({{\rm{lv}}}_{j}^{(i)}\) is the level. If we have a model that can precisely estimate the salary Yj of a job posting given the value of its required skills, a proper estimation on skill value can lead to a good estimation on the job salary. So in this paper, we regard job salary prediction as a cooperative task for skill valuation. Formally, we define the task of this paper as a Salary-Skill Value Composition Problem, which aims to jointly learn a context-aware skill value assessment model f: (skill, context → value) and a skill-based salary prediction model g: (<skill, value> → salary) from the job postings set \({\mathcal{J}}\). It should be noticed that, although there might exist more complicated relationships among job skills, context and salary, in the problem formulation, we only consider the skill value is context-aware and can be combined together in a linear way to reflect the job salary. In this way, our model can facilitate the measurement of the influence of contexts on individual skills as well as the influence of skills on job salary.

Based on the above, the salary of a job j can be formulated as \({\widetilde{y}}_{j}=g(\{({s}_{j}^{(i)},{{\rm{lv}}}_{j}^{(i)},{v}_{j}^{(i)})| i=1,2,\cdots \ \},{{\bf{C}}}_{{\bf{j}}}| {{\Phi }}),\) where Φ and Θ denote the parameters, \({v}_{j}^{(i)}=f({s}_{j}^{(i)},{{\rm{lv}}}_{j}^{(i)},{{\bf{C}}}_{{\bf{j}}}| {{\Theta }})\). By comparing the predicted job salary with the real salary, both the skill value assessment model f and skill-based salary prediction model g can be trained simultaneously.

To solve the Salary-Skill Value Composition Problem, we propose the SSCN that is a cooperative neural network containing two steps of modeling to achieve skill valuation (the main task) and salary prediction (the cooperative task) simultaneously. The structure of SSCN is shown in Fig. 2b. Specifically, SSCN takes a job posting as the input, calculates the value of all the involved skills and then combines them into the job salary in a straightforward but interpretable way.

The first part of SSCN is a specially designed Context-aware Skill Valuation Network (CSVN), as shown in Fig. 2c. It dynamically models the skills, extracts the context-skill interaction and estimates the context-aware skill value. According to our definition, skill value can be regarded as a special case of job salary, and since salary is given as a range in our data, CSVN models the skill value as a range. Specifically, CSVN assigns each skill with a non-negative lower bound and a non-negative upper bound, constraining that the upper bound is no less than the lower bound.

In the real-world working scenario, the employees allocate their time and effort among the skills according to the importance of different job duties. Intuitively, the more you use a specific skill during work, the more it will influence your salary. Simulating this process, we propose to model the job salary as the weighted average of the skill value. We call the weight as skill domination. This agrees with our definition of skill value because when a job only involves one skill, the only skill has full domination and the salary degenerates into its value. In this way, the skill value is comparable and independent with each other. Considering that skills may have combinatorial influences on salary, we let the model catch skill interactions through modeling the domination. Specifically, the skill co-appearance is considered to influence the domination of each skill, which assures the model to peel explainable skill value that is only context-dependent while maintaining the model’s fitting ability to general job postings. To model the domination, the second part of SSCN is a specially designed Attentive Skill Domination Network (ASDN), as shown in Fig. 2d. Considering that the skill domination can be affected by the related skills (e.g., one skill may play an important role in the job when many related skills are also required), ASDN models the domination with a graph-based approach. Specifically, we attach each job posting with a skill graph, where the node represents the involved skills, and the edge between two skills represents their relationship. ASDN combines this skill graph with context-skill interaction information extracted from CSVN and calculates skill domination with graph-based attention mechanism. Considering that the two salary bounds may correspond to different job duty allocation, for example, common skills may raise the salary lower bound instead of the upper bound, ASDN outputs different skill domination for the two bounds. The details of training both CSVN and ASDN can be found in “Methods”.

Indeed, SSCN models the relationship among skills, context and salary based on the observations of job advertisement data in an end-to-end manner. As a common issue of deep learning models, all the influencing factors and their complicated relationships are implicitly modeled as a blackbox, which is hard to be interpreted in a theoretical way. Nevertheless, it also brings the advantage that we only need to pay attention on the input (i.e., context and job skills) and output (i.e., job salary and skill value), while other latent influencing factors and relationships will be automatically learned by the hidden layers. In this way, the model is easy to be operated, and meanwhile, the skill value influenced by observable contexts can be explicitly estimated, which strongly supports further explainable analysis.

Results

To validate the models proposed in this paper, we collected IT-related job postings from a popular online recruitment website in China, namely Lagou (https://www.lagou.com/). Our dataset contains over 800,000 postings of various job positions across a time span of 36 months, ranging from July 2016 to June 2019. After filtering the data with some preprocessing steps, we got 215,308 samples. We used these samples to train and validate our model. The details of data preprocessing, feature selection, network configurations, numerical statistics, and additional experimental results can be found in Methods and Supplementary Information. In particular, we also conducted supplementary experiments on an additional designer-related job posting dataset to validate the generalization of our model.

Skill value analysis under different job contexts

Here we demonstrate the value of skills estimated by CSVN considering different kinds of job contexts. During our experiments, we found that the lower bound and upper bound of skill value always have a similar trend, so we mainly introduce the results of the lower bound, unless noted otherwise.

We define level influence as the average ratio of value increase when a level is specified. Figure 3a shows the levels' average influence (see Supplementary Fig. S8a for influence distribution), where we have used all the skill-level pair instances involving each level for the estimation. The detailed information on sample size and influence distribution can be found in Supplementary Table S10. We can observe that CSVN can significantly distinguish the impact of different levels. In general, most levels have a similar influence on both bounds, and sophisticated levels raise skill value more. In particular, the level Can Read, i.e., the lowest degree of mastery in our dataset, will decrease the skill value by 10%, while the level Versatile can contribute about 10% increase to the value. To get more insights, we show level influence on some specific skills in Table 1. In addition, we conducted significance test for better validating the results. It can be observed that, by ignoring the insignificant entries (i.e., p-value > 0.05), the table is generally consistent with the averaged influence. Nevertheless, the model also learns bias for some special cases. For example, while Know is a relatively low level of mastery, it has positive influence on skill value when describing JavaScript. The reason is that while JavaScript mostly appears in jobs that related to web development, the statement Know JavaScript usually acts as an additional requirement for some complicated and higher-paid jobs like architecture design. Therefore, the model overestimates the skill value due to the imbalanced data distribution. Indeed, this result is explainable from a market-oriented view. Specifically, the mastery level of a specific skill usually indicates the role that it plays in the job; and therefore, the skill value highly depends on the market pricing on the relevant jobs. However, as shown in Fig. 3 (a), the model will still work for the general cases. Furthermore, we calculated the ratio of skill-level observations that might cause the biased level influence estimations. The result shows that only very few samples (0.96% of the whole dataset) encounter this bias. The detailed calculation can be found in the Supplementary Information. A possible solution for alleviating this kind of bias is to enlarge the diversity of the recruitment market data, which is a valuable direction for our future studies. Supplementary Fig. S6a shows the level influence on the designer dataset. The result slightly differs from the result on the IT dataset, which further indicates that level influence varies with respect to occupations.

Fig. 3: Skill valuation concerning different job contests.
figure 3

a We calculated the influence of level lv as \({r}_{s}^{{\rm{lv}}}=\frac{{\sum }_{i,j}{\mathbb{1}}\{{{\rm{lv}}}_{j}^{(i)}={\rm{lv}}\}({v}_{j}^{(i)}-{v}_{{s}_{j}^{(i)}})/{v}_{{s}_{j}^{(i)}}}{{\sum }_{i,j}{\mathbb{1}}\{{{\rm{lv}}}_{j}^{(i)}={\rm{lv}}\}}\), where and \({v}_{{s}_{j}^{(i)}}\) denotes the averaged value of skill \({s}_{j}^{(i)}\). We also show the 95% confidence interval (CI) in the figure, where data are presented as mean values ± CI. We use different colors to indicate level influence on different bounds. b CSVN assigns the skills with temporal embeddings to catch their dynamic changes, we show the average value of some skills at different time intervals. The shadow shows the 95% confidence interval, where data are presented as mean values ± CI. We use different colors to indicate different skills. c The value of some randomly selected skills with different length of working experience. The shadow shows the 95% confidence interval, where data are presented as mean values ± CI. We use different colors to indicate different skills. d To analyze the value of skills with respect to different companies, we draw the value distribution of some popular skills in five famous Chinese Internet companies on boxplots. The box shows the quartiles of the dataset. The whiskers extend to show the rest of the distribution except for outliers. Specifically, as a common practice, we regarded the samples outside 1.5 times interquartile range (IQR) above the upper quartile or below the lower quartile as outliers. We use different colors to indicate different skills.

Table 1 The level influence on 6 kinds of programming skills.

In this study, time is also regarded as one kind of job context. CSVN assigns the skills with temporal embeddings, this supports dynamic skill value analysis. From Fig. 3b, we can observe that fluctuations exist on skill value, and the skills have different trends of value change (see Supplementary Table S12 for numerical statistics). Some interesting findings can also be observed from the figure. On the whole, Architecture has a relatively stable trend of value increase. Specifically, in 2016-H2, its value is 21.8 K RMB on average. Then, it increased 5% on average for every half-year and reached 27.6 K RMB in 2019-H1. This indicates a rising market demand for this skill, which is good news for architects. However, some hot skills like GoLang and Recommender System seem to be less stable. Especially, GoLang has sharp value increase and decrease. For example, in 2019-H1, its value decreased by 26%, from 28.2 K RMB to 20.8 K RMB on average. This reminds students not to simply pursue the hottest new skills on the market, because their related industry may be still unstable. According to our experiment, we find that many skills with high value meet value decrease in the first half of 2019. We guess this phenomenon is due to the so-called Internet Winter of China this year. The trend of value for designer skills can be found in Supplementary Fig. S6b. Interestingly, the designer skills are stable and there is no general value decrease in the first half of 2019, which indicates that recent market changes have more influence on IT practitioners than designers.

Skill value under different experience requirements can provide talents with a long-term reference on choosing skills to learn. CSVN considers working experience requirements as one kind of job context and has a strong ability on inferring the experience-aware value, even for new skills. For example, although GoLang was officially released in 2009, we can still estimate its value with the working experience of longer than 10 years as 32.0 K RMB by smoothly extending the line. Figure 3c shows that longer experience leads to higher skill value (see Supplementary Table S13 for numerical statistics). Compared with the graduates, 10 years of working experience increases the skill value by 2.5 times on average. This is reasonable because a highly experienced talent usually can get a higher salary. But the speed of value rise has some differences among the skills. For example, Architecture and Project Management increase slowly in the first several years, while quickly after 3–5 years. Specifically, although Algorithm has a higher value (12.8 K RMB) for graduates, in the long term, the value of Project Management (10.2 K RMB for graduates) increases faster and achieves the similar value as Algorithm after 10 years. Similarly, Machine Learning has a higher value (16.8 K RMB) than Architecture (16.4 K RMB) for graduates and increases fast in the first several years. It can be observed that, with 1–3 years’ experience, the value of Machine Learning (24.2 K RMB) is 20% higher than Architecture (19.9 K RMB). However, the rank is reversed after 5 years. This result makes sense, because the ability on Architecture and Project Management accumulates during work, while talents’ programming skills usually gain fast the first several years of their career and may decrease as they get older. We can conclude that CSVN can provide good experience-aware skill value assessment. This provides students a reference to consider their longer future career when choosing a skill to learn, instead of only comparing the job salary at an entry-level. In addition to skills that get you a fortune at the moment you graduate from school, learning skills that are valuable for you in the future may also be a good choice. We also show the experience influence on designer skills in Supplementary Fig. S6c, which shows the similar trend with that of the IT dataset.

For job seekers, the best choice is to work in companies that treasure the skills they possess. Figure 3d shows skill value distribution in different companies, where we have used all the skill-company pair instances involving each corresponding skill-company pair for the estimation. The detailed information on sample size and numerical statistics can be found in Supplementary Table S14. It can be observed that, due to the differences in business strategy, skills are valued differently by different companies. This reveals the traits of companies. For example, while most of these companies give a much higher value to Architecture than Algorithm, ByteDance values them similarly. Besides, ByteDance is the only company that values Python (23.9 K RMB on average) more than Java (21.0 K RMB on average). This implies ByteDance attaches high importance to some research works. In JD.com, Java has a larger range of value distribution than in other companies. Specifically, the gap between the two quartiles of Java in JD.com is 13 K RMB, which is much larger than the gaps of 7 K RMB in the other 4 companies. This implies the higher possibility of salary increase for a Java engineer in JD.com. Meanwhile, different from others, the value of skills in Baidu is quite stable, which means the demand for different skills is more comprehensive. In Supplementary Fig. S6d, we show the distribution of designer-related skill value on these companies. It can be observed that the companies also have different preferences in designer-related skills.

Evaluation on salary prediction

We compared the performance of SSCN on salary prediction with several baseline methods (see details in “Methods”). The performance is evaluated with root mean square error (RMSE) and mean absolute error (MAE)18, which are both popular metrics for difference measurement between the observations and the predictions. The results of the evaluation are listed in Table 2. There are several observations. First, SSCN outperforms all the baseline models, especially in terms of RMSE where there is a 3.5% decrease on lower bound prediction and 5.2% decrease on upper bound prediction compared to BERT, which outperforms the rest of the baseline models. Though SSCN has a larger variance due to its complex structure, its worst performance is still significantly better than the others’ best performances. Second, SSCN outperforms the linear models (i.e., SVM and LR). To assure the physical meanings of the skill value, SSCN simplifies the last layer of skill composition into a linear form. However, SSCN is still a complicated non-linear deep learning model that can seize the complicated relation between skill, context and salary. So it performs much better than the real linear models. Third, since accurately predict context-aware job salary is a more difficult problem than standard salary benchmarking, HSBMF performs not well. But SSCN can achieve more accurate salary prediction under specific job contexts. Fourth, by replacing ASDN with a mean pooling layer, the model’s performance decreased a lot. This proves the effectiveness of skill domination on job salary modeling. Fifth, simultaneously estimating the two bounds of the range in a single model improves the performance. This is because the lower bound and upper bound of job salary are strongly correlated. In addition to giving constraints on the bounds, CSVN also extracts a shared shallow representation for them. In this way, the two bounds can get part of the supervision from each other, which reduces the chance of over-fitting. The experimental results on salary prediction on the designer dataset can be found in Supplementary Table S8, which are consistent with the results of the IT dataset. Furthermore, we conducted parameter experiments to demonstrate the robustness of our model, which can be found in Supplementary Fig. S5 and Supplementary Table S7. The results show that SSCN is parameter insensitive and can be easily adopted without carefully tuning the hyper-parameters.

Table 2 Performance evaluation on salary prediction.

It can be concluded that, with the cooperation of the salary prediction task, SSCN trains a quantitative and accurate skill valuation model without using any labeled skill value data. Since skill valuation is an essential component of job salary prediction in SSCN, SSCN’s performance on job salary prediction also quantitatively demonstrated the effectiveness of our skill valuation model.

Discussion

With the Salary-Skill composition structure, SSCN decouples the job salary into the value of every involved skills by modeling skill domination. Here, we analyze this composition process holistically and show the effect of its factors.

Skill domination versus skill value

The multiplication of value and domination of some skill in a job posting is its actual contribution to the salary. To analyze the effect of domination and value, we display the averaged value, domination, and salary contribution of machine learning-related skills in Fig. 4. The numerical statistics can be found in Supplementary Table S16–S18. On the whole, more generic skills have higher domination, while more specific skills have higher value. For example, Unsupervised Learning (with domination 37.8% on average) and Multivariable Regression (with domination 46% on average) have high domination, showing many jobs need them. Graph Algorithm (with domination 18.2% on average) has lower domination but higher value (with value 35.2 K RMB on average), indicating that although there are fewer jobs that can make full use of it, you can easily get high salary if you find one. Indeed, most jobs in the market are not so professional and are dominated by some generic skills. In these jobs, some high-value skills may also be involved, but they are usually not a major part of the work. Also, the rapidly-emerging new skills with the fast technology changes enlarge the skill gap between job candidates and employers19. As a result, from the viewpoint of the employers, although it is usually difficult to find candidates who perfectly meet their specific skill requirements, the talents owning generic skills are usually able to quickly learn and adapt to the required skills20. Accordingly, higher education in recent years have been focusing on teaching theoretical and basic knowledge, and cultivating students’ learning ability and problem-solving skills rather than teaching specific skills21. This phenomenon enlarges the domination of more generic skills in the job market.

Fig. 4: Visualizations of skill salary decompositions.
figure 4

a We calculated the averaged context-aware skill value estimated by CSVN and drew word cloud of machine learning-related skills, where the size of each word shows the skill value. b Similar to the word cloud of skill value, we drew word cloud for averaged skill domination estimated by ASDN. c For each skill in a job posting, we can calculate the actual salary contribution of it by calculating the multiplication of its value and domination, and show the averaged contribution of each skill on the word cloud. d A case study on a job posting, the role of each skill is analyzed by calculating their domination, contribution, and influence on job salary. The color of words in the job description shows the skills' influence on salary, blue/yellow/red means the salary will increase/remain/decrease by dropping the skill. The pie plots show skill domination and contribution where the colors distinguish different skills.

Our experimental result implies that the breadth of your knowledge decides how easy you can find a job, while the depth of your skill helps to raise your salary. In this way, it becomes a trade-off between domination and value when choosing a skill to learn, the averaged contribution becomes a good reference, as is shown in Fig. 4c, Topic Model (with contribution 8.5 K RMB on average) is a good learning choice. It should be noticed that having a low averaged domination does not mean the skill never dominates a job. When you have excellent knowledge of some specific skills (which is always true for Ph.D. students), you should be confident that you can find somewhere to make full use of your ability. Wordclouds for the designer dataset can be found in Supplementary Fig. S7, where we can distinguish generic and specific skills for designer-related jobs.

The influence of skill on job salary

For a skill required in a job posting, we can estimate its influence by calculating how much will the salary decrease if we remove this skill from the requirement. By fixing the domination of the other skills and getting their weighted average of value, the new salary can be estimated as \(y^{\prime} =\frac{y-v}{1-d},\) where v and d represents the value and domination of the removed skill. The ratio of decrease is \(r=\frac{y-y^{\prime} }{y},\) where y denotes the previous job salary. In Table 3, we can observe that generally, high value and high domination lead to high influence. For example, Matrix Calculation has a high value and high domination, by dropping it, the job salary will decrease by 18.4% on average. According to this table, machine learning-related skills have positive influence on job salary. We will show in the next part that some skills may have negative influences on job salary.

Table 3 Skill’s average influence on salary.

Case study on a job posting

Everyone wants a job where they can give full play to their ability. However, the job descriptions may contain job duties both you are good at and not good at. Understanding the role of each required skill in a job can help job seekers to decide if a job is suitable for them. For each job posting, SSCN predicts the value of each skill under the specified context, calculates the skill domination based on the skill co-appearance, and finally combines the skill value into the job salary according to the domination. Figure 4d shows the case study to illustrate how SSCN works on a job posting. Specifically, we used the trained SSCN to decompose a randomly selected job posting and analyzed the domination, contribution, and overall influence on salary of the involved skills. This job description is to employ an algorithm engineer who has two parts of job duties, which are data mining with business data and product development. Compared with the coding skills, Machine Learning and Deep Learning have much higher domination and contribution on the job salary, indicating that the job expects a data mining expert instead of an experienced engineer. Though with similar domination, Deep Learning has a much higher contribution than Machine Learning, which is because it has a higher value under its job context. We can also observe that Deep Learning contributes a lot to the higher-bound salary, which agrees with the job description where Deep Learning is listed as the additional requirement. From the above analysis, we can find that the job seekers can try this job if they are good at machine learning and deep learning, there is no need to worry much if they are mediocre at coding. Also, it can be observed that data mining duty has a positive influence while the development duty has a negative influence on the salary. This indicates that if you are an expert in data mining, maybe you should look for a full-time data mining job, it may bring you a higher salary.

Potential applications

Through the experiments, we show that our skill valuation model has the potential to be applied to various real-world applications. First, our model can be applied to talent recruitment. As can be observed in Table 2, our model achieves high performance on salary prediction. Therefore, it can provide salary references for jobs in the market when the job descriptions are specified. With the predicted salary information, the recruiters can evaluate the market competitiveness of their offered salaries; and the job seekers can get an idea about their salary expectations. Second, our model can be directly applied to business market analysis. For example, as can be observed in Fig. 3b, our model reveals the overall trend of skill value in the market. Third, our model can be applied to student education. Specifically, the skill value provides the students with market-oriented guidance for skill learning. For example, with the experience-aware skill value shown in Fig. 3c, students are able to make better personalized curriculum choices to achieve long-term career development. Fourth, our model can be applied to knowledge management and talent development. For example, as shown in Fig. 3d, the companies can analyze the value of skills for their own business. Then, they can develop specific curriculums to continuously train their employees for valuable skills. Fifth, our model can be applied to job recommendation. For example, by measuring the average value of skills in different companies, as shown in Fig. 3d, job seekers can receive effective guidance on which company is more suitable for them to pursue.

Technical contribution

Since indirect supervision is common in the real-world, we believe that this work not only provides an intelligent and accurate solution for the skill valuation problem but also can be an inspiration for readers who work on data analysis in other fields of applications. Specifically, in many real-world scenarios, obtaining labeled training data is far from an easy job. It is often the case that we can only obtain indirect supervision from a related task. Learning skill valuation model from job salary data is one of these kinds of problems. In this problem, we have no labeled data of skill value, but we have job salary data as indirect supervision information, with the intuition that high skill value usually leads to high job salary. To this end, we proposed a machine learning-based solution that uses neural network with cooperative structure to model the relationship between job and skills, where the salary prediction is regarded as a cooperative task for training the skill valuation model. In this way, we obtain an effective skill valuation model under the indirect supervision of job salary data.

Limitations

The first limitation of this paper is the limited data. On the one hand, since our work is based on the accumulated job advertisements in online recruitment website, which has a short history, we are not able to provide insights about the long-term job skill development. On the other hand, since our research has certain requirements on the data quality (e.g., detailed skill requirement, job salary and contextual information), in this paper we only evaluated our model with two datasets collected from one of the largest and most popular Chinese online recruitment website of Internet-related industry. This may bring bias to the analysis. If provided with more large-scale and comprehensive data, our model will obtain more significant insights. The second limitation is the empirical validation of skill value. Since market-oriented skill valuation is a new research problem, we are not able to obtain ground truth for quantitatively validating the accuracy of our model. Therefore, in this paper, we evaluated the performance of our model with the task of salary prediction. The rational behind our evaluation is that, with the explicitly formulated relationship between salary and skill, the effectiveness of skill value will be revealed from the salary prediction performance. In the future, we plan to continuously update our research by seeking more data sources and collaborations for further validating our model.

Methods

Job posting formulation

A summary of the notations in this paper can be found in Supplementary Table S2. As shown in Fig. 2a, we formulate a job posting Jj as (Cj, Sj, Yj), where Cj denotes a set of job contexts, Sj denotes required skill set, and Yj denotes the expected range of job salary. \({{\bf{S}}}_{{\bf{j}}}=\{({s}_{j}^{(i)},l{v}_{j}^{(i)})| i=1,2,\cdots \ \}\) is a set of skill-level pairs involved in the job description, where \({s}_{j}^{(i)}\) is a skill and \({{\rm{lv}}}_{j}^{(i)}\) is its level on the degree of mastery, for example Proficient in JavaScript. Considering that the relations between involved skills may affect job salary (e.g., a skill affects the job salary more if many skills related to it are also required), we attach Sj with a skill graph Aj, where each node represents an involved skill and the edge weights represent the co-appearing relations between them.

Data preprocessing

We extracted 14 level words and 1374 IT-related skill words, so that the job descriptions can be formulated into structured records. The detailed descriptions of data preprocessing can be found in Supplementary Information. Then, we counted the co-appearing frequency of every two skills in the job advertisements. If the frequency is larger than a pre-defined threshold, we added an edge between these two skills, whose weight is the normalized co-appearing frequency. To reduce noise, we first filtered full-time job postings. Next, we ranked the cities according to the number of the samples they involve and filtered the job postings of the top 16 cities, which covers over 90% of the data. Then, we dropped the records whose upper-bound or lower-bound salary is a boxplot outlier22 in the dataset (see Supplementary Fig. S3 for the salary distribution). Finally, we ranked the companies according to the number of involved samples and filtered job postings of the top 1000 companies. After the above preprocessing, we got 215,308 job postings. Based on the observable contexts, we extracted continuous and discrete features to form the input of the model. The detailed descriptions of feature extraction can be found in Supplementary Table S4.

Overall process

The pseudocode of the overall model training and applying process of this paper can be found in Algorithm 1.

Algorithm 1. Overall process

Require: Dtrain: training set; Dtest: testing set; η: learning rate; MaxIter: the number of training iterations.

  1: Build model \({\mathcal{M}}\) with initial parameter Φ;

  2: /**Training**/

  3: Forit 1 MaxIterdo

  4:  Sbatch ← randomly split Dtrain into batches;

  5:  For each DbatchSbatchdo

  6:    dΦ = 0;

  7:    For each (skillset, context, y) Dbatchdo

  8:     \(d{{\Phi }}=d{{\Phi }}+\frac{\partial Loss({\mathcal{M}}(skillset,context;{{\Phi }}),y)}{\partial {{\Phi }}}\);

  9:   Φ = Φ − ηdΦ;

10: /**Validation**/

11: Ypred, Ytrue ← empty lists;

12: for each (skillset, context, y) Dtestdo

13:  Predict salary range \(\widetilde{y}={\mathcal{M}}(skillset,context;{{\Phi }})\);

14:  Store \(\widetilde{y}\) in Ypred;

15:  Store y in Ytrue;

16: Calculate MAE(Ypred, Ytrue) and RMSE(Ypred, Ytrue);

17: /**Value Estimation**/

18: For (skillset, context, y) DtrainDtestdo

19:  For each (level, skill) skillsetdo

20:   Estimate value v and domination d for (level, skill, context) with \({\mathcal{M}}(\cdot ;{{\Phi }})\).

21: Analyze skill value;

Context-aware skill valuation network

Temporal skill embedding

Considering that the skills’ traits change over time, CSVN assigns temporal embeddings for skills at each time interval. To reduce model complexity, we use the idea of Matrix Factorization23 and assume the skill embedding is composed of a low-ranked embedding and a latent projecting matrix. Formally, \({{\bf{E}}}_{{\bf{s}}}^{({\bf{t}})}={({{\bf{W}}}^{{\bf{us}}})}^{({\bf{t}})}{{\bf{W}}}^{{\bf{vs}}},\quad t=1,2,\cdots \ ,T,\) where \({{\bf{E}}}_{{\bf{s}}}^{({\bf{t}})}\in {{\mathbb{R}}}^{{N}_{s}\times {\rm{de}}}\) stores the skill embeddings of the t-th time interval, T is the number of time intervals, Ns denotes the size of the skill vocabulary, \({({{\bf{W}}}^{{\bf{us}}})}^{({\bf{t}})}\in {{\mathbb{R}}}^{{N}_{s}\times {\rm{dl}}}\) is the low-ranked skill embeddings of the t-th time interval, \({{\bf{W}}}^{{\bf{vs}}}\in {{\mathbb{R}}}^{{\rm{dl}}\times {\rm{de}}}\) is the latent projection shared by all the time intervals, de is the embedding dimension, dl is the number of latent factors. Though the temporal embedding gives CSVN the ability to model skills’ dynamic changes, it brings higher model complexity. To avoid over-fitting, we add a temporal regularization to the model, formulated as

$${L}_{t}=\mathop{\sum }\limits_{t=1}^{T-1}\parallel {({{\bf{E}}}^{{\bf{s}}})}^{({\bf{t}}+{\bf{1}})}-{({{\bf{E}}}^{{\bf{s}}})}^{({\bf{t}})}{\parallel }_{F},$$
(1)

where F denotes the Frobenious norm. Lt constrains the temporal embeddings not to change sharply. With the temporal skill embedding, our model can distinguish the development and change on skill semantic over time and maintains low model complexity. However, it should also be noticed that our model is not a forecasting model as training data of each time period is needed to train the corresponding embedding.

Skill-context interaction extraction

To increase fitting ability, CSVN takes both continuous context vectors (e.g., salary statistics of a city) and discrete contexts (e.g., city index) as inputs. Then, inspired by the famous CTR prediction model DeepFM24 in the field of recommender system, CSVN extracts both deep and shallow interactions between these job contexts and the skill. Specifically, the input contexts are processed in different manners and go though linear projection, multiplicative operation and Multi-Layer Perceptron (MLP) to extract interaction of different orders. Formally, each continuous context \(i\in {\mathcal{C}}\) inputs a feature vector, written as \({{\bf{o}}}_{{\bf{i}}}^{{\bf{c}}}\in {{\mathbb{R}}}^{{d}_{i}}\), where \({\mathcal{C}}\) denotes continuous job contexts. Each discrete context \(i\in {\mathcal{D}}\) inputs an index, CSVN encodes it into an one-hot representation \({{\bf{o}}}_{{\bf{i}}}^{{\bf{d}}}\in {{\mathbb{R}}}^{{m}_{i}}\), where mi is the maximum possible value of this context. Then the linear projection extracts the first-order interaction as

$${{\bf{h}}}_{{\bf{1}}}=\mathop{\sum} \limits_{i\in {\mathcal{C}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{c}}l}{{\bf{o}}}_{{\bf{i}}}^{{\bf{c}}}+\mathop{\sum} \limits_{i\in {\mathcal{D}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{dl}}}{{\bf{o}}}_{{\bf{i}}}^{{\bf{d}}}+{{\bf{W}}}^{{\bf{sl}}}{{\bf{e}}}^{{\bf{s}}}+{{\bf{b}}}^{{\bf{l}}},$$
(2)

where \({{\bf{e}}}^{{\bf{s}}}\in {{\mathbb{R}}}^{{\rm{de}}}\) denotes the input skill’s current embedding vector, \({{\bf{W}}}_{{\bf{i}}}^{{\bf{cl}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\times ^{{d}_{i}},{{\bf{W}}}_{{\bf{i}}}^{{\bf{dl}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\times ^{{m}_{i}}\), \({{\bf{W}}}_{{\bf{i}}}^{{\bf{sl}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\times ^{\rm{de}}\) and \({{\bf{b}}}^{{\bf{l}}}\in {{\mathbb{R}}}_{1}^{{\rm{do}}}\) are the trainable parameters, do1 is the output dimension. Then, multiplicative operation extracts the second-order interactions. Specifically, each discrete context \(i\in {\mathcal{D}}\) is first assigned with an embedding \({{\bf{e}}}_{{\bf{i}}}^{{\bf{d}}}={{\bf{o}}}_{{\bf{i}}}^{{\bf{d}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{e}}},\) where \({{\bf{W}}}_{{\bf{i}}}^{{\bf{e}}}\in {{\mathbb{R}}}^{{m}_{i}\times {\rm{de}}}\) stores the value embeddings of context i. For continuous context \(i\in {\mathcal{C}}\), we project the feature vector into the space of discrete job contexts, written as \({{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}={{\bf{o}}}_{{\bf{i}}}^{{\bf{c}}}{{\bf{W}}}_{{\bf{i}}}^{{\bf{p}}}+{{\bf{b}}}^{{\bf{p}}},\) where \({{\bf{W}}}_{{\bf{i}}}^{{\bf{p}}}\in {{\mathbb{R}}}^{{d}_{i}\times {\rm{de}}}\) and \({{\bf{b}}}^{{\bf{p}}}\in {{\mathbb{R}}}^{{\rm{de}}}\) are trainable parameters. The multiplicative operation is formulated as

$${{\bf{h}}}_{{\bf{2}}}=\mathop{\sum}\limits _{i\in {\mathcal{C}}}\mathop{\sum}\limits _{i\ne j,j\in {\mathcal{C}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}\odot {{\bf{e}}}_{{\bf{j}}}^{{\bf{c}}}+\mathop{\sum}\limits _{i\in {\mathcal{D}}}\mathop{\sum}\limits _{i\ne j,j\in {\mathcal{D}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{d}}}\odot {{\bf{e}}}_{{\bf{j}}}^{{\bf{d}}}+\mathop{\sum}\limits _{i\in {\mathcal{C}}}\mathop{\sum}\limits _{j\in {\mathcal{D}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}\odot {{\bf{e}}}_{{\bf{j}}}^{{\bf{d}}}+{{\bf{e}}}^{{\bf{s}}}(\mathop{\sum}\limits _{i\in {\mathcal{C}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{c}}}+\mathop{\sum}\limits _{i\in {\mathcal{D}}}{{\bf{e}}}_{{\bf{i}}}^{{\bf{d}}}),$$
(3)

where  denotes element-wise multiplication. At last, MLP extracts the higher order information, which is tiled by several fully connected layers, formulated as

$${{\bf{x}}}^{({\bf{0}})}={{\bf{o}}}_{{\bf{0}}}^{{\bf{c}}}| {{\bf{o}}}_{{\bf{1}}}^{{\bf{c}}}| \cdots | {{\bf{e}}}_{{\bf{0}}}^{{\bf{d}}}| {{\bf{e}}}_{{\bf{1}}}^{{\bf{d}}}| {{\bf{e}}}^{{\bf{s}}},\quad {{\bf{x}}}^{({\bf{k}})}=\sigma ({{\bf{x}}}^{({\bf{k}}-{\bf{1}})}{({{\bf{W}}}^{{\bf{m}}})}^{({\bf{k}})}),\quad k=1,2,\cdots \ ,K$$
(4)

where K is the depth, \({({{\bf{W}}}^{{\bf{m}}})}^{({\bf{k}})}\in {{\mathbb{R}}}^{{d}_{m}^{(k-1)}\times {d}_{m}^{(k)}}\) and x(k) denotes the parameter and the output of the k-th layer, σ denotes the activation function, ** denotes concatenating two vectors. We set the final output x(K) as the high-order interaction h3.

To provide context-skill representation for domination modeling, this MLP has a multi-head structure. Specifically, since outputs of the shallow layers are general context-skill interactions, while the whole MLP extracts value related information, the output of some shallow middle layer is fed into ASDN to extract domination related features. The details will be described in the salary prediction part.

Constrained value range modeling

CSVN estimates the value range by predicting its bounds. To assure that the predicted bounds can form a meaningful value range, we have two constraints. First, since skill value is a special case of salary, its lower bound is non-negative. Second, the upper-bound value is no less than the lower-bound value. We concatenate the extracted interaction of different orders, then estimate the range with two constrained linear projection, formulated as

$${v}^{l}=[{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{l}}}+{b}^{l},\quad {v}^{u}=[{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{u}}}+{b}^{u},\quad {\rm{s.t.}}\quad 0\le {v}^{l}\le {v}^{u}.$$
(5)

As vl and vu are intermediate variables of SSCN, its whole training process becomes a constrained optimization. However, it is hard for deep learning models to deal with constraints. Though we can add a soft constraint regularization to the loss function, it cannot guarantee the constraints are strictly satisfied and can easily cause the model fail to converge. To avoid constrained optimization and enable gradient descent, we adjust the network structure so that the constraints are naturally satisfied. Specifically, we add a non-negative activation to the lower-bound output, formulated as

$${v}^{l}=\max ([{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{l}}}+{b}^{l},0).$$
(6)

Next, instead of directly predicting the upper-bound value, we change the mission of the second linear projection to output the gap p between the bounds, the upper bound is thus calculated as vu = vl + p. The upper bound is guaranteed to be no smaller than the lower bound if we constrain the gap to be non-negative, formulated as \(p=\max ([{{\bf{h}}}_{{\bf{1}}}| {{\bf{h}}}_{{\bf{2}}}| {{\bf{h}}}_{{\bf{3}}}]{{\bf{W}}}^{{\bf{g}}}+{b}^{g},0).\)

Attentive skill domination network

In Fig. 2d, we show the structure of ASDN. ASDN use features extracted by CSVN as the input, denoted by IA. From IA, it first independently extracts two kinds of skill representations with MLP. ASDN first extracts an important representation for each skill, which implicates the traits of the skill that impact their domination, e.g., some skills may be common and easy to become the major part of the jobs. Meanwhile, ASDN extracts an influence representation for each skill to model their influence on domination to each other. We use \({{\bf{X}}}_{{\bf{imp}}}\in {{\mathbb{R}}}^{N\times {\rm{dp}}}\) and \({{\bf{X}}}_{{\bf{inf}}}\in {{\mathbb{R}}}^{N\times {\rm{di}}}\) to denote the importance/influence representation, where each row of them is a skill’s representation and N denotes the number of appeared skills.

ASDN supposes the domination of skill is affected by three factors, which are its own importance, the global influence from all the skills, and the local influence from the related skills. The global influence is calculated as the averaged influence vector of all the skills, written as \({\bf{Q}}=\frac{{{\mathbb{1}}}^{{\mathsf{T}}}{{\bf{X}}}_{{\bf{inf}}}}{N},\) where \({\mathbb{1}}\in {{\mathbb{R}}}^{{N}_{s}}\) is a vector whose elements are all 1. The global influence is the same for all the skills, we regard it as the query in the attention mechanism. To model the influence from the neighboring skills, we apply a simple Graph Convolutional Network (GCN)25 on the skill graph to extract the local influence, formulated as

$${{\bf{U}}}^{({\bf{0}})}={{\bf{X}}}_{{\bf{inf}}},\quad {{\bf{U}}}^{({\bf{k}})}=\sigma ({\bf{A}}{{\bf{U}}}^{({\bf{k}}-{\bf{1}})}{({{\bf{W}}}^{{\bf{g}}})}^{({\bf{k}})}),\,k=1,2,\cdots \ ,{K}_{c},$$
(7)

where Kc is the depth of GCN, \({\bf{A}}\in {{\mathbb{R}}}^{N\times N}\) is the adjacency matrix of the skill graph, Ai,j denotes the edge weight from skill i to skill j, \({{\bf{U}}}^{({\bf{k}})}\in {{\mathbb{R}}}^{N\times {d}^{(l)}}\) stores the output vectors of all the nodes in the k-th layer, d(k) is the output dimension, and \({{\bf{W}}}^{{\bf{g}}}\in {{\mathbb{R}}}^{{d}^{(k)}\times {d}^{(k+1)}}\) is the trainable parameter. We concatenate the importance vectors with the local influence vectors as the keys and calculates the dominations of each skill with an attention layer, formulated as

$$\widetilde{{\bf{a}}}=\tanh \left({\bf{Q}}{{\bf{W}}}^{{\bf{q}}}+[{{\bf{U}}}^{({{\bf{K}}}_{{\bf{c}}})}| {{\bf{X}}}_{{\bf{imp}}}]{{\bf{W}}}^{{\bf{k}}}\right){{\bf{W}}}^{{\bf{v}}},\quad {\bf{a}}={\rm{softmax}}(\widetilde{{\bf{a}}}),$$
(8)

where \({\bf{a}}\in {{\mathbb{R}}}^{N}\), the element ai represents domination of the i-th skill, \({{\bf{W}}}^{{\bf{q}}}\in {{\mathbb{R}}}^{{\rm{di}}\times {\rm{da}}},{{\bf{W}}}^{{\bf{k}}}\in {{\mathbb{R}}}^{({d}^{(k)}+{\rm{dp}})\times {\rm{da}}}\) and \({{\bf{W}}}^{{\bf{v}}}\in {{\mathbb{R}}}^{{\rm{da}}}\) are the trainable parameters. To guarantee that each skill has separate domination factors for lower-bound and upper-bound salary, ASDN trains two sets of the above attentional parameters.

Job salary prediction

For a job posting Jj, SSCN models its job salary as the weighted average of the skill value. The lower bound salary \({\widetilde{y}}_{j}^{l}\) and upper bound salary \({\widetilde{y}}_{j}^{u}\) is estimated as

$${\widetilde{y}}_{j}^{l}=\mathop{\sum }\limits_{i}^{| {{\bf{S}}}_{{\bf{j}}}| }{({v}^{l})}_{j}^{(i)}{({a}^{l})}_{j}^{(i)}\quad {\widetilde{y}}_{j}^{u}=\mathop{\sum }\limits_{i}^{| {{\bf{S}}}_{{\bf{j}}}| }{({v}^{u})}_{j}^{(i)}{({a}^{u})}_{j}^{(i)},$$
(9)

where \({({v}^{* })}_{j}^{i}\) represents the value bound of the i-th skill in Jj, \({({a}^{* })}_{j}^{i}\) represents the corresponding domination factor. We set the loss function to be the difference between the predicted and the real salary bounds, formulated as

$${L}_{s}=\frac{{\lambda }_{l}}{| {\mathcal{J}}| }\mathop{\sum }\limits_{j}^{| {\mathcal{J}}| }{({\widetilde{y}}_{j}^{l}-{y}_{j}^{l})}^{2}+\frac{{\lambda }_{u}}{| {\mathcal{J}}| }\mathop{\sum }\limits_{j}^{| {\mathcal{J}}| }{({\widetilde{y}}_{j}^{u}-{y}_{j}^{u})}^{2},$$
(10)

where \({y}_{j}^{* }\) denote the observation of job salary bounds, λl and λu are hyper-parameters for balancing the importance of these two loss, \(| {\mathcal{J}}|\) denotes the job postings set.

Combining the Ls with the skills’ temporal regularizer Lt, we formulate the loss function of SSCN as

$$L=\frac{{\lambda }_{l}}{| {\mathcal{J}}| }\sum _{j}{({\widetilde{y}}_{j}^{l}-{y}_{j}^{l})}^{2}+\frac{{\lambda }_{u}}{| {\mathcal{J}}| }\sum _{j}{({\widetilde{y}}_{j}^{u}-{y}_{j}^{u})}^{2}+\beta \mathop{\sum }\limits_{t=1}^{T-1}\parallel {{\bf{E}}}_{{\bf{s}}}^{({\bf{t}}+{\bf{1}})}-{{\bf{E}}}_{{\bf{s}}}^{({\bf{t}})}{\parallel }_{F},$$
(11)

where β is a hyperparameter balancing the importance of the temporal regularizer.

Network configuration

The network configurations can be found in Table 4. Since the lower-bound salary is smaller than the upper bound, we set λl and λu to be 2 and 1. The time regularizer β was set to be 0.004. We use residual structure26 to accelerate the training and Leaky ReLU27 as the activation function. The weights are initialized with glorot normal initializer28. For optimization, we use Adam optimizer29. We found slight changes in parameters did not affect much on the performance. Specifically, the additional parameter experiments can be found in Supplementary Information.

Table 4 The network configurations.

Baseline methods for salary prediction

Our baseline methods for salary prediction including four parts:

  • Classic regression models including linear regression (LR), Support Vector Machine (SVM), and Gradient Boosting Decision Tree (GBDT). Since these methods process the structured feature vectors of fixed size, we concatenated the one-hot skillset representation, the averaged features of skills, and job context as their input.

  • Deep Neural Network with the same depth and a similar number of variables as SSCN for fairness of comparison. The input was also the concatenated feature vector.

  • Holistic Salary Benchmarking Matrix Factorization (HSBMF)30. HSBMF is the state-of-the-art salary benchmarking model. HSBMF groups the job advertisements into posts and predict their salary with matrix factorization. We used the job contextual information and skill requirements for building regularization matrices in HSBMF to assure it considers the same information as SSCN.

  • State-of-the-art text mining-based methods. We compared two groups of typical methods that model the job postings as texts. The first group consists of well-adopted Natural Language Processing (NLP) network architectures trained in an end-to-end manner with our data, including Convolutional Neural Network (TextCNN)31,32, Hierarchical Attention Network (HAN)33, and the recently proposed Transformer-XL34. In these models, we used pre-trained Chinese word embeddings35 to initialize the parameters. The second group consists of state-of-the-art pre-trained models, including Bidirectional Encoder Representations from Transformers (BERT)36, Robustly optimized BERT approach (RoBERTa)37, and XLNet38. To better process our input data, we have adopted models trained with Chinese corpus39.

We also disabled some parts of SSCN to show their effectiveness, including two parts:

  • “CSVN + Mean”, where we replaced ASDN with a mean pooling layer.

  • “SSCN (Independ)”, where we disabled the range prediction part and train the models for the upper bound and lower bound independently.

For all the compared methods that are not designed for range prediction, we separately train the lower-bound and upper-bound regression model with them and validate their performances independently.

Validation

We repeated 10 times of hold-out validation on the models. Specifically, at each time, we randomly split the data into training and testing set with a ratio of 4:1. We used the training data for model training and used the testing data for performance evaluation.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.