A large dataset of semantic ratings and its computational extension

Wang, Shaonan; Zhang, Yunhao; Shi, Weiting; Zhang, Guangyao; Zhang, Jiajun; Lin, Nan; Zong, Chengqing

doi:10.1038/s41597-023-01995-6

Download PDF

Data Descriptor
Open access
Published: 23 February 2023

A large dataset of semantic ratings and its computational extension

Shaonan Wang ORCID: orcid.org/0000-0001-5455-1359^1,2,
Yunhao Zhang^1,2,
Weiting Shi^3,4,
Guangyao Zhang^3,4,
Jiajun Zhang^1,2,
Nan Lin^3,4 &
…
Chengqing Zong^1,2

Scientific Data volume 10, Article number: 106 (2023) Cite this article

3699 Accesses
4 Citations
2 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 22 August 2023

This article has been updated

Abstract

Evidence from psychology and cognitive neuroscience indicates that the human brain’s semantic system contains several specific subsystems, each representing a particular dimension of semantic information. Word ratings on these different semantic dimensions can help investigate the behavioral and neural impacts of semantic dimensions on language processes and build computational representations of language meaning according to the semantic space of the human cognitive system. Existing semantic rating databases provide ratings for hundreds to thousands of words, which can hardly support a comprehensive semantic analysis of natural texts or speech. This article reports a large database, the Six Semantic Dimension Database (SSDD), which contains subjective ratings for 17,940 commonly used Chinese words on six major semantic dimensions: vision, motor, socialness, emotion, time, and space. Furthermore, using computational models to learn the mapping relations between subjective ratings and word embeddings, we include the estimated semantic ratings for 1,427,992 Chinese and 1,515,633 English words in the SSDD. The SSDD will aid studies on natural language processing, text analysis, and semantic representation in the brain.

The Three Terms Task - an open benchmark to compare human and artificial semantic representations

Article Open access 02 March 2023

Semantic projection recovers rich human knowledge of multiple object features from word embeddings

Article 14 April 2022

Structural differences in the semantic networks of younger and older adults

Article Open access 12 December 2022

Background & Summary

Accumulating behavioral and neural evidence indicates that semantic representation of words is distributed across multiple neural subsystems, each representing a particular dimension of semantic information^1,2. These semantic subsystems and dimensions provide important clues for the organization of the human semantic system. To investigate how word meaning is represented and processed in the human brain, many studies have built databases of word ratings on the psychologically and neurobiologically plausible semantic dimensions^1,3,4,5,6. Different from free-association-based⁷ and feature-generation-based^8,9 semantic databases, dimension-based semantic databases provide quantified rating scores for words on experiential semantic dimensions, which enable investigating the behavioral and neural impacts of semantic dimensions on language processes^3,4,10 and building computational representations of language meanings^{11,12,13,14,15}. However, the existing databases typically contain hundreds to thousands of words and are not large enough to support comprehensive semantic analysis of natural texts or speech. For example, if a researcher wants to analyze the behavioral or neural effects of particular semantic dimensions of words during natural text processing, then they may want to conduct an analysis using the semantic ratings of all or most of the words of the text; however, in most cases, the existing semantic rating databases can only provide the ratings of a small part of the words.

This article reports a large semantic rating database named the Six Semantic Dimension Database (SSDD)¹⁶. The SSDD focuses on six major semantic dimensions: vision, motor, socialness, emotion, time, and space. The visual and motor dimensions are included to reflect the impact of sensory-motor experience on semantic representation. Sensory and motor dimensions are probably the most frequently investigated semantic dimensions, and their importance for object and action concepts has been well established^{1,4,17,18,19,20}. Among the multiple sensory dimensions associated with semantic representation, we chose the visual dimension because vision is the dominant sensory modality. The behavioral and neural impacts of visual and motor semantics on cognitive processing have been indicated by many previous studies^{18,19,21,22,23,24,25}. The social and emotional dimensions are included to reflect the impact of social-emotional experience on semantic representation. These dimensions have dissociable neural correlates^{10,24,26,27,28,29,30,31,32,33} and are especially important for the representation of mental and abstract concepts^{5,32,33,34,35}. Huth et al.² investigated the organization of semantic representation in the brain using a data-driven approach and found that social-emotional and sensory-motor semantics are associated with the opposite ends of the most important data-driven semantic dimension. Therefore, the social and emotional dimensions can serve as important supplements to the visual and motor dimensions to reflect semantic representation. The time and space dimensions are especially important for the representation of events and situations^36,37,38. Dissociable neural correlates of these dimensions have also been indicated by neuropsychological and neuroimaging research^37,39. The representativeness of the six dimensions has been reflected by a comprehensive review of experiential semantic attributes by Binder et al.¹. Binder et al.¹ summarized 65 semantic dimensions belonging to 14 domains, among which more than 2/3 of the dimensions belong to the domains of vision, motor, socialness, emotion, time, and space. The SSDD treats these six domains as coarse-grained semantic dimensions and provides general ratings for each of them.

The SSDD contains two datasets: the first is the subjective ratings for 17,940 commonly used Chinese words on the six semantic dimensions. The second is a computational extension of the subjective rating data. We combined the subjective ratings with computational models and then estimated the semantic ratings of 1,427,992 Chinese and 1,515,633 English words. The SSDD makes it possible to analyze the semantic components of various natural language materials, such as natural texts, speeches, and the language produced by neurological and psychiatric patients.

Methods

Subjective rating dataset

Participants

A total of 85 healthy undergraduate and graduate students (52 women, M age = 22.73 years, SD age = 2.24) participated in the rating experiment. All participants were native Chinese speakers. No participant had suffered from psychiatric or neurological disorders or sustained a head injury. Each participant read and signed the informed consent form before the experiment. All experiments were approved by and performed in accordance with guidelines and regulations of the Institutional Ethics Committee at the Institute of Psychology of the Chinese Academy of Sciences. Participants were asked to complete at least one rating experiment session (see Procedure of the rating experiments) and were compensated with 30 RMB per session. Each participant could complete as many sessions as they wanted as they passed the quality evaluation every time. Those who failed the quality evaluation once were not allowed to complete more sessions. Following exclusions (see Procedure of the rating experiments), the final sample comprised 80 participants (49 women, M age = 22.88 years, SD age = 2.21) who provided at least one session of valid data.

Stimuli

The stimuli were 17,940 items that could be separated into three sets based on their sources. The first set of items was 12,814 high-frequency Chinese words selected from the Wikipedia Chinese corpus (https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2). These items were selected based on four inclusion criteria: (1) They are the 20,000 most frequent items of the Wikipedia Chinese corpus; (2) they are also included in at least one of two supplementary Chinese corpora, that is, the Contemporary Chinese Dictionary⁴⁰ and the Chinese Linguistic Data Consortium (2003) corpus (https://catalog.ldc.upenn.edu/LDC2003T09); (3) they do not contain any non-Chinese characters; and 4) they were judged as Chinese words but not phrases or nonwords and were not judged as proper nouns by at least two of three independent raters (two authors, Nan Lin and Weiting Shi, and a graduate volunteer). We used supplementary Chinese corpora and subjective assessments because the boundaries between words and phrases in Chinese are vague. There is often a discrepancy between different corpora and between corpora and subjective judgments⁴¹. We excluded proper nouns because participants’ knowledge of them may highly depend on personal experiences and interests.

The second set of items was 4,915 Chinese words selected from the stimuli of two recently published fMRI datasets^42,43, a published study¹⁰, and several unpublished experiments of ours. Items were excluded if they contained non-Chinese characters or were evaluated as nonwords, phrases, or proper nouns by at least two of three independent raters.

The last set of items was 211 Chinese translations for the English stimuli of the semantic rating experiments from Binder et al.¹ and Tamir et al.⁵. The two studies included 535 and 166 English words, respectively. Most of their Chinese translations had already been included in the first two sets of items. The remaining 211 translations (which include a small number of phrases) were included as the last set of items. The rating data of these items were used to validate the results (see Technical Validation).

Procedure of the rating experiments

We conducted six rating experiments on the 17,940 items, each focusing on one semantic dimension. Each experiment was separated into 18 sessions containing 1,000 words (the last session contained 940 words). The data were collected through the free-access online platform “Wen Juan Xing” (https://www.wjx.cn/). Except for the rating experiment on the semantic dimension of emotion, which used a 13-point scale (−6 = very negative, 0 = neutral, and 6 = very positive), all other rating experiments used 7-point scales (7 = very high, and 1 = very low). Before each rating session, participants read instructions about the working definitions for the semantic dimension to be rated (See Table 1) plus a few example words with high and low ratings. For the semantic dimension of motor, we further specify the working definition based on the charade/pantomime rating from previous studies:^18,44,45,46 “Please rate the extent to which the meaning of a word can easily and quickly trigger corresponding body actions in your mind. Specifically, suppose you were playing a pantomime game in which one person had to identify a word based on how another person mimicked various actions that might be associated with its meaning. The easier a word is for the game, the higher its rating score should be; the harder a word is, the lower its rating score should be.”

Table 1 Working definition of each semantic dimension.

Full size table

To control the quality of the rating data, after each session of rating, we calculated the correlation between the ratings of each participant and the mean ratings of the remaining participants using Jamovi (https://www.jamovi.org/). For a given session, if the correlation between the ratings of a participant and those of the others was lower than 0.5, then the data of this participant would be excluded¹, and the participant would be excluded from the rest of our experiment. This criterion resulted in the rejection of 28 sessions or 0.87% of the data. If the data of a participant were excluded, a new participant was recruited to complete the rating session. For each session of each experiment, 30 valid participants were recruited.

Data analysis

For each experiment, we calculated the average rating for each word to represent its value on the rated semantic dimension. In addition to the six rated dimensions, a seventh semantic dimension was obtained by calculating the absolute value of the average emotion rating for each word. We believe this dimension (i.e. valenced vs. neutral) reflects the relatedness of word meanings to emotion (see Technical Validation for evidence of this argument). We added 1 to this measure to match its scale with that of the five nonemotion ratings.

As shown in Fig. 1, the distributions of the word ratings on all dimensions for the 17940 items are skewed, indicating that for each dimension, only a small proportion of words contain rich semantic information. Because our experimental stimuli are composed of the most commonly used Chinese words, these distributions should represent Chinese vocabulary. Figure 2 shows the correlations between the seven dimensions of rating data. Most correlations are low, indicating that the semantic dimensions are mostly independent. The highest correlations were found between the dimensions Vision and Motor (r = 0.49) and Vision and Space (r = 0.40). These correlations are reasonable because the visual system plays an important role in perceiving and acquiring motor and spatial information.

Computational extension dataset

Chinese

By combining the subjective rating data with computational models, we estimated the semantic ratings of a vocabulary of 1,427,992 Chinese words. This vocabulary was constructed by including the words consisting of Chinese characters and with counts no less than 5 in the Xinhua news corpus (19.7 GB in total and collected from http://www.xinhuanet.com/whxw.htm).

We first tested a variety of context-insensitive models (GloVe and Word2Vec that their word embeddings are static) and context-sensitive models (GPT2, BERT ERNIE, and MacBERT⁴⁷ that their word embeddings vary according to their context) in predicting semantic ratings. Results show that Word2vec and MacBERT achieved the best performance in their category in the cross-validation analysis (average Pearson correlation between the predicted and actual rating scores across all dimensions: 0.613, 0.782, 0.850, 0.877, 0.881, 0.886, for GloVe, Word2Vec, GPT2, BERT, ERNIE and MacBERT). Therefore, in the following experiments, we utilized these two representative models to extract word representations for Chinese words. Specifically, for Word2vec, we used the default parameters as Skip-Gram architecture with embedding dimensions of 300. To obtain the word embeddings from MacBERT, following Chersoni et al.⁴⁸, we extracted 10 to 1,000 sentences for each word (depending on the counts of the word) from the Xinhua corpus and used MacBERT to calculate the representations of the sentences. We then calculated the averaged sentence representation and used it as the word representation. We obtained 1,427,992 word representations from Word2vec and 900,243 (here, we only use words with counts greater than 10) from MacBERT.

Afterward, for each of the seven semantic dimensions and each word embedding method, we trained a ridge regression model with a 10-fold cross-validation method to learn the mapping function from word representations to the mean semantic ratings of corresponding words. We then use the best-trained regression model (which achieved the lowest error on the validation set out of 10 models from the 10-fold validation) to estimate the semantic ratings for the extended Chinese vocabulary.

English

We also extended the computational dataset to an English vocabulary. This extension is based on the assumption that the Chinese and English semantic spaces share the coarse-grained semantic dimensions that we studied. The assumption got direct support from the high cross-language validity of our Chinese dataset: for all semantic dimensions, the ratings of Chinese words were strongly correlated with the ratings of their English translations from previously published English rating datasets (see Technical Validation). The English vocabulary includes 1,515,633 words. This vocabulary was constructed by including the words with counts no less than 5 in the Wikipedia corpus (13 G and downloaded from https://dumps.wikimedia.org/enwiki/latest/). Consistent with the methods for constructing the extensional Chinese dataset, we utilized Word2vec with default parameters and the pretrained BERT model (which has been proven to achieve the best performance at predicting semantic features among other variations⁴⁹) to extract word representations for each English word. Specifically, to obtain word embeddings from BERT, we first extract 10 to 1,000 sentences (depending on the word counts) from the Wikipedia corpus for each. We use BERT to calculate the sentence representations. We then use the averaged sentence representation as the word representation. We obtained 1,515,633 word representations from Word2vec and 930,668 (here, we only use words with counts more than 10) from BERT.

To estimate the semantic ratings of English words, we first trained a model to align the English embedding space to the Chinese embedding space. Specifically, we extracted all single word translation pairs (i.e., remove Chinese-English pairs in which English is more than one word) from a Chinese to English dictionary, the CC-CEDICT, at https://www.mdbg.net/chinese/dictionary?page=cc-cedict, which is the largest open-sourced Chinese to English dictionary to our knowledge, and obtained a Chinese-English bilingual lexicon of 19,424-word pairs. Next, we trained a ridge regression model to learn the mapping between the English embeddings and Chinese embeddings based on the bilingual word pairs.

Finally, the semantic ratings for the English words were estimated in two steps. First, we projected each English word representation from the English semantic space to the Chinese semantic space. Second, the projected word representation was taken as the input word representation for the semantic rating prediction models for Chinese words. Then the output of the model was taken as the estimated semantic rating for the English word.

Data Records

The SSDD¹⁶ is available on the OSF repository at https://doi.org/10.17605/OSF.IO/N5VKE. The data are sorted into two main folders. The first main folder is “Main_Data,” in which we provided the final subjective and estimated rating results. The second main folder is called “Supplementary_Data,” in which we provided the information of participants, the instructions for the rating experiments, the raw rating data, the validation data for the subjective ratings and computational extension ratings, the word embeddings, and the code for calculating and validating the estimated ratings. More details about the data are provided below.

Main Data

The average ratings across participants for the 17,940 Chinese words on the six rated semantic dimensions are provided in the file “Rated_semantic_dimensions.csv.” Additionally, we also provided the absolute value of the average emotion rating for each word as the seventh dimension, which is called “emotion_abs + 1” in the file. The estimated semantic ratings for extensional Chinese and English vocabularies using different computational models are provided in four files named “Estimated_semantic_dimensions_word2vec_Chinese.csv”, “Estimated_semantic_dimensions_macbert_Chinese.csv”, “Estimated_semantic_dimensions_word2vec_English.csv”, and “Estimated_semantic_dimensions_bert_English.csv”.

Supplementary data

Information of participants

The file named “Information_Participants.xlsx” provides the age and sex of the participants, the number of valid and invalid sessions and words that the participants’ data contain, and which of the six experiments the participants participated in and provided valid data.

Instructions for the rating experiments

At the start of each session of each rating experiment, participants were provided an instruction that contained the working definition of the semantic dimension and a few examples. These instructions are provided in the file named “Instructions.docx.”

Raw rating data

The raw data of the 6 rating experiments are provided under the subfolder named “Raw_rating_data.” The data are sorted into six folders named by the rated dimensions (Vision, Motor, Socialness, Emotion, Time, and Space). Under each folder, 18 files (named “session*.csv,” in which * is 1 to 18) correspond to the 18 sessions. In each file, the column named “Word” provides the items for which the semantic ratings were collected, for example, ‘花朵’ (meaning “Flower” in English). The remaining columns are named by the initials of 30 participants who rated the words and show the rating scores from each participant.

Validation data for the subjective ratings

In the file “Validation_Ratings.xlsx,” we provided the data and results of the validation analyses for our ratings. In addition to the validation analyses and results mentioned in the section “Validity of the subjective rating dataset” for each dimension of ratings, we also provided the correlations of our ratings to several fine-grained semantic dimensions of ratings provided by Binder et al.¹.

Validation data for the computational extension ratings

In the file “Validation_Computational_Ratings.xlsx,” we provided the data and results of the validation analyses for our computational extension dataset.

Word embeddings

The word vectors used to compute the computational extension datasets are provided in the subfolder named “Word embeddings,” including the Word2vec and MacBERT embeddings for Chinese words and the Word2vec and BERT embeddings for English words.

Code for calculating and validating the estimated ratings

See the section “Code Availability.”

Technical Validation

Reliability of the subjective rating dataset

We examined the reliability of the ratings by computing the intraclass correlation coefficients (ICCs) for each experiment and each session. For each experiment, we calculated the one-way random ICC because different participants rated different items; for each session, we calculated the two-way random ICC because there were always 30 consistent participants who rated all items^50,51. The results are summarized in Tables 2, 3. For all experiments and all sessions, the ICCs were above 0.9, which indicates good reliability of the ratings. In addition, in the SSDD, we rerated the socialness of 945 words from a prior study of us¹⁰ and obtained a cross-study correlation of 0.955.

Table 2 ICCs for each experiment (One-Way Random).

Full size table

Table 3 ICCs for each session of each experiment (Two-Way Random, consistency).

Full size table

Validity of the subjective rating dataset

We examined the validity of the ratings by calculating the correlations between the ratings obtained in the current study and those provided in several previous studies. The results are shown in Table 4. The full set of validation data is provided in the Supplementary Data of the database.

Table 4 Results of the validation analysis.

Full size table

For the semantic dimension of vision (visual imageability), the ratings were validated based on Binder et al.¹, Liu et al.²², and Su et al.⁵². The rating instructions used in the 4 studies are similar. Binder et al.¹, Liu et al.²², and Su et al.⁵² obtained their ratings using English words, single-character Chinese words, and two-character Chinese words, respectively. Their correlations to the current study are 0.756, 0.627, and 0.821. The relatively lower correlation to Liu et al.²² than to Su et al.⁵² might be due to that many Chinese characters have multiple meanings so that single-character words are more often ambiguous in their semantics than two-character words.

For the semantic dimension of motor, the ratings were validated based on Heard et al.⁴⁵ and Binder et al.¹. The rating instructions used in the current study and Heard et al.⁴⁵ are very similar, both focusing on how easily a word’s referent can be pantomimed. Similar motor-semantic ratings have been used to reflect the general impact of motor-semantic representation on cognition and neural activities in several previous studies^{18,44,46,53,54}. The correlation between Heard et al.⁴⁵ and the current study is 0.806. Binder et al.¹ did not set any general rating for the motor dimension. We therefore correlated our ratings with the four fine-grained motor ratings of Binder et al.¹, i.e., Head, UpperLimb, LowerLimb, and Practice. The correlations are in the range of 0.133 to 0.342. We further correlated our ratings with the mean of the four motor ratings and obtained a correlation of 0.426. These relatively low correlations indicate that the four motor dimensions rated by Binder et al.¹ may not be able to fully explain the content of our ratings. It is likely that our ratings reflect more dimensions of motor knowledge than those included in Binder et al.¹, such as postures and gestures. For example, the word ‘怀孕’ meaning “pregnant” was rated as high-socialness. Being pregnant is associated with specific postures and whole-body motor features but not with specific motor features of the head, feet, or hands. In addition, people often use gestures to represent some particular concepts, especially when performing pantomimes or playing charades.These gestures should also be viewed as a type of motor knowledge as long as people can reach a consensus on their meanings. The motor-rating instructions used in Heard et al.⁴⁵ and the current study should be more sensitive in detecting these additional types of motor knowledge than those used by Binder et al.¹.

For the semantic dimension of socialness, the ratings were validated based on Diveica et al.³ and Binder et al.¹. The core ideas of the instructions used in the 3 studies are all centered on interpersonal interactions and relationships. However, the instructions used in the current study and Binder et al.¹ were both brief, while those used by Diveica et al.³ were much more detailed, that is, “a social characteristic of a person or group of people, a social behavior or interaction, a social role, a social space, a social institution or system, a social value or ideology, or any other socially relevant concept.” The correlations of our ratings to those of Diveica et al.³ and Binder et al.¹ are both 0.724.

For the semantic dimension of emotion (valence), the ratings were validated based on Xu et al.⁵⁵ and Binder et al.¹. The instructions used in the current study and Xu et al.⁵⁵ are similar, and the correlation between the two studies is 0.935. The emotion ratings of the current study are closely associated with two dimensions of Binder et al.¹, that is Pleasant and Unpleasant. Therefore, we calculated composite scores of the two dimensions by subtracting the ratings of Unpleasant from those of Pleasant and correlated the scores with our ratings. The correlation is 0.795.

We also validated the absolute values of our emotion ratings. As mentioned above, this measure, which can be referred to as the dimension of “valenced vs. neutral”, can reflect the relatedness of word meanings to emotion. To validate this measure, we correlate the absolute values with the emotion ratings collected by Tamir et al.⁵. The correlation is 0.617, indicating that the absolute values of our emotion ratings can reflect the general emotional relatedness of words. Additionally, the absolute value of valence is also related to another important dimension of emotion, called arousal. Arousal increases as a function of both positive and negative valence⁵⁶. The absolute value of valence rating has been used to represent arousal in some previous studies⁵⁷. We correlated the absolute values of our emotion ratings with the arousal ratings provided in Xu et al.⁵⁵ and Binder et al.¹. The correlations are 0.585 and 0.532, respectively, which is consistent with the findings in the literature.

Finally, for the semantic dimensions of time and space, we validated the ratings based on Binder et al.¹. Binder et al.¹ did not set any general rating for these dimensions. Therefore, we averaged the ratings of two time-related dimensions (Time and Duration) to correspond to our time ratings and averaged the ratings of six space-related dimensions (Landmark, Path, Scene, Near, Toward, and Away) to correspond to our space ratings. The correlations are 0.715 and 0.716 for time and space ratings, respectively.

The validation results indicate good validity of our ratings. They also indicate that the semantic ratings of the current study can be to a large extent generalized from Chinese to English. As shown in Table 4, for all 6 original ratings, the cross-language correlations can reach above 0.7. The only low correlation is with the “Motor_General” measure of Binder et al.¹, which has been explained above. The absolute values of our emotion ratings also have a correlation of 0.617 with the emotion ratings of Tamir et al.⁵. These cross-language correlations are close to some of the reported correlations between English studies. For example, as reported by Diveica et al.³, their correlation with Binder et al.¹ is 0.76, which is only slightly higher than ours. Therefore, although it should be noted that language and cultural differences should have impacts on semantic ratings, there is still high cross-language validity of our ratings to support computational estimations on English words.

Validity of the computational extension dataset

We conducted two validation analyses to test the validity of the estimated ratings for the extensional vocabulary. The first analysis aimed to examine the internal validity of the outputs of our computational predictive models. To this end, we calculated the average correlation between the predicted and actual ratings for Chinese words across the 10-fold cross-validation training. Specifically, the 10-fold cross-validation training first split all dataset into 10 folds, then used 9 folds to train the model and the predicted semantic ratings for words in the left one fold, then calculated the correlation of the predicted semantic ratings with the actual ratings to evaluate the model. The results are shown in Table 5. For both word embedding methods, the models performed well for all semantic dimensions, indicating good internal validity.

Table 5 Results of the cross-validation analysis for the estimated ratings of Chinese words.

Full size table

The second analysis aimed to examine the external validity of the estimated/predicted ratings for the extended Chinese and English words. To this end, we conducted a validation analysis following the same method that we used for validating the subjective ratings. The results are shown in Tables 6–9. For all dimensions, the estimated/predicted ratings of all models showed good external validity, which is very close to that of the subjective ratings (see Table 4).

Table 6 Results of the validation analysis for the estimated ratings of Chinese words by Word2vec.

Full size table

Table 7 Results of the validation analysis for the estimated ratings of Chinese words by MacBERT.

Full size table

Table 8 Results of the validation analysis for the estimated ratings of English words by Word2Vec.

Full size table

Table 9 Results of the validation analysis for the estimated ratings of English words by MacBERT.

Full size table

Note that the semantic ratings in the computational extension dataset are calculated using distributional language models, which may not give ratings as accurate as human annotations. Existing work has proven that word embeddings from Word2vec, BERT and other language models encode rich semantic information. However, it is unclear what exact semantic feature it encodes; that is, word embeddings may not encode all information in the semantic dimensions of vision, motor, socialness, emotion, time, and space. Moreover, word embeddings are calculated by counting word cooccurrence in a large corpus, which is different from how humans learn and understand word meaning. Therefore, the computational extension dataset may show different patterns than human annotated data. Furthermore, the mismatch between English and Chinese semantic spaces is a potential limitation of our method because we projected English words from the English semantic space to the Chinese semantic space to accomplish our estimation.

Code availability

The codes for calculating and evaluating computational extension scores are available in the subfolder named “Code” under the folder named “Supplimentary_Data” at https://doi.org/10.17605/OSF.IO/N5VKE¹⁶. Specifically, to estimate the semantic ratings of Chinese words, we first used “train_decode.py” to learn a mapping function from the Chinese embedding space to semantic ratings based on the 17,940 subjective ratings with their corresponding word embeddings. We then utilized “predict.py” to generate the semantic ratings of all extensional Chinese words.

To estimate the semantic ratings for English words, we need to align the mapping relations between the English and Chinese embedding spaces beforehand. To achieve that, we first used “extract_en.py” and “extract_zh.py” to extract word representations that are in the Chinese-English bilingual lexicon and then used “match.py” and “train_align.py” to learn the mapping function from English to Chinese word representations. Finally, based on the two mapping functions, including English-Chinese mapping and Chinese to semantic ratings, we used “predict.py” to project English embedding peace to that of Chinese to generate the semantic ratings of all extensional English words.

To validate the above results, we used “corr_binder.py” and “corr_binder_cn.py” to compute correlations between the extensional ratings and corresponding scores in Binder et al.¹.

Change history

22 August 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41597-023-02479-3

References

Binder, J. R. et al. Toward a brain-based componential semantic representation. Cognitive neuropsychology 33, 130–174 (2016).
Article PubMed Google Scholar
Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016).
Article ADS PubMed PubMed Central Google Scholar
Diveica, V., Pexman, P. M. & Binney, R. J. Quantifying social semantics: An inclusive definition of socialness and ratings for 8388 english words. Behavior Research Methods 1–13 (2022).
Hoffman, P. & Ralph, M. A. L. Shapes, scents and sounds: quantifying the full multi-sensory basis of conceptual knowledge. Neuropsychologia 51, 14–25 (2013).
Article PubMed Google Scholar
Tamir, D. I., Thornton, M. A., Contreras, J. M. & Mitchell, J. P. Neural evidence that three dimensions organize mental state representation: Rationality, social impact, and valence. Proceedings of the National Academy of Sciences 113, 194–199 (2016).
Article ADS CAS Google Scholar
Troche, J., Crutch, S. & Reilly, J. Clustering, hierarchical organization, and the topography of abstract and concrete nouns. Frontiers in psychology 5, 360 (2014).
Article PubMed PubMed Central Google Scholar
Nelson, D. L., McEvoy, C. L. & Schreiber, T. A. The university of south florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers 36, 402–407 (2004).
Article Google Scholar
Cree, G. S. & McRae, K. Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). Journal of experimental psychology: general 132, 163 (2003).
Article PubMed Google Scholar
Deng, Y. et al. A chinese conceptual semantic feature dataset (ccfd). Behavior Research Methods 53, 1697–1709 (2021).
Article PubMed Google Scholar
Zhang, G., Xu, Y., Zhang, M., Wang, S. & Lin, N. The brain network in support of social semantic accumulation. Social cognitive and affective neuroscience 16, 393–405 (2021).
Article PubMed PubMed Central Google Scholar
Wang, S., Zhang, J., Lin, N. & Zong, C. Investigating inner properties of multimodal representation and semantic compositionality with brain-based componential semantics. Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018).
Sun, J., Wang, S., Zhang, J. & Zong, C. Towards sentence-level brain decoding with distributed representations. Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019).
Wang, S., Zhang, J., Lin, N. & Zong, C. Probing brain activation patterns by dissociating semantics and syntax in sentences. Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020).
Wang, S., Zhang, J., Wang, H., Lin, N. & Zong, C. Fine-grained neural decoding with distributed word representations. Information Sciences 507, 256–272 (2020).
Article MathSciNet MATH Google Scholar
Sun, J., Wang, S., Zhang, J. & Zong, C. Neural encoding and decoding with distributed sentence representations. IEEE Transactions on Neural Networks and Learning Systems 32, 589–603 (2020).
Article Google Scholar
Wang, S. et al. The six semantic dimension dataset: A large dataset of semantic ratings and its computational extension. Open Science Framework https://doi.org/10.17605/OSF.IO/N5VKE (2022).
Kemmerer, D., Castillo, J. G., Talavage, T., Patterson, S. & Wiley, C. Neuroanatomical distribution of five semantic components of verbs: Evidence from fmri. Brain and language 107, 16–43 (2008).
Article PubMed Google Scholar
Lin, N., Guo, Q., Han, Z. & Bi, Y. Motor knowledge is one dimension for concept organization: Further evidence from a chinese semantic dementia case. Brain and Language 119, 110–118 (2011).
Article PubMed Google Scholar
Mahon, B. Z. & Caramazza, A. Concepts and categories: a cognitive neuropsychological perspective. Annual review of psychology 60, 27 (2009).
Article PubMed PubMed Central Google Scholar
Martin, A. et al. The representation of object concepts in the brain. Annual review of psychology 58, 25 (2007).
Article PubMed Google Scholar
Fernandino, L. et al. Concept representation reflects multimodal abstraction: A framework for embodied semantics. Cerebral cortex 26, 2018–2034 (2016).
Article PubMed Google Scholar
Liu, Y., Shu, H. & Li, P. Word naming and psycholinguistic norms: Chinese. Behavior research methods 39, 192–198 (2007).
Article PubMed Google Scholar
Liu, Y., Hao, M., Li, P. & Shu, H. Timed picture naming norms for mandarin chinese. PLoS One 6, e16505 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Lin, N. et al. Fine subdivisions of the semantic network supporting social and sensory–motor semantic processing. Cerebral Cortex 28, 2699–2710 (2018).
Article PubMed Google Scholar
Lin, N. et al. Premotor cortex activation elicited during word comprehension relies on access of specific action concepts. Journal of cognitive neuroscience 27, 2051–2062 (2015).
Article PubMed Google Scholar
Lin, N., Bi, Y., Zhao, Y., Luo, C. & Li, X. The theory-of-mind network in support of action verb comprehension: evidence from an fmri study. Brain and Language 141, 1–10 (2015).
Article PubMed Google Scholar
Lin, N. et al. Neural correlates of three cognitive processes involved in theory of mind and discourse comprehension. Cognitive, Affective, & Behavioral Neuroscience 18, 273–283 (2018).
Article Google Scholar
Lin, N. et al. Coin, telephone, and handcuffs: Neural correlates of social knowledge of inanimate objects. Neuropsychologia 133, 107187 (2019).
Article PubMed Google Scholar
Lin, N. et al. Dissociating the neural correlates of the sociality and plausibility effects in simple conceptual combination. Brain Structure and Function 225, 995–1008 (2020).
Article PubMed Google Scholar
Zhang, G., Hung, J. & Lin, N. Coexistence of the social semantic effect and non-semantic effect in the default mode network. Brain Structure and Function 1–19 (2022).
Yang, H. & Bi, Y. From words to phrases: neural basis of social event semantic composition. Brain Structure and Function 227, 1683–1695 (2022).
Article PubMed Google Scholar
Vigliocco, G. et al. The neural representation of abstract words: the role of emotion. Cerebral Cortex 24, 1767–1777 (2014).
Article PubMed Google Scholar
Wang, X., Wang, B. & Bi, Y. Close yet independent: Dissociation of social from valence and abstract semantic dimensions in the left anterior temporal lobe. Human brain mapping 40, 4759–4776 (2019).
Article PubMed PubMed Central Google Scholar
Kousta, S.-T., Vigliocco, G., Vinson, D. P. & Andrews, M. & Del Campo, E. The representation of abstract words: why emotion matters. Journal of Experimental Psychology: General 140, 14 (2011).
Article PubMed Google Scholar
Thornton, M. A. & Mitchell, J. P. Theories of person perception predict patterns of neural activity during mentalizing. Cerebral cortex 28, 3505–3520 (2018).
Article PubMed Google Scholar
Kranjec, A., Cardillo, E. R., Schmidt, G. L., Lehet, M. & Chatterjee, A. Deconstructing events: the neural bases for space, time, and causality. Journal of cognitive neuroscience 24, 1–16 (2012).
Article PubMed Google Scholar
Speer, N. K., Reynolds, J. R., Swallow, K. M. & Zacks, J. M. Reading stories activates neural representations of visual and motor experiences. Psychological science 20, 989–999 (2009).
Article PubMed Google Scholar
Zwaan, R. A. & Radvansky, G. A. Situation models in language comprehension and memory. Psychological bulletin 123, 162 (1998).
Article CAS PubMed Google Scholar
Kemmerer, D. The spatial and temporal meanings of english prepositions can be independently impaired. Neuropsychologia 43, 797–806 (2005).
Article PubMed Google Scholar
Jiang, L., Tan, J. & Cheng, R. The contemporary chinese dictionary (6th edition). Beijing: The Commercial Press (2012).
Liu, P.-P., Li, W.-J., Lin, N. & Li, X.-S. Do chinese readers follow the national standard rules for word segmentation during reading? PloS one 8, e55440 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, S., Zhang, X., Zhang, J. & Zong, C. A synchronized multimodal neuroimaging dataset for studying brain language processing. Scientific Data 9, 1–10 (2022).
Article ADS Google Scholar
Wang, S. et al. An fmri dataset for concept representation with semantic feature annotations. Scientific Data 9, 1–9 (2022).
Article ADS MathSciNet Google Scholar
Guérard, K., Lagacé, S. & Brodeur, M. B. Four types of manipulability ratings and naming latencies for a set of 560 photographs of objects. Behavior research methods 47, 443–470 (2015).
Article PubMed Google Scholar
Heard, A., Madan, C. R., Protzner, A. B. & Pexman, P. M. Getting a grip on sensorimotor effects in lexical–semantic processing. Behavior research methods 51, 1–13 (2019).
Article PubMed Google Scholar
Mahon, B. Z. et al. Action-related properties shape object representations in the ventral stream. Neuron 55, 507–520 (2007).
Article CAS PubMed PubMed Central Google Scholar
Cui, Y. et al. Revisiting pre-trained models for chinese natural language processing. In Findings of the Association for Computational Linguistics: EMNLP 2020, 657–668 (2020).
Chersoni, E., Santus, E., Huang, C.-R. & Lenci, A. Decoding word embeddings with brain-based semantic features. Computational Linguistics 47, 663–698 (2021).
Article Google Scholar
Turton, J., Smith, R. E. & Vinson, D. Deriving contextualised semantic features from bert (and other transformer model) embeddings. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), 248–262 (2021).
McGraw, K. O. & Wong, S. P. Forming inferences about some intraclass correlation coefficients. Psychological methods 1, 30 (1996).
Article Google Scholar
Shrout, P. E. & Fleiss, J. L. Intraclass correlations: uses in assessing rater reliability. Psychological bulletin 86, 420 (1979).
Article CAS PubMed Google Scholar
Su, Y., Li, Y. & Li, H. Imageability ratings for 10,426 chinese two-character words and their contribution to lexical processing. Current Psychology 1–12 (2022).
Brodeur, M. B., Dionne-Dostie, E., Montreuil, T. & Lepage, M. The bank of standardized stimuli (boss), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research. PloS one 5, e10773 (2010).
Article ADS PubMed PubMed Central Google Scholar
Magnié, M., Besson, M., Poncet, M. & Dolisi, C. The snodgrass and vanderwart set revisited: Norms for object manipulability and for pictorial ambiguity of objects, chimeric objects, and nonobjects. Journal of clinical and experimental neuropsychology 25, 521–560 (2003).
Article PubMed Google Scholar
Xu, X., Li, J. & Chen, H. Valence and arousal ratings for 11,310 simplified chinese words. Behavior Research Methods 54, 26–41 (2022).
Article PubMed Google Scholar
Kron, A., Pilkiw, M., Banaei, J., Goldstein, A. & Anderson, A. K. Are valence and arousal separable in emotional experience? Emotion 15, 35 (2015).
Article PubMed Google Scholar
Yang, Q., Zhou, S., Gu, R. & Wu, Y. How do different kinds of incidental emotions influence risk decision making? Biological Psychology 154, 107920 (2020).
Article PubMed Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant numbers: 62036001, 31871105, 31871108) and the Scientific Foundation of Institute of Psychology, Chinese Academy of Sciences, No. E2CX3625CX. This work is also supported by Youth Innovation Promotion Association CAS.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China
Shaonan Wang, Yunhao Zhang, Jiajun Zhang & Chengqing Zong
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Shaonan Wang, Yunhao Zhang, Jiajun Zhang & Chengqing Zong
CAS Key Laboratory of Behavioural Sciences, Institute of Psychology, Beijing, China
Weiting Shi, Guangyao Zhang & Nan Lin
Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
Weiting Shi, Guangyao Zhang & Nan Lin

Authors

Shaonan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weiting Shi
View author publications
You can also search for this author in PubMed Google Scholar
Guangyao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Nan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chengqing Zong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Overall design and supervision: S. Wang, N. Lin, J. Zhang, C. Zong; Conceiving and designing the rating experiments: N. Lin, S. Wang, W. Shi, G. Zhang; Rating data collection and inspection: Y. Zhang; Calculating the word embeddings and building the prediction model: Y. Zhang; Technical validation: S. Wang, N. Lin; First draft writing: S. Wang, N. Lin; Reviewing and revising the manuscript: all authors.

Corresponding author

Correspondence to Nan Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, S., Zhang, Y., Shi, W. et al. A large dataset of semantic ratings and its computational extension. Sci Data 10, 106 (2023). https://doi.org/10.1038/s41597-023-01995-6

Download citation

Received: 25 August 2022
Accepted: 31 January 2023
Published: 23 February 2023
DOI: https://doi.org/10.1038/s41597-023-01995-6

This article is cited by

Plant disease prescription recommendation based on electronic medical records and sentence embedding retrieval
- Junqi Ding
- Yan Qiao
- Lingxian Zhang
Plant Methods (2023)
A large-scale fMRI dataset for human action recognition
- Ming Zhou
- Zhengxin Gong
- Zonglei Zhen
Scientific Data (2023)

Subjects

Abstract

Similar content being viewed by others

The Three Terms Task - an open benchmark to compare human and artificial semantic representations

Semantic projection recovers rich human knowledge of multiple object features from word embeddings

Structural differences in the semantic networks of younger and older adults

Background & Summary

Methods

Subjective rating dataset

Participants

Stimuli

Procedure of the rating experiments

Data analysis

Computational extension dataset

Chinese

English

Data Records

Main Data

Supplementary data

Information of participants

Instructions for the rating experiments

Raw rating data

Validation data for the subjective ratings

Validation data for the computational extension ratings

Word embeddings

Code for calculating and validating the estimated ratings

Technical Validation

Reliability of the subjective rating dataset

Validity of the subjective rating dataset

Validity of the computational extension dataset

Code availability

Change history

22 August 2023

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Plant disease prescription recommendation based on electronic medical records and sentence embedding retrieval

A large-scale fMRI dataset for human action recognition

Search

Quick links