Clinical standard terms and ontologies are necessary for computing systems to promote unhindered communication, improve workflows, and build applications for quality control, clinical decision support, and clinical trials1,2,3. Natural language allows for a widely varied expressiveness but at the same time may be ambiguous, and jargon and acronyms are used in medical settings4. Non-standardized reporting of image observations in patients are a cause of risks in diagnoses with inconsistent communication and inaccuracies in risk estimation by medical experts5. Radiological technology is a field covering principles of radiation/magnetic field/ultrasound, engineering for radiation therapy, and imaging technology in addition to medical knowledge. This gives radiological technology a substantial input from engineering among medical fields, and a variety of specialists are involved in this area. Due to the above, the role of terminology in radiological technology is important.

The terminology and ontology are dealing with all the terms in a specific subject field or field of activity by describing the interrelationships between terms. So far, there are a lot of ontologies are created in the medical field, such as the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), the International Classification of Diseases (ICD), the Logical Observation Identifiers, Names, and Codes (LOINC), and RadLex. With SNOMED CT there is a comprehensive concept system for healthcare and it is becoming adopted as a standard terminology for electronic health records3: ICD is a health statistics coding tool for classification of human diseases, syndromes, and conditions. With LOINC the aim is to provide a means for uniquely identifying the information elements in electronic health records6. The first release of LOINC in May 1995 contained only terms for laboratory testing, and LOINC has grown significantly in other fields, including radiology, standardized survey instruments and patient-reported outcome measures, clinical documents, nursing management data, and nursing assessments7. RadLex, which is published by the Radiological Society of North America, aims to provide a comprehensive resource for image-related terms, spanning areas such as imaging technologies, image findings, anatomy, and pathology8.

Together with these, terminologies in radiological technology has been developed. The International Organization for Standardization (ISO) says that “health terminology is complex and multifaceted, more so than most other language fields,” and “it has been estimated that between 500,000 and 45 million different concepts are needed to adequately describe concepts9.” Japanese terminology regarding radiological technology published by the Japanese Society of Radiological Technology has less than 10,000 entries. It is necessary to add terms since there may not be enough terms and to maintain and update the body of recommended terminology continuously.

Developing and maintaining terminologies are difficult work10, here particularly as the language of medicine is in a constant flux11. Therefore, computational support plays an important role in providing an efficient and accurate terminology expansion. The goal of this study is to edit and maintain the terminology of radiological technology automatically and efficiently. One of the important tasks is to identify synonyms automatically. The same concept may be described in different expressions in the text data. If this happens, a computer will determine that the word is different, and accurate processing will not be possible.

One useful method for automatic synonym detection is the distributed representation such as with Word2vec and fastText, and this is widely applied in the medical field12. It expresses words in concepts using hundreds of vectors, and each concept is represented by the activity patterns of multiple vectors, and each vector is also referred to by multiple concepts.

The challenge in the study here is also to apply distributed representation methods to Japanese technical terms, but these methods were originally adopted to the latin alphabet. The Japanese language uses three sets of characters that each have specific grammatical uses—kanji, hiragana, and katakana. Kanji are ideograms which each has their own meanings and correspond to a word. Hiragana and katakana are native sets of syllable characters developed from kanji characters and are used only in Japanese. A Japanese word can be represented in a mixture of kanji, hiragana, and katakana, and it is sometimes shortened. Katakana is also frequently used in transliterations13. Moreover, alphabetic characters also found in medical documents as technical phrases and acronyms. For example, “MRI” can be represented in Kanji as "磁気共鳴画像(kakujikikyoumeigazou)" or as a combination of Kanji and abbreviation like “MR画像(MR-gazou).” The term "X-ray" can be translated in Japanese as “X線(x-sen),” which only incorporates the word "ray," as "エックス線(x-sen)" using a transliteration of katakana and kanji, or as "レントゲン線(Rentogen-sen)," named after its discoverer. Given this, synonymous terms in Japanese can differ based on script selection, loanword integration, and context. As a result, expressions in Japanese are intrinsically more intricate than those in Latinate languages. While research on synonym identification using distributed representations in Japanese exists14,15,16,17,18, the nuances of complex expression patterns remain underexplored. It is, therefore, crucial to discern the advantages and limitations of distributed representations for managing terminology in radiological technology.

The purpose of this study was to evaluate the accuracy of automatic synonym identification in expression patterns using distributed representations in the field of radiological technology.

Methods

The methodology of this study encompassed several stages: data collection and preprocessing, the development of word embedding models, the formation of synonym sets, prediction of synonyms, and finally, evaluation, as illustrated in Fig. 1.

Figure 1
figure 1

Study flow.

Data collection and preprocessing

We collected data from 337,479 abstracts (206 MB), published from 1980 to the data collection date (June 20, 2019), from the Ichushi-Web, a Japanese database19. Ichushi-Web archives approximately 400,000 periodicals published in Japan annually. This comprehensive collection includes journals from academic societies, medical publishers, university bulletins, and spans across disciplines such as medicine, dentistry, pharmacy, nursing, and their related fields. To date, the database boasts more than 15 million periodicals. When collecting data, we used “diagnosis imaging” as a keyword and descriptors which are synonyms of the keyword (Table1).

Table 1 List of descriptions.

Our preprocessing phase involved several steps. First, we extracted the main text in the abstracts, and inserted spaces ahead of and behind the following symbols, such as brackets, colons, and semicolons. Next, we edited English technical terms consisting more of than 2 words. Specifically, we inserted underbars before, after, and between words appearing in RadLex 4.020 because a word is recognized by the space between words in distributed representations. Finally, we inserted spaces between words using MeCab21, which is a morphological analyzer, and the Mecab-ipadic-Neologd dictionary22 and the terminology for radiological technology23,24. Figure 2 shows an example of the preprocessing.

Figure 2
figure 2

Example of data preprocessing (□: space).

Creation of word embedding models

We used two methods: Word2vec25,26 and fastText27. Word2vec uses the method of learning the context in which a particular word appears using a neural network. There are two architectures to produce a distributed representation of words: CBOW and skip-gram. CBOW is a method of predicting the current word from the surrounding context words; skip-gram is a method of learning the sequence of neighboring words based on the word and its pattern of appearance. fastText is an improved method with n-gram to solve the problem that Word2vec ignores sub-word information and out-of-vocabulary (OOV) words.

The parameters of Word2vec and fastText are shown in Table 2 and were were used to create distributed representations. The Gensim package for Word2vec28, and the library of fastText29 were used to create training vectors. All experiments for the training models were run on a computer with the Ubuntu 18.04 operating system, Intel Core i7-9700K and 64 GB RAM, with the programming language Python 3.8.3.

Table 2 Parameters in Word2vec and fastText.

Synonym set creation

Two radiological technologists collaborated in the development of a synonym set for the evaluation dataset, utilizing terms pertinent to radiology and radiologic technology. Each set in this collection included a Japanese term along with its associated synonyms, which comprised either the corresponding English term and its acronym or the English term alone. This process yielded 1029 sets that became the foundation for this study. To conclude the process, we categorized each set based on fields relevant to the terms. The designated fields include image engineering, physical phenomena, radiation management, equipment, informatics, diagnostic imaging, radiation therapy, and basic medicine (Table 3). For the evaluation of synonym identifications considering expression patterns, we categorized this into seven patterns based on the features of the expression of synonyms, as follows: transliteration variants, different Japanese spellings with the same meaning, Japanese shortened forms, conversion to transliteration, English words, English acronyms, and plural expressions (Table 4).

Table 3 Eight fields in radiological technology and their descriptions.
Table 4 Expression patterns in synonym sets.

Synonym prediction

Subsequently, preferred terms from the synonym sets were input into the distributed representations. The cosine similarities were then calculated, and the top 10 synonym candidates with the highest cosine similarities to the input term were obtained. The formulation of the cosine similarity was presented as follows,

$${\text{cosine}} \, \mathrm{ similarity}= \frac{{\text{A}}\cdot {\text{B}}}{\Vert {\text{A}}\Vert \Vert {\text{B}}\Vert }$$
(1)

where A is a vector of an input word and B is a vector of a word in a distributed representation.

Evaluation

The ten synonym candidates procured were juxtaposed against the synonyms delineated in the set. If the orthography of a synonym aligned with that of a candidate, said candidate was adjudged correct. Metrics such as precision, recall, F1-measure, and accuracy were subsequently computed for each word embedding model as follows.

$${\text{Precision}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FP}}} \right)$$
(2)
$${\text{Recall}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FN}}} \right)$$
(3)
$${\text{F1}} - {\text{score}} = {2} \times {\text{Precision}} \times {\text{Recall}}/\left( {{\text{Precision}} + {\text{Recall}}} \right)$$
(4)
$${\text{Accuracy}} = \left( {{\text{TP}} + {\text{TN}}} \right)/\left( {{\text{TP}} + {\text{FP}} + {\text{FN}} + {\text{TN}}} \right)$$
(5)

Where TP stands for true positive and refers to situations wherein a word is correctly identified as a synonym. FP denotes false positive and indicates instances wherein a word is erroneously identified as a synonym when, in fact, it is not. FN signifies false negative and pertains to situations in which a word is inaccurately identified as not being a synonym by the models. TN refers to true negative and represents cases wherein a word is correctly identified as not being a synonym. However, TN was set to 0 in this study due to the stipulation that only synonymous candidate words would be output in this task.

For each model, cumulative accuracies across various ranks were derived by consolidating individual prediction accuracies within the model using the formula in Eq. (5). Moreover, to gauge the accuracy of synonym identification contingent upon expression patterns, we computed the metrics of precision, recall, F1-score, and accuracy over eight specific fields in radiological technology and different expression patterns in the synonym sets. In the analysis of English terms and abbreviations, the output was also assessed based on words consisting solely of alphabetical characters.

Ethical approval

This article does not include any studies with human participants or animals that were performed by any of the authors.

Results

Comparison among models

In comparing the evaluation indices of synonym predictions across each word embedding model, the model that achieved the highest precision, recall, F1-score and accuracy was fastText with CBOW with 300 vector dimensions registering scores of 0.5567, 0.8872, 0.6841, 0.5199, respectively (Table 5). In all evaluation indices, the observed values indicated that accuracies in fastText consistently outperformed those of Word2vec. The observed trends for CBOW were superior to those of skip-gram. However, in fastText with CBOW, the 100-dimensional representation yielded the lowest performance. There was a general trend of improved performance with increasing dimensions in both fastText with skip-gram and Word2vec with CBOW. Conversely, in Word2vec with skip-gram, performance declined as dimensionality decreased.

Table 5 Cumulative ratio of synonym prediction in each word embedding model. (FT: fastText, W2V:Word2vec, SKIP:skip-gram, number: vector dimensions).

Figure 3 illustrates the cumulative probabilities observed in each model. In the context of the Word2vec with the CBOW model, the average variation in cumulative ratios for the foremost 1–10 words was observed to be approximately 14.8% to 19.5%. In contrast, the fastText with CBOW model exhibited a range of about 24.1% to 29.5%, and the fastText with skip-gram model displayed a span of 19.0% to 28.2%. "It is noteworthy that in the fastText model, there was a marked enhancement in cumulative accuracy when incorporating terms that appeared in lower ranks. Table 6 shows examples of synonym candidates (Top-10) using the most accurate model in each architecture.

Figure 3
figure 3

Evaluation of each model (top left: Precision, top right: Recall, bottom left: F1-score, bottom right: Accuracy).

Table 6 Examples of the Top-10 synonym candidates in the models with the greatest accuracy.

Analysis for the synonym expression patterns in Japanese

Table 7 presents the results for synonym expression patterns in Japanese. In the “different Japanese spellings with the same meaning,” almost all fastText models demonstrated superior performance over word2vec across all indices. The fastText model employing CBOW at 400 dimensions reported precision, recall, F1-score, and accuracy values of 0.5365, 0.6649, 0.5938, and 0.4222, respectively.

Table 7 Results for synonym expression patterns in Japanese.(FT: fastText, W2V:Word2vec, SKIP:skip-gram, number: vector dimensions).

In the “Japanese shortened forms,” the indices for fastText notably surpassed those of word2vec. The optimal performance was observed in fastText with CBOW at 900 dimensions, yielding precision, recall, F1-score, and accuracy values of 0.9038, 0.8952, 0.9000, and 0.8174, respectively. Furthermore, there was a discernible enhancement in performance as the dimension number increased.

In “conversion to transliteration,” word2vec models with CBOW outshined other models in all indices. Specifically, the model at 400 dimensions achieved a precision of 0.328, a recall of 0.625, an F1-score of 0.430, and an accuracy of 0.274.

In “transliteration variants,” fastText models consistently excelled over those of word2vec. The skip-gram approach at both 200 and 400 dimensions exhibited standout performance, registering precision, recall, F1-score, and accuracy values of 0.833, 0.935, 0.882, and 0.789, respectively.

In the category of “plural expressions,” fastText models significantly outperformed Word2vec models, particularly excelling in precision, recall, and overall accuracy. The fastText model employing a 300-dimensional CBOW architecture emerged as the most effective. Figure 4 presents the distribution of the most prominent synonym expression patterns for plural expressions, as analyzed across various models. For categories such as “transliteration variants,” “different Japanese spellings with the same meaning,” and “Japanese shortened forms,” optimal performance was achieved within the 200 to 400 dimensional range using fastText with CBOW. In contrast, other notation patterns showed a tendency towards Word2vec, with “conversion to transliteration” peaking with a 500-dimensional CBOW model.

Figure 4
figure 4

Frequency of synonym expression patterns detected in multiple expressions (FT: fastText, W2V:Word2vec, SKIP:skip-gram, number: vector dimensions).

Analysis for the synonym expression patterns in English terms and abbreviations

Table 8 shows the results for English terms and abbreviations. In “Japanese and English terms,” all indices were poor for all models. The best F1-score and accuracy were 0.1836 and 0.1011 for word2vec with CBOW and 800 dimensions. In Output alphabetic-only words, the values of all indicators improved, and the improvement was particularly pronounced in fastText. The best model was fastText with skipgram at 800 and 900 dimensions, with precision, recall, F1-score and accuracy of 0.3253, 0.3718, 0.3470 and 0.2099, respectively. The rate of increase ranged from 0.15 to 0.25 for all indices.

Table 8 Results for synonym expression patterns in English terms and acronyms.(FT: fastText, W2V:Word2vec, SKIP:skip-gram, number: vector dimensions).

In "Japanese words and English acronyms," fastText with skipgram performed well on all indicators. In particular, the 400-dimensional model showed the best values for precision, F1-score, and accuracy at 0.4643, 0.6341, and 0.4643, respectively, For fastText with CBOW at 400 dimensions, these indices increased by about 0.25, while for the other fastText with skipgram, they increased by about 0.11 to 0.13.

Analysis for eight fields in radiological technology

In the evaluative indices, the fields of “Image Engineering”, “Physical Phenomena”, “Equipment”, “Radiation Therapy”, “Medicine”, and “Imaging Diagnosis” manifested optimal values when processed with fastText with CBOW. In these fields, the optimal vector dimensionality was consistently below 500, gravitating towards approximately 300 dimensions. Specifically, within the realm of “Imaging Diagnosis”, it emerged as the most superior among all categories. The recorded values for precision, recall, F1-score, and accuracy were 0.7137, 0.9586, 0.8182, and 0.6923, respectively. The "Radiation Control" showed optimal performance when processed in fastText using the skip-gram approach at 600 dimensions. The "Informatics" showed the best values in all evaluation metrics when subjected to Word2vec with CBOW at 800 dimensions. In the investigation of Japanese notation patterns, it was observed that plural expressions exceeded 30% across all domains apart from Radiation Therapy. In the fields of Equipment, Informatics, and Radiation Therapy, the frequency of “conversion to transliteration” was notably higher, accounting for approximately 20% or more in comparison to other categories. “Japanese shortened forms” were prevalent, constituting over 20% within the domains of Medicine and Imaging Diagnosis. Moreover, “Different Japanese spellings with the same meaning” appeared most frequently in the areas of Physical Phenomena, Radiation Control, and Physical Phenomena again, exceeding 40% (Table 9).

Table 9 The percentage of each Japanese expression pattern in the eight fields.

Discussion

Comparison between Word2vec and fastText

Across all indices, fastText consistently outperformed Word2vec. Notably, fastText employing the CBOW architecture peaked in performance at 300 dimensions. Furthermore, scores remained relatively stable across various vector dimensions within fastText with CBOW. The disparity between the maximum and minimum values was a mere 0.03, translating to a difference of about 30 words attributable to variations in the number of vector dimensions. In the context of Word2vec, models crafted using the skip-gram approach outpaced those based on CBOW, with the exception of the 100-dimensional representation. Given the outcomes, it is evident that the most effective architecture for synonym extraction within the domain of radiation technology is the fastText model utilizing the CBOW approach, with vector dimensions ranging between 300 and 400. However, considering the tendency for synonyms to appear in lower ranks, effective automation would require extracting terms from multiple ranks and implementing a robust filtering mechanism to refine the synonym selection.

Analysis for the synonym expression patterns

In the evaluation of seven expression patterns, fastText outperformed word2vec in four categories: “transliteration variants,” “different Japanese forms with the same pronunciation and meaning,” “Japanese shortened forms” and “plural expressions.” A common feature of these four categories of synonym sets was the few differences of the number of characters in words in sets. Expressed differently, synonyms in these categories had a common n-gram. Figure 5 shows the distribution of word vectors by t-distributed stochastic neighbor embedding (t-SNE) which is an unsupervised dimension reduction technique30. Here we will focus on the sets of “Japanese shortened forms”. In fastText, terms with the same n-gram tend to be located closer. Clusters of words including those with the same n-gram tends to spread widely and overlap in Word2vec. This result also suggests that fastText is advantageous in detecting synonyms with the same n-gram in high rankings.

Figure 5
figure 5

t-SNE map of terms in synonym sets. The left panel is for Word2vec with CBOW and 800 vector dimensions, and the right panel is for fastText with CBOW and 400 dimensions. Words are sets in “shortened Japanese.” Green words include “irradiation(照射)” or “irradiation method(照射法).” Blue words show “contrast (造影)” or “contrast method (造影法).” Red words show “-graphy (撮影)” or “-graphy method (撮影法).”

The optimal architecture (CBOW and skip-gram) and the number of vectors in these four categories differed depending on the categories of the synonym sets. For the architecture, skip-gram was adequate only for "different Japanese forms with the same pronunciation and meaning". In fastText and CBOW the tendency is to use words containing a common n-gram as synonyms, and skip-gram tended to extract synonyms pairs that did not have a common n-gram. The more sets that include a common n-gram, the more CBOW would be advantageous, and if not, it may be equivalent to skip-gram or skip-gram may be advantageous. For vector dimensions, the ratio was improved or did not change significantly when the dimension number was larger. As mentioned in previous studies16,28, it has been reported that the accuracy improves as the vector dimensions increases. However, the optimum vector size may change depending on the characteristics of the synonym sets. This matter is also a subject for future investigations.

In “conversion to transliteration”, “Japanese words and English acronyms” and “Japanese and English words”, four indices in word2vec were equivalent or better when compared to the models by fastText, depending on the number of vector dimensions. However, the accuracy was the highest, with 50% in “Japanese words and English acronyms” and less than 30% in the others, which was inferior to that in the above four categories. Synonym sets in these categories had few or no common character strings, and it is difficult for fastText to perform well, making it necessary to consider ways to improve the accuracy of Word2vec.

It has been observed that when generating outputs involving English words and abbreviations, Japanese words tend not to rank at the top. This phenomenon is likely attributable to the models being predominantly trained on a Japanese corpus, thereby limiting their exposure to sufficient English expressions. Notably, the accuracy for both English words and abbreviations improved significantly when the models were constrained to output only alphabet characters. This finding suggests that specifying character sets can be a beneficial strategy when aiming to generate outputs in a language different from that of the training corpus.

Synonym expression patterns

In the domains of “Image Engineering,” “Physical Phenomena,” “Equipment,” “Radiation Therapy,” “Medicine,” and “Imaging Diagnosis,” optimal outcomes were observed when employing FastText with the CBOW model. It was noted that in these fields, synonyms commonly included words with shared n-grams. Conversely, in the areas of Radiation Control and Imaging Diagnosis, the fastText model utilizing Skip-grams was favored. This preference could be attributed to the model”s enhanced capability in detecting words that posed challenges for the fastText model with CBOW, thereby potentially contributing to the improved performance in evaluation indices. In the field of “Informatics,” a considerable number of synonyms were categorized under “different Japanese writings of the same meaning” and “Japanese word and Japanese transliteration.” These categories were distinguished by a markedly lower frequency of shared n-grams. In scenarios characterized by a reduced prevalence of common n-grams among synonyms, the Word2vec model might exhibit a comparative advantage over fastText.

Comparison with previous studies

When contrasting the skip-gram and CBOW architectures within Word2vec, several studies have indicated a superior performance by CBOW in tasks like similar word detection and text classification31,32. In the context of fastText, existing research has posited that skip-gram surpasses CBOW, especially in sentiment analysis-based classifications33,34. The outcomes from our research align with these findings for Word2vec. In our study, however, the difference in the cumulative ratio between the most accurate model (CBOW) and skip-gram was marginal, at approximately 1.9%, as detailed in Table 4. This marginal difference could hint at an inherent advantage for skip-gram depending on the nature of the task at hand.

Regarding the dimensionality of word embeddings, various studies have explored the relationship between accuracy and vector dimensions. For instance, Milolov et al. described that the accuracy increased as vector size increased26. Concurrently, Pennington et al. highlighted a peak in accuracy around the 300-dimensional mark35. Our results resonate with these observations, underscoring the general trend seen in word embedding research. However, it’s worth noting that these prior studies didn’t exclusively target medical terminology and differed in their specific tasks compared to our research.

Limitations

The synonym set employed in this study was curated by two experts drawing from a glossary provided by academic societies. However, this approach is not impervious to potential omissions. Additionally, as part of the preprocessing, spaces were introduced between words in Japanese text using the morphological analysis tool. A significant challenge in morphological analysis is the handling of out-of-vocabulary (OOV) words. When these OOV words, not covered in the dictionary, are subjected to analysis, there is a risk of incorrect segmentation. This becomes especially challenging when the OOV (Out of Vocabulary) word is a technical term, as it may hinder the successful identification of synonymous candidates. Due to the collected text in the learning corpus primarily focusing on “image diagnosis”, there is a possibility that adequate learning has not been achieved for areas with low relevance, such as radiation measurement.

Conclusions

The application of Word2vec and fastText models for automatic synonym detection in the field of radiological technology indicated that the fastText with CBOW at 300 dimensions was the most precise. In the detailed analysis of synonym notation patterns, it was found that fastText with CBOW excelled in cases where synonyms shared common n-grams. Conversely, fastText with skip-gram and Word2vec with CBOW models were more effective in instances where synonyms did not share common n-grams. In the eight fields pertinent to radiological technology, the fastText with CBOW model proved particularly beneficial due to the frequent occurrence of common n-grams. However, in the field of informatics, where English terms, acronyms, and transliterations are commonly employed, the Word2vec model with CBOW architecture showed greater utility.