Introduction

Pediatrics is a field of large research diversity1 reflected by the origin of publications, a large number of topics, and a wide range of article types. In order to maintain your expertize as a pediatrician, you constantly need to keep updated regarding new knowledge.2,3 In 1998, Bergman presented a work analyzing the trends in pediatric publications.1 This work used manual analysis of pediatric research topics. However, the number of pediatric publications increased exponentially over the past decades making it impossible to manually summarize a topic.4

In text-mining, information is extracted from texts using computer algorithms.5 Text-mining can be applied to identify trends and to investigate the dynamics in a research field.4,6,7,8 Examples include anti-epileptic drug research,9 adolescent substance abuse,10 and Diatom research.9 The application of such algorithms has not been frequently used in pediatric research. In this work, we aimed to use text-mining to provide a high-level analysis of trends in pediatric literature over the past two decades.

Methods

Data set

The U.S. National Library of Medicine produces an annual version of MEDLINE/PubMed data. This data set is freely available to download.11 We retrieved all available MEDLINE/PubMed annual data sets until December 31, 2018.

We extracted the following data for each entry: PubMed unique article ID (PMID), title, publishing journal, abstract text, keywords (if any), and authors (including the first author affiliation, if available).

We retrieved the number of times each article was cited. For this purpose, we used a National Center for Biotechnology Information application.11 Data lock and citation retrieval was performed on May 1, 2019.

Data processing

Data processing and result visualization were written on Python (ver. 3.6.5, 64 bits).

For text-mining, punctuations and double spaces were removed. First author country was retrieved from the affiliation data of the first author.

Inclusion criteria

We included articles published in journals categorized as “Pediatrics, Perinatology, and Child Health” based on Scimago ranking for medicine journals.12 The time frame for article inclusion was the past 20 years (1/1/1999–31/12/2018). As the text-mining technique is dependent on the text in the abstract, we included only entries with abstracts >50 words.

Topic modeling

Included studies were categorized into topics using latent Dirichlet allocation (LDA).13 This algorithm is well documented and is commonly utilized for topic modeling. LDA splits the corpus into groups by grouping together documents with similar words. Each group is the most significant set of words used by the algorithm for the differentiation. We set the algorithm to output 200 groups of word sets. Manual allocation of topics to each set of words was performed by a domain expert (S.L.-M.) according to Bergman et al.1 with mild modifications (see Supplement). The final number of topics was 35. Each abstract in the corpus was assigned to one of the 35 topics.

Results

Two hundred and twenty-five journals were categorized as Pediatrics, Perinatology, and Child Health based on Scimago ranking for medicine journals. Out of the total of 29,137,794 entries in PubMed, 610,826 papers were published in 1 of the 225 pediatrics-related journals. Of them, 392,826 had abstracts >50 words. From this corpus, we included papers published between 1999 and 2018, a total of 201,141 papers.

Country of origin

The pediatric articles came from 177 countries and 8 continents. United States had the highest number of publications (n = 69,263) followed by United Kingdom (n = 12,332), Turkey (n = 9614), and Canada (n = 8676). When further analyzing the citations/publications ratio (C/P ratio), United States ranked highest (C/P = 6.2), followed by United Kingdom (C/P = 5.9), Sweden (C/P = 5.8), and Canada (C/P = 5.1) (Fig. 1).

Fig. 1: Citations/publications ratio per country.
figure 1

Citations/publications ratio per country over the past 20 years.

Article type

MEDLINE/PubMed article type was specified for the sub-categories: meta-analysis, clinical trial, randomized control trial, multicenter study, editorial, letter, review, and guidelines. As presented in Fig. 2, the majority of these articles were review papers (56%) and clinical trials (13%) (Fig. 2). Clinical guidelines and meta-analyses had the highest C/P ratio (C/P ratio = 12.2 and C/P ratio = 10.6, respectively; Fig. 3).

Fig. 2: Number of articles per type of article.
figure 2

Total number of articles per type of article, Each article type has a total number of articles over the past 20 years as well as a percentage out of the total.

Fig. 3: Citations/publications ratio per article type.
figure 3

Citations/publications ratio per article type over the past 20 years.

Trends of topics

The trends of the leading ten topics in the past two decades are presented below. When examining the proportion of each topic along the years, we observed that epidemiological papers (15% in 1999–2004 to 26% in 2014–2018) and psychology (5% in 1999–2004 to 9% in 2014–2018) increased, while neurology (6–3%), infectious diseases (5.4–3.3%), and pulmonology (4.9–3.3%) relatively decreased (Fig. 4). Psychology and psychiatry had the highest C/P ratio (6.8) followed by neurology, epidemiology, and endocrinology (Fig. 5).

Fig. 4: Number of publications per topic over the 2 decades.
figure 4

Number of publications per topic every 5 years over the past 2 decades. Each topic has 4 bars each representing a 5 year period.

Fig. 5: Citations/publications ratio per topic.
figure 5

Citations/publications ratio per topic over the last 20 years.

Discussion

This study employed text-mining to provide a high-level view of pediatric research in the past two decades. The number of articles published in the past 20 years has grown, which reflects the growing interest in research in the pediatric world.

United States leads in the number of pediatric publications, which reflects the prominent role of the United States pediatric community in global pediatric research. Although low-income countries have higher morbidity, they have a disproportionally lower number of publications, as was previously reported by Keating et al.14 This is most probably attributed to the lack of resources directed to research and academic centers in those countries.

When analyzing article types, we demonstrated that the most frequently cited were clinical guidelines and meta-analyses, both provide analyses and summary of a large amount of data that can influence the clinical setting.

The topic change of focus to epidemiology papers may be a result of a number of factors. The understanding of the epidemiology of diseases can help in disease prevention and better resource planning to improve quality and longevity in an era in which economy has an important impact on delivering medical treatment. The field of psychology and psychiatry has also grown, supporting the importance of mind in medicine and should be addressed in pediatric research as well.

Text-mining enabled us to summarize the past 20 years of pediatric literature and get a better understanding regarding trends in this field. Text-mining can enable the medical academia to deal with the ever growing number of publications that otherwise would not be possible.

Our research has several limitations. First, this is a comprehensive study that includes 20 years of research. As such, it can only provide a high-level view of pediatric research. Second, it should be noted that papers that were published in the last year of our analysis have not had a long time to be cited, therefore this may influence their C/P ratio. Third, there was some topic overlap, therefore some papers had more than one topic. This may have affected the results. Fourth, LDA used words taken out of abstracts and not full text, that said, abstracts are a reflection of the essence of the paper. Fifth, this study analyzed pediatric journals. Pediatric papers published in non-pediatric journals were not included.

In conclusion, the topics in pediatric literature have shifted in the past two decades, reflecting changing trends in the field. Text-mining enables analysis of trends in publications and can serve as a high-level academic tool emphasizing where there is a need for additional medical education as well as research.