Academic journals are the repositories of mankind’s gradually accumulating knowledge of the surrounding world. Just as knowledge is organized into classes ranging from major disciplines, subjects and fields, to increasingly specific topics, journals can also be categorized into groups using various metric. In addition, they can be ranked according to their overall influence. However, according to recent studies, the impact, prestige and novelty of journals cannot be characterized by a single parameter such as, for example, the impact factor. To increase understanding of journal impact, the knowledge gap we set out to explore in our study is the evaluation of journal relevance using complex multi-dimensional measures. Thus, for the first time, our objective is to organize journals into multiple hierarchies based on citation data. The two approaches we use are designed to address this problem from different perspectives. We use a measure related to the notion of m-reaching centrality and find a network that shows a journal’s level of influence in terms of the direction and efficiency with which information spreads through the network. We find we can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. In this case, in a self-organized way, the journals become branches according to the major scientific fields, where the local structure of the branches reflect the hierarchy within the given field, with usually the most prominent journal (according to other measures) in the field chosen by the algorithm as the local root, and more specialized journals positioned deeper in the branch. This can make the navigation within different scientific fields and sub-fields very simple, and equivalent to navigating in the different branches of the nested hierarchy. We expect this to be particularly helpful, for example, when choosing the most appropriate journal for a given manuscript. According to our results, the two alternative hierarchies show a somewhat different, but also consistent, picture of the intricate relations between scientific journals, and, as such, they also provide a new perspective on how scientific knowledge is organized into networks.
Providing an objective ranking of scientific journals and mapping them into different knowledge domains are complex problems of significant importance, which can be addressed using a number of different approaches. Probably the most widely known quality measure is the impact factor (Garfield, 1955,1999), corresponding to the total number of citations a journal receives in a 2-year period, divided by the number of published papers over the same period. Although it is a rather intuitive quantity, the impact factor has serious limitations (Harter and Nisonger, 1997; Opthof, 1997; Seglen, 1997; Bordons et al., 2002). This consequently led to the introduction of alternative measures such as the H-index for journals (Braun et al., 2006), the g-index (Egghe, 2006), the Eigenfactor (Bergstrom, 2007), the PageRank and the Y-factor (Bollen et al., 2007), the Scimago Journal Rank (The Scimago Journal & Country Rank, 2015), as well as the use of various centralities such as the degree-, closeness- or betweenness centrality in the citation network between the journals (Bollen et al., 2005; Leydesdorff, 2007). Comparing the advantages and disadvantages of the different impact measures and examining their correlation has attracted considerable interest in the literature (Bollen et al., 2009; Franceschet, 2010a, b; Glänzel, 2011; Kaur et al., 2013). However, according to the results of the principal component analysis of 39 quality measures carried out by Bollen et al. (2009), scientific impact is a multi-dimensional construct that cannot be adequately measured by any single indicator. Thus, the development of higher-dimensional quality indicators for scientific journals provides an important objective for current research.
In this study, we consider different possibilities for defining a hierarchy between scientific journals based on their citation network. The advantage of using the network approach for representing the intricate relations between journals is that networks can show substantially different aspects compared with any parametric method representing the journals with points in single- or even in multi-dimensional space. When organized into a hierarchy, the most important and prestigious journals are expected to appear at the top, while lesser known journals are expected to be ranked lower. However, a hierarchy offers a more complex view of the ranking between journals compared with a one-dimensional impact measure. For example, if the branches of the hierarchy are organized according to the different scientific fields, then the journals in a given field can be compared simply by zooming into the corresponding branch in the hierarchy. Possible scenarios for hierarchical relations between scientific journals have already been suggested recently by Iyengar and Balijepally (2015). However, the main objective in this earlier study was to examine the validity of a linear ordering between the journals based on a dominance ranking procedure (Iyengar and Balijepally, 2015). Here, we construct and visualize multiple hierarchies between the journals, offering a far more complex view of the ranking between journals compared with a one-dimensional impact measure.
Hierarchical organization is in general a widespread phenomenon in nature and society. This is supported by several studies, focusing on the transcriptional regulatory network of Escherichia coli (Ma et al., 2004), the dominant–subordinate hierarchy among crayfish (Goessmann et al., 2000), the leader–follower network of pigeon flocks (Nagy et al., 2010), the rhesus macaque kingdoms (Fushing et al., 2011), neural network (Kaiser et al., 2010), technological networks (Pumain, 2006), social interactions (Guimerà et al., 2003; Pollner et al., 2006; Valverde and Solé, 2007), urban planning (Batty and Longley, 1994; Krugman, 1996), ecological systems (Hirata and Ulanowicz, 1985; Wickens and Ulanowicz, 1988) and evolution (Eldredge, 1985; McShea, 2001). However, hierarchy is a polysemous word, and in general, we can distinguish between three different types of hierarchies when describing a complex system: the order, the nested and the flow hierarchy. In the case of order hierarchy, we basically define a ranking, or more precisely a partial ordering, of the set of elements under investigation (Lane, 2006). Nested hierarchy (also called inclusion hierarchy or containment hierarchy) represents the idea of recursively aggregating the items into larger and larger groups, resulting in a structure where higher-level groups consist of smaller and more specific components (Wimberley, 2009). Finally, a flow hierarchy can be depicted as a directed graph, where the nodes are layered in different levels so that the nodes that are influenced by a given node (are connected to it through a directed link) are at lower levels.
Hierarchical organization is an important concept also in network theory (Ravasz et al., 2002; Trusina et al., 2004; Pumain, 2006; Clauset et al., 2008; Corominas-Murtra et al., 2011; Mones et al., 2012; Corominas-Murtra et al., 2013). The network approach has become a ubiquitous tool for analysing complex systems—from the interactions within cells, transportation systems, the Internet and other technological networks, through to economic networks, collaboration networks and society (Albert and Barabási, 2002; Mendes and Dorogovtsev, 2003). Grasping the signs of hierarchy in networks is a non-trivial task with a number of possible different approaches, including the statistical inference of an underlying hierarchy based on the observed network structure (Clauset et al., 2008), and the introduction of various hierarchy measures (Trusina et al., 2004; Mones et al., 2012; Corominas-Murtra et al., 2013). What makes the analysis of hierarchy even more complex is that it may also be context dependent. According to a recent study on homing pigeons, the hierarchical pattern of in-flight leadership does not build upon the stable, hierarchical social dominance structure (pecking order) evident among the same birds (Nagy et al., 2013).
In this study, we show that in a somewhat similar fashion, scientific journals can also be organized into multiple hierarchies with different types. Our studies rely on the citation network between scientific papers obtained from Web of Science (ISI Web of Knowledge, 2012). On the one hand, the flow hierarchy analysis of this network based on the m-reaching centrality (Borgatti, 2003; Mones et al., 2012) reveals the structure relevant from the point of view of knowledge spreading and influence. On the other hand, the alternative hierarchy obtained from the same network with the help of an automated tag hierarchy extraction method (Tibély et al., 2013) highlights a nested structure with the most interdisciplinary journals at the top and the very specialized journals at the bottom of the hierarchy.
Scientific publication data
The dataset on which our studies rely consists of all the available publications in Web of Science (ISI Web of Knowledge, 2012) between 1975 and 2011. The downloading scripts we used are available in WOS publication data downloading scripts (2012), and the Harvard Dataverse repository (Palla et al, 2015). To take into account as wide a list of papers as possible, we did not apply any specific filtering. Thus, conference proceedings and technical papers also appear in the dataset used. However, since the network we study builds upon citation between papers (or journals), the conference proceedings, technical papers (or even journals) with no incoming citation fall out of the flow hierarchy analysis automatically. (Nevertheless, in the event that they have outgoing citations, this is included in the evaluation of the m-reaching centrality of other journals.) Furthermore, even when cited, a conference proceedings does not have a real chance of getting high in any of the hierarchies considered here, due to their very limited number of publications compared with journals. Although highly cited individual conference proceedings publications may appear, they cannot boost the overall citation of the proceedings to the level of journals (for example, whenever a scientific breakthrough is published in a conference proceedings first, it is usually also published in a more prestigious journal soon afterwards, which eventually drives the citations to the journal instead of the proceedings). For these reasons, the conference proceedings are ranked at the bottom of the hierarchies we obtained.
We used the 11 character-long abbreviated journal issue field in the core data for identifying the journal of a given publication. The advantage of using this field is that it contains only an abbreviated journal name without any volume numbers, issue numbers, years and so on (in contrast, the full journal name in some cases may contain the volume number or the publication year as well, which of course are varying over time). The total number of publications for which the mentioned data field was non-empty reached 35,372,038, and the number of different journals identified based on this data field was 13,202. As mentioned previously, in case of conference proceedings, the appearing 11 character long abbreviated journals issue field was treated the same as in case of journal publications, without any filtering.
Flow hierarchy based on the m-reaching centrality
A recently introduced approach for quantifying the position of a node in a flow hierarchy is based on the m-reaching centrality (Mones et al., 2012). The basic intuition behind this idea is that reaching the rest of the network should be relatively easy for the nodes high in the hierarchy, and more difficult from the nodes at the bottom of the hierarchy. Thus, the position of the node i in the hierarchy is determined by its m-reaching centrality (Borgatti, 2003), Cm(i), corresponding to the fraction of nodes that can be reached from i, following directed paths of at most m steps, (where m is a system dependent parameter). Naturally, a higher Cm(i) value corresponds to a higher position in the hierarchy, and the node with the maximal Cm(i) is chosen as the root. However, this approach does not specify the ancestors or descendants of a given node in the hierarchy; instead it provides only a ranking between the nodes of the underlying network according to Cm(i). Nevertheless, hierarchical levels can be defined in a simple way: after sorting nodes in an ascending order, we can sample and aggregate nodes into levels so that in each level the standard deviation of Cm is lower than a predefined fraction of the standard deviation in the whole network. This method of constructing a flow hierarchy based on the m-reach (and the standard variation of the m-reach) has already been shown to provide meaningful structures for a couple of real systems, including electric circuits, transcriptional regulatory networks, e-mail networks and food webs (Mones et al., 2012).
When applying this approach to the study of the hierarchy between scientific journals, we have to take into account that journals are not directly connected to each other; instead they are linked via a citation network between the individual publications. In principle, we may assume different “journal strategies” for obtaining a large reach in this system: for example, a journal might publish a very high number of papers of poor quality with only a few citations each. Nevertheless, taken together they can still provide a large number of aggregated citations. Another option is to publish a lower number of high-quality papers, obtaining a lot of citations individually. To avoid having a built-in preference for one type of journal over the other, we define a reaching centrality that is not sensitive to such details, and which only depends on the number of papers that can be reached in m steps from publications appearing in a given journal.
First, we note that when calculating the reach of the publications, the citation links have to be followed backwards: that is, if paper i is citing j, then the information presented in j has reached i. Thus, the reaching centralities are evaluated in a network where the links are pointing from a reference article to all papers citing it. The m-reach of a journal , denoted by is naturally given by the number of papers that can be reached in at most m steps from any article appearing in the given journal. Thus, the mathematical definition of is based on the set of m-reachable nodes, given by
where dout(j, i) denotes the out-distance from paper j to i, (that is, the distance of the papers when only consecutive out-links are considered). The set is equivalent to the set of papers outside that can be reached in at most m steps, provided that the starting publication is in . The m-reaching centrality of is simply the size of the m-reachable set, (that is, the number of papers in ). Figure 1 shows an illustration of the calculation of the m-reach of the journals detailed above. We note that a closely related impact measure for judging the influence of research papers based on deeper layers of other papers in the citation network is given by the wake-citation-score (Klosik and Bornholdt, 2014). A comparison study between the m-reach and the wake-citation-score is given in the Supplementary Information S1.
To determine the optimal value of m, we calculated the for all journals in our dataset for a wide range of m values. According to the results detailed in the Supplementary Information S2, at around m=4 the starts to saturate for the top journals. To provide a fair and robust ordering between the journals, here we set m to m=3, corresponding to an optimal setting: on the one hand we still allow multiple steps in the paths contributing to the reach. On the other hand, we also avoid the saturation effect caused by the exponential increase in the reach as a function of the maximal path length and the finite system size. More details on the tuning of m are given in the Supplementary Information S2, and the results obtained for other m values are shown in the Supplementary Information S3.
Before considering the results, we note that an alternative approach for studying the citation between journals is to aggregate all papers in a given journal into a single node, representing the journal itself, in similar fashion to the works by Leydesdorff et al. (2013, 2014). In this case the link weight from journal to journal is given by total number of citations from papers appearing in to papers in . In the Supplementary Information S4, we analyse the flow hierarchy obtained by evaluating the m-reaching centrality in this aggregated network between the journals. However, recent works have pointed out that aggregations of this nature can lead to serious misjudgement of the importance of nodes (Pfitzner et al., 2013; Rosvall et al., 2014). For instance, an interesting memory effect of the citation network between individual papers is that a paper citing mostly biological papers that appear in interdisciplinary journals is still much more likely to be cited back by other biological papers, compared with other disciplines (Rosvall et al., 2014). Such phenomena can have a significant influence on the m-reaching centrality. However, by switching to the aggregated network between journals we wipe out these effects and introduce a distortion in the m-reach. Thus, here we stick to the most detailed representation of the system, given by the citation network between individual papers and leave the analysis of the aggregated network between journals to the Supplementary Information S4. (An illustration of the difference between the m-reach calculated on the level of papers and on the aggregated level of journals is given in Fig. 1.)
The results for the top journals according to the m-reaching centrality at m=3 based on the publication data available from the Web of Science between 1975 and 2011 are given in Fig. 2. The hierarchy levels were defined by allowing a maximal standard deviation of 0.13·σ(Cm) for Cm within a given level, where σ(Cm) denotes the standard deviation of Cm over all journals. (The effect of changes in the within-level standard deviation of Cm on the shape of the hierarchy is discussed in the Supplementary Information S5.) According to our analysis, Science is the most influential journal based on the flow hierarchy, followed by Nature, with PNAS coming third, while Lancet and the New England Journal of Medicine form the fourth level. In general, the top of the hierarchy is strongly dominated by medical, biological and biochemical journals. For instance, the top physics journal, the Physical Review Letters, appears only on the 13th level, and the top chemistry journal, the Journal of the American Chemical Society, is positioned at the 11th level.
For comparison, in Fig. S7 in the Supplementary Information S4, we show the top of the flow hierarchy obtained from the citation network aggregated to the level of journals. Although Science, Nature and PNAS preserve their position as the top three journals, relevant changes can be observed in the hierarchy levels just below, as physical and chemical journals take over the biological and medical journals. For instance, Physical Review Letters is raised from Level 13 to Level 3, while Lancet is pushed back from Level 4 to Level 17. This reorganization is likely to be caused by the “memory” of the citation network described in the work by Rosvall et al. (2014)—the fact that a paper citing mostly biological articles is more likely to be cited by other biological papers, even if it appears in an interdisciplinary journal. Since biology and medicine have the highest publication rate among different scientific fields, the aggregation to the level of journals has the most severe effect on the reach of entities obtaining citations mostly from these fields. Thus, the notable difference between the flow hierarchy obtained from the citation network of individual papers and from the aggregated network between journals is yet another indication of the distortion in centralities caused by link aggregation, pointed out in related, but somewhat different contexts by Rosvall et al. (2014) and by Pfitzner et al. (2013).
Extracting a nested hierarchy
Categorizing items into a nested hierarchy is a general idea that has been around for a long time in, for instance, library classification systems, biological classification and also in the content classification of scientific publications. A very closely related problem is that given by the automatized categorization of free tags appearing in various online content (Heymann and Garcia-Molina, 2006; Schmitz, 2006; Damme et al., 2007; Plangprasopchok et al., 2011; Tibély et al., 2013; Velardi et al., 2013). In recent years, the voluntary tagging of photos, films, books and so on, with free words has become popular on the Internet in blogs, various file-sharing platforms, online stores and news portals. In some cases, these phenomena are referred to as collaborative tagging (Lambiotte and Ausloos, 2006; Cattuto et al., 2007; Cattuto et al., 2009; Floeck et al., 2011), and the resultant large collections of tags are referred to as folksonomies, highlighting their collaborative origin and the “flat” organization of the tags in these systems (Mika, 2005; Lambiotte and Ausloos, 2006; Spyns et al., 2006; Cattuto et al., 2007, 2009; Voss, 2007; Tibély et al., 2012). The natural mathematical representation of tagging systems is given by hypergraphs (Ghosal et al., 2009; Zlatić et al., 2009).
Revealing the hidden hierarchy between tags in a folksonomy or a tagging system in general can significantly help broadening or narrowing the scope of search in the system, give recommendation about yet unvisited objects to the user or help the categorization of newly appearing objects (Juszczyszyn et al., 2010; Lu et al., 2012). Here we apply a generalized version of a recent tag hierarchy extraction method (Tibély et al., 2013) for constructing a nested hierarchy between scientific journals. In its original form, the input of the tag hierarchy extraction algorithm is given by the weighted co-occurrence network between the tags, where the weights correspond to number of shared objects. Based on the z-score of the connected pairs and the centrality of the tags in the co-occurrence network, the hierarchy is built bottom up, as the algorithm eventually assigns one or a few direct ancestors to each tag (except for the root of the hierarchy). The details of the algorithm are described in the “nested hierarchy extraction algorithm” subsection.
To study the nested hierarchy between scientific journals, we simply replace the weighted co-occurrence network between tags by the weighted citation network between journals at the input of the algorithm. Although a tag co-occurrence network and a journal citation network are different, the two most important properties needed for the nested hierarchy analysis are the same in both: general tags and multidisciplinary journals have a significantly larger number of neighbours compared with more specific tags and specialized journals. Furthermore, closely related tags co-appear more often compared with unrelated tags, as journals focusing on the same field cite each other more often compared with journals dealing with independent disciplines. Based on this, the hierarchy obtained from the journal citation network in this approach is expected to be organized according to the scope of the journals, with the most general multidisciplinary journals at the top and the very specialized journals at the bottom.
We note that since in this case we have to determine which journals are the most closely related to each other and which are unrelated, rather than evaluating the overall influence of the journals, we use simply the number of direct citations from one journal to the other as the weight for the connections. This is equivalent of taking the m-reach calculated on the publication level at m=1, sorting according to the source of the citations and then summing up the results for the papers appearing in one given journal. Thus, when constructing the flow hierarchy, we start from the publication level citation network and evaluate the m-reach at m=3, whereas in case of the nested hierarchy we calculate the publication level m-reach at m=1, which technically becomes equivalent to the journal level citation numbers when summed over papers appearing in one given journal.
Nested hierarchy extraction algorithm
Our algorithm corresponds to a generalized version of “Algorithm B” presented in Tibély et al. (2013). The main differences are that here we force the algorithm to produce a directed acyclic graph consisting of a single connected component, and we allow the presence of multiple direct ancestors. In contrast, in its original form “Algorithm B” can provide disconnected components, and each component in the output is corresponding to a directed tree. A further technical improvement we introduce is given by the calculation of the node centralities. Thus, the outline of the method used here is the following: first we carry out “Algorithm B” given in Tibély et al. (2013) with modified centrality evaluation, obtaining a directed tree between the journals. This is followed by a second iteration where we “enrich” the hierarchy by occasionally assigning further direct ancestors to the nodes.
Since “Algorithm B” is presented in full detail in Tibély et al. (2013), here we provide only a brief overview. The input of the algorithm is a weighted directed network between the journals based on the z-score for the citation links. After throwing away unimportant connections by using a weight threshold, the node centralities are evaluated in the remaining network. Here we used a centrality based on random walks on the citation network between journals with occasional teleportation steps, in a similar fashion to PageRank. We adopted the method proposed by Lambiotte and Rosvall (2012), calculating the dominant right eigenvector of the matrix Mij=(1−α)ωij+αsiin, where ωij is the link weight, siin denotes the in strength of journal i (in number of citations) and α is corresponding to the teleportation probability. We have chosen the widely used α=0.15 parameter value, however, the ordering of the journals according to the centralities was quite robust with respect to changes in α.
Based on the centralities a directed tree representing the backbone of the hierarchy is built from bottom up as described in “Algorithm B” in Tibély et al. (2013). In the event that we cannot find a suitable “parent” for node i according to the original rules, we chose the node with the highest accumulated z-score from all journals that have a higher centrality than i (where the accumulation is running over the already found descendants of the given node). This ensures the emergence of a single connected component, since a single direct ancestor is assigned to every node (except for the root of the tree). This is followed by a final iteration over the nodes where we examine whether further “parents” have to be assigned or not. The criteria for accepting a node as the second, third, and so on, direct ancestor of journal i are that it must have a higher centrality compared with i, and also the z-score has to be larger than the z-score between i and its first direct ancestor. Note that the first parent is chosen based on aggregated z-score instead of the simple pairwise z-score, as explained by Tibély et al. (2013).
Nested hierarchy of scientific journals
In Fig. 3, we show the top of the obtained nested hierarchy between the journals, with Nature appearing as the root, while PNAS, Science, New Scientist and Astrophysical Journal form the second level. Several prominent field specific journals such as Physical Review Letters, Brain Research, Ecology and Journal of the American Chemical Society have both Nature and Science as direct ancestors. Interestingly, the Astrophysical Journal is a direct descendant only of Nature, and is not linked under Science or PNAS. Nevertheless, it serves as a local root for a branch of astronomy-related journals, in a similar fashion to Physical Review Letters, which can be regarded as the local root of physics journals, or Journal of the American Chemical Society, corresponding to the local root of chemical journals. The biological, medical and biochemical journals form a rather mingled branch under PNAS, with Journal of Biological Chemistry as the local root and New England Journal of Medicine corresponding to a sub-root for medical journals. However, Cell and New England Journal of Medicine are direct descendants of Nature and Science as well. Interestingly, the brain- and neuroscience-related journals form a rather well-separated branch with Brain Research as the local root, linked directly under PNAS, Science and also under Nature.
Comparing the hierarchies
Although the hierarchies presented in Fig. 2 and Fig. 3 show a great deal of similarity, some interesting differences can also be observed. The figures show the top of the corresponding hierarchies, and seemingly, a significant portion of the journals ranked high in the hierarchy are the same in both cases. However, the root of the hierarchies is different (Science in case of the flow hierarchy and Nature in case of the nested hierarchy), and also the level-by-level comparison of Fig. 2 and Fig. 3 shows that a very high position in the flow hierarchy is not always accompanied by an outstanding position in the nested hierarchy, and vice versa. For example, the Lancet and New England Journal of Medicine appear much higher in Fig. 2 compared with Fig. 3, while Geophysical Research Letters is just below Nature and Science in the nested hierarchy and is not even shown in the top of the flow hierarchy.
To make the comparison between the two types of hierarchies more quantitative, we subsequently aggregated the levels in the hierarchies starting from the top, and calculated the Jaccard similarity coefficient between the resulting sets as a function of the level depth ℓ. Thus, when ℓ=1, we are actually comparing the roots, when ℓ=2, the journals on the top two levels and so on. However, since the total number of levels in the hierarchies are different, we refine the definition of the similarity coefficient by allowing different ℓ values in the two hierarchies, and always choosing the pairs of aggregated sets with the maximal relative overlap. Therefore, we actually have two similarity functions,
where Sf(ℓf) and Sn(ℓn) denote the set of aggregated journals from the root to level ℓf in the flow hierarchy and to level ℓn in the nested hierarchy, respectively. When evaluating Jf(ℓf) at a given level depth ℓf according to equation (2), the set of aggregated journals in the flow hierarchy, Sf(ℓf) is fixed, and we search for the most similar set of aggregated journals from the nested hierarchy by scanning over the entire range of possible ℓn values, and choose the one giving the maximal Jaccard similarity. Similarly, when calculating Jn(ℓn) according to equation (3), the set of aggregated journals taken from the nested hierarchy Sn(ℓn) is fixed, and the set Sf(ℓf) yielding the maximal Jaccard similarity is chosen from the flow hierarchy.
In Fig. 4, we show the result obtained for Jf(ℓf) as a function of the level depth ℓf in the flow hierarchy (while the corresponding Jn(ℓn) plot for the nested hierarchy is given in Fig. S10 in the Supplementary Information S6). Beside Jf(ℓf), in Fig. 4 we also plotted the expected similarity between the aggregated sets of journals and a random set of journals of the same size. Since the roots of the hierarchies are different, the curves are starting from 0 at ℓf=1, and naturally, as we reach to the maximal level depth, the similarity is approaching to 1, since all journals are included in the final aggregate. However, at the top levels below the root, a prominent increase can be observed in the Jf(ℓf), while the similarity between random sets of journals is increasing very slowly in this region. Thus, the flow hierarchy and the nested hierarchy revealed by our methods show a significant similarity also from the quantitative point of view. This is also supported by the remarkably small τ=0.16 generalized Kendall-tau distance obtained by treating the two hierarchies as partial orders, and applying a natural extension of the standard distance measure between total orders. The definition of the distance measure and the details of the calculation are given in the Supplementary Information S7.
Finally, our hierarchies can also be compared with traditional impact measures. According to the results detailed in the Supplementary Information S8, both the flow and the nested hierarchy show moderate correlations with the impact factor, the Scimago Journal Rank and the closeness centrality of journals in the aggregated citation network. Therefore, the general trends shown by the hierarchies are consistent with previously introduced, widely used impact measures. However, when looking into the details, they also provide an alternative point of view with important differences, circumventing large correlation values with the former, one-dimensional characterizations of journal ranking.
Ranking and comparing the importance, prestige and popularity of scientific journals is a far from trivial task with quite a few different available impact measures (Garfield, 1955,1999; Braun et al., 2006; Egghe, 2006; Bergstrom, 2007; Bollen et al., 2007, 2005; Leydesdorff, 2007; Bollen et al., 2009; Franceschet, 2010a, b; Glänzel, 2011; Kaur et al., 2013). However, it seems that the overall impact of journals cannot be adequately characterized by a single one-dimensional quality measure (Bollen et al., 2009). In this light, our results offer an informative overview on the ranking and the intricate relations between journals, where instead of, for example, simply ordering them according to a one-dimensional parameter we organize them into multiple hierarchies.
First, we defined a flow hierarchy between the journals based on the m-reaching centrality in the citation network between the scientific papers. This structure organizes the journals according to their potential for spreading new scientific ideas, with the most influential information spreaders sorted at the top of the hierarchy. In this respect Science turned out to be the root, followed by Nature and PNAS, and the top dozen levels of the hierarchy were dominated by multidisciplinary, biological, biochemical and medical journals.
We also constructed a nested hierarchy between the journals by generalizing a recent tag hierarchy extraction algorithm. In this case the journals were organized into branches according to the major scientific fields, with a clear separation between unrelated fields, and relatively strong mixing and overlap between closely related fields. Mapping the different journals into well-oriented knowledge domains is a complex problem on its own (Chen et al., 2001a, b; Shiffrin and Börner, 2004; Rosvall and Bergstrom, 2008; Börner, 2010), especially from the point of view of multi- and interdisciplinary fields. Our nested hierarchy provides a natural tool for the visualization of the intricate nested and overlapping relations between scientific fields as well. An important feature is that the organization of the branches roughly highlights the local hierarchy of the given field, with usually the most prominent journal in the field serving as the local root, and more specialized journals positioned at the bottom. Thus, zooming into a specific field for computing and ranking the journals that publish in the given field becomes simple: we just have to select the corresponding branch in the nested hierarchy.
Another interesting perspective is that based on the position of a journal in the nested hierarchy we gain immediate information on its standing within its particular field. According to that we can select those journals with which we can make a fair comparison, and we can exclude journals in faraway branches from any comparing study. Moreover, similarly to judging the position of a journal within its specific field (a local branch), we can also judge the standing of this sub-field in a larger scientific domain (a main branch) and so on, and thereby compare the ranking of the different scientific fields and sub-fields (each being composed of multiple journals). When zooming out completely to the overall hierarchy between the journals, Nature was observed to be in the top position with Science, PNAS, the Astrophysical Journal and New Scientist formed the second level, with the field-dependent branches starting at the third level. The comparison between the two types of hierarchy reveals a strong similarity accompanied by significant differences. Basically, Science, Nature and PNAS provide the top three journals in both cases, and also, the top few hundred nodes in the hierarchy have a far larger overlap than expected at random. However, a closer level-by-level inspection showed that a very high position in, for example, the flow hierarchy does not guarantee a similarly outstanding ranking in the nested hierarchy, and vice versa. Both hierarchies showed moderate correlations with the impact factor, the Scimago Journal Rank and the closeness centrality of the journals in the citation network. This supports our view that the hierarchical organization of scientific journals provides an interesting alternative for the description of journal impact, which is consistent with the previously introduced measures at large, but in the meantime it shows important differences when examined in detail.
In summary, the two hierarchies we constructed offer a compound view of the inter relations between scientific journals, and provide a higher-dimensional characterization of journal impact instead of ranking simply according to a one-dimensional parameter. Naturally, hierarchies between scientific journals can be defined in other ways too (Iyengar and Balijepally, 2015). For example, when building a flow hierarchy, the overall influence of journals could be measured alternatively with other quantities such as the wake-citation-score (Klosik and Bornholdt, 2014), the PageRank or the Y-factor (Bollen et al., 2007). In parallel, a nested hierarchy might also be constructed by suitably modifying a community finding algorithm producing inherently nested and overlapping communities such as the Informap (Rosvall and Bergstrom, 2008; Rosvall and Bergstrom, 2011; Rosvall et al., 2014) or the clique percolation method (Palla et al., 2005). Another interesting aspect we have not taken into account here is given by the time evolution of the citation network between the journals. Obviously, the ranking of the journals changes with time, and by treating all publications between 1975 and 2011 in a uniform framework we neglected this effect. However, the examination of the further possibilities for hierarchy construction and the study of the time evolution of the journal hierarchies is out of the scope of the present work, although it provides interesting directions for future research.
The datasets analyzed during the current study are available from the Web of Science repository, owned by Thomson Reuters (http://scientific.thomson.com/isi/) but restrictions apply to the availability of these data, which were used under license from Thomson Reuters, and so are not publicly available. Data are however available from the authors upon reasonable request and permission of Thomson Reuters.
The downloading scripts used in the study are available in the Dataverse repository: http://dx.doi.org/10.7910/DVN/MCXTHF
How to cite this article: Palla G, Tibély G, Mones E, Pollner P and Vicsek T (2015) Hierarchical networks of scientific journals. Palgrave Communications 1:15016 doi: 10.1057/palcomms.2015.16.
Albert R and Barabási A-L (2002) Statistical mechanics of complex networks. Reviews of Modern Physics; 74 (1): 47–97.
Batty M nd Longley P (1994) Fractal Cities: A Geometry of Form and Function. Academic: San Diego, CA.
Bergstrom CT (2007) Eigenfactor: Measuring the value and prestige of scholarly journals. C&RL News; 68 (5): 314–316.
Bollen J, de Sompel HV, Smith J and Luce R (2005) Toward alternative metrics of journal impact: A comparison of download and citation data. Information Processing & Management; 41 (6): 1419–1440.
Bollen J, Rodriguez MA and de Sompel HV (2007) Journal status. Scientometrics; 69 (3): 669–687.
Bollen J, de Sompel HV, Hagberg A and Chute R (2009) A principal component analysis of 39 scientific impact measures. PLoS ONE; 4 (6): e6022.
Bordons M, Fernandez MT and Gomez I (2002) Advantages and limitations in the use of impact factor measures for the assessment of research performance. Scientometrics; 53 (2): 195–206.
Borgatti SP (2003) The key player problem In: Breiger R, Carley K and Pattison (eds) Dynamic Social Network Modelling Analysis: Workshop Summary and Papers. National Academy of Sciences Press: Washington D.C., pp 241–252.
Börner K (2010) Atlas of Science: Visualizing What We Know. The MIT Press: Cambridge, Massachusetts, USA.
Braun T, Glänzel W and Schubert A (2006) A Hirsch-type index for journals. Scientometrics; 69 (1): 169–173.
Cattuto C, Barrat A, Baldassarri A, Schehr G and Loreto V (2009) Collective dynamics of social annotation. Proceedings of the National Academy of Sciences of the USA; 106 (26): 10511–10515.
Cattuto C, Loreto V and Pietronero L (2007) Semiotic dynamics and collaborative tagging. Proceedings of the National Academy of Sciences of the USA; 104 (5): 1461–1464.
Chen C, Kuljis J and Paul RJ (2001a) Visualizing latent domain knowledge. IEEE Transactions on Systems, Man, and Cybernetics; 31 (4): 518–529.
Chen C, Paul RJ and OKeefe B (2001b) Fitting the jigsaw 1 of citations: Information visualization in domain analysis. Journal of the Association for Information Science and Technology; 52 (3): 315–330.
Clauset A, Moore C and Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature; 453 (7191): 98–101.
Corominas-Murtra B, Goñi J, Solé RV and Rodríguez-Caso C (2013) On the origins of hierarchy in complex networks. Proceedings of the National Academy of Sciences of the USA; 110 (33): 13316–13321.
Corominas-Murtra B, Rodríguez-Caso C, Goñi J and Solé R (2011) Measuring the hierarchy of feedforward networks. Chaos; 21 (1): 016108.
Damme CV, Hepp M and Siorpaes K (2007) Folksontology: An integrated approach for turning folksonomies into ontologies. In Proceedings of the ESWC Workshop ‘Bridging the Gap between Semantic Web and Web 2.0’, pp. 57–70.
Egghe L (2006) Theory and practice of the g-index. Scientometrics; 69 (1): 131–152.
Eldredge N (1985) Unfinished Synthesis: Biological Hierarchies and Modern Evolutionary Thought. Oxford University Press: New York.
Floeck F, Putzke J, Steinfels S, Fischbach K, Schoder D (2011) Imitation and quality of tags in social bookmarking systems—Collective intelligence leading to folksonomies In: Bastiaens TJ, Baumöl U and Krämer BJ (eds) On Collective Intelligence, Volume 76 of Advances in Intelligent and Soft Computing. Springer: Berlin, Heidelberg, pp 75–91.
Franceschet M (2010a) The difference between popularity and prestige in the sciences and in the social sciences: A bibliometric analysis. Journal of Informetrics; 4 (1): 55–63.
Franceschet M (2010b) Ten good reasons to use the eigenfactor 26 metrics. Information Processing & Management; 46 (5): 555–558.
Fushing H, McAssey MP, Beisner B and McCowan B (2011) Ranking network of captive rhesus macaque society: A sophisticated corporative kingdom. PLoS ONE; 6 (3): e17817.
Garfield E (1955) Citation indexes for science: A new dimension in documentation through association of ideas. Science; 122 (3159): 108.
Garfield E (1999) Journal impact factor: A brief review. 1 Canadian Medical Association Journal; 161 (8): 979–980.
Ghosal G, Zlatić V, Caldarelli G and Newman MEJ (2009) Random hypergraphs and their applications. Physical Review E; 79 (6): 066118.
Glänzel W (2011) The application of characteristic scores and scales to the evaluation and ranking of scientific journals. Journal of Information Science; 37 (1): 40–48.
Goessmann C, Hemelrijk C and Huber R (2000) The formation and maintenance of crayfish hierarchies: Behavioral and self-structuring properties. Behavioral Ecology and Sociobiology; 48 (6): 418–428.
Guimerà R, Danon L, Díaz-Guilera A, Giralt F and Arenas A (2003) Self-similar community structure in a network of human interactions. Physical Review E; 68 (6): 065103.
Harter SP and Nisonger TE (1997) ISI’s impact factor as misnomer: A proposed new measure to assess journal impact. Journal of the American Society for Information Science and Technology; 48 (12): 1146–1148.
Heymann P and Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical Report, Stanford InfoLab.
Hirata H and Ulanowicz R (1985) Information theoretical analysis of the aggregation and hierarchical structure of ecological networks. Journal of Theoretical Biology; 116 (3): 321–341.
ISI Web of Knowledge. (2012) http://scientific.thomson.com/isi/, accessed 1 January 2012.
Iyengar K and Balijepally V (2015) Ranking journals using the dominance hierarchy procedure: An illustration with is journals. Scientometrics; 102 (1): 5–23.
Juszczyszyn K, Kazienko P, Katarzyna M (2010) Personalized ontology based recommender systems for multimedia objects In: Hākansson A, Hartung R and Nguyen N (eds) Agent and Multi-Agent Technology for Internet and Enterprise Systems, Volume 289 of Studies in Computational Intelligence. Springer: Berlin, Heidelberg, pp 275–292.
Kaiser M, Hilgetag CC and Kötter R (2010) Hierarchy and dynamics of neural networks. Front Neuroinform; 4, 112.
Kaur J, Radicchi F and Menczer F (2013) Universality of scholarly impact metrics. Journal of Informetrics; 7 (4): 924–932.
Klosik DF and Bornholdt S (2014) The citation wake of publications detects Nobel laureates’ papers. PLoS ONE; 9 (12): e113184.
Krugman PR (1996) Confronting the mystery of urban hierarchy. Journal of the Japanese and International Economies; 10 (4): 399–418.
Lambiotte R and Ausloos M (2006) Collaborative tagging as a tripartite network. Lecture Notes in Computer Science; 3993, 1114–1117.
Lambiotte R and Rosvall M (2012) Ranking and clustering of nodes in networks with smart teleportation. Physical Review E; 85 (5): 056107.
Lane D (2006) Hierarchy, Complexity, Society. Springer: Dordrecht, the Netherlands.
Leydesdorff L (2007) Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. Journal of the American Society for Information Science and Technology; 58 (9): 1303–1319.
Leydesdorff L, de Moya-Anegón F and Guerrero-Bote VP (2013) Journal maps, interactive overlays, and the measurement of interdisciplinarity on the basis of scopus data. arXiv:1310.4966 [cs.DL], accessed 31 October 2014.
Leydesdorff L, de Moya-Anegón F and de Nooy W (2014) Aggregated journal-journal citation relations in scopus and web-of-science matched and compared in terms of networks, maps, and interactive overlays. arXiv:1404.2505 [cs.DL], accessed 31 October 2014.
Lu L, Medo M, Yeung CH, Zhang Y-C, Zhang Z-K and Zhou T (2012) Recommender systems. Physics Reports; 519 (1): 1–49.
Ma HW, Buer J and Zeng AP (2004) Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach. BMC Bioinformatics; 5 (1): 199.
McShea DW (2001) The hierarchical structure of organisms. Paleobiology; 27 (2): 405–423.
Mendes JFF and Dorogovtsev SN (2003) Evolution of Networks: From Biological Nets to the Internet and WWW.. Oxford University Press: Oxford.
Mika P (2005) Ontologies are us: A unified model of social networks and semantics. In International Semantic Web Conference, 3729, 522 –536.
Mones E, Vicsek L and Vicsek T (2012) Hierarchy measure for complex networks. PLoS ONE; 7 (3): e33799.
Nagy M, Ákos Z, Biro D and Vicsek T (2010) Hierarchical group dynamics in pigeon flocks. Nature; 464 (7290): 890–893.
Nagy M, Vásárhelyi G, Pettit B, Roberts-Mariani I, Vicsek T and Biro D (2013) Context-dependent hierarchies in pigeons. Proceedings of the National Academy of Sciences of the USA; 110 (32): 13049–13054.
Opthof T (1997) Sense and nonsense about the impact factor. Cardiovascular Research; 33 (1): 1–7.
Palla G, Derényi I, Farkas I and Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature; 435 (7043): 814–818.
Palla G, Tibély G, Mones E, Pollner P and Vicsek T (2015) Project, Hiertags, Source code of the crawler. Dataverse. http://dx.doi.org/10.7910/DVN/MCXTHF
Pfitzner R, Scholtes I, Garas A, Tessone CJ and Schweitzer F (2013) Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks. Physical Review Letters; 110 (19): 198701.
Plangprasopchok A, Lerman K and Getoor L (2011) A probabilistic approach for learning folksonomies from structured data. In Fourth ACM International Conference on Web Search and Data Mining (WSDM), ACM: New York, NY, USA, pp 555–564.
Pollner P, Palla G and Vicsek T (2006) Preferential attachment of communities: The same principle, but a higher level. Europhysics Letters; 73 (3): 478–484.
Pumain D (2006) Hierarchy in Natural and Social Sciences, Volume 3 of Methodos Series. Springer: Dodrecht, the Netherlands.
Ravasz E, Somera AL, Mongru DA, Oltvai ZN and Barabási A-L (2002) Hierarchical organization of modularity in metabolic networks. Science; 297 (5586): 1551–1555.
Rosvall M and Bergstrom C (2011) Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE; 6 (4): e18209.
Rosvall M and Bergstrom CT (2008) Maps of random 1 walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the USA; 105 (4): 1118–1123.
Rosvall M, Esquivel AV, Lancichinetti A, West JD and Lambiotte R (2014) Memory in network flows and its effects on spreading dynamics and community detection. Nature Communications, 5, 4630.
Schmitz P (2006) Inducing ontology from flickr tags. Paper presented at Collaborative Web Tagging Workshop at the 15th Int. Conf. on World Wide Web (WWW).
Seglen PO (1997) Why the impact factor of journals should not be used for evaluating research. British Medical Journal; 314 (7079): 498–502.
Shiffrin RM and Börner K (2004) Mapping knowledge domains. Proceedings of the National Academy of Sciences of the USA; 101 (Suppl 1): 5183–5185.
Spyns P, Moor AD, Vandenbussche J and Meersman R (2006) From Folksologies to Ontologies: How the Twain Meet. Lecture Notes in Computer Science, 4275, 738–755.
The Scimago Journal & Country Rank. (2015) http://www.scimagojr.com, accessed 16 March 2015.
Tibély G, Pollner P, Vicsek T and Palla G (2012) Ontologies and tag-statistics. New Journal of Physics; 14 (5): 053009.
Tibély G, Pollner P, Vicsek T and Palla G (2013) Extracting tag hierarchies. PLoS ONE; 8 (12): e84133.
Trusina A, Maslov S, Minnhagen P and Sneppen K (2004) Hierarchy measures in complex networks. Physical Review Letters; 92 (17): 178702.
Valverde S and Solé RV (2007) Self-organization versus hierarchy in open-source social networks. Physical Review E; 76 (4): 046118.
Velardi P, Faralli S and Navigli R (2013) Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics; 39 (3): 665–707.
Voss J (2007) Tagging, folksonomy & Co—Renaissance of manual indexing? arXiv:cs/0701072v2 [cs.IR], accessed 31 October 2014.
Wickens J and Ulanowicz R (1988) On quantifying hierarchical 1 connections in ecology. Journal of Social and Biological Structures; 11 (3): 369–378.
Wimberley E T (2009) Nested Ecology: The Place of Humans in the Ecological Hierarchy. John Hopkins University Press: Baltimore, MD.
WOS Publication Data Downloading Scripts. (2012) http://hiertags.elte.hu/downloads/datasets/wos/, accessed 1 January 2012.
Zlatić V, Ghosal G and Caldarelli G (2009) Hypergraph topological quantities for tagged social networks. Physical Review E; 80 (3): 036118.
The research was partially supported by the European Union and the European Social Fund through project FuturICT.hu (grant no: TAMOP-4.2.2.C-11/1/KONV-2012-0013), by the Hungarian National Science Fund (OTKA K105447) and by the EU FP7 ERC COLLMOT project (grant no: 227878). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
The authors declare no competing financial interests.
About this article
Cite this article
Palla, G., Tibély, G., Mones, E. et al. Hierarchical networks of scientific journals. Palgrave Commun 1, 15016 (2015). https://doi.org/10.1057/palcomms.2015.16
This article is cited by
Communications Physics (2023)
Scientific Reports (2020)
Scientific Reports (2015)