Introduction

Providing an objective ranking of scientific journals and mapping them into different knowledge domains are complex problems of significant importance, which can be addressed using a number of different approaches. Probably the most widely known quality measure is the impact factor (Garfield, 1955,1999), corresponding to the total number of citations a journal receives in a 2-year period, divided by the number of published papers over the same period. Although it is a rather intuitive quantity, the impact factor has serious limitations (Harter and Nisonger, 1997; Opthof, 1997; Seglen, 1997; Bordons et al., 2002). This consequently led to the introduction of alternative measures such as the H-index for journals (Braun et al., 2006), the g-index (Egghe, 2006), the Eigenfactor (Bergstrom, 2007), the PageRank and the Y-factor (Bollen et al., 2007), the Scimago Journal Rank (The Scimago Journal & Country Rank, 2015), as well as the use of various centralities such as the degree-, closeness- or betweenness centrality in the citation network between the journals (Bollen et al., 2005; Leydesdorff, 2007). Comparing the advantages and disadvantages of the different impact measures and examining their correlation has attracted considerable interest in the literature (Bollen et al., 2009; Franceschet, 2010a, b; Glänzel, 2011; Kaur et al., 2013). However, according to the results of the principal component analysis of 39 quality measures carried out by Bollen et al. (2009), scientific impact is a multi-dimensional construct that cannot be adequately measured by any single indicator. Thus, the development of higher-dimensional quality indicators for scientific journals provides an important objective for current research.

In this study, we consider different possibilities for defining a hierarchy between scientific journals based on their citation network. The advantage of using the network approach for representing the intricate relations between journals is that networks can show substantially different aspects compared with any parametric method representing the journals with points in single- or even in multi-dimensional space. When organized into a hierarchy, the most important and prestigious journals are expected to appear at the top, while lesser known journals are expected to be ranked lower. However, a hierarchy offers a more complex view of the ranking between journals compared with a one-dimensional impact measure. For example, if the branches of the hierarchy are organized according to the different scientific fields, then the journals in a given field can be compared simply by zooming into the corresponding branch in the hierarchy. Possible scenarios for hierarchical relations between scientific journals have already been suggested recently by Iyengar and Balijepally (2015). However, the main objective in this earlier study was to examine the validity of a linear ordering between the journals based on a dominance ranking procedure (Iyengar and Balijepally, 2015). Here, we construct and visualize multiple hierarchies between the journals, offering a far more complex view of the ranking between journals compared with a one-dimensional impact measure.

Hierarchical organization is in general a widespread phenomenon in nature and society. This is supported by several studies, focusing on the transcriptional regulatory network of Escherichia coli (Ma et al., 2004), the dominant–subordinate hierarchy among crayfish (Goessmann et al., 2000), the leader–follower network of pigeon flocks (Nagy et al., 2010), the rhesus macaque kingdoms (Fushing et al., 2011), neural network (Kaiser et al., 2010), technological networks (Pumain, 2006), social interactions (Guimerà et al., 2003; Pollner et al., 2006; Valverde and Solé, 2007), urban planning (Batty and Longley, 1994; Krugman, 1996), ecological systems (Hirata and Ulanowicz, 1985; Wickens and Ulanowicz, 1988) and evolution (Eldredge, 1985; McShea, 2001). However, hierarchy is a polysemous word, and in general, we can distinguish between three different types of hierarchies when describing a complex system: the order, the nested and the flow hierarchy. In the case of order hierarchy, we basically define a ranking, or more precisely a partial ordering, of the set of elements under investigation (Lane, 2006). Nested hierarchy (also called inclusion hierarchy or containment hierarchy) represents the idea of recursively aggregating the items into larger and larger groups, resulting in a structure where higher-level groups consist of smaller and more specific components (Wimberley, 2009). Finally, a flow hierarchy can be depicted as a directed graph, where the nodes are layered in different levels so that the nodes that are influenced by a given node (are connected to it through a directed link) are at lower levels.

Hierarchical organization is an important concept also in network theory (Ravasz et al., 2002; Trusina et al., 2004; Pumain, 2006; Clauset et al., 2008; Corominas-Murtra et al., 2011; Mones et al., 2012; Corominas-Murtra et al., 2013). The network approach has become a ubiquitous tool for analysing complex systems—from the interactions within cells, transportation systems, the Internet and other technological networks, through to economic networks, collaboration networks and society (Albert and Barabási, 2002; Mendes and Dorogovtsev, 2003). Grasping the signs of hierarchy in networks is a non-trivial task with a number of possible different approaches, including the statistical inference of an underlying hierarchy based on the observed network structure (Clauset et al., 2008), and the introduction of various hierarchy measures (Trusina et al., 2004; Mones et al., 2012; Corominas-Murtra et al., 2013). What makes the analysis of hierarchy even more complex is that it may also be context dependent. According to a recent study on homing pigeons, the hierarchical pattern of in-flight leadership does not build upon the stable, hierarchical social dominance structure (pecking order) evident among the same birds (Nagy et al., 2013).

In this study, we show that in a somewhat similar fashion, scientific journals can also be organized into multiple hierarchies with different types. Our studies rely on the citation network between scientific papers obtained from Web of Science (ISI Web of Knowledge, 2012). On the one hand, the flow hierarchy analysis of this network based on the m-reaching centrality (Borgatti, 2003; Mones et al., 2012) reveals the structure relevant from the point of view of knowledge spreading and influence. On the other hand, the alternative hierarchy obtained from the same network with the help of an automated tag hierarchy extraction method (Tibély et al., 2013) highlights a nested structure with the most interdisciplinary journals at the top and the very specialized journals at the bottom of the hierarchy.

Scientific publication data

The dataset on which our studies rely consists of all the available publications in Web of Science (ISI Web of Knowledge, 2012) between 1975 and 2011. The downloading scripts we used are available in WOS publication data downloading scripts (2012), and the Harvard Dataverse repository (Palla et al, 2015). To take into account as wide a list of papers as possible, we did not apply any specific filtering. Thus, conference proceedings and technical papers also appear in the dataset used. However, since the network we study builds upon citation between papers (or journals), the conference proceedings, technical papers (or even journals) with no incoming citation fall out of the flow hierarchy analysis automatically. (Nevertheless, in the event that they have outgoing citations, this is included in the evaluation of the m-reaching centrality of other journals.) Furthermore, even when cited, a conference proceedings does not have a real chance of getting high in any of the hierarchies considered here, due to their very limited number of publications compared with journals. Although highly cited individual conference proceedings publications may appear, they cannot boost the overall citation of the proceedings to the level of journals (for example, whenever a scientific breakthrough is published in a conference proceedings first, it is usually also published in a more prestigious journal soon afterwards, which eventually drives the citations to the journal instead of the proceedings). For these reasons, the conference proceedings are ranked at the bottom of the hierarchies we obtained.

We used the 11 character-long abbreviated journal issue field in the core data for identifying the journal of a given publication. The advantage of using this field is that it contains only an abbreviated journal name without any volume numbers, issue numbers, years and so on (in contrast, the full journal name in some cases may contain the volume number or the publication year as well, which of course are varying over time). The total number of publications for which the mentioned data field was non-empty reached 35,372,038, and the number of different journals identified based on this data field was 13,202. As mentioned previously, in case of conference proceedings, the appearing 11 character long abbreviated journals issue field was treated the same as in case of journal publications, without any filtering.

Flow hierarchy based on the m-reaching centrality

A recently introduced approach for quantifying the position of a node in a flow hierarchy is based on the m-reaching centrality (Mones et al., 2012). The basic intuition behind this idea is that reaching the rest of the network should be relatively easy for the nodes high in the hierarchy, and more difficult from the nodes at the bottom of the hierarchy. Thus, the position of the node i in the hierarchy is determined by its m-reaching centrality (Borgatti, 2003), Cm(i), corresponding to the fraction of nodes that can be reached from i, following directed paths of at most m steps, (where m is a system dependent parameter). Naturally, a higher Cm(i) value corresponds to a higher position in the hierarchy, and the node with the maximal Cm(i) is chosen as the root. However, this approach does not specify the ancestors or descendants of a given node in the hierarchy; instead it provides only a ranking between the nodes of the underlying network according to Cm(i). Nevertheless, hierarchical levels can be defined in a simple way: after sorting nodes in an ascending order, we can sample and aggregate nodes into levels so that in each level the standard deviation of Cm is lower than a predefined fraction of the standard deviation in the whole network. This method of constructing a flow hierarchy based on the m-reach (and the standard variation of the m-reach) has already been shown to provide meaningful structures for a couple of real systems, including electric circuits, transcriptional regulatory networks, e-mail networks and food webs (Mones et al., 2012).

When applying this approach to the study of the hierarchy between scientific journals, we have to take into account that journals are not directly connected to each other; instead they are linked via a citation network between the individual publications. In principle, we may assume different “journal strategies” for obtaining a large reach in this system: for example, a journal might publish a very high number of papers of poor quality with only a few citations each. Nevertheless, taken together they can still provide a large number of aggregated citations. Another option is to publish a lower number of high-quality papers, obtaining a lot of citations individually. To avoid having a built-in preference for one type of journal over the other, we define a reaching centrality that is not sensitive to such details, and which only depends on the number of papers that can be reached in m steps from publications appearing in a given journal.

First, we note that when calculating the reach of the publications, the citation links have to be followed backwards: that is, if paper i is citing j, then the information presented in j has reached i. Thus, the reaching centralities are evaluated in a network where the links are pointing from a reference article to all papers citing it. The m-reach of a journal J, denoted by C m (J), is naturally given by the number of papers that can be reached in at most m steps from any article appearing in the given journal. Thus, the mathematical definition of C m (J) is based on the set of m-reachable nodes, given by

C m ( J ) = { i | d out ( j , i ) m , j J i J } ,
((1))

where dout(j, i) denotes the out-distance from paper j to i, (that is, the distance of the papers when only consecutive out-links are considered). The set C m (J) is equivalent to the set of papers outside J that can be reached in at most m steps, provided that the starting publication is in J. The m-reaching centrality of J is simply the size of the m-reachable set, C m (J)= | C m ( J ) | (that is, the number of papers in C m (J)). Figure 1 shows an illustration of the calculation of the m-reach of the journals detailed above. We note that a closely related impact measure for judging the influence of research papers based on deeper layers of other papers in the citation network is given by the wake-citation-score (Klosik and Bornholdt, 2014). A comparison study between the m-reach and the wake-citation-score is given in the Supplementary Information S1.

Figure 1
figure 1

A schematic illustration of the calculation of journals’ m-reach. (a) The papers are represented by grey nodes, connected by directed citation links, while the journals are corresponding to the coloured sets; (b) When calculating the m-reach, the links have to be reverted. For example, in the case of a 3-reach, articles in journal B (orange) can reach four papers in total (the ones in journals A, C and D) excluding journal B itself, while journal H (dark blue) has a reach of six (given by articles in journals F and G). Note that if we would switch to network between journals, B would have a reach of five (as it can reach journals A, C, D, E and F), while journal H would have a reach of two (as it could reach only journals F and G).

To determine the optimal value of m, we calculated the C m (J) for all journals in our dataset for a wide range of m values. According to the results detailed in the Supplementary Information S2, at around m=4 the C m (J) starts to saturate for the top journals. To provide a fair and robust ordering between the journals, here we set m to m=3, corresponding to an optimal setting: on the one hand we still allow multiple steps in the paths contributing to the reach. On the other hand, we also avoid the saturation effect caused by the exponential increase in the reach as a function of the maximal path length and the finite system size. More details on the tuning of m are given in the Supplementary Information S2, and the results obtained for other m values are shown in the Supplementary Information S3.

Before considering the results, we note that an alternative approach for studying the citation between journals is to aggregate all papers in a given journal into a single node, representing the journal itself, in similar fashion to the works by Leydesdorff et al. (2013, 2014). In this case the link weight from journal J to journal is given by total number of citations from papers appearing in J to papers in K. In the Supplementary Information S4, we analyse the flow hierarchy obtained by evaluating the m-reaching centrality in this aggregated network between the journals. However, recent works have pointed out that aggregations of this nature can lead to serious misjudgement of the importance of nodes (Pfitzner et al., 2013; Rosvall et al., 2014). For instance, an interesting memory effect of the citation network between individual papers is that a paper citing mostly biological papers that appear in interdisciplinary journals is still much more likely to be cited back by other biological papers, compared with other disciplines (Rosvall et al., 2014). Such phenomena can have a significant influence on the m-reaching centrality. However, by switching to the aggregated network between journals we wipe out these effects and introduce a distortion in the m-reach. Thus, here we stick to the most detailed representation of the system, given by the citation network between individual papers and leave the analysis of the aggregated network between journals to the Supplementary Information S4. (An illustration of the difference between the m-reach calculated on the level of papers and on the aggregated level of journals is given in Fig. 1.)

The results for the top journals according to the m-reaching centrality at m=3 based on the publication data available from the Web of Science between 1975 and 2011 are given in Fig. 2. The hierarchy levels were defined by allowing a maximal standard deviation of 0.13·σ(Cm) for Cm within a given level, where σ(Cm) denotes the standard deviation of Cm over all journals. (The effect of changes in the within-level standard deviation of Cm on the shape of the hierarchy is discussed in the Supplementary Information S5.) According to our analysis, Science is the most influential journal based on the flow hierarchy, followed by Nature, with PNAS coming third, while Lancet and the New England Journal of Medicine form the fourth level. In general, the top of the hierarchy is strongly dominated by medical, biological and biochemical journals. For instance, the top physics journal, the Physical Review Letters, appears only on the 13th level, and the top chemistry journal, the Journal of the American Chemical Society, is positioned at the 11th level.

Figure 2
figure 2

Top journals in the flow hierarchy according to the m-reaching centrality Cm at m=3, based on the scientific publication data from the Web of Science. The standard deviation of Cm within the individual hierarchy levels is at most 0.13·σ(Cm), where σ(Cm) denotes the standard deviation of Cm over all journals. The nodes are coloured according to the scientific field of the given journal.

For comparison, in Fig. S7 in the Supplementary Information S4, we show the top of the flow hierarchy obtained from the citation network aggregated to the level of journals. Although Science, Nature and PNAS preserve their position as the top three journals, relevant changes can be observed in the hierarchy levels just below, as physical and chemical journals take over the biological and medical journals. For instance, Physical Review Letters is raised from Level 13 to Level 3, while Lancet is pushed back from Level 4 to Level 17. This reorganization is likely to be caused by the “memory” of the citation network described in the work by Rosvall et al. (2014)—the fact that a paper citing mostly biological articles is more likely to be cited by other biological papers, even if it appears in an interdisciplinary journal. Since biology and medicine have the highest publication rate among different scientific fields, the aggregation to the level of journals has the most severe effect on the reach of entities obtaining citations mostly from these fields. Thus, the notable difference between the flow hierarchy obtained from the citation network of individual papers and from the aggregated network between journals is yet another indication of the distortion in centralities caused by link aggregation, pointed out in related, but somewhat different contexts by Rosvall et al. (2014) and by Pfitzner et al. (2013).

Extracting a nested hierarchy

Categorizing items into a nested hierarchy is a general idea that has been around for a long time in, for instance, library classification systems, biological classification and also in the content classification of scientific publications. A very closely related problem is that given by the automatized categorization of free tags appearing in various online content (Heymann and Garcia-Molina, 2006; Schmitz, 2006; Damme et al., 2007; Plangprasopchok et al., 2011; Tibély et al., 2013; Velardi et al., 2013). In recent years, the voluntary tagging of photos, films, books and so on, with free words has become popular on the Internet in blogs, various file-sharing platforms, online stores and news portals. In some cases, these phenomena are referred to as collaborative tagging (Lambiotte and Ausloos, 2006; Cattuto et al., 2007; Cattuto et al., 2009; Floeck et al., 2011), and the resultant large collections of tags are referred to as folksonomies, highlighting their collaborative origin and the “flat” organization of the tags in these systems (Mika, 2005; Lambiotte and Ausloos, 2006; Spyns et al., 2006; Cattuto et al., 2007, 2009; Voss, 2007; Tibély et al., 2012). The natural mathematical representation of tagging systems is given by hypergraphs (Ghosal et al., 2009; Zlatić et al., 2009).

Revealing the hidden hierarchy between tags in a folksonomy or a tagging system in general can significantly help broadening or narrowing the scope of search in the system, give recommendation about yet unvisited objects to the user or help the categorization of newly appearing objects (Juszczyszyn et al., 2010; Lu et al., 2012). Here we apply a generalized version of a recent tag hierarchy extraction method (Tibély et al., 2013) for constructing a nested hierarchy between scientific journals. In its original form, the input of the tag hierarchy extraction algorithm is given by the weighted co-occurrence network between the tags, where the weights correspond to number of shared objects. Based on the z-score of the connected pairs and the centrality of the tags in the co-occurrence network, the hierarchy is built bottom up, as the algorithm eventually assigns one or a few direct ancestors to each tag (except for the root of the hierarchy). The details of the algorithm are described in the “nested hierarchy extraction algorithm” subsection.

To study the nested hierarchy between scientific journals, we simply replace the weighted co-occurrence network between tags by the weighted citation network between journals at the input of the algorithm. Although a tag co-occurrence network and a journal citation network are different, the two most important properties needed for the nested hierarchy analysis are the same in both: general tags and multidisciplinary journals have a significantly larger number of neighbours compared with more specific tags and specialized journals. Furthermore, closely related tags co-appear more often compared with unrelated tags, as journals focusing on the same field cite each other more often compared with journals dealing with independent disciplines. Based on this, the hierarchy obtained from the journal citation network in this approach is expected to be organized according to the scope of the journals, with the most general multidisciplinary journals at the top and the very specialized journals at the bottom.

We note that since in this case we have to determine which journals are the most closely related to each other and which are unrelated, rather than evaluating the overall influence of the journals, we use simply the number of direct citations from one journal to the other as the weight for the connections. This is equivalent of taking the m-reach calculated on the publication level at m=1, sorting according to the source of the citations and then summing up the results for the papers appearing in one given journal. Thus, when constructing the flow hierarchy, we start from the publication level citation network and evaluate the m-reach at m=3, whereas in case of the nested hierarchy we calculate the publication level m-reach at m=1, which technically becomes equivalent to the journal level citation numbers when summed over papers appearing in one given journal.

Nested hierarchy extraction algorithm

Our algorithm corresponds to a generalized version of “Algorithm B” presented in Tibély et al. (2013). The main differences are that here we force the algorithm to produce a directed acyclic graph consisting of a single connected component, and we allow the presence of multiple direct ancestors. In contrast, in its original form “Algorithm B” can provide disconnected components, and each component in the output is corresponding to a directed tree. A further technical improvement we introduce is given by the calculation of the node centralities. Thus, the outline of the method used here is the following: first we carry out “Algorithm B” given in Tibély et al. (2013) with modified centrality evaluation, obtaining a directed tree between the journals. This is followed by a second iteration where we “enrich” the hierarchy by occasionally assigning further direct ancestors to the nodes.

Since “Algorithm B” is presented in full detail in Tibély et al. (2013), here we provide only a brief overview. The input of the algorithm is a weighted directed network between the journals based on the z-score for the citation links. After throwing away unimportant connections by using a weight threshold, the node centralities are evaluated in the remaining network. Here we used a centrality based on random walks on the citation network between journals with occasional teleportation steps, in a similar fashion to PageRank. We adopted the method proposed by Lambiotte and Rosvall (2012), calculating the dominant right eigenvector of the matrix Mij=(1−α)ωij+αsiin, where ωij is the link weight, siin denotes the in strength of journal i (in number of citations) and α is corresponding to the teleportation probability. We have chosen the widely used α=0.15 parameter value, however, the ordering of the journals according to the centralities was quite robust with respect to changes in α.

Based on the centralities a directed tree representing the backbone of the hierarchy is built from bottom up as described in “Algorithm B” in Tibély et al. (2013). In the event that we cannot find a suitable “parent” for node i according to the original rules, we chose the node with the highest accumulated z-score from all journals that have a higher centrality than i (where the accumulation is running over the already found descendants of the given node). This ensures the emergence of a single connected component, since a single direct ancestor is assigned to every node (except for the root of the tree). This is followed by a final iteration over the nodes where we examine whether further “parents” have to be assigned or not. The criteria for accepting a node as the second, third, and so on, direct ancestor of journal i are that it must have a higher centrality compared with i, and also the z-score has to be larger than the z-score between i and its first direct ancestor. Note that the first parent is chosen based on aggregated z-score instead of the simple pairwise z-score, as explained by Tibély et al. (2013).

Nested hierarchy of scientific journals

In Fig. 3, we show the top of the obtained nested hierarchy between the journals, with Nature appearing as the root, while PNAS, Science, New Scientist and Astrophysical Journal form the second level. Several prominent field specific journals such as Physical Review Letters, Brain Research, Ecology and Journal of the American Chemical Society have both Nature and Science as direct ancestors. Interestingly, the Astrophysical Journal is a direct descendant only of Nature, and is not linked under Science or PNAS. Nevertheless, it serves as a local root for a branch of astronomy-related journals, in a similar fashion to Physical Review Letters, which can be regarded as the local root of physics journals, or Journal of the American Chemical Society, corresponding to the local root of chemical journals. The biological, medical and biochemical journals form a rather mingled branch under PNAS, with Journal of Biological Chemistry as the local root and New England Journal of Medicine corresponding to a sub-root for medical journals. However, Cell and New England Journal of Medicine are direct descendants of Nature and Science as well. Interestingly, the brain- and neuroscience-related journals form a rather well-separated branch with Brain Research as the local root, linked directly under PNAS, Science and also under Nature.

Figure 3
figure 3

The top of the nested hierarchy between scientific journals. Due to the rapidly increasing number of nodes per level, the journals on Levels 4 and 5 are organized into multiple rows. The size of the nodes indicates the total number of descendants (on a logarithmic scale). The journals positioned above Level 5 with no out links shown (for example, Europhysics Letters or Bioscience) have descendants on levels that are out of the scope of the figure.

Comparing the hierarchies

Although the hierarchies presented in Fig. 2 and Fig. 3 show a great deal of similarity, some interesting differences can also be observed. The figures show the top of the corresponding hierarchies, and seemingly, a significant portion of the journals ranked high in the hierarchy are the same in both cases. However, the root of the hierarchies is different (Science in case of the flow hierarchy and Nature in case of the nested hierarchy), and also the level-by-level comparison of Fig. 2 and Fig. 3 shows that a very high position in the flow hierarchy is not always accompanied by an outstanding position in the nested hierarchy, and vice versa. For example, the Lancet and New England Journal of Medicine appear much higher in Fig. 2 compared with Fig. 3, while Geophysical Research Letters is just below Nature and Science in the nested hierarchy and is not even shown in the top of the flow hierarchy.

To make the comparison between the two types of hierarchies more quantitative, we subsequently aggregated the levels in the hierarchies starting from the top, and calculated the Jaccard similarity coefficient between the resulting sets as a function of the level depth ℓ. Thus, when ℓ=1, we are actually comparing the roots, when ℓ=2, the journals on the top two levels and so on. However, since the total number of levels in the hierarchies are different, we refine the definition of the similarity coefficient by allowing different ℓ values in the two hierarchies, and always choosing the pairs of aggregated sets with the maximal relative overlap. Therefore, we actually have two similarity functions,

J f ( f ) = max n | S f ( f ) S n ( n ) | | S f ( f ) S n ( n ) | ,
((2))
J n ( n ) = max f | S f ( f ) S n ( n ) | | S f ( f ) S n ( n ) | ,
((3))

where Sf(ℓf) and Sn(ℓn) denote the set of aggregated journals from the root to level ℓf in the flow hierarchy and to level ℓn in the nested hierarchy, respectively. When evaluating Jf(ℓf) at a given level depth ℓf according to equation (2), the set of aggregated journals in the flow hierarchy, Sf(ℓf) is fixed, and we search for the most similar set of aggregated journals from the nested hierarchy by scanning over the entire range of possible ℓn values, and choose the one giving the maximal Jaccard similarity. Similarly, when calculating Jn(ℓn) according to equation (3), the set of aggregated journals taken from the nested hierarchy Sn(ℓn) is fixed, and the set Sf(ℓf) yielding the maximal Jaccard similarity is chosen from the flow hierarchy.

In Fig. 4, we show the result obtained for Jf(ℓf) as a function of the level depth ℓf in the flow hierarchy (while the corresponding Jn(ℓn) plot for the nested hierarchy is given in Fig. S10 in the Supplementary Information S6). Beside Jf(ℓf), in Fig. 4 we also plotted the expected similarity between the aggregated sets of journals and a random set of journals of the same size. Since the roots of the hierarchies are different, the curves are starting from 0 at ℓf=1, and naturally, as we reach to the maximal level depth, the similarity is approaching to 1, since all journals are included in the final aggregate. However, at the top levels below the root, a prominent increase can be observed in the Jf(ℓf), while the similarity between random sets of journals is increasing very slowly in this region. Thus, the flow hierarchy and the nested hierarchy revealed by our methods show a significant similarity also from the quantitative point of view. This is also supported by the remarkably small τ=0.16 generalized Kendall-tau distance obtained by treating the two hierarchies as partial orders, and applying a natural extension of the standard distance measure between total orders. The definition of the distance measure and the details of the calculation are given in the Supplementary Information S7.

Figure 4
figure 4

Comparison between the flow hierarchy and the nested hierarchy of scientific journals. The Jaccard similarity coefficient Jf(ℓf) of the aggregated sets of journals is plotted as a function of the number of accumulated journals from the top of the hierarchy to level f in the flow hierarchy. Circles are corresponding to the similarity between the two hierarchies, while squares show the similarity between two random sets of journals of the corresponding size.

Finally, our hierarchies can also be compared with traditional impact measures. According to the results detailed in the Supplementary Information S8, both the flow and the nested hierarchy show moderate correlations with the impact factor, the Scimago Journal Rank and the closeness centrality of journals in the aggregated citation network. Therefore, the general trends shown by the hierarchies are consistent with previously introduced, widely used impact measures. However, when looking into the details, they also provide an alternative point of view with important differences, circumventing large correlation values with the former, one-dimensional characterizations of journal ranking.

Discussion

Ranking and comparing the importance, prestige and popularity of scientific journals is a far from trivial task with quite a few different available impact measures (Garfield, 1955,1999; Braun et al., 2006; Egghe, 2006; Bergstrom, 2007; Bollen et al., 2007, 2005; Leydesdorff, 2007; Bollen et al., 2009; Franceschet, 2010a, b; Glänzel, 2011; Kaur et al., 2013). However, it seems that the overall impact of journals cannot be adequately characterized by a single one-dimensional quality measure (Bollen et al., 2009). In this light, our results offer an informative overview on the ranking and the intricate relations between journals, where instead of, for example, simply ordering them according to a one-dimensional parameter we organize them into multiple hierarchies.

First, we defined a flow hierarchy between the journals based on the m-reaching centrality in the citation network between the scientific papers. This structure organizes the journals according to their potential for spreading new scientific ideas, with the most influential information spreaders sorted at the top of the hierarchy. In this respect Science turned out to be the root, followed by Nature and PNAS, and the top dozen levels of the hierarchy were dominated by multidisciplinary, biological, biochemical and medical journals.

We also constructed a nested hierarchy between the journals by generalizing a recent tag hierarchy extraction algorithm. In this case the journals were organized into branches according to the major scientific fields, with a clear separation between unrelated fields, and relatively strong mixing and overlap between closely related fields. Mapping the different journals into well-oriented knowledge domains is a complex problem on its own (Chen et al., 2001a, b; Shiffrin and Börner, 2004; Rosvall and Bergstrom, 2008; Börner, 2010), especially from the point of view of multi- and interdisciplinary fields. Our nested hierarchy provides a natural tool for the visualization of the intricate nested and overlapping relations between scientific fields as well. An important feature is that the organization of the branches roughly highlights the local hierarchy of the given field, with usually the most prominent journal in the field serving as the local root, and more specialized journals positioned at the bottom. Thus, zooming into a specific field for computing and ranking the journals that publish in the given field becomes simple: we just have to select the corresponding branch in the nested hierarchy.

Another interesting perspective is that based on the position of a journal in the nested hierarchy we gain immediate information on its standing within its particular field. According to that we can select those journals with which we can make a fair comparison, and we can exclude journals in faraway branches from any comparing study. Moreover, similarly to judging the position of a journal within its specific field (a local branch), we can also judge the standing of this sub-field in a larger scientific domain (a main branch) and so on, and thereby compare the ranking of the different scientific fields and sub-fields (each being composed of multiple journals). When zooming out completely to the overall hierarchy between the journals, Nature was observed to be in the top position with Science, PNAS, the Astrophysical Journal and New Scientist formed the second level, with the field-dependent branches starting at the third level. The comparison between the two types of hierarchy reveals a strong similarity accompanied by significant differences. Basically, Science, Nature and PNAS provide the top three journals in both cases, and also, the top few hundred nodes in the hierarchy have a far larger overlap than expected at random. However, a closer level-by-level inspection showed that a very high position in, for example, the flow hierarchy does not guarantee a similarly outstanding ranking in the nested hierarchy, and vice versa. Both hierarchies showed moderate correlations with the impact factor, the Scimago Journal Rank and the closeness centrality of the journals in the citation network. This supports our view that the hierarchical organization of scientific journals provides an interesting alternative for the description of journal impact, which is consistent with the previously introduced measures at large, but in the meantime it shows important differences when examined in detail.

In summary, the two hierarchies we constructed offer a compound view of the inter relations between scientific journals, and provide a higher-dimensional characterization of journal impact instead of ranking simply according to a one-dimensional parameter. Naturally, hierarchies between scientific journals can be defined in other ways too (Iyengar and Balijepally, 2015). For example, when building a flow hierarchy, the overall influence of journals could be measured alternatively with other quantities such as the wake-citation-score (Klosik and Bornholdt, 2014), the PageRank or the Y-factor (Bollen et al., 2007). In parallel, a nested hierarchy might also be constructed by suitably modifying a community finding algorithm producing inherently nested and overlapping communities such as the Informap (Rosvall and Bergstrom, 2008; Rosvall and Bergstrom, 2011; Rosvall et al., 2014) or the clique percolation method (Palla et al., 2005). Another interesting aspect we have not taken into account here is given by the time evolution of the citation network between the journals. Obviously, the ranking of the journals changes with time, and by treating all publications between 1975 and 2011 in a uniform framework we neglected this effect. However, the examination of the further possibilities for hierarchy construction and the study of the time evolution of the journal hierarchies is out of the scope of the present work, although it provides interesting directions for future research.

Data Availability

The datasets analyzed during the current study are available from the Web of Science repository, owned by Thomson Reuters (http://scientific.thomson.com/isi/) but restrictions apply to the availability of these data, which were used under license from Thomson Reuters, and so are not publicly available. Data are however available from the authors upon reasonable request and permission of Thomson Reuters.

The downloading scripts used in the study are available in the Dataverse repository: http://dx.doi.org/10.7910/DVN/MCXTHF

Additional information

How to cite this article: Palla G, Tibély G, Mones E, Pollner P and Vicsek T (2015) Hierarchical networks of scientific journals. Palgrave Communications 1:15016 doi: 10.1057/palcomms.2015.16.