Hierarchical networks of scientific journals

Scientific journals are the repositories of the gradually accumulating knowledge of mankind about the world surrounding us. Just as our knowledge is organised into classes ranging from major disciplines, subjects and fields to increasingly specific topics, journals can also be categorised into groups using various metrics. In addition to the set of topics characteristic for a journal, they can also be ranked regarding their relevance from the point of overall influence. One widespread measure is impact factor, but in the present paper we intend to reconstruct a much more detailed description by studying the hierarchical relations between the journals based on citation data. We use a measure related to the notion of m-reaching centrality and find a network which shows the level of influence of a journal from the point of the direction and efficiency with which information spreads through the network. We can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. The results are weakly methodology-dependent and reveal non-trivial relations among journals. The two alternative hierarchies show large similarity with some striking differences, providing together a complex picture of the intricate relations between scientific journals.


Introduction
Hierarchical organisation is a widespread phenomenon in nature and society. This is supported by several studies, focusing on the transcriptional regulatory network of Escherichia coli [1], the dominant-subordinate hierarchy among crayfish [2], the leader-follower network of pigeon flocks [3], the rhesus macaque kingdoms [4], neural networks [5], technological networks [6], social interactions [7,8,9], urban planning [10,11], ecological systems [12,13], and evolution [14,15]. However, hierarchy is a polysemous word, and in general, we can distinguish between three different type of hierarchies when describing a complex system: the order, the nested and the flow hierarchy. In case of order hierarchy, we basically define a ranking, or more precisely a partial ordering on the set of elements under study [16]. Nested hierarchy, (also called as inclusion hierarchy or containment hierarchy), represents the idea of recursively aggregating the items into larger and larger groups, resulting in a structure where higher level groups consist of smaller and more specific components [17]. Finally, a flow hierarchy can be depicted as a directed graph, where the nodes are layered in different levels so that the nodes that are influenced by a given node (are connected to it through a directed link) are at lower levels.
Hierarchical organisation is a very relevant concept in network theory [18,19,20,6,21,22,23]. The network approach has become an ubiquitous tool for analysing complex systems ranging from the interactions within cells through transportation systems, the Internet and other technological networks to economic networks, collaboration networks and the society [24,25]. Grasping the signs of hierarchy in networks is a non-trivial task with a number of possible different approaches, including the statistical inference of an underlying hierarchy based on the observed network structure [20], and the introduction of various hierarchy measures [19,22,23].
What makes the analysis of hierarchy even more complex is that it may also be context dependent. According to a recent study on homing pigeons, the hierarchical pattern of in-flight leadership does not build upon the stable, hierarchical social dominance structure (pecking order) evident among the same birds [26]. Here we show that multiple hierarchies with different types can be observed also between scientific journals. Our studies rely on the citation network between scientific papers obtained from the Web of Science [27]. The data is consisting of all the available publications between 1975 and 2011, counting altogether 35,372,039 papers appearing in 13,202 different journals. On the one hand, the flow hierarchy analysis of this network based on the m-reaching centrality [22,28] reveals the structure relevant from the point of view of knowledge spreading and influence. On the other hand, the alternative hierarchy obtained from the same network with the help of an automated tag hierarchy extraction method [29] highlights a nested structure with the most interdisciplinary journals at the top and the very specialised journals at the bottom of the hierarchy.

Flow hierarchy based on the m-reaching centrality
A recently introduced approach for quantifying the position of a node in a flow hierarchy is based on the m-reaching centrality [22]. The basic intuition behind this idea is that reaching the rest of the network should be relatively easy for the nodes high in the hierarchy, and more difficult from the nodes at the bottom of the hierarchy. Thus, the position of the node i in the hierarchy is determined by its m-reaching centrality [28], C m (i), corresponding to the fraction of nodes that can be reached from i, following directed paths of at most m steps, (where m is a system dependent parameter). Naturally, a higher C m (i) value is corresponding to a higher position in the hierarchy, and the node with the maximal C m (i) is chosen as the root. However, this approach is not specifying the ancestors or descendants of a given node in the hierarchy, instead it provides only a ranking between the nodes of the underlying network according to C m (i). Nevertheless, hierarchical levels can be defined in a simple way: after sorting nodes in an ascending order, we can sample and aggregate nodes into levels so that in each level, the standard deviation of C m is lower than a pre-defined fraction of the standard deviation in the whole network.
When applying this approach to the study of the hierarchy between scientific journals, we have to take into account that journals are not directly connected to each other, instead they are linked via a citation network between the individual publications. In principle we may assume different "journal strategies" for obtaining a large reach in this system: E.g., a journal might publish a very high number of papers, which are of poor quality with only a few citations each. Nevertheless, taken together they can still provide a large number of aggregated citations. Another option is to publish a lower number of high quality papers, obtaining a lot of citations also on the level of publications. To avoid having a built in preference for one type of journal over the other, we define a reaching centrality which is not sensitive to such details, and depends only on the number of papers that can be reached in m-steps from publications appearing in the given journal.
First we note that when calculating the reach of the publications, the  citation links have to be followed backwards, i.e., if paper i is citing j, then the information presented in j has reached i. Thus, the reaching centralities are evaluated in a network where the links are pointing from a reference article to all papers citing it. The m-reach of a journal J , denoted by C m (J ), is naturally given by the number of papers that can be reached in at most m steps from any article appearing in the given journal. Thus, the mathematical definition of C m (J ) is based on the set of m-reachable nodes, given by where d out (j, i) denotes the out-distance from paper j to i, (i.e., the distance of the papers when only consecutive out-links are considered). The set C m (J ) is equivalent to the set of papers outside J that can be reached in at most m steps, provided that the starting publication is in J . The m-reaching centrality of J is simply the size of the m-reachable set, C m (J ) = |C m (J )|, (i.e., the number of papers in C m (J )). In Fig.1. we show an illustration of the calculation of the m-reach of the journals detailed above. In order to determine the optimal value of m, we calculated the C m (J ) for all journals in our data set for a wide range of m values. According to the results detailed in the Supplementary Information S1., around m = 4 the C m (J ) is starting to saturate for the top journals. In order to provide a fair and robust ordering between the journals, here we set m to m = 3, corresponding to an optimal setting: On the one hand we are still allowing multiple steps in the paths contributing to the reach, and on the other hand, avoid the saturation effect caused by the exponential increase in the reach as a function of the maximal path length and the finite system size. (More details on the tuning of m are given in the Supplementary Information S1).
Before actually turning to the results, we note that an alternative approach for studying the citation between journals is to aggregate all papers in a given journal into a single node, representing the journal itself, in similar fashion to Refs. [30,31]. In this case the link weight from journal J to journal I is given by total number of citations from papers appearing in J to papers in I. In the Supplementary Information S2. we analyse the flow hierarchy obtained by evaluating the m-reaching centrality in this aggregated network between the journals. However, recent works have pointed out that aggregations of this nature can lead to serious misjudgement of the importance of nodes [32,33]. E.g., an interesting memory effect of the citation network between individual papers is that a paper citing mostly biological papers, however, appearing in an interdisciplinary journal, is still much more likely to be cited back by other biological papers compared to other disciplines [32]. Such phenomena can have a significant influence on the m-reaching centrality. However, by switching to the aggregated network between journals we wipe out these effects and introduce a distortion in the m-reach. Thus, here we stick to the most detailed representation of the system, given by the citation network between individual papers, and leave the analysis of the aggregated network between journals to the Supplementary Information S2. (An illustration of the difference between the m-reach calculated on the level of papers and on the aggregated level of journals is given in Fig.1.) The results for the top journals according to the m-reaching centrality at m = 3 based on the publication data available from the Web of Science between 1975 and 2011 are given in Fig.2. According to our analysis, Science is the most influential journal based on the flow hierarchy, followed by Nature, with PNAS coming 3 d , while Lancet and the New England Journal of Medicine are forming the 4 th level. In general, the top of the hierarchy is strongly dominated by medical, biological and biochemical journals, e.g., the top physics journal, the Physical Review Letters is appearing only on the 13 th level, and the top chemistry journal, the Journal of the American Chemical Society is positioned at the 11 th level.
For comparison, in Fig.S3. in the Supplementary Information S2. we show the top of the flow hierarchy obtained from the citation network aggregated to the level of journals. Although Science, Nature and PNAS preserve their position as the top 3 journals, relevant changes can be observed in the hierarchy levels just below, as physical and chemical journals take over the biological and medical journals. E.g., Physical Review Letters is raised from level 13 to level 3, while Lancet is pushed back from 4 level to level 17. This reorganisation is likely to be caused by the "memory" of the citation network described in Ref. [32], the fact that a paper citing mostly biological articles is more likely to be cited by other biological papers, even if it is appearing in an interdisciplinary journal. Since biology and medicine have the highest publication rate among different scientific fields, the aggregation to the level of journals has the most severe effect on the reach of entities obtaining citations mostly from these fields. Thus, the notable difference between the flow hierarchy obtained from the citation network of individual papers and from the aggregated network between journals is yet another indication of the distortion in centralities caused by link aggregation, pointed out in related, but somewhat different contexts in Refs. [32,33].

Extracting a nested hierarchy
Categorising items into a nested hierarchy is a general idea that has been around for a long time in, e.g., library classification systems, biological classification and also in the content classification of scientific publications. A very closely related problem is given by the automatised categorisation of free tags appearing in various on-line content [34,35,36,37,29,38]. In the recent years the voluntary tagging of photos, films, books, etc. with free words has become popular on the WWW from blogs, through various file sharing platforms to on-line stores and news portals. In some cases these phenomena is referred to as collaborative tagging [39,40,41,42], and the arising large collections of tags are referred to as folksonomies, highlighting their collaborative origin and the "flat" organisation of the tags in these systems [39,40,42,43,44,45,46]. The natural mathematical representation of tagging systems is given by hypergraphs [47,48].
Revealing the hidden hierarchy between tags in a folksonomy or a tagging system in general can significantly help broadening or narrowing the scope of search in the system, give recommendation about yet unvisited objects to the user, or help the categorisation of newly appearing objects [49,50]. Here we apply a generalised version of a recent tag hierarchy extraction method [29] for constructing a nested hierarchy between scientific journals. In its original form, the input of the tag hierarchy extraction algorithm is given by the weighted co-occurrence network between the tags, where the weights correspond to number of shared objects. Based on the z-score of the connected pairs and the centrality of the tags in the co-occurrence network, the hierarchy is built bottom up, as the algorithm is eventually assigning one or a few direct ancestors to each tag, (except for the root of the hierarchy). The details of the algorithm are described in the Methods.
In order to study the nested hierarchy between scientific journals, we simply replace the weighted co-occurrence network between tags by the weighted citation network between journals at the input of the algorithm. Although a tag co-occurrence network and a journal citation network are different, the two most important properties needed for the nested hierarchy analysis are the same in both: General tags and multidisciplinary journals have a significantly larger number of neighbours compared to more specific tags and specialised journals. Furthermore, closely related tags co-appear more often compared to unrelated tags, as journals focusing on the same field cite each other more often compared to journals dealing with independent disciplines. Based on this, the hierarchy obtained from the journal citation network in this approach is expected to be organised according to the scope of the journals, with the most general multidisciplinary journals at the top and the very specialised journals at the bottom.
In Fig.3. we show the top of the obtained nested hierarchy between the journals, with Nature appearing as the root, while PNAS, Science, New Scientist and Astrophysics Journal are forming the second level. Several prominent field specific journals such as Physical Review Letters, Brain Research, Ecology and Journal of the American Chemical Society are having both Nature and Science as direct ancestors. Interestingly, the Astrophysics Journal is a direct descendant only of Nature, and is not linked under Science, nor PNAS. Nevertheless, it serves as a local root for a branch of astronomy re-lated journals, in a similar fashion to Physical Review Letters, which can be regarded as the local root of physics journals, or Journal of the American Chemical Society, corresponding to the local root of chemical journals. The biological, medical and biochemical journals are forming a rather mingled branch under PNAS, with Journal of Biological Chemistry as the local root and New England Journal of Medicine corresponding to a sub-root for medical journals. However, Cell and New England Journal of Medicine are direct descendants of Nature and Science as well. Interestingly, the brainand neuroscience related journals are forming a rather well separated branch with Brain Research as the local root, linked directly under PNAS, Science and also under Nature.

Comparing the hierarchies
Although the hierarchies presented in Fig.2. and Fig.3. show a great deal of similarity, some interesting differences can also be observed. The figures are showing the top of the corresponding hierarchies, and seemingly, a significant part of the journals ranked high in the hierarchy are the same in both cases. However, the root of the hierarchies are different (Science in case of the flow hierarchy and Nature in case of the nested hierarchy), and also, the level by level comparison of Fig.2. and Fig.3. shows that a very high position in the flow hierarchy is not always accompanied by an outstanding position in the nested hierarchy, and vice versa. E.g., Lancet and New England Journal of Medicine appear much higher in Fig.2 compared to Fig.3, while Geophysical Research Letters is just below Nature and Science in the nested hierarchy and is not even shown in the top of the flow hierarchy.
To make the comparison between the two types of hierarchies more quantitative, we subsequently aggregated the levels in the hierarchies starting from the top, and calculated the Jaccard similarity coefficient between the resulting sets as a function of the level depth ℓ. Thus, when ℓ = 1, we are actually comparing the roots, when ℓ = 2, the journals on the top 2 levels, etc. However, since the total number of levels in the hierarchies are different, we refine the definition of the similarity coefficient by allowing different ℓ values in the two hierarchies, and always choosing the pairs of aggregated sets with the maximal relative overlap. Therefore, we have actually two similarity functions, where S f (ℓ f ) and S n (ℓ n ) denote the set of aggregated journals from the root to level ℓ f in the flow hierarchy and to level ℓ n in the nested hierarchy, respectively. In Fig.4. we show the obtained result for J f (ℓ f ) as a function of the level depth ℓ f in the flow hierarchy, (while the corresponding J n (ℓ n ) plot for the nested hierarchy is given in Fig.S4. in the Supplementary Information S3.). Beside J f (ℓ f ), in Fig.4. we also plotted the expected similarity between the aggregated sets of journals and a random set of journals of the same size. Since the roots of the hierarchies are different, the curves are starting from zero at ℓ f = 1, and naturally, as we reach to the maximal level depth, the similarity is approaching to one, since all journals are included in the final aggregate. However, at the top levels below the root, a prominent increase can be observed in the J f (ℓ f ), while the similarity between random sets of journals is increasing only very slowly in this region. Thus, the flow hierarchy and the nested hierarchy revealed by our methods show a significant similarity also from the quantitative point of view. This is also supported by the remarkably small τ = 0.16 generalised Kendall-tau distance obtained by treating the two hierarchies as partial orders, and applying a natural extension of the standard distance measure between total orders. The definition of the distance measure and the details of the calculation are given in the Supplementary Information S4.

Discussion
Providing an objective ranking of scientific journals and mapping them into different knowledge domains are very complex problems of high relevance, with a number of different approaches (and critics) [51,52,53,54,55,56,57,58,59,60]. In this light, our results are offering an informative overview on the intricate relations between journals, where instead of e.g., simply ranking them according to a one dimensional parameter, we organise them into multiple hierarchies. On the one hand, the flow hierarchy based on the m-reaching centrality in the citation network can help pinpointing the most influential information spreaders in the network. In this respect Science turns out to be the root, followed by Nature and PNAS, and the top dozen levels of the hierarchy are dominated by multidisciplinary, biological, biochemical and medical journals. On the other hand, the nested hierarchy obtained by generalising a recent tag hierarchy extraction algorithm is organising the journals into branches according to the major scientific fields, with a clear separation between unrelated fields, and relatively strong mixing and overlap between closely related fields. In this case Nature is on the top, with Science, PNAS, the Astrophysics Journal and New Scientist forming the 2 nd level, and the field dependent branches starting at the 3 d level. The organisation of these branches roughly highlights the local hierarchy of the given field, with usually the most prominent journal in the field serving as the local root, and more specialised journals positioned at the bottom.
The comparison between the two types of hierarchy reveals a strong similarity accompanied by significant differences. Basically Science, Nature and PNAS provide the top 3 journals in both cases, and also, the top few hundred nodes in the hierarchy have a far larger overlap than expected at random. However, a closer level by level inspection showed that a very high position in e.g., the flow hierarchy does not guarantee a similarly outstanding ranking in the nested hierarchy, and vice versa. Thus, the two hierarchies together are providing a compound view of the ranking of scientific journals that cannot be accessed by simply taking either only the flow, or only the nested hierarchy approach.

Methods
The nested hierarchy extraction algorithm Our algorithm is corresponding to a generalised version of "Algorithm B" presented in [29]. The main differences are that here we force the algorithm to produce a directed acyclic graph consisting of a single connected component, and we allow the presence of multiple direct ancestors. In contrast, in its original form "Algorithm B" can provide disconnected components, and each component in the output is corresponding to a directed tree. A further technical improvement we introduce is given by the calculation of the node centralities. Thus, the outline of the method used here is the following: First we carry out "Algorithm B" given in [29] with modified centrality evaluation, obtaining a directed tree between the journals. This is followed by a second iteration where we "enrich" the hierarchy by occasionally assigning further direct ancestors to the nodes.
Since 'Algorithm B" is presented in full details in [29], here we provide only a brief overview. The input of the algorithm is a weighted directed network between the journals based on the z-score for the citation links. After throwing away unimportant connections by using a weight threshold, the node centralities are evaluated in the remaining network. Here we used a centrality based on random walks on the citation network between journals with occasional teleportation steps, in a similar fashion to PageRank. We adopted the method proposed in [61], calculating the dominant right eigenvector of the matrix M ij = (1 − α)w ij + αs in i , where w ij is the link weight, (z-score), s in i denotes the in strength of journal i (in number of citations), and α is corresponding to the teleportation probability. We have chosen the widely used α = 0.15 parameter value, however, the ordering of the journals according to the centralities was quite robust with respect to changes in α.
Based on the centralities a directed tree representing the backbone of the hierarchy is built from bottom up as described in "Algorithm B" in [29]. In case we cannot find a suitable "parent" for node i according to the original rules, here we chose the node with the highest accumulated z-score from all journals having a higher centrality than i, (where the accumulation is running over the already found descendants of the given node). This ensures the emergence of a single connected component, since a single direct ancestor is assigned to every node, (except for the root of the tree). This is followed by a final iteration over the nodes where we examine whether further "parents" have to be assigned or not. The criteria for accepting a node as the 2 nd , 3 d , etc. direct ancestor of journal i are that it must have a higher centrality compared to i, and also the z-score has to be larger than the z-score between i and its first direct ancestor. (Note that the first parent is chosen based on aggregated z-score instead of the simple pairwise z-score, as explained in [29]).

Additional information
Supplementary information accompanies this paper. The authors declare no competing financial interest.
Supplementary Information S1 Setting the parameter m in the flow hierarchy analysis The maximal allowed path length in the calculation of the m-reaching centrality, denoted by m, is an important parameter of our approach for analysing the flow hierarchy between scientific journals. Since the citation network is random and rather dense, the small world effect takes place: the average distance is low between pairs of papers that can be reached from one to the other following the citations. Thus, the number of reachable articles from a given paper or a given journal saturates rather fast as a function of m. This effect is shown in Fig.S1. The two main reasons for the saturation effect are the exponential increase of the number reachable papers as a function of m, and the finite system size. Since the saturation effect takes place at different m values for the different journals, in order to provide a fair comparison between their influence based on the "information spreading ability", we should take an m value below the saturation of all journals, that is an m value below m < 4. When m > 4, the already saturated journals have a disadvantage, as their m-reach is already starting to be affected by the finite system size, while the not yet saturated journals do not suffer from this problem.
Keeping m smaller than m = 4 is also consistent with the general intuition about the spread of information on the citation network: a direct citation is usually corresponding to a strong interrelation between the two papers, which are likely to be focused on the same field. However, as we increase the distance between the papers in the citation network, the relatedness between them usually drops, e.g., a pair of papers 4 citation steps away from each other can very easily belong to absolutely different fields.
Based on the above, we have chosen to set m to m = 3 in the flow hierarchy analysis outlined in the main paper. According to Fig.S1., on the one hand this way we avoid the saturation effect present at m ≥ 4 values. On the other hand, we also allow multiple steps in the information spread over the system, with a limited path length where we can still assume at least a weak relatedness between the papers at the opposite end of the citation path. A further advantage of this choice is that variance of the C m (J ) values is significantly larger at m = 3 compared to e.g., m = 5, thus, providing a ranking between the journals based on C m (J ) is much more robust at m = 3.

S2 Aggregated citation network
An alternative option for analysing the hierarchical relations of scientific journals based on publication data is to first construct a citation network between journals instead of individual papers, and in the next step apply the hierarchy related methods on the level of this aggregated network. The weights of the directed links between the journals in this framework are corresponding to the accumulated number of papers appearing in the "target" journal citing at least one paper appearing in the "source" journal. The advantage of this approach is that journals are represented by single nodes in the obtained network instead of groups of nodes as in case of the citation network between papers. However, a considerable drawback is that scientific citation networks have a memory [32]: e.g., a paper citing mostly biological articles is likely to be cited mainly by biological papers as well.
When calculating e.g., the reaching centrality of a journal in the aggregated network we neglect this memory effect, and thus, the result can show large deviations compared to the value obtained in the original citation network between papers. This effect is also very closely related to the distortions that can be caused by time aggregation in temporal networks pointed out in Ref. [33]. Nevertheless, it is still worth analysing the hierarchical properties of the aggregated citation network between journals for comparison with the results shown in Fig.2. in the main paper, with bearing in mind that the reaching centralities obtained here are somewhat distorted.
In order to concentrate only on the highly significant connections between the journals, we applied a weight threshold, w * , taking into account only the links with a weight w > w * . The weights of the links are distributed according to a power-law, inferring no plausible threshold by simply studying their distribution. Therefore, the final threshold was chosen so that the extent of hierarchy in the resulting network be maximal. A natural measure for the hierarchy is given by the Global Reaching Centrality (GRC) [22], reflecting the inhomogeneity of the reach of the individual nodes. The mathematical definition of the GRC is given by where max C (m) R (i) is the largest centrality in the network and N is the number of nodes.
Based on the above, we rejected links with a weight lower than w * = K w w where w denotes the average weight, allowing for different values of K w . Afterwards, the centralisation of the m-reach centrality was calculated according to (S1). In Fig.S2. we show the obtained GRC m for m = 3 as a function of K w , with a clear global maximum at K w = 10. Thus, we applied this value in the investigation of the flow hierarchy at the level of the aggregated network between journals.
In Fig.S3. we show the top journals according to the reaching centrality within m = 3 steps. Similarly to Fig.2. in the main paper, the hierarchy levels are obtained by aggregating the journals into subsets with a standard deviation of C m smaller than 0.013 · σ(C m ), where σ(C m ) denotes the standard deviation of C m over all journals. According to Fig.S3., Science is the most influential journal according to the flow hierarchy, followed by Nature, with PNAS and Physical Review Letters are forming the 3 d level. Interestingly, physical journals dominate the next few levels, with Physical Review A on the 4 th level, Journal of Applied Physics on the 5 th level and  Figure S2: The m-reaching centrality in networks obtained by different edgeweight cutoffs. After the filtering of the edges, only those with a weight larger than K w w are kept together with the corresponding nodes. The inset shows the number of nodes and edges as a function of the cutoff threshold.
Physical Review B and Physical Review E providing the 6 th level, followed by Applied Physics Letters on the 7 th level.
This tendency is rather different from the results obtained from the citation network between individual papers (Fig.2. in the main paper), where medical, biological and biochemical journals occupied the top of the hierarchy. A plausible explanation is that when collapsing all the papers appearing in a given journal into a single node, we loose the information about the number of publications appearing in the journal. Since medical, biological and biochemical papers tend to cite mainly within these three fields, the reach of related journals is strongly reduced when switching from the network on the level of publications to the network between journals: The very high publication rate of these journals provides a high reach in the original network between papers, while the collapse of the vast number of papers appearing in these journals onto a single node in the aggregated network cancels out this effect. In contrast, papers appearing in physical journals have a somewhat larger likelihood for citing publications from other fields, thus, the aggregation of the papers into nodes representing journals does not have such a drastic effect on the reach.

S3 Jaccard similarity
In the main paper we are comparing the flow-and the nested hierarchies based on the Jaccard similarity between the sets of aggregated journals from the root down to a certain level ℓ. However, since the number of levels in the flow-and the nested hierarchy are different, we need to introduce actually a separate similarity measure for each hierarchy, as given in Eqs. (2)(3) in the main paper. In Fig.4. in the main paper we displayed J f (ℓ f ) for the flow hierarchy, here in Fig.S4. we show the corresponding J n (ℓ n ) for the nested hierarchy. The behaviour is very similar to that of the J f (ℓ f ) measure.

S4 Comparing the hierarchies by the Kendall-tau distance
The studied flow-and the nested hierarchies can be compared also according to partial order distance measures. The basic idea is to first map the given hierarchy onto a partial order, given by a domain of candidates C and a relation κ obeying the following conditions: • κ is irreflexive, i.e., ∀x ∈ C x ≺ κ x, • κ is asymmetric, i.e., x ≺ κ y ⇒ y ≺ κ x • κ is transitive, i.e., x ≺ κ y ∧ y ≺ κ z ⇒ x ≺ κ z.
The intuitive interpretation of the relation x ≺ κ y is that x is ranked before y, or x is preferred over y. A pair of candidates are unrelated (incomparable) if (x ≺ κ y) ∧ (y ≺ κ x). Naturally, the journals are corresponding to the candidates, and a given journal is ranked before all of its descendants in the hierarchy. However, in case of the flow hierarchy only the levels of the hierarchy are given, the ancestor-descendant relations between the journals are not specified. Thus, the flow hierarchy is actually corresponding to a bucket order, where the "buckets" are given by the hierarchy levels, and we assume that a journal in a given bucket is preceding all journals in lower buckets, and journals in the same bucket are all equal to each other. The Kendall-tau distance measure was originally defined for total orders, where all pairs of candidates are comparable. In this case the distance measure is corresponding to the number of inversions needed to convert one total order to the other one. It can be normalised by dividing by the total number of relations, resulting in a value between 0 and 1. In contrast to similarity measures, for an identical pair of total orders, the Kendall-tau distance is 0, while for maximally different total orders it is 1. Here we adapt this concept to the problem of comparing a bucket order (the flow hierarchy) and a partial order (the nested hierarchy).
The basic idea is to iterate over all possible pairs of journals and compare their ordering in the bucket order and in the partial order. Whenever we observe a mismatch between the two ordering, we increase the distance score D by one. The detailed rules for updating D are the following: • D is left unchanged if x ≺ y according to the both the bucket order and the partial order, -x and y are unrelated according to the partial order, (are on different branches)

• D is increased by one in all other cases, that is
if x ≺ y according to the bucket order and y ≺ x according to the partial order, if x ≡ y according to the bucket order, while x ≺ y or y ≺ x according to the partial order.
In order to normalise D we have to divide the obtained result by the maximal number of possible mismatches, which is given by the total number of comparable pairs in the partial order. (I.e., if the given pair is unrelated, D is left unchanged irrespectively to the ordering in the bucket order). By applying the above comparison method, our result for the normalised Kendall-tau distance between the flow hierarchy and the nested hierarchy is D = 0.1594. For comparison we also calculated the mean distance between randomised hierarchies. The randomisation was carried out by simply swapping pairs of journals in a given hierarchy at random, by keeping the structure of the hierarchy (number of levels, number of nodes in a level, etc.) fixed. The average distance and standard deviation was given by D rand = 0.8021 ± 0.0169. Thus, the examined two hierarchies are significantly closer to each other than expected at random, i.e., the z-score for the distance is −38.03626.