Introduction

The h-index, proposed by Hirsch1 for evaluating the academic impact of individual researchers, has received wide attention in recent years. The citations received by all papers of a given researcher can be characterized by a citation distribution function, where the y-axis corresponds to the citations received by a paper, whereas the x-axis represents the paper rank arranged in descending order of citations (Fig. 1). The distribution of citation verse paper rank is called the citation distribution function or curve, denoted by , in which the paper receives citations. The h-index was simply defined as 1. The area under the citation distribution curve is divided by the h-index into two parts: those of the h-core2 and the h-tail3. The former is further divided into another two parts: those of excess citations4 and h-squared citations1. As a consequence, the total citations are divided into three different parts: h-squared, excess and h-tail citations (Fig. 1). Indeed, the h-index lacks information for the excess and the h-tail citations, keeping only the citations related to the h-index (). Theoretically, only when is dominant among the three parts, the h-index can properly reflect the academic performance of the scientist under study, otherwise, the h-index leads to biased evaluation. The question that whether the h-index dominates the citations or not depends on the shape of citation distribution function.

Figure 1
figure 1

The citation distribution curve.

The y-axis corresponds to the citations received by a paper, whereas the x-axis represents the paper rank arranged in descending order of citations. The area under the citation distribution curve is divided by the h-index into three parts: , (excess) and (h-tail).

As pointed out by Bornmann and co-workers5, for an isohindex group (scientists having the same h-index), their associated citation distribution functions may display quite different shapes. Therefore, to study how to apply h-index fairly, it is necessary to study the shape of the citation distribution functions and the current study aims to address this question by using a triangle mapping technique. One of the advantages of citation triangle method is that the comparison of different shapes of the citation distribution functions can be performed intuitively. By viewing the distribution of mapping points within the triangle, the shapes of the citation distribution functions can be studied with a perceivable manner. Based on this method, we are able to study the degree with which the h-index is applicable properly. It is hoped that the technique presented here is useful for using the h-index to evaluate academic performance in a more unbiased way.

Results

We here propose a novel triangle mapping technique to study the relations among h-squared, excess and h-tail citations. For a regular triangle, the sum of the distances from any interior point to the three sides is equal to a constant, the height of the triangle. Note that the sum of the percentages for h2, e2 and t2 is also a constant, which equals to 1. Based on this characteristic, percentages for these 3 kinds of citations are mapped onto a point in a regular triangle (Fig. 2A). Refer to the Method section for details.

Figure 2
figure 2

The citation triangle method in studying h-index based citations.

(A) A regular triangle ABC with its height being equal to 1 and center situated at O. A Descartes coordinate system is set up with its origin at O. The three sides of the triangle are denoted by h, e and t and the distances of an interior point to them are equal to H, E and T, respectively. Therefore, the point is the mapping point for the three real numbers H, E and T. (B) The regular triangle is divided into nine smaller regular triangles. The intervals of H, E and T for each of the 9 smaller triangles are shown in Table 1.

First of all, let us consider two concrete examples. According to the citation information provided by Dodson6, , (), () and we find , and . Therefore, the mapping point corresponding to Dodson is situated at the region No. 4, where the h-index is applicable (Fig. 2B). The second example is for the chemist Berni Alder, where (), () and 4 and we find , and . Therefore, the mapping point corresponding to Alder is situated at the region No. 5, where the e-index is absolutely dominant (Fig. 2B). This example shows that Alder's h-index severely under-estimates his academic impact and in this case, the e-index should be used together with the h-index for a fair evaluation4.

In what follows, let us apply the citation triangle method to study the cases of citations of the 100 most prolific economists7. The data used to derive the corresponding h-index, e-index and t-index were kindly provided by Dr. Tol. As a consequence, we calculated the coordinates and of each mapping point corresponding to each economist. The distribution of the 100 points is showed in Fig. 3A. As we can see that only two points are situated at the region No. 3, i.e., an h-index dominant region. Meanwhile, only 11 points (11%) are situated above the horizontal line or , where the h-index can be properly applicable. Accordingly, for the remaining cases (89%), where or , the h-index should be used jointly with the e-index, even the t-index. The average h-index and e-index over the 100 points are 19 and 28.14, respectively, corresponding to the average H and E being 0.26 and 0.48 (). Overall, to have a fair and accurate evaluation, the h-index should be used together with the e-index even the t-index for most of the 100 most prolific economists.

Figure 3
figure 3

Distributions of mapping points in the citation triangle.

(A) The distribution of the 100 mapping points for each of the 100 most prolific economists.Note that only 11 points (11%) are situated at the regions where the h-index can be applicable (), indicating that the h-index should be used jointly with the e-index, even the t-index, for the remaining 89 economists. (B) An example to demonstrate that the power parameter is one of the key factors, which determines the position of the mapping point. Given and starting from the region No. 3 (the h-index dominated region) with , the mapping point moves to the region No. 6 (the e-index dominated region) with . Interestingly, the track of the mapping points forms a clockwise rotating curve.

The h-index captures only the information of the citation function partially. However, the above distribution of the 100 mapping points within the triangle provides more information about the shapes of the corresponding citation functions. For example, the mapping points within the small triangle No. 5 indicate that their citation distribution functions are peaked on the beginning part. On the contrary, the mapping points within the small triangle No. 9 indicate that their citation distribution functions are flat with a long tail. In both cases, the h-index seems not appropriate in capturing the main information of citation function. To complement the h-index, Bormann and co-workers5 introduced three parameters: the upper, center and lower, which correspond to E, H and T, respectively, in this paper. In other words, the triangle mapping technique provides an intuitive representation of the upper, center and lower. Bornmann and co-workers5 studied the shapes of the citation distribution functions of three scientists, A, B and C, belonging to an isohindex group with h = 14. For scientist A, E = 0.82, H = 0.15 and T = 0.03, corresponding to , . Its mapping point is situated at the small triangle No. 5, an e-index absolutely dominated regions. According to Bornmann et al5 and Cole and Cole8, Scientist A is called perfectionist-type scientist, who has rather few but very highly cited publications. For scientist B, E = 0.39, H = 0.48 and T = 0.13, corresponding to , . Its mapping point is situated at the small triangle No. 2, a boundary region between h-index and e-index dominated regions. According to references5,8, Scientist B is called a prolific-type scientist, who publishes a large number of high-impact papers. For scientist C, E = 0.10, H = 0.33 and T = 0.57, corresponding to , . Its mapping point is situated at the small triangle No. 8, a t-index dominated region. Scientist C is called a mass producer5,8, who publishes a larger number of papers that are lowly cited. It can be seen from the above analysis, the locations of the mapping points carry the information of the types of scientists. Therefore, the triangle mapping technique is particularly useful when the academic impact of a large number of scientists is studied. In that case, clustering analysis can be performed based on the mapping point locations and therefore scientists can be classified according to their academic performance.

Recently, Baum introduced a new parameter, called Excess-Tail Ratio9, denoted by, where . Baum found that for most cases he studied, , even Only for few cases, . The shapes of citation distribution functions for are peaked, whereas for the shapes of the citation functions are flat with a long tail. Therefore, the Excess-Tail ratio is an appropriate parameter to capture the overall shapes of the citation functions. According to eq. (12), or corresponds to , or , respectively.

Discussion

In what follows, we want to explore the key factors that determine the shape of the citation distribution function. As previously, we assume a simple mathematical model for the citation distribution curve

The total citations received by N papers, , is

Based on eq. (1), it was shown that4,10

However, we should have which leads to

Meanwhile, we have4

Using eqs. (1)–(5), we find

Therefore, the condition under which the h-index can be dominant should satisfy , or

To have an intuitive picture, we consider some numerical examples as follows. Taking , and letting respectively, we calculate the values of H and E for each case. Using eq. (12), we find 12 mapping points in the triangle, as shown in Fig. 3B. It is interesting to see that with the increase of the value, the track of the mapping points forms a clockwise rotating curve. This example shows that the power parameter is one of the key factors to determine the shape of the citation function. Given and there is a threshold of , when is less than this threshold, the h-index can no longer be properly applicable. In fact, , .

The main contribution of this paper is to propose the citation triangle method, by which the shapes of citation distribution functions can be studied in a perceivable form. Based on the distribution of mapping points, applicability and limitation of the h-index can be studied. Generally, the h-index is not properly applicable in the e-index or t-index dominated regions. In those cases, the h-index should be jointly applied together with the e-index or t-index. The proposed mapping technique provides a platform to study the academic impact of a group of scientists, because some mathematical methods, such as clustering analysis, can be used to study the distribution of mapping points and the academic impact of these scientists can then be classified and compared.

Methods

The h-index was proposed by Hirsch in 20051. The set of h papers of a scientist was called the h-core2, in which at least h citations were received by each of the h papers. The e-index was proposed by Zhang4, which was defined as the square root of excess citations over those used for calculating the h-index. Therefore, the total citations received by the papers in the h-core are equal to . The h-index divides the total citations of a scientist into two parts: the first part is of the h-core, whereas the second one is of the h-tail3. For convenience, we define the square root of citations received by all papers in the h-tail as the t-index. Therefore, the number of total citations received by all papers of a scientist, , is composed of three parts: , and , i.e.,

where h, e and t are the h-, e- and t-index, respectively. Letting

we have

For any regular triangle, the sum of the distances from any interior point to the three sides is equal to the height of the triangle. Consider a regular triangle ABC with its height equal to 1 (Fig. 2A). Let the center of the triangle be denoted by O and an x − y coordinate system is set up as shown in Fig. 2A. Based on eq. (11) and the feature of the regular triangle, the set of three real numbers H, E and T is mapped onto a point P(x, y) within the triangle, as shown in Fig. 2A. Simple calculation shows that

The triangle can be divided into 9 smaller triangles (regions) as shown in Fig. 2B. We denote them by No. 1 through No. 9, respectively. Each region is characterized by a special interval of the three real numbers H, E and T, respectively. For example, at the region No.1, and , indicating that is absolutely dominant at this region as compared with and . Similarly, at the region No. 5, and , indicating that is absolutely dominant as compared with and . At the region No. 9, and , indicating that is absolutely dominant as compared with and . Furthermore, at the region No. 3, , , , so, it is called an h-index dominant region; at the region No. 6, , , , so, it is called an e-index dominant region; and at the region No. 8, , , , so, it is called a t-index dominant region. Finally, the region No. 2 is the boundary region between the h-index and e-index dominant regions, the region No. 4 is the boundary region between the h-index and t-index dominant regions and the region No. 7 is the boundary region between the e-index and t-index dominant regions. The above description has symmetry of a regular triangle. The total description is summarized in Table 1.

Table 1 Intervals of H, E and T for each of the 9 regions (small triangles) within the citation triangle

The three real numbers H, E and T are the percentages of citations associated with the h-, e- and t-index, respectively. In general, H should be greater than 1/3 (or , where the h-index is properly applicable, otherwise, if (or , the h-index under-evaluates the academic impact of the researcher concerned. Therefore, the four regions No.1, No.2, No.3 and No.4 are the regions where the h-index can be properly applied (). The regions No.2, No.5, No.6 and No.7 are the regions where the e-index can be properly applied (), whereas those of No.4, No.7, No.8 and No.9 are the regions where the t-index can be properly applied (. In summary, the h-index can only be properly applied in the regions No.1, No.2, No.3 and No.4 ( or ); and the h-index should be jointly applied together with the e-index or t-index in the remaining regions No. 5 through No.9 ( or ).