Introduction

A fundamental feature of a human society is that its individuals possess all kinds of interests, the driving force of many human behaviors. Some interests may last for a lifetime while others can fade away in short time. From time to time our interests also change. In the modern society that we live in, all kinds of attractions and temptations emerge and disappear on a daily basis. Does this mean that the evolution of our interest is mostly random? Or are there intrinsic dynamical rules that govern how human interests evolve with time? To answer these questions was deemed to be extremely difficult, due to the lack of appropriate means to characterize human mind and to measure quantitatively how it changes with time. Yet the questions are fundamental in science and any revelation of the dynamics of human interest may have significant applications in commerce, medical sciences and even defense. In particular, in commerce, adequate knowledge of customer interests and how they change with time are key to the success of many businesses as such knowledge can be of tremendous value to advertisement design and product promotion. In psychiatry, a good understanding of patients' interests may help generate accurate diagnosis and devise effective therapeutic approaches. In defense, timely and reliable assessment of certain group or individuals' interests and their time evolution can help predict the group or individuals' possible future behaviors and actions. Apparently, all these rely on human-interest dynamics' being not completely random.

There have been efforts in modeling and understanding human behaviors that are essential to many social and economical phenomena, with significant applications in areas ranging from resource allocation and transportation control to epidemic prediction and personal recommendation1,2,3,4. The pursuit has been facilitated greatly by the advances in information technology, especially by the availability of massive Internet data and resources5. However, to probe into human-interest dynamics is more challenging, due to the difficulty in characterizing human interests and traditional lack of data sets from which the underlying dynamical processes may be deduced. In recent years “Big Data” sets, such as those from e-commerce or mobile-phone communications, become commonly available, making it possible to quantify human interests and to infer their intrinsic dynamics. As a branch of the science of “Big Data”, the field of human-interest dynamics is at its infancy.

A viable approach to probing into human-interest dynamics is to use data analysis as a getaway to uncover various phenomena and possible scaling laws. Guided by this principle, in this paper we explore two e-commerce data sets (Douban, Taobao) and one communication data set [Mobile-Phone Reading (MPR)] and focus on three issues: statistical distribution of the time that an interest lasts, distribution of the return time to revisiting a particular interest and interest ranking and transition. Considering the large number of factors that can affect human interest, such as the specific activity contents and distractions of the individual's attention, it seems plausible that the underlying dynamics be completely random6,7,8. Indeed, a widely used assumption is that of the Markovian type of dynamics for individuals' online behaviors, in which an online user's next action depends not on his/her history of interests but on the current interest only9,10,11. However, there is recent evidence12,13 of deviations from the Markovian dynamics. Our systematic analysis of the three data sets reveals an unequivocal signature of the fat-tailed scaling behavior characteristic of non-equilibrium complex systems and, consequently, indicates the existence of intrinsic dynamical rules governing the human-interest dynamics. Based on the empirical analysis, we identify three basic ingredients underlying the dynamics: preferential return, inertial effect and exploration. A mathematical model incorporating these ingredients is then developed to account for the observed fat-tailed scaling behaviors. Our study represents the first systematic attempt to probe into the dynamics of human interest and we expect our finding and model to have broad applications.

We note that, in the study of human behaviors, heavy-tailed type of statistical features, e.g., those in the inter-event time distributions14,15,16,17,18,19, have been uncovered recently. Such a non-Poisson type of distribution implies, e.g., that the bursts of rapidly occurring events are typically separated by long periods of inactivity. Various mechanisms have been proposed to explain the heavy-tailed inter-event statistics, such as the highest-priority-first queue model14,20, Poisson probability model21,22, varying interest23, memory effects24 and human interactions19,25,26. Non-Poisson, heavy-tailed type of statistics also arise in human mobility trajectories27,28,29 and mathematical models have been proposed to account for the non-Markovian type of dynamics underlying the human mobility, such as those based on exploration and preferential return30, hierarchy of traffic systems31 and regular mobility32. Variances in the statistical behaviors of human mobility were also reported33,34,35. The distinct feature of our work is its focus on human-interest dynamics.

Results

We analyze three massive data sets: two from e-commerce, namely, Douban and Taobao and one from mobile-communication, i.e., MPR. We focus on the scaling of three quantities: (1) the time interval l that an individual stays within the same interest, defined as the length of a sequence of clicks within the same interest category (defined in Methods), (2) the time interval τ that an individual returns to visiting the same interest category, defined as the sequence of clicks between two visits to the same interest, representing a kind of memory effect in the dynamics of interest and (3) the frequencies of visit of an individual to different interests, which can be used to rank this individual's particular interests.

Fat-tailed distribution of interest interval l

A number of approaches have been proposed to characterize an individual's interests, such as the interest profile36, contextual information37, distinct visited subpages38 and service items39. Taking advantage of the nature of our large data sets, we use categories to characterize an individual's interests, which can be, for example, music, books and movies on Douban, clothing, footwear and toys in Taobao, love stories and science fictions on MPR and so on. Figure 1(a) shows, for a typical individual on Douban, the distribution P(l) of l visiting different interest categories, which exhibits a fat-tailed distribution: P(l) ~ l−α. The long tail associated with the scaling indicates that the individual tends to spend an abnormally long time visiting certain interests during browsing. Similar scaling behaviors have been found for users on Taobao and MPR, as shown in Figs. 1(b) and 1(c), respectively. A typical sequence that the values of l corresponding to an identical interest appear is shown in Fig. 1(d). From Fig. 1(d), we observe a highly non-uniform behavior in the values of l, which gives rise to the fat-tailed distribution in Fig. 1(a). We have examined many individuals from the three data sets and found similar behaviors. In fact, the distribution of l for all users from any particular data set exhibits a robust fat-tailed distribution (Fig. S1 in Supplementary Information). The scaling observed for all cases implies substantial derivation of the human-interest dynamics from that of the Markovian process (associated with the transition probability matrix for interests) for which the scaling of l would be exponential40.

Figure 1
figure 1

Distribution of interest-dwelling time.

(a–c) Probability distributions P(l) of the time interval l of consecutive visits to the same interest for three representative individuals, each from one of the three data sets (Douban, Taobao and MPR), where the numbers of interests are 3, 24 and 44, respectively. The numbers of clicks (Na) for the three cases are 18396, 106571 and 4398, respectively. The three distributions can be fitted as P(l) ~ l−α, with exponents α ≈ 1.16, 4.02 and 3.35, respectively (the values of the exponent α are estimated using the maximum-likelihood criterion63). Panel (d) shows the various values of l as they appear with time, where n is the event index (an integer variable).

Memory effect in human-interest dynamics

Memory, as one of the key attributes of human being, has been widely studied in the past23,24,35,41,42,43,44. We observe from our data sets that, often, an individual tends to return to specific interests that he/she has recently visited with relatively higher probabilities than those visited long ago. For example, even when an interest had been visited many times in the past, if the most recent visit dates back one year or longer, the probability of revisiting is lower as compared with that associated with another interest that was visited merely a week ago. But would the probability that an interest is revisited after a very long time be exponentially small? To answer this question, we calculate the distribution of the return time44 τ, the time interval that an individual revisits the same interest after the last visit. Typical distributions from three individuals, one from each data base, are shown in Figs. 2(a–c), which can again be well fitted by fat-tailed distributions: P(τ) ~ τ−β, with the exponent β. While P(τ) is higher for small values of τ, the probability of the occurrence of very large values of τ is, surprisingly, not exponentially small, indicating that such events can indeed occur. An important implication is that, both short-term and long-term memories can shape the human-interest dynamics. Similar results are obtained for many other users (Fig. S5 in Supplementary Information). Additionally, the distribution of τ for all users from any particular data set exhibits a fat-tailed distribution (Fig. S1 in Supplementary Information).

Figure 2
figure 2

Memory effect of human interest dynamics.

(a–c) For the data sets in Figs. 1(a–c), respectively, fat-tailed distributions (τ−β) of the time τ taken to revisit the same interest. The values of the fitted exponent β are approximately 1.58, 2.04 and 1.41 for (a–c), respectively.

Interest ranking and transition among interests

An individual can possess a number of interests, which can be ranked in terms of the respective frequencies of visit. In a given (large) time interval, an individual can focus on different interests, giving rise to a kind of “transition” among the interests. The interest ranking and transition are important not only for the study of human dynamics14,30 and decision-making45,46, but also for applications such as behavior prediction and search-algorithm design.

A convenient way to assess the interest-transition pattern for an individual is to use a network representation, where nodes denote different interests with sizes determined by their ranks, links correspond to the observed transitions among the interests and the dwelling time in any particular interest is represented by a self loop. Similar network representations have also been used in other contexts such as transportation dynamics47, citations48 and human-mobility behaviors49. Figures 3(a–c) show examples of the transition networks of one typical individual from each of the three data sets, respectively. Setting the most frequently visited interest to have rank r = 1 and the successively less frequently visited interests to have ranks r = 2, 3 and so on, we can generate a distribution of the interest rank for each individual, examples of which are shown in Figs. 3(d–f). In all cases, such a rank distribution can be approximately fitted by the following exponentially truncated fat-tailed distribution: fr = r−γ exp (−r/S), where S is the number of distinct interests that the individual has selected. Note that this truncated fat-tailed distribution is with respect to an individual. When the collective behavior of a large number of individuals is considered, the signature of the exponential truncation diminishes and the scaling of fr can be better fitted by a fat-tailed distribution (see Fig. S1 in Supplementary Information). This is similar to the fat-tailed ranking distribution observed in the collective human-mobility patterns28,30,50 where the distribution is with respect to the actual locations that the individual visits physically.

Figure 3
figure 3

Interest-transition network and transition probabilities.

(a–c) For the three individuals represented in Figs. 1(a–c), the respective transition networks, where nodes correspond to distinct interests, a self loop represents the dwelling time in the same interest category and the weighted links characterize the interest transitions. A few highly frequently visited interests are marked. (d–f) Truncated fat-tailed in the rank distribution: fr r−γ exp (−r/S), where the fitted values of the exponent γ and the numbers of interests are (γ, S) = (0.89, 24) (panel (e), Taobao) and (γ, S) = (1.39, 44) (panel (f), MPR) (The dashed line in Fig. 3(d) is for eye guide). (g–i) Two-dimensional representation of the interest-transition probabilities for the three networks in (a–c), respectively. The probabilities are represented on a logarithmic scale; see side bars.

Model of human-interest dynamics

To gain insights into the development of a quantitative model describing the dynamics of human interest, we study the transition pattern of any individual among interests, which can be characterized by the probability for transitional events to take place between interests i and j, defined as , where n(i, j) is the number of switchings from interest i to j. Examples of the transition probabilities, those corresponding to the respective transition networks in Figs. 3(a–c), are shown in Figs. 3(g–i) in the two-dimensional representation of i and j. We observe two key features: (i) p(i, j) exhibits relatively large values for transitions among the highly ranked interests (note that r = 1 corresponds to the highest ranked interest) and (ii) the diagonal elements p(i, i) have relatively large values as well. The first feature suggests a kind of preferential selection12,30,51,52,53 of interests: individuals tend to return to highly ranked interests with relatively larger probabilities and stay in these interests. The second feature indicates an inertial effect: an individual tends to stay in the interest that he/she has already been exploring. These two ingredients, preferential return and inertia, plus an individual's desire to explore new interest, are the basic ingredients underlying the human-interest dynamics, based on which a phenomenological model can be developed.

A schematic illustration of our model is shown in Fig. 4(a). To initiate the dynamical evolution of interest, an individual has two options: exploration of new interest or return to one of the previously visited interests, with probability ρn−λ and 1 − ρn−λ, respectively, where 0 < ρ ≤ 1 and λ > 0 are parameters30,44 and n denotes the number of hopping-events among different interests, which is obtained by merging the same interest in click-event series into one. For example, the click-event series 1, 1, 2, 2, 2, 1, 3 with 7 actions can be transformed into the following hopping-event series: 1, 2, 1, 3, where n = 4. In the exploration state, individual visits a new interest and continuously browses the same interest, due to the effect of inertia. At a “microscopic” level, inertial browsing can be regarded as an excited random-walk process (ERW)54. If the individual returns to a set of previous revisited interests, he/she preferentially selects an interest category to browse according to the prior probability of visit to the same interest. Once a particular interest is chosen, the inertial effect sets in and the individual has the tendency to stay in the same interest category. The microscopic browsing behavior again can be modeled by an excited random-walk process. A detailed mathematical analysis of the model in Fig. 4(a) can be found in Supplementary Information. Examples of the predicted scaling relations are illustrated in Figs. 4(b–d) (with more examples in Supplementary Information), which are consistent with those uncovered from real data as exemplified in Figs. 1,2,3.

Figure 4
figure 4

Proposed model of human-interest dynamics and predicted scaling relations.

(a) Schematic illustration of the model, where an individual can enter one of the two dynamically complementary states at each hopping step: exploring new interests with the probability ρn−λ (the state of “Exploration”, the white circles representing available new interests) or returning preferentially to a previously explored interest with the probability 1 − ρn−λ (the state of “Preferential return”, the circles of different colors illustrating those visited interests, with the size corresponding to their frequencies to be visited by users). Regardless of which state takes place, as one interest is selected, an inertial effect was triggered, which can be modeled as an excited random walk (ERW)54. (b, c) Fat-tailed distribution of P(l) and P(τ), respectively. (d, e) Predicted interest-ranking distribution and transition-probability pattern, respectively. These results are obtained from model simulations where the number of agents in each case is 1000, for the parameter setting of λ = 0.4 and ρ = 0.6. For P(l), analytic result can be derived: P(l) ~ l−(2−ζ), where ζ and 1 − ζ are the probabilities of moving towards the “right” or the “left”, respectively. In (b–d), three values of ζ are used: ζ = 0.4, ζ = 0.5 and ζ = 0.6. In (e), the value of ζ is 0.5.

Discussion

Despite recent efforts in human-mobility dynamics14,15,16,17,18,19, little is known about human-interest dynamics. We aim to explore the fundamental mechanisms underpinning the human-interest dynamics through a completely data-driven approach. In particular, we have analyzed three large-scale data sets: two from e-commerce and one from mobile communication and uncovered the emergence of fat-tailed behaviors in a number of fundamental quantities. These are the interval l to stay in an interest, the time interval τ to return to a previously visited interest and the interest-ranking distribution. A detailed analysis of the patterns of the transition probabilities among different interests suggests preferential return, inertia and exploration as the three basic dynamical ingredients underlying the human-interest dynamics, enabling us to construct a phenomenological, random-walk based model. The model captures the essential features of the human-interest dynamics in that it is constructed based on generic ingredients extracted from real data and it is capable of reproducing the scaling laws observed from data. The model, however, may still be idealized as it cannot predict the scaling exponents. To develop a more predictive model, additional effects must be included, such as individual's memory effect12,24,35, cognitive activities45,53 and the specific web categories, etc. Nonetheless, the current model provides a phenomenological framework where the basic properties and scaling behaviors associated with human-interest dynamics can be explained.

The fat-tailed distributions uncovered from data and the dynamical model developed accordingly can be applied to addressing significant problems ranging from human-behavior prediction and the design of search algorithms30,55 to controlling spreading dynamics56,57. As a demonstration, we have quantified the degree of predictability of user-behavior patterns underlying the three data sets by using the statistical measures of entropy and Fano inequality30, with the result that such patterns are in fact quite predictable, despite the apparent randomness in the human-interest dynamics (see Supplementary Information).

Methods

Data collection

The massive data sets used in this article are from large-scale real e-commerce and communication systems: Douban, Taobao and MPR. For fair comparison, in each data set we focus on users who performed at least 100 actions. Data description and basic statistical properties are listed in Table I.

Table 1 Basic parameters of the three massive data sets studied in this paper

(i) Douban

The experimental data set is randomly sampled from Douban, a major e-commerce company in China. It is similar to the Social Networking Services (SNS) that allows registered users to record information and create contents related to movies, books and music, etc., yet it can also make personalized recommendations for the registered users. In this data set, we select 21,148 individuals, each executing at least 100 rating actions, from which we can find historical information about the users, such as user ID, item ID, rate, timestamps and item types (considered as interest types), etc. The sampling time resolution is one second.

(ii) Taobao

The Chinese web site Taobao is one of the world's largest electronic marketplaces. The browsing behaviors of users on Taobao are recorded and any user can browse and trade with any other users. Our data is composed of all browsing behaviors of 34,330 users, each browsing more than 100 items in the time span between September 1 and October 28, 2011. For each user, information is available such as the user ID, item ID, item classes (regarded as interest types), timestamps, etc. The sampling time resolution is one second.

(iii) MPR

a widely used electronic reading tool. The usage of such a mobile service reflects well customers' interests. We collected the reading records of 19,067 users, each performing more than 100 reading tasks between October 1 and October 31, 2011. The categories of books that each reader chose are regarded as interests. The sampling time resolution is one day.

Definition of length of interest interval l

Previous studies defined session as a sequence of Web pages viewed by a user within a given time window, which has been widely used in modeling and tracking individuals' navigation behaviors52,58,59,60,61. However, for characterizing human interest, this definition of session has two deficiencies: (1) difficulty to split an individual's click sequence into sessions60 due to the continuous nature of the user online activities30,62 and (2) limit in the data sets, due to the time resolution of MPR (day). Thus, we define the interest duration l as the length of a sequence of clicks within the same interest category.