Uncovering and Predicting the Dynamic Process of Collective Attention with Survival Theory

The subject of collective attention is in the center of this era of information explosion. It is thus of great interest to understand the fundamental mechanism underlying attention in large populations within a complex evolving system. Moreover, an ability to predict the dynamic process of collective attention for individual items has important implications in an array of areas. In this report, we propose a generative probabilistic model using a self-excited Hawkes process with survival theory to model and predict the process through which individual items gain their attentions. This model explicitly captures three key ingredients: the intrinsic attractiveness of an item, characterizing its inherent competitiveness against other items; a reinforcement mechanism based on sum of each previous attention triggers; and a power-law temporal relaxation function, corresponding to the aging in the ability to attract new attentions. Experiments on two population-scale datasets demonstrate that this model consistently outperforms the state-of-the-art methods.


Datasets
The APS dataset used in this report comprises the papers published in all the journals in American Physical Society (APS) from 1893 to 2009, consisting of 245,365 authors, 463,344 papers, and 4,692,026 citations (Table 1). For each paper, the dataset includes title, DOI, author name, institute, printed time, received time, references, PACS code and so on 1 . Basic statistics of the APS dataset is reported in Table 1. The WEIBO dataset is a benchmark dataset that was released as a task of the 13th International Conference on Web Information System Engineering (WISE 2012 Challenge) 2 . The dataset contains crawled users and forwarding behaviors between Aug 24, 2009 and Dec 31, 2011 from a Chinese social media website Sina Weibo 3 . In this report, we select messages that was originally posted to Sina Weibo between July 1, 2011 and July 31, 2011. We cleaned the data by removing inactive users and unpopular messages. We also removed spam users who abnormally forward a single message for hundreds of times. To alleviate the effect from activity pattern of users, we only consider the messages posted between 10am and 10pm per day, which is the active period in Sina Weibo system. There are 2.6 million messages. For each message, we collect its forwarding information between July 1, 2011 and August 31, 2011. Basic statistics of the WEIBO dataset is reported in Table 2.

Maximum likelihood estimation for model parameters
Given the log-likelihood for the dynamics {t k } up to T as For parameter µ, γ, the optimal values can be found by maximizing the loglikelihood in equation (1) using the gradient ascent method. The gradients for each parameter are According to the standard gradient ascent method, update rules at the n-th iteration are shown as follows.
where η 1 and η 2 are the learning rate at each iteration. The algorithm stops when the change in an iteration is small enough.

Comparative models
Following two baseline models are implemented for comparison.
• The WSB model proposed in [1]. Wang et al. employed reinforced Poisson processes, modeling three phenomena: fitness of an item, a log-normal temporal relaxation function and a reinforcement mechanism.
• The SEISMIC model proposed in [2]. Zhao et al. employed a double stochastic process, one accounting for infectiousness and the other one for the arrival time of events. It is the current state of the art in predicting dynamics of popularity.

Incorporating exogenous information
The proposed model is flexible, being able to incorporate exogenous information such as structure features, to improve its accuracy. To show this, we consider the inhomogeneous influence between individuals. Hence the rate function is modified as follows where µ is the intrinsic attractiveness of the item, φ(τ ) is the relaxation function that characterizes the temporal inhomogeneity due to the aging effect, a j is the triggering strength of each subsequent attention, capturing the influence of individuals. Note that we employ the page rank score as the influence of a paper in the APS dataset and the logarithmic of the number of a user's followers in the followship network to represent its influence in the WEIBO dataset.