Article | Open | Published:

# Popularity and Novelty Dynamics in Evolving Networks

## Abstract

Network science plays a big role in the representation of real-world phenomena such as user-item bipartite networks presented in e-commerce or social media platforms. It provides researchers with tools and techniques to solve complex real-world problems. Identifying and predicting future popularity and importance of items in e-commerce or social media platform is a challenging task. Some items gain popularity repeatedly over time while some become popular and novel only once. This work aims to identify the key-factors: popularity and novelty. To do so, we consider two types of novelty predictions: items appearing in the popular ranking list for the first time; and items which were not in the popular list in the past time window, but might have been popular before the recent past time window. In order to identify the popular items, a careful consideration of macro-level analysis is needed. In this work we propose a model, which exploits item level information over a span of time to rank the importance of the item. We considered ageing or decay effect along with the recent link-gain of the items. We test our proposed model on four various real-world datasets using four information retrieval based metrics.

## Introduction

Online social networking sites and social media platforms enable people to communicate and share different forms of contents or items such as texts, web links, photos and videos. These create a huge amount of data on the interaction between online items and users. Understanding the behaviour of a user in a friendship network in Facebook and/or a following-follower relationship in Twitter or a movie in a movie rating platform such as Netflix is important in marketing and recommendation systems. Network science theories and graph-theoretic frameworks have been successful in solving many real-world problems in industry, social, natural and medical sciences such as information overload problems1,2. The approaches of network science can be used to hypothesis and analyze the relationship ‘among the users or items’ (mono-partite network) and the relationship ‘between users and items’ (bipartite network). These representations of networks are useful in prediction and modelling link formation and network dynamics, which outline how social media items (e.g., news, blog, post, videos, and application downloads, topics in discussion forums and product reviews) are adopted and influenced by their creators.

Even if one has detailed information about the items and the users who share them, it can still be incredibly challenging to predict which item will be popular in future among users3,4. Item popularity is found to be affected by the following features: ‘structure’; ‘content’; ‘early adopters’; and ‘temporal’ feature. It is arguable whether ‘content features’ is useful for the popularity prediction of items. Some researchers4,5 have found ‘content features’ is not useful while others have found it is6. Furthermore, it is found that along with the item features, the underlying network ‘structural features’ such as the number of followers of seed users in Twitter6,7 and Facebook5 is useful in predicting their popularity. It is discussed how popularity of online items exhibit temporal dynamics8,9,10. Among all the features, the ‘temporal features’ is considered as one of the best features for popularity prediction4,11. For example, ‘temporal features’ of early adoption of news articles on Digg (e.g., the number of likes news received during initial one hour) has shown to play an important role in future popularity prediction of online news articles12. It is easy to get the ‘temporal features’ and also they are independent of the item level or network level features. Therefore, models based on ‘temporal features’ are applicable in more applications. A solely temporal feature based models, are applied widely in a variety of areas such as Twitter7,13, citation count14,15 and the occurrence of earthquake16. Because of its generic nature, and that it avoids the cost of feature engineering for prediction, it is also applied in investigating the diffusion of items.

Due to the competition and fitness of the items, not all of them become popular, and only some retain their popularity. In the presence of the information overload problem, identifying these popular and novel items are needed from every aspect of life. It affects every area of daily life such as what item to consume, outcome of election, political discourse, community formation and many more. Web is being used these days for propagating information for their social, informational and consumer needs through vast social networks that extends far beyond the personal relation or even geography. Therefore social network is also playing an important role in dissemination of ideas, purchases and reputations. As people are more affected by their own social networks, therefore, research for novelty as well as popularity in social networks are also an important task among researchers. A few people would view or consume stale information. This is the reason most of the news aggregators, Twitter and Facebook order the content according to newness (novelty) of the item. A very important factor in allocation of user attention is the finite number of items that a user can attend from a recommendation list. In consequence, only top popular items are consumed even though there are potential novel items at the bottom of the list and consequently ends up to skewed popularity distribution17,18. This research presents a model which identifies these potential novel items without any cost of predicting already popular items.

## Results

In parameter learning, the parameters λ and γ in Eq. 9 are accepted, which maximize the precision during 3000 iterations. Only in the case of re-tweet data, the learned parameter is different for every individual retweet. In other cases, we took an average of the parameter values for all the items, as the nature of the data does not support learning for individual items. Furthermore, in this study we compare the performance of the proposed model to three well-known models (Popularity Based Predictor19 (PBP), Degree (Preferential Attachment) and Reinforced Poisson Process Model (RPPM)) by analyzing the sensitivity of the models. Since RPPM learns the parameter from initial adoption history of items so the re-tweet data are used to test its performance.

### Results for varying top k list size

In order to get and compare the accuracy results for varying size of top k items in the popular list (shown in Fig. 1), we have used the following four information retrieval metrics: (a) Novelty (Q): quantifies the objects which enter in top popularity list for the first time (an absolute novelty); (2) Temporal Novelty (TN): reflects the ability to predict the objects which did not gain popularity in the past time window but they appear in the top popular list in future; (3) Precision (P): the fraction of correctly predicted objects using the top 100 popular objects; and (4) Area Under receiving operating Characteristic (AUC): gives the comparative ranking ability of the predictor. TN and Q metrics are very sensitive as it depends on exact identification of items which where not available in past or recent past time window. Considering temporal novelty (TN, Eq. 16) as an accuracy metric, with respect to top k list size, the proposed model outperforms in the case of Netflix (see Data and Metrics section for detailed data description) than the rest. For precision (P, Eq. 12) analysis, the accuracy increases with different rates for different datasets, most likely due to different nature of the generated datasets. Therefore, it is better to use larger k (30%+ for Facebook, 50%+ for Netflix, and 70%+ for Movielens) to get 100% precision. In the case of novelty (Q, Eq. 13) analysis, the accuracy remains constant as list size increases. In the case of AUC, performance decreases with the size of the list; all decreasing with a similar trend.

### Varying both past and future time windows with equal value (varying T P = T F )

To test the model’s ability to make a correct prediction, it is compared to the benchmark models, for varying past and future time windows (T P and T F ) but having equal values, using the four information retrieval based indices considering only top 100 items of the popular list (k = 100). Based on the results depicted in Fig. 2, on average the performance of the proposed model, Recent Behaviour with Aging Effect (RBAE), is better than the other two benchmark models as they have either ability to predict in only one case such as in the case of Temporal Novelty (TN). Novelty (Q) index performs better than RBAE for initial few days of prediction degree but after few days RBAE outperforms all. As shown in Fig. 2 for the top 100 popular items, Temporal Novelty (TN100) values increase as the past and future time windows increase for values above 100 days for all the datasets. Overall, RBAE model outperforms both benchmark models as time windows increases. Considering Precision (P100), RBAE model outperforms the other two models in Netflix and Facebook and has similar performance with PBP for Movielens dataset, despite a slight decreasing trend as the time window increases. Novelty (Q100) or absolute novelty (Eq. 13) results show that our model outperforms other two models in Movielens and after around 75 days in the other two datasets. Considering AUC100, as shown in Fig. 2, RBAE model performance is always better (or equal to PBP) in all the datasets and for all the time windows.

### Varying future time window (T F )

Figure 3 depicts the performance of proposed predictor against the benchmark predictors for different values of the future time window up to 300 days. Similar to author19, the past time window length T P = 60 days is considered. For proposed predictor (RBAE), the parameter learned as described in Method section. For PBP the parameter values are iterated up to two decimal places and chose which gave the best precision. As the results of the analysis based on the four performance indicators presented in Fig. 3 shows, on average RBAE outperforms the benchmark models. For example, the ability of degree in making a prediction for temporal novelty (TN100) is best while it shows zero performance in the case of absolute novelty (Q100). PBP performs better than Degree but RBAE performs consistently better in all the cases. As the results of the analysis for Temporal Novelty (TN100) shows our proposed model, RBAE, always performs better than PBP; degree performs better than RBAE in Movielens and Netflix datasets while in the case of Facebook data, RBAE outperforms both benchmark models. Precision(P100) results reflects RBAE performs better than degree in all cases and being almost similar accuracy to PBP for Movielens and Netflix datasets while in the case of Facebook PBP outperforms RBAE. The results of novelty (Q100) analysis show that RBAE performs better than both benchmark models in all the cases. It is also important to note that novelty affected by future time window size.

### Predicting the absolute popularity

In this section, we compare the proposed model, RBAE, with the Reinforced Poisson Process Model (RPPM) model, which is for predicting absolute number of popularity gain, in addition to the other two benchmark models (Degree and PBP) considering the total number of link gains up to a future time window. Twitter re-tweet data is used. To make prediction the model is trained for 20 minutes by considering recent past time window for 10 minutes (T P = 600 seconds). As shown in Fig. 4, At every time step in future, the total number of re-shares is counted and the tweets are ranked accordingly. It is found that in the cases of temporal novelty (TN) and novelty (Q), RBAE prediction outperforms other models while in the other cases its performance is not good.

## Discussion

This study attempts to solve the problem of predicting popularity of potential items18 which are generally suppressed by already popular items. We solve this problem by considering user-item bipartite interaction network and ranking approach. We emphasize two kinds of novelty prediction: ‘absolute novelty’ and ‘temporal novelty’. From Fig. 1, we find that as ranking list size increases, precision also increases, AUC decreases, while the novelty and temporal novelty are slightly affected. This result shows our model performs well only for ranking top popular items. It also suggests discovering novel items has cost of accurately predicting lower rank items. The similar result is also found from Fig. 4, as RBAE outperforms other models in predicting novelty and temporal novelty but not in other two metrics. From Fig. 3, we can say the long-term prediction performance increases with recent past time window size. This suggests our model is sensitive towards recent past window size selection on all the datasets. In Fig. 3 we also see the effect of fixed recent past time window for varying future time window, RBAE performs for Movielens and Netflix dataset but in the other cases its performance is equal or it outperforms. This analysis suggests recent past time window affect more in identifying items which did not get popularity during recent past time window. Further it is found that proposed predictor does not perform well for Facebook system on precision metric as compared to PBP when the past time window is fixed (see Fig. 3), but in other cases, it is found that it makes good prediction when the past time window is also varying (see Fig. 2 for same Facebook system). Thus we can say that RBAE is an optimal predictor because it helps in predicting and ranking novel items. From Fig. 4, a limitation of our proposed model is that it does not perform well for ranking on the basis of total popularity gain (see problem definition 2) as AUC and precision is vital metrics. Nevertheless, RBAE outperforms the other models in predicting both novel as well as temporal novel items. The proposed predictor is purely temporal feature based, which is also found to be effective in generalization4. We have performed extensive experiments on four distinct data sets, which represent four distinct systems. Our model can also be applied to other evolving systems. For future possible work, we will consider the temporal features along with other driving factors such as preferential attachment, aging, freshness of item, community, non-linear preferential attachment, and sentiment analysis.

## Methods

We first describe three benchmark models, and then we introduce our proposed model. The benchmark models are given as follows

### Degree

Matthew effect or preferential attachment is a well-known phenomenon which is seen almost in every evolving network. It states the rate of a node’s future link gain (e.g., movies receiving new rating in the case of Movielens, friends receiving new likes or comments in the case of Facebook wall post activities) is proportional to the number of links it currently has. In other words, the current degree of an item (k o (t)) is a good predictor for its future popularity.

### Popularity-based predictor

PBP, proposed by19, extends the degree (or preferential attachment) model by adding a new parameter, ‘recent time window’, as a proxy for items’ recent popularity. The prediction score of an item at time t can be given as:

$${{\rm{s}}}_{o}{(t,T}_{{\rm{p}}})={{\rm{k}}}_{o}(t)-\lambda {{\rm{k}}}_{o}(t-{{\rm{T}}}_{{\rm{P}}}),$$
(1)

where s o (t, TP) is the predicted rating/links considering recent (past) time window T P from t. k o (t) is the total link gain up to time t. λ [0, 1] and λ = 0 gives the total popularity (i.e., the total number of links for an item) and for λ = 1 gives recent popularity (i.e., the number of links in recent time window T P ).

### Reinforced Poisson Process Model

RPPM is proposed by13,14,15 for predicting popularity dynamics of evolving systems. Consider time-dependent Poisson process which gives the intensity of a given message (m), its popularity (re-tweet) dynamics $$\{{t}_{k}^{m}\}$$ up to time T i , can be modelled as reinforced Poisson process with intensity λ m (t, k) which can be measured as

$${\lambda }_{m}(t,k)={c}_{m}{f}_{m}(t){r}_{m}(k),$$
(2)

where c m is the intrinsic attractiveness $${f}_{m}({t}_{k})={t}_{k}^{\gamma }$$ is the time relaxation function which characterize aging effect. r m (k) is the reinforcement function depicting the “rich-gets-richer” effect. Further they modeled reinforcement mechanism as follows-

$${r}_{m}(k)=\,\in \,+\frac{(1-{e}^{-\alpha (k+1)})}{(1-{e}^{\alpha })},$$
(3)

where r m (k) is reinforcement mechanism and k is cumulative number of re-tweet at time t. The model parameters {c m , α m , γ m } is estimated by maximizing the likelihood function13. The cumulative number of retweet count at any time in future t can be estimated by expectation of Poisson process,

$$\frac{dR}{dt}=\lambda (t),$$
(4)

which can be solved exactly as following expression with boundary condition R(T i ) = n.

$$R(t)=\frac{({ln}(1+{e}^{Y})-Y-\,{ln}\,{\varepsilon }-{\alpha }^{\ast })}{{\alpha }^{\ast }},$$
(5)

where,

$$Y={\varepsilon }{c}^{\ast }{\alpha }^{\ast }\frac{({T}_{i}^{1-{\gamma }^{\ast }}-{t}^{1-{\gamma }^{\ast }})}{\mathrm{(1}-{\gamma }^{\ast }\mathrm{)(1}-e-{\alpha }^{\ast })}-(n+\mathrm{1)}{\alpha }^{\ast }-ln({\varepsilon }-e-{\alpha }^{\ast }(n+\mathrm{1)),}$$
(6)

where, {c*, α*, γ*} is the estimated parameter after likelihood maximization, and ε = 1 + (1 − eα).

### Our proposed model: Considering aging factor with recent popularity

The popularity of a node in a complex system is driven by four factors: its degree, newness20, recent popularity gain21 and aging effect15,22,23,24. When the number of nodes in a system is very large we assume that attraction of attention due to newness is negligible. To consider recent popularity and degree together, we consider a parametric linear model which uses total popularity and recent popularity. The recent popularity is also used in previous research19,21. Since in an ideal rich-gets-richer system oldest node is the popular one and therefore recent popularity gain should also be a good predictor. But since the Web system are driven by many intrinsic as well as extrinsic phenomena25,26,27,28 therefore we have kept it parametric. As aging phenomenon is omnipresent in many complex systems so in web system also, for example in social media platforms, microblogs lose their popularity13, pathogenes lose their infectiousness due to ageing24 and network changes structure due to the ageing factor over time29. Modeling of aging phenomenon depends on system such as be exponential22,23,30, power-law7,13,31 and lognormal14,15. In our study we have considered exponential decay effect. To consider all these facts, we come up with an intuitive solution that aging factor with recent popularity will help us in detecting “potential items” (going to be popular). If s o (t, Tp) is prediction score at time t given the past time window T P . We can say

$${s}_{o}(t,{T}_{p}) \sim \,\frac{({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))}{\sum _{O}({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))}$$
(7)

The above equation states that score of the object follows its recent popularity gain. λ is tunable parameter between recentness and total popularity. It can take values in [0, 1] interval. As the ageing or decay is present everywhere, so we can formulate the prediction score as follows

$${s}_{o}(t,{T}_{p}) \sim \,\frac{\sum _{u}{e}^{\gamma ({T}_{uo}-t)}}{\sum _{O}\sum _{u}{e}^{\gamma ({T}_{uo}-t)}}$$
(8)

where T uo denotes the time at which user u consumed the object o and γ is free parameter. Since recent popularity will be good predictor if decay rate is constant, therefore, we will have

$${s}_{o}(t,{T}_{p}) \sim \,\frac{\sum _{u}{e}^{\gamma ({T}_{uo}-t)}}{\sum _{O}\sum _{u}{e}^{\gamma ({T}_{uo}-t)}}\bullet \frac{({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))}{\sum _{O}({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))}$$
(9)

In the above model, in the case of monopartite networks user u is the set of other nodes from where node or object o have received the link. For ease of representation, we name it as Recent Behaviour with Aging Effect (RBAE).

### Parameter learning using gradient descent

To optimise the model parameters we use gradient descent method and apply the following two cost minimization approaches:

• Ordinal ranking minimization, in which we first rank the predicted and real values and then the learned the parameters.

• Normalised score minimization, in which we normalise the both predicted and real scores between 0 and 1 and then learn the parameters. Further, we apply a weight to the cost by 1 − P n and 1 − Q n .

For learning the parameters in our proposed model (9) we use gradient descent and we have calculated the gradients as

$$\begin{array}{rcl}\frac{\partial ({s}_{o}(t,{T}_{P}))}{\partial \lambda } & = & \frac{[(({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P})).(\sum _{o}({k}_{o}(t-{T}_{P}))))-(({k}_{o}(t-{T}_{P}))(\sum _{o}({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))))]}{{(\sum _{o}({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P})))}^{2}}\\ & & .(\frac{\sum _{u}{e}^{\gamma ({T}_{uo}-t)}}{\sum _{o}\sum _{u}{e}^{\gamma ({T}_{uo}-t)}}),\end{array}$$
(10)
$$\begin{array}{rcl}\frac{\partial ({s}_{o}(t,{T}_{P}))}{\partial \gamma } & = & (\frac{({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))}{\sum _{o}({k}_{o}(t)-\lambda {k}_{o}(t-{T}_{P}))})\\ & & .(\frac{[(\sum _{u}{e}^{\gamma ({T}_{uo}-t)}.({T}_{uo}-t)).(\sum _{o}\sum _{u}{e}^{\gamma ({T}_{uo}-t)})]-[(\sum _{u}{e}^{\gamma ({T}_{uo}-t)}).(\sum _{o}\sum _{u}{e}^{\gamma ({T}_{uo}-t)}({T}_{uo}-t))]}{{(\sum _{o}\sum _{u}{e}^{\gamma ({T}_{uo}-t)})}^{2}}),\end{array}$$
(11)

So we updated parameter as follows:-

$$\begin{array}{rcl}{\lambda }_{i} & = & {\lambda }_{i}-\alpha \mathrm{.(}{\rm{\Delta }}e\mathrm{).}(\frac{\partial {s}_{o}(t,{T}_{P})}{\partial {\lambda }_{i}}),\\ {\gamma }_{i} & = & {\gamma }_{i}-\alpha \mathrm{.(}{\rm{\Delta }}e\mathrm{).}(\frac{\partial {s}_{o}(t,{T}_{P})}{\partial {\gamma }_{i}}),\end{array}$$

where parameters λ and γ are the same as in Eq. 9 and Δe is the error magnitude which can be calculated considering different scenarios such as ordinal ranking-based, and normalised score error minimization. Since we want to maximize accuracy while learning, we give the weight of 1 − P n to normalised score based on the error minimization in our current result. We also test the result considering normalised score minimization approach and found it is also working good; we accepted the parameters which give the best accuracy. While parameter estimation, we set the past and future time window as 45 days, in the case of Movielens, Netflix and Facebook. In the case of Twitter, we learn the parameter for initial 20 minutes of re-sharing data and kept past time window for 10 minutes.

## Data and Metrics

To test the performance and robustness of our model, we consider the following datasets and evaluation metrics:

### Data

To test the predictor’s accuracy we have used different data sets. Like MovieLens, Netflix, Facebook wall post and retweet data from Twitter set-

• Netflix: This data set contains movie ratings from a famous platform called Netflix. The original dataset has 480, 189 users, 17, 770 items and 100, 480,507 ratings between 1 January 2000 and 31 December 2005. It contains rating from 1 to 5, where 1 being the worst and 5 is the best. We have randomly selected user’s who have rated at least 10 movies above 2.

• Movielens 10M: This dataset contains record of the movie ratings by users during 01 January, 2002 to 1st January 2005. MovieLens is provided by orgGroupLens project at University of Minnesota and contains 10, 000, 054 ratings and 95, 580 tags applied to 10681 movies by, 71567 users of the online movie recommender service MovieLens32. It contains rating from 1 to 5 where 1 is the worst and 5 is the best. We only consider positive ratings, where there is a link between a user if he/she has rated a movie higher than 2. We have randomly sampled 7, 000 unique users and all the movies rated by them. Further, we used the day as a unit of time rather than the detailed time.

• Facebook wall post: This dataset contains user’s wall post activity information during 14 October 2004 to 21 January 2009. It contains 46, 951 users and their wall post activity33,34. We ignored the self-influence, i.e. the record where the user has acted on his own wall. Further, we have converted this into a bipartite network where there is a link between a users and a Facebook wall when the user post a content to another user’s wall.

• Twitter re-tweet Data: This dataset contains tweet and re-tweet information7 on Twitter site. The original data contains 3.2 billion tweets and re-tweets on Twitter from 7 October to 7 November 2011. In our study, we randomly sampled 5000 tweets and all the information about their re-tweet activity. The re-tweet time is taken as relative, which is the main difference between this data and other data set used in this study. Every tweet has assigned time as 0 second when it was first shared. The time is considered in seconds.

The data description after cleaning are as in Table 1. In the table number of user for re-tweet data is dummy. Since in the data the user detail is not available so we consider every retweet or like is coming from different user therefore the details in the table is maximum possible user for Re-tweet data set.

### Evaluation metrics

The following evaluation metrics are adopted to measure the accuracy of the proposed models:Precision (P k ), Novelty (Q k ), Temporal Novelty(TN k ) and Area Under receiving operating Characteristic(AUC K ), also referred as ROC35.

• Precision is defined as the fraction of objects listed in the top k rankings of the predicted and real ranking lists36,

$${P}_{k}=\frac{{D}_{k}}{k},$$
(12)

where D k is the number of common objects in the top k of both predicted and real ranking lists. P k [0, 1]. The higher value of P k , the better precision of prediction.

• Novelty(Q k ) measures the ability of a predictor to rank ‘new object’ in the top k position that was not in top k position in past. Let R k denote the number of new objects (that were not in top rank before) in the top k of the real list. And E K denotes the number of the new objects correctly predicted by our model in the top k ranking list. Then the novelty score is given by

$${Q}_{k}=\frac{{E}_{k}}{{R}_{k}},$$
(13)
• AUC measures the importance of the relative position of its top k objectives in the predicted and ranked list. It selects top k objects from the real list as a benchmark and compares its rank score in top k predicted list. Let s p L p and s r L r be the scores of an object in predicted list. Then AUC is given by

$$AUC=\frac{\sum _{{s}_{p}\,\in \,{L}_{p}}\sum _{{s}_{r}\,\in \,{L}_{r}}I({s}_{p},{s}_{r})}{|{L}_{p}||{L}_{r}|}$$
(14)

where,

$$\begin{array}{rcl}I({s}_{p},{s}_{r}) & = & \{\begin{array}{ccc}0, & \,{\rm{if}} & {s}_{p} > {s}_{r},\\ 0.5, & \,{\rm{if}} & {s}_{p}={s}_{r},\\ 1, & \,{\rm{if}} & {s}_{p} < {s}_{r}.\end{array}\end{array}$$
(15)
• Temporal Novelty(TN k ) measures the ability of a predictor to rank ‘new object’ in top k that was not present in the top k position during recent past time window but during future time window T F they gained popularity. Let $${R}_{k}^{{\rm{\Delta }}t}$$ denote the number of new objects (that were not in top rank by popularity gain during recent time window T P ) in top k of the real list. And $${E}_{k}^{{\rm{\Delta }}t}$$ denotes the number of the new objects correctly predicted by our model in the top k ranking list. Then the temporal novelty (TN k ) score is given by

$$T{N}_{k}=\frac{{E}_{k}^{{\rm{\Delta }}t}}{{R}_{k}^{{\rm{\Delta }}t}},$$
(16)

## Ethics declarations

### Competing Interests

The authors declare no competing interests.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Minkov, E., Kahanov, K. & Kuflik, T. Graph-based recommendation integrating rating history and domain knowledge: Application to the on-site guidance of museum visitors. Journal of the Association for Information Science and Technology (2017).

2. 2.

Liao, H., Mariani, M. S., Medo, M., Zhang, Y.-C. & Zhou, M.-Y. Ranking in evolving complex networks. Physics Reports (2017).

3. 3.

Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring Limits to Prediction in Complex Social Systems. In Proceedings of the fourth ACM international conference on Web search and data mining, 65–74 (ACM, 2011).

4. 4.

Shulman, B., Sharma, A. & Cosley, D. Predictability of Popularity: Gaps between Prediction and Understanding. ICWSM. 348–357 (2016).

5. 5.

Cheng J. et al. Can cascades be predicted?Proceedings of the 23rd international conference on World wide web. ACM, 925–936 (2014).

6. 6.

Tsur, O. & Rappoport A. What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 2012: 643–652.

7. 7.

Zhao, Q., Erdogdu, M. A., He, H. Y., Rajaraman, A. & Leskovec, J. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1513–1522 (ACM, 2015).

8. 8.

Ahmed, M., Spagna, S., Huici, F. & Niccolini, S. A peek into the future: Predicting the evolution of popularity of user-generated content. In Proceedings of the sixth ACM international conference on Web search and data mining, 607–616 (ACM, 2013).

9. 9.

Bauckhage, C., Kersting, K. & Hadiji, F. Mathematical models of fads explain the temporal dynamics of internet memes. In ICWSM (2013).

10. 10.

Cheng, J., Adamic, L. A., Kleinberg, J. M. & Leskovec, J. Do cascades recur? In Proceedings of the 25th International Conference on World Wide Web, 671–681 (International World Wide Web Conferences Steering Committee (2016).

11. 11.

Abbas, K., Shang, M., Luo, X. & Abbasi, A. Emerging trends in evolving networks: Recent behaviour dominant and non-dominant model. Physica A: Statistical Mechanics and its Applications 484, 506–515 (2017).

12. 12.

Szabo, G. & Huberman, B. A. Predicting the popularity of online content. Communications of the ACM 53, 80–88 (2010).

13. 13.

Gao, S., Ma, J. & Chen, Z. Modeling and predicting retweeting dynamics on microblogging platforms. In Cheng, X., Li, H., Gabrilovich, E. & Tang, J. (eds) Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, Shanghai, China, February 2–6, 2015, 107–116 (ACM, 2015).

14. 14.

Wang, D., Song, C. & Barabasi, A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013).

15. 15.

Shen, H.-W., Wang, D., Song, C. & Barabási, A.-L. Modeling and predicting popularity dynamics via reinforced poisson processes. In AAAI, 14, 291–297 (2014).

16. 16.

Ogata, Y., Katsura, K. & Tanemura, M. Modelling heterogeneous space–time occurrences of earthquakes and its residual analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 52, 499–509 (2003).

17. 17.

Kim, M. et al. Event diffusion patterns in social media. In ICWSM (2012).

18. 18.

Huberman, B. A. Big Data and the Attention Economy: Big Data (Ubiquity symposium). ACM, 2:1–2:7 (2017).

19. 19.

Zeng, A., Gualdi, S., Medo, M. & Zhang, Y.-C. Trend prediction in temporal bipartite networks: The case of Movielens, Netflix, and Digg. Advances in Complex Systems 16, 1350024 (2013).

20. 20.

Wu, F. & Huberman, B. A. Novelty and collective attention. Proceedings of the National Academy of Sciences of the United States of America 104, e0120735 (2007).

21. 21.

Gleeson, J. P., Cellai, D., Onnela, J.-P., Porter, M. A. & Reed-Tsochas, F. A simple generative model of collective online behavior. Proceedings of the National Academy of Sciences 111, 10411–10415 (2014).

22. 22.

Iwata, T., Shah, A. & Ghahramani, Z. Discovering latent influence in online social activities via shared cascade Poisson processes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 266–274 (ACM, 2013).

23. 23.

Shahzamal, M. et al Airborne disease propagation on large scale social contact networks. In Proceedings of the 2nd International Workshop on Social Sensing, 35–40 (ACM, 2017).

24. 24.

Zelner, J. L., Lopman, B. A., Hall, A. J., Ballesteros, S. & Grenfell, B. T. Linking time-varying symptomatology and intensity of infectiousness to patterns of norovirus transmission. PloS one 8, e68413 (2013).

25. 25.

Mariani, M. S., Medo, M. & Zhang, Y.-C. Identification of milestone papers through time-balanced network centrality. Journal of Informetrics 10, 1207–1223 (2016).

26. 26.

Taylor, D., Myers, S. A., Clauset, A., Porter, M. A. & Mucha, P. J. Eigenvector-based centrality measures for temporal networks. Multiscale Modeling & Simulation 15, 537–574 (2017).

27. 27.

Oestreicher-Singer, G. & Sundararajan, A. Recommendation networks and the long tail of electronic commerce (2010).

28. 28.

Zangerle, E., Gassler, W. & Specht, G. On the impact of text similarity functions on hashtag recommendations in microblogging environments. Social network analysis and mining 3, 889–898 (2013).

29. 29.

Zhu, H., Wang, X. & Zhu, J.-Y. Effect of aging on network structure. Physical Review E 68 (2003).

30. 30.

Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P. & Tita, G. E. Self-exciting point process modeling of crime. Journal of the American Statistical Association 106, 100–108 (2011).

31. 31.

Parolo, P. D. B. et al. Attention decay in science. Journal of Informetrics 9, 734–745 (2015).

32. 32.

Harper, F. M. & Konstan, J. A. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 19 (2016).

33. 33.

34. 34.

Viswanath, B., Mislove, A., Cha, M. & Gummadi, K. P. On the evolution of user interaction in Facebook. In Proc. Workshop on Online Social Networks, 37–42 (2009).

35. 35.

Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143, 29–36 (1982).

36. 36.

Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J. T. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22, 5–53 (2004).

## Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 91646114) and Chongqing research program of technology innovation and application under grant cstc2017rgzn-zdyfX0020.

## Author information

K.A., A.A. and M.S. designed the research. K.A. performed the experiment. K.A., A.A. and J.J.X. wrote the paper. K.A., M.S., A.A., X.L. and Y.Z. analysed the results. All authors have reviewed the paper.

### Competing Interests

The authors declare no competing interests.

Correspondence to Khushnood Abbas or Mingsheng Shang or Alireza Abbasi.

## Rights and permissions

Reprints and Permissions

• ### Mining Social Media to Identify Heat Waves

• Francesca Cecinati
• , Tom Matthews
• , Sukumar Natarajan
• , Nick McCullen
•  & David Coley

International Journal of Environmental Research and Public Health (2019)

• ### Quantifying users’ selection behavior in online commercial systems

• Xi Wang
• , Heyang Li
•  & An Zeng

Physica A: Statistical Mechanics and its Applications (2018)