Introduction

Humans are deeply captivated to try new experiences that eventually become pleasant daily routines. The enjoyment of playing a musical instrument, speaking foreign languages, sports or hobbies, are all activities that for full enjoyment require some time investment and training experience that eventually pay off. One interesting question is how humans get engaged and come to love these activities, which offer both a challenge as well as an intrinsic reward. What is the training or learning process and how does it affect their level of enjoyment? How can we measure and quantify fun?

Before the new era of modern technology, answering this type of question relied on the accumulated knowledge obtained from qualitative observations of single individuals made in different conditions by different observers. This makes it very difficult to extract general laws of human behavior. The widespread use of the Internet and the world-wide connectivity that it provides is changing this picture radically and fast. For the first time in human history, it is possible to monitor human actions on an unprecedented large-scale, allowing us to uncover precise and quantitative laws of human behavior1,2,3. Nowadays, we have the ability to measure, with impressive precision, our mobility patterns4,5, our musical tastes6,7, or the way in which ideas spread and crystallize across populations8,9, providing us with a very accurate picture of some of the key aspects of human behavior at the large scale10,11,12,13.

Fostered by the widespread outburst of smart phones and tablets, one of the most popular current amusements are casual video games. These are games with simple rules and game dynamics that can be played in brief bursts in a casual way, e.g. during breaks or daily commuting. Some of these games, like Candy Crush Saga (the flagship game of King Digital Entertainment), have reached outstanding popularity. As of the fourth quarter of 2018, King’s games were played by 268 millions monthly active players, with millions of players playing many millions of levels every day in Candy Crush Saga alone14. Hence, they are an ideal platform for studying how humans become engaged in a rewarding activity.

There is a vast literature on measuring video game engagement and enjoyment. However, most of these studies are based on (1) surveys with a moderate number of individuals15,16,17,18, (2) physical measures of physiological metrics on players while they are playing19, or (3) studies of psychological motivations18,20,21,22. In this paper, it is not our intention to enter into the psychological, motivational, behavioral, or social aspects of video game playing nor criticize the standard psychometric, behavioral or physiological metrics, or questionnaire-based evaluation of engagement performed on a limited number of individuals (typically aware to be subject of study) and short time span. Our work is radically different as it approaches the problem from a data-driven point of view by analyzing the real behavior of a large population of individuals as they play the game. In some of the games we have analyzed, we follow the individual behavior of a cohort of 10 million players during a period of two years. This astonishing amount of data allows us to quantify empirically users’ engagement vs progression in a way that has not been possible before the big data era. Besides, our analysis reveals a scaling law that is universal –across many different games, player segmentation, or countries– with profound theoretical implications.

Specifically, we show that the progression, engagement, and quitting of players in casual games can be analyzed and simulated using a simple stochastic model. The level of enjoyment and engagement of a fun activity, like a video game, can be measured and shows a common scaling behavior described by a power-law as a function of the progression in the game. This result suggests that enjoyment, like popularity, wealth, and many other phenomena, is a multiplicative process23,24,25,26: the more you are into it, the more engaged you become. Our empirical findings have interesting implications not only for casual games but also for generic engagement dynamics into a variety of different activities, reflecting a global trend of human behavior.

Results

Casual games

Many typical casual games, like Candy Crush Saga, pose a linear sequence of levels that a player can access one by one as the previous level is successfully completed (see Fig. 1). Players start the game at level 1 and progress level by level in an increasing manner. At each level, the player must achieve a predefined goal to pass it (e.g. collect a specific number of candies or reach a certain score) using a limited number of moves, resources, or time. Each attempt to pass a level can be successful, meaning that the player passes that level and can play the next one, or unsuccessful. Alternatively, the player can become tired or frustrated at some point and decides to quit the game. Each level always involves randomness, either in the initial configuration or in the dynamics. This makes it natural to model game dynamics as a stochastic process27,28.

Figure 1
figure 1

(left) Map of the linear sequence of levels of a saga game. Players start at level 1 and take a different number of attempts to pass each level, progressing until eventually they decide to abandon the game. (right) Typical trace of the progression of a player measured as the highest level achieved after a total number of accumulated attempts.

To model player progression and experience in the game, we use two general indicators: one to quantify the total time spent in the game and another one to measure the progression within the game. In casual games, the real-time activity (i.e. how often and for how long the person plays the game) is not a good measure of the actual time spent in the game. This is so because these games are very often played in short breaks or free time, which is unpredictable and not controlled by the player. Instead, we use the accumulated number of attempts as activity-independent measure of the total “time” spent in the game. The maximum level achieved after a given number of attempts is an indicator of game progression, i.e., on how far a player is in the game (see Fig. 1b). With this strategy, we monitor the actual progression of players in the game decoupled from their real-world activity.

For these games, the dynamics of game progression can be modeled in a very simple way using Continuous Time Random Walks (CTRW)29,30, as described in detail in the Supplementary Information (SI). In our model, we assume that all players can be considered as identical and independent. When a player reaches a new level, there are two competing random processes taking place simultaneously: (1) the random number of attempts required to pass that level, τp, and (2) the random time, measured in number of attempts, that the player takes to get bored or frustrated and decides to abandon the game, τa. For a given level, the final fate of the player depends on which of these random times is shorter. If τp < τa, the player passes the level and jumps to the next one; otherwise the player quits the game. These two times are assumed to be statistically independent random variables with probability density functions \({\psi }_{n}^{p}(t)\) and \({\psi }_{n}^{a}(t)\). In short, \({\psi }_{n}^{p}(t)\) controls the time that the player would take to pass level n if he/she were not allowed to abandon the game. Similarly, \({\psi }_{n}^{a}(t)\) defines the time the player would take to abandon the game if level n were impossible to pass (without the player knowing it). In the simplest version of the model, pass and abandon times at level n are taken to be Poisson point processes and, therefore, their probability density functions are31

$${\psi }_{n}^{p}(t)=\frac{1}{{\bar{t}}_{p}(n)}{e}^{-t/{\bar{t}}_{p}(n)},\ \ {\psi }_{n}^{a}(t)=\frac{1}{{\bar{t}}_{a}(n)}{e}^{-t/{\bar{t}}_{a}(n)},$$
(1)

where \({\bar{t}}_{p}(n)\) and \({\bar{t}}_{a}(n)\) are the average time to pass or abandon at level n, respectively. In the SI file, we show that although this assumption is not totally correct, deviations from the exponential hypothesis only occur in around 0.1% of the observed abandon and pass times. With this choice, \({\bar{t}}_{p}(n)\) and \({\bar{t}}_{a}(n)\) are the main ingredients of the model. Specifically, \({\bar{t}}_{p}(n)\) is a measure of the relative difficulty (or relative cost) of that particular level, whereas \({\bar{t}}_{a}(n)\) is a measure of the engagement of a player at that particular level. Both times can be easily measured for an arbitrary dataset as (see Methods)

$${\bar{t}}_{p}(n)=\frac{{\bar{t}}_{p}^{emp}(n)}{1-{p}_{c}(n)}\ {\rm{a}}{\rm{n}}{\rm{d}}\ {\bar{t}}_{a}(n)=\frac{{\bar{t}}_{p}^{emp}(n)}{{p}_{c}(n)},$$
(2)

where \({{\bar{t}}_{p}}^{emp}(n)\) is the empiric mean time to pass level n, and pc(n) the probability to churn at that level. The empiric time to pass \({{\bar{t}}_{p}}^{emp}(n)\) is just the average number of attempts needed by players that passed level n to pass it. The churn probability pc(n) is the total number of players that abandoned at level n divided by the total number of players that played level n. We consider that a player has abandoned the game when the player shows no activity during the remaining of the observation time window. Consequently, the estimation of pc, and so of \({\bar{t}}_{a}\), depends on the observation time window of the dataset. In the SI file, we report estimations of \({\bar{t}}_{a}\) for the same cohort of players by increasing the total observation window from 60 to 600 days. The abandon time nicely collapses into a clean power law as we increase the size of the window.

Measuring engagement

Figure 2 shows an example of the average abandon and pass times of the different levels of a game, along with the behavior of the survival probability of players in the game as a function of the level and attempts, respectively. The data corresponds to a week cohort of 11,836,502 players of Candy Crush Saga game playing on the Facebook platform starting on 2014 and followed for 2 years.

Figure 2
figure 2

Average abandon (in blue) and pass (in orange) times measured at each level of the game Candy Crush Saga from a week cohort of 11,836,502 players with install dates corresponding to the first week of the year 2014 playing on Facebook platform and followed for 2 years. The data has been binned, plotted in double logarithmic scale, and fitted to power laws \({\bar{t}}_{a}(n) \sim {n}^{\alpha }\) and \({\bar{t}}_{p}(n) \sim {n}^{\beta }\), obtaining the values of the exponents α and β indicated in the legend. (b) Simulations of the model (orange) reproduce the actual survival of players (blue in levels, green in accumulated attempts) within the statistical uncertainty.

The empirical data reveal a very interesting behavior for the abandon time, \({\bar{t}}_{a}(n)\). After an initial number of levels, typically 10–20, where the player is discovering the game (or the activity) and deciding whether he/she likes it or not, the engagement follows a power-law behavior of the form \({\bar{t}}_{a}(n) \sim {n}^{\alpha }\), with an exponent α around 1.1 (The value of the exponent for most analyzed games is in the range [1.0, 1.5]. In short datasets there is small plateau for high levels, consequence of the finite time window of observation and the end of content effect (see Fig. S5)). As a consequence of such fast growth rate, players behave very differently depending on their progression throughout the game, suggesting a “happy-get-happier” mechanism as a final explanation. The average pass time, on the other hand, is an indicator of the relative difficulty of the level as perceived by a player that has reached level n by his/her own means. Therefore, \({\bar{t}}_{p}(n)\) is a combination of the intrinsic difficulty of the level and the learning curve of players32 and, in general, we expect it to show a convex dependency on the progression level n. Consider, for instance, the case of learning a musical instrument. It is clear that the Minuet in G (BWV 114) from the Notebook of Anna Magdalena Bach is objectively simpler than the Bach-Brahms Chaconne in D minor BWV 1004 (for the left hand alone). Yet, the effort to learn the former (and so to advance in the progression) is perceived by a first-year piano student as higher than the effort to learn the latter as perceived by, for instance, the great piano player Daniil Trifonov. We thus expect \({\bar{t}}_{p}(n)\) to grow with n in a convex way. Our empirical analysis indicates that, indeed, this is the case. As a matter of fact, in the studied datasets, the average pass time after the first 10–20 tutorial levels can be reasonably fitted by a power law \({\bar{t}}_{p}(n) \sim {n}^{\beta }\), with an exponent β in the range [0.1, 0.5].

These scaling laws have important consequences for the global dynamics of the game. Indeed, as we show in the SI, there is an (infinite order) phase transition as a function of the parameters α and β between a standard phase, where all players eventually quit the game, and an “enthusiastic” phase, where a finite fraction of players never abandon the game. For α − β < 1, the probability of a player quitting the game at level n or higher follows a Weibull distribution of parameter β − α + 1, that is, \(S(n)\approx {e}^{-\mu ({n}^{\beta -\alpha +1}-1)}\). In this standard phase, the probability of a player to never abandon the game is zero. Instead, when α − β > 1 there is a finite probability that players never abandon the game, provided that the game has infinite content. This probability can be computed as \({S}_{\infty }\approx {e}^{{\sum }_{n=1}^{\infty }{\bar{t}}_{p}(n)/{\bar{t}}_{a}(n)}\) (see SI for a formal proof). In this “enthusiastic” phase, the survival probability for those players that eventually do abandon the game follows a power law of the form Sf(n) ~ n1+βα. This implies that the higher the value of α − β the fastest Sf(n) decays, so that either players abandon the game at the beginning of the progression, or they keep playing forever. Interestingly, all the analyzed casual games seem to be below but very close to the critical point α = 1 + β so that the survival probability is well described by a Weibull distribution. Figure 3a shows simulation results of this phase transition as compared to the theoretical approximation for \({\bar{t}}_{p}(n)=b{n}^{\beta }\), \({\bar{t}}_{a}(n)=a{n}^{\alpha }\) with β = 0.4, a = 1.5, and b = 1. The critical point αc = 1.4 and the behavior of S close to the critical point are both very well reproduced by the theoretical approximation. Figure 3b shows the survival probability for finite realizations Sf(n) below, at, and above the critical point αc. The agreement with the theoretical predictions is remarkable.

Figure 3
figure 3

Numerical simulations of the phase transition. Left, red squares are results of numerical simulations of the probability of a realization to never end as a function of α for a fixed value of β. We use the algorithm described in the methods section with \({\bar{t}}_{p}(n)=b{n}^{\beta }\), \({\bar{t}}_{a}(n)=a{n}^{\alpha }\) with β = 0.4, a = 1.5, and b = 1. Solid line is the approximate analytic solution derived in the SI. Notice the smooth approach to the critical point coming from the right, as a consequence of the transition being of infinite order. Right, survival probability of finite realizations below (α = 1), above (α = 3.4), and at the critical point (α = 1.4). Finite realizations are defined as those ending at n < nmax with nmax = 107. Other values of nmax do not change the results significantly. Dashed lines are the analytic predictions.

Mimicking player progression by simulation

In our model, we make three main assumptions: (i) the independence of the average pass and abandon times; (ii) both times are exponentially distributed; and (iii) all players can be considered as statistically identical. To verify the validity of these assumptions, (i) we performed a detrended fluctuation analysis that verifies that both times are truly independent (see Fig. S2); (ii) we have also verified that the distribution of abandon and pass times of all levels are exponential to a very good approximation (see Fig. S3); (iii) we also show that considering all players as identical reproduces their progression and survival accurately. To contrast the validity of this last assumption and of the model, we simulated the progression and churn of a cohort of identical players using the simple stochastic algorithm described in the Methods section with the abandon and pass times measured for the real dataset as input. Figure 2b compares the real data with the results of the simulations for the survival probability in levels and attempts (i.e. the fraction of initial players still active after playing a given number of attempts or levels). The simulations nicely reproduce the real survival (except for the small finite size effects of the tail), showing impressively the validity of the model and of the assumption that all players can be considered as identical. Accordingly, the abandon time is indeed an intrinsic, difficulty-independent measure of the average engagement of players at that level. Hence, a remarkable aspect of the model is that it can measure quantitatively human engagement and how it evolves as players progress in the game.

Universal behavior

The data for the abandon time shown in Fig. 2 for a specific game (Candy Crush Saga), clearly shows that the engagement increase as a power-law as the player gets more into the game. We repeated the analysis for different Saga games: Farm Heroes, Papa Pear, Candy Crush Soda and Pyramid Solitaire (see Fig. 4). These games are very different in terms of genre (e.g. Candy Crush is a match-three swapping tile game; Papa Pear is a physics based bouncing game; Pyramid is a card solitaire), targeted audience, graphics, mechanics and design. Astonishingly, all of them exhibit a common power law behavior of engagement, showing that this evolution of the engagement into a fun activity may be universal. The same happens when we analyzed data corresponding to players from different continents, platforms and periods of time (see Fig. S4).

Figure 4
figure 4

Comparison of mean abandon times measured in attempts of different popular Saga games from King: Candy Crush, Candy Crush Soda, Farm Heroes, Papa Pear, and Pyramid Solitaire. In all of them, the abandon time, and thus the engagement, increases as a power-law after the initial 10–20 levels, once players have learnt the dynamics of the game. All data corresponds to a week cohort of installs followed for 2 years. Data from Candy Crush and Pyramid Solitaire are from players on Facebook in the first week of 2014 and the week from 11-10-2013 to 17-10-2013, respectively; the remaining games are from all platforms and the first week of installs in 2017.

Discussion

We have seen that it is possible to quantify and model progression and churn of a playful activity or habit, like a videogame, as a competition between two ingredients: relative difficulty and engagement. Our big data analysis of the system allowed us to find a very precise measure of engagement, which shows a power-law trend indicative of a happy-get-happier mechanism. In this work, we have focused on the particular case of engagement in videogames since, to the best of our knowledge, it is the only system where the amount of available data allows us to elucidate sound statistical laws. However, we believe the process can be generalized to describe engagement in other activities: difficulty is a measure of the training cost and engagement is a measure of the reward or tolerance. Our model shows that a delicate balance between these two ingredients is needed to avoid early churn and that having a very difficult/traumatic experience at the initial stages would lead to massive churn. In addition, there is an interesting phase transition controlled by the ratio of progression between difficulty and engagement that leads to a finite probability that the person never abandons the activity. An interesting example is learning to play a musical instrument and, in general, any rewarding intellectual activity, like doing scientific research or artistic creation. Our model predicts a phase where the probability of individuals to never abandon the activity is non-zero. This may seem as obvious in these cases. Indeed, after many years of intense training, it is very unlikely that a person who had reached an advanced level would stop playing the piano or doing research33. Certainly, the amount of content in such disciplines is, basically, unlimited and the intellectual reward of keeping doing them is so high that it would be highly improbable that anyone at an advanced level would quit the activity. The importance of our framework relies precisely in its ability to explain when this behavior is possible and under what precise conditions. The model could be helpful to perform a similar analysis in other fields, to quantify tolerance and enjoyment and to design smooth learning procedures to facilitate for instance healthy habits (like sports) or to minimize early school leaving.

Methods

Empirical estimation of average abandon and pass times of individual levels

In our model, we assume that pass and abandon times are statistically independent random variables exponentially distributed according to Eq. (1). For mathematical tractability we take t as a continuous variable. This assumption does not affect any of the conclusions of this work. The corresponding survival probabilities, representing the probability that the time required to pass or abandon at level n is larger than t are:

$${\Psi }_{n}^{p}(t)={\int }_{t}^{\infty }{\psi }_{n}^{p}(\tau )d\tau ={e}^{-t/{\bar{t}}_{p}(n)}$$
(3)

and

$${\Psi }_{n}^{a}(t)={\int }_{t}^{\infty }{\psi }_{n}^{a}(\tau )d\tau ={e}^{-t/{\bar{t}}_{a}(n)}.$$
(4)

The average abandon, \({\bar{t}}_{a}(n)\), and pass, \({\bar{t}}_{p}(n)\), times cannot be measured directly from the data. The reason is that abandon and pass times are unconditioned random processes, that is, \({\psi }_{n}^{p}(t)\), for instance, accounts for the distribution of pass times at level n if players were not allowed to quit the game, which is a condition that is not meet in a real dataset. Instead, the empirical observables are: the churn probability at level n, pc(n), defined as the number of players that churned at level n divided by the total number of players that reached that level; and the empirical pass time, \({{\bar{t}}_{p}}^{emp}(n)\), defined as the average time to pass level n for those players that actually passed the level (and, therefore, did not churn).

In the model, churn probability can be evaluated as the probability that the time to abandon level n –whatever value it takes– is smaller than the time to pass it. In mathematical terms this is simply expressed as

$${p}_{c}(n)={\int }_{0}^{\infty }{\psi }_{n}^{a}(\tau ){\Psi }_{n}^{p}(\tau )d\tau =\frac{{\bar{t}}_{p}(n)}{{\bar{t}}_{p}(n)+{\bar{t}}_{a}(n)}.$$
(5)

Similarly, \({{\bar{t}}_{p}}^{emp}(n)\) can be evaluated mathematically in the model as

$${\bar{t}}_{p}^{emp}(n)=\frac{{\int }_{0}^{\infty }\tau {\psi }_{n}^{p}(\tau ){\Psi }_{n}^{a}(\tau )d\tau }{{\int }_{0}^{\infty }{\psi }_{n}^{p}(\tau ){\Psi }_{n}^{a}(\tau )d\tau }=\frac{{\bar{t}}_{p}(n){\bar{t}}_{a}(n)}{{\bar{t}}_{p}(n)+{\bar{t}}_{a}(n)}.$$
(6)

By inverting the last two equations, we obtain

$${\bar{t}}_{p}(n)=\frac{{\bar{t}}_{p}^{emp}(n)}{1-{p}_{c}(n)}\ {\rm{a}}{\rm{n}}{\rm{d}}\ {\bar{t}}_{a}(n)=\frac{{\bar{t}}_{p}^{emp}(n)}{{p}_{c}(n)},$$
(7)

relating the parameters of the model \({\bar{t}}_{p}(n)\) and \({\bar{t}}_{a}(n)\) with two quantities that can be directly measured in empirical datasets, namely \({{\bar{t}}_{p}}^{emp}(n)\) and pc(n).

Stochastic simulations of player progression and churn

To simulate the model, we only need as input information about \({\bar{t}}_{p}(n)\) and \({\bar{t}}_{a}(n)\), i.e. the average time to pass or abandon at level n, respectively. In the simulations, for each player starting at t = 1 at level n = 1, we perform the following steps34:

  1. 1.

    Being at level n at time t, generate two random numbers r1 and r2, uniformly distributed between (0, 1).

  2. 2.

    Use these random numbers to calculate the time to pass that level as \({\tau }_{p}=-{\bar{t}}_{p}(n){\rm{l}}{\rm{n}}\,{r}_{1}\) and the time to abandon that level as \({\tau }_{a}=-{\bar{t}}_{a}(n){\rm{l}}{\rm{n}}\,{r}_{2}\).

  3. 3.

    If τp ≤ τa the player jumps to level n + 1, time is advanced to t + τp, and go to step 1.

  4. 4.

    If τp > τa the player churns at time t + τa at level n.

The whole procedure is then repeated for another player up to a total of N1 players that are used to evaluate the survival curves. The survival curves are calculated as the fraction of the initial number of players that survived up to a given total number of attempts or levels. The validation of the model was performed using the average abandon and pass time measured from the real dataset and represented in Fig. 2a. An excellent agreement was also obtained using the power-law fit as input for the abandon times.