Morningness–eveningness assessment from mobile phone communication analysis

Human behaviour follows a 24-h rhythm and is known to be governed by the individual chronotypes. Due to the widespread use of technology in our daily lives, it is possible to record the activities of individuals through their different digital traces. In the present study we utilise a large mobile phone communication dataset containing time stamps of calls and text messages to study the circadian rhythms of anonymous users in a European country. After removing the effect of the synchronization of East-West sun progression with the calling activity, we used two closely related approaches to heuristically compute the chronotypes of the individuals in the dataset, to identify them as morning persons or “larks” and evening persons or “owls”. Using the computed chronotypes we showed how the chronotype is largely dependent on age with younger cohorts being more likely to be owls than older cohorts. Moreover, our analysis showed how on average females have distinctly different chronotypes from males. Younger females are more larkish than males while older females are more owlish. Finally, we also studied the period of low calling activity for each of the users which is considered as a marker of their sleep period during the night. We found that while “extreme larks” tend to sleep more than “extreme owls” on the weekends, we do not observe much variation between them on weekdays. In addition, we have observed that women tend to sleep even less than males on weekdays while there is not much difference between them on the weekends.


S1. PRINCIPAL COMPONENT ANALYSIS OF THE MID-SLEEP TIMES
We have performed a principal component analysis on the mid-sleep times of the vector T mid = {T W eekday mid , T F riday mid , T Saturday mid , T Sunday mid } as discussed in the main text. The loadings of the T d mid has been summarized in Table S1. Since all the loadings have negative values we use -PC1 as a convention to study the chronotypes from T mid . The reversal of sign does not affect the results in the case of PCA because the principal axis is rotated arbitrarily to get the best fit of the data. Moreover, we have found that the correlation between the PC1 and g chronotype (discussed in details in later sections) have a negative value of −0.73. Thus we have reversed the sign for non regressed values of T d mid to maintain the same conventions for all chronotypes.

S4. EXPLORATORY BIFACTOR ANALYSIS
We perform a bifactor analysis to compute g chronotype described in the main text. The loadings of all the factors, g chronotype, F1* and F2*, factors have been summarised in is used to identify their chronotypes. F1* and F2* are group factors that load separately onto the morning and the evening activities.

S5. REGRESSION OF THE CALLING ACTIVITIES
We again carry out a regression on the morning ({M A d }) and evening activities ({EA d }) of the individuals for all d days of the week to remove the geographical effect using longitude and latitude as independent variables. In Table S5

S6. FACTOR SCORES OF THE RESIDUALS COMPUTED AFTER REGRESSION
After carrying out the regression on MAs and EAs we perform an exploratory analysis and bifactor analysis on the residuals obtained. We show the variation of the factor scores of the morning and evening behaviour, the general g chronotype and m chronotype (see section S7) for the 5 regions (longitudinal bands). This plot is computed after an EFA and EBA was done on the residuals to show that we have removed the effect of the East-West progression in the data. Figure S1 shows a summary of the factor scores in the form of a box-plot that includes all the values within the range of the 25 th and 75 th percentile and the end of the whiskers represent the maximum and the minimum scores excluding outliers. The horizontal lines inside the boxes represent the median of the scores in each region.

S7. THE M CHRONOTYPE AND ITS VARIATION WITH AGE AND GENDER
Finally, we have considered all the T d mid values for all d days of the week. This model has a unidimensionality score is 0.97 which implies that there is only one underlying factor. An EFA carried out on this model also shows that there is one latent factor that can be used as indicator of the chronotype (m chronotype) of an individual as shown in Figure S2. The two chronotypes: g and m have a strong correlation (0.78) and both can used to determine the morningness or eveningness of a user. The colour blue has been used males and red for females.