Happiness and the Patterns of Life: A Study of Geolocated Tweets

The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies have had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the hedonometer, we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individual's average location.

: The distributions of gyradius (km) for four cities appear to be log-normal. The mode distance (binned) is larger for Los Angeles and San Francisco than for Chicago and New York City. We note that these distributions were calculated for all individuals whose expected location fell within the latitude and longitude bounds of Figure 2, and thus reflect a modified set of individuals than those identified with cities in Figure 3.
19 Figure S1: The distributions of gyradius (km) for four cities appear to be approximately lognormal. The mode distance (binned) is larger for Los Angeles and San Francisco than for Chicago and New York City. We note that these distributions were calculated for all individuals whose expected location fell within the latitude and longitude bounds of main text Fig. 2, and thus reflect a modified set of individuals than those identified with cities in Fig. S3. Figure S2: The distance from expected location, calculated for each individual, is shown for each tweet authored in four example cities in 2011. Spatial clustering is observed (Table S2) Table S1: Evidence for clustering is observed in both Geary's C and Moran's I spatial autocorrelation for tweet location colored by gyradius ( Figure 2).    the Pew Internet & American Life Project, roughly 15% of adults in the U.S. were actively using Twitter at the end of the year during which we collected data [42]. While this fraction represents a substantial group of Americans, we have no data to quantify the demographic group represented by the subset of these 15% who specifically choose to geotag a large percentage of their messages. Nevertheless, since we threshold the sample to include individuals who have geolocated more than approximately 300 of their messages in 2011, we suspect that the large majority of individuals represented in our study regularly do so as a matter of daily life, as opposed to geolocating messages only when encountering a novel experience such as a vacation.
Regarding word usage as a proxy for happiness, accessing the internal emotional state of individuals is beyond the scope of our instrument. We do believe however, that when aggregated, the words used by large groups of individuals reflect their culture in ways not captured by surveys or self-report. Indeed, we see the hedonometer as complementing more traditional economic methods for characterizing economic and societal health, such as the Gross Domestic Product or Consumer Confidence Index. Using the same collection of geolocated messages explored here, the hedonometer was recently employed by Mitchell et al.
[17] to characterize trends in word usage for cities. Expressed happiness was shown to correlate to hundreds of demographic, socio-economic, and health measures, with interactive evidence available in the article's online Appendix 1 .

Results
In Figure 2, we investigate the geographical distribution of movement in four urban areas by plotting a dot for each tweet, colored by the gyradius of its author. Clockwise from the top left, cities are displayed in order of their apparent aggregate gyradius, with New York City seemingly exhibiting a smaller radius than the San Francisco Bay Area. Reflecting the pattern of urban life, we find messages authored by large radius individuals to be more likely to appear in the main downtown area of each city, while messages authored by small radius individuals tend to appear in less densely populated areas. For example, in Chicago, many individuals writing from downtown exhibit an order of magnitude greater radius than individuals posting in areas outside of the city.
In the greater Los Angeles area, we see several clusters of individuals with larger radius in downtown Los Angeles, as well Long Beach, Santa Monica, and Disneyland in Anaheim, while less densely populated areas are seen 1 http://www.uvm.edu/storylab/share/papers/ mitchell2013a 6 Figure S3: The mean gyradius of individuals whose expected location falls within each city is plotted against the city's population (A) and land area (B). Shown are cities containing at least 50 individuals with a nonzero gyradius, each individual having authored at least 30 geolocated tweets. City boundaries are defined by [2] which encompasses a smaller area for the four cities illustrated in the main text Fig. 2. Generally, gyradius increases with city population and land area, with no large cities exhibiting a small mean radius. Pearson correlations: Population ρ = 0.10, p = 0.03, Land Area ρ = 0.24, p = 2 × 10 −7 .    Figure S4: A random 10% of individuals (30 out of 300) are removed from Figure 5A, and the slope of the probability fit (red curve) is recalculated. Repeating the procedure 100 times, we find the above distribution of slopes. The mean of this distribution agrees well with that reported in Figure 5A. Additionally, fitting the power law model to the leading 10 locales, using only individuals who have at least 10 locales, we also get a slope of roughly −1.3 (not shown).  Figure A8: For New York City, Los Angeles, Chicago, and the San Francisco Bay Area, we group messages into equally sized bins by the distance from expected location of their author, and measure the average word happiness of each group. These plots exhibit similar trends to that observed in Figure 8A with the exception of New York City. 20 Figure S5: For New York City, Los Angeles, Chicago, and the San Francisco Bay Area, we group messages into equally sized bins by the distance from expected location of their author, and measure the average word happiness of each group. These plots exhibit similar trends to that observed in main text Fig. 6A with the exception of New York City.     Figure S7: We compare the 6.55 km gyradius group versus the 292.03 km gyradius group (A). We find that the 292.03 km group has relatively frequent use of the words 'car' and 'weekend' suggesting that this group travels on the weekends perhaps to a vacation home as suggested by use of the word 'home'. (B) We compare the 13.26 km gyradius group versus the 292.03 km gyradius group. We find that the 292.03 km group uses the word 'car' more frequently than the 13.26 km group which, interestingly, uses the word 'traffic' more frequently. Again the increased relative usage of these words seems fitting for a groups with these patterns of movement.       [1] found this distribution to be well modeled by a truncated power law with an exponential tail.

Normalizing Human Trajectory
To compare the shape of trajectories of individuals traveling in different directions and over different distances, we use the methods introduced by González et al. [1]. We will examine the normalization steps for two individuals we will call user A and user B. We have 768 geolocated tweets for user A and 1,882 geolocated tweets for user B. User A has gyradius r A = 463.61 km and user B has gyradius r B = 54.28 km. Fig. S10 represents the geospatial tweet locations for user A and user B, but we have shifted their coordinate system to maintain their anonymity. We have also allowed for a slight spatial separation between the locations for user A and the locations of user B for clarity.    In Fig. S11, we apply the linear transformation shifting each location for the user to the distance in kilometers from their center of mass, i.e. the expected location of the user. The difference in gyradius between user A and user B is still very apparent in the axis ranges for this plot. Notice that the directional relationships between the tweet locations for each user have still been preserved. We can see that user A travels predominantly in a southwest direction, while user B travels primarily in a northwest direction.
To normalize for direction of travel, let the set of tweet locations for user i be represented by the set of equally weighted masses at each of the tweet locations {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )}. Now we calculate the tensor of inertia (I) for each set of weighted (x, y)-points as The eigenvector of I corresponding to the largest eigenvalue of I represents the direction along which most of user i's trajectory occurs; we call this the principal axis for user i (see Fig.  S12).   Figure S12: The tweet locations for User A and User B along with a line representing the principal axis for that user. Now we can determine the angle necessary to rotate the set of points for user i so that the the resulting principal axis is the x-axis. Fig. S13 shows the results of this step. We see that the principal axis for user A and user B is now the x-axis. The final step is to normalize for individuals with different gyradius. We accomplish this by dividing the x-coordinate of each rotated tweet location for user i by σ x , where σ x is the standard deviation of the x-coordinates of the rotated tweet locations for user i, and similarly dividing by σ y for the ycoordinates. The final result is shown in Fig. S14.  Figure S14: The rotated tweet locations of User A and User B after normalizing for gyradius. The origin represents the center of mass of the respective individuals' trajectory, namely p a from equation (2).
As a result, we can compare the shape of the trajectories for User A and User B having normalized for direction and gyradius. We can see that both User A and User B have most of their normalized tweet locations in two main clusters.