Counterfactual mobility network embedding reveals prevalent accessibility gaps in U.S. cities

Living in cities affords expanded access to various resources, infrastructures, and services at reduced travel costs, which improves social life and promotes systemic gains. However, recent research shows that urban dwellers also experience inequality in accessing urban facilities, which manifests in distinct travel and visitation patterns for residents with different demographic backgrounds. Here, we go beyond simple ﬂ awed correlation analysis and reveal prevalent accessibility gaps by quantifying the causal effects of resident demographics on mobility patterns extracted from U.S. residents ’ detailed interactions with millions of urban venues. Moreover, to ef ﬁ ciently reveal micro neighborhood-level accessibility gaps, we design a novel Counterfactual RANdom-walks-based Embedding (CRANE) method to learn continuous embedding vectors on urban mobility networks with confounding effects disen-tangled. Our analysis reveals signi ﬁ cant income and racial gaps in mobility frequency and visitation rates to sports and education venues. Besides, bachelor ’ s degree holders experience greater mobility reduction during the COVID-19 crisis. With extensive experiments on neighborhood-level accessibility prediction and visualizing accessibility gaps with embed-dings vectors, we demonstrate that the counterfactual mobility network embeddings can improve the explanatory capacity and robustness of revealed accessibility gaps by extending them from aggregate statistics to individual neighborhoods and allowing for cross-city knowledge transfer. As such, urban mobility networks can reveal consistent accessibility gaps in the U.S., calling for urgent urban design policies to ﬁ ll in the gaps.


U
rban dwellers live together in a compact environment but increasingly experience inequality in accessing urban venues and taking advantage of urban opportunities, which stems from the severe mobility gaps across gender, socioeconomic, and ethnic groups (Barbosa et al., 2021;Gauvin et al., 2020;Moro et al., 2021;Wang et al., 2018).Although urbanization improves venue accessibility and stimulates economic growth (Bettencourt, 2021), growing inequalities in urban mobility threaten the fabric that holds urban society together, affecting the livelihood and well-being of each resident (Glaeser et al., 2008).Inequality in urban mobility is entangled with multiple factors, including cultural experience and expectations (Wang et al., 2018), financial circumstance (Moro et al., 2021), and physical wellness (Althoff et al., 2017).As a result, measuring inequalities in urban mobility and dissecting its underlying factors represents a challenging task, but one that must be completed to meet UN sustainable development goals (SDGs) in reducing inequality and building resilient cities (Griggs et al., 2013).
There has been an increasing interest in uncovering mobility inequalities in urban spaces concerning various factors (Barbosa et al., 2021;Carra et al., 2016;Dueñas et al., 2021;Frias-Martinez et al., 2012;Gauvin et al., 2020;Graells-Garrido et al., 2020;Law, 1999;Lenormand et al., 2015;Macedo et al., 2022;Moro et al., 2021;Ng and Acker, 2018;Ryan et al., 2015;Sallis et al., 2018;Uteng, 2012).An important study (Law, 1999) defines the topic of "gender and daily mobility" and reviews studies on this from the early 70s.From a gendered perspective, a study in Chile finds that women not only visit fewer unique locations than men but also distribute their duration less equally (Gauvin et al., 2020).Mobility related to specific travel behaviors such as travel distance and the number of trips has been examined by gender in eight cities in multiple countries (Ng and Acker, 2018), with findings that women tend to travel shorter distances than men.Much research has investigated the relationship between other demographic or socioeconomic features and urban mobility.Both higher socioeconomic status (Frias-Martinez et al., 2012) and younger age (Lenormand et al. 2015) strongly correlate with a larger mobility range.A study in South America finds that middle-class travelers exhibit the most diverse mobility patterns, while the lower classes manifest limited spatial exploration (Macedo et al., 2022).Regarding the response to the COVID-19 pandemic, poorer populations show lower reductions in mobility level (Dueñas et al., 2021).Moreover, planned spatial factors play an important role in urban mobility, with previous studies revealing that improved access to public transportation infrastructure (Ryan et al., 2015) and well-designed neighborhood walkability (Sallis et al., 2018) can enhance mobility within urban areas.
Existing studies providing evidence on inequality in urban mobility, however, remain woefully inadequate due to limitations in datasets.Many efforts rely on small-scale survey data that is expensive to collect and difficult to scale up (Sharkey and Elwert, 2011).Some research attempts large-scale estimation via hypothetical models defined on spatial proximity (Hasthanasombat and Mascolo, 2019;Saxon, 2021), but this inevitably leads to bias in analytic results.Recent burgeoning precise mobility data provide the chance to compensate for the above deficiencies (Kadar et al., 2020;Moro et al., 2021).Here we measure accessibility gaps based on large-scale mobility data provided by SafeGraph company.SafeGraph curates mobile device users' traces to obtain fine-grained visitations to urban facilities represented as Pointsof-Interest (POIs).Covering over 35 million residents' visits to more than 4 million POI in the U.S. from 2019 to 2020, the largescale and accurate mobility dataset can sufficiently measure gaps in various mobility patterns that depict a resident's capability to exploit urban opportunities.Beyond data limitations, previous works also suffer from research methodologies.Most of them rely on correlation analysis between mobility patterns and a few specific demographic factors (Barbosa et al., 2021;Gauvin et al., 2020;Wang et al., 2018), which cannot control for confounding effects in urban mobility and may lead to inaccurate conclusions.Findings from correlation analysis are often based on aggregate statistics and provide limited insights into the micro-level causes of accessibility gaps.Therefore, it is important to tease apart potential confounding factors and probe micro-level gaps.
In this study, We present a novel framework to quantify accessibility gaps in six metropolitan areas of the US.We employ a propensity score matching (PSM) method that estimates the causal effects of demographic features on mobility patterns.Specifically, we look at the causal effect of gender, race, income, physical disability, and education background on mobility frequency and urban facility accessibility across different neighborhoods, which quantify the number of movements per person and the likelihood of accessing different urban facilities, respectively.Furthermore, we design a novel Counterfactual RANdom-walksbased Embedding (CRANE) to learn representations for microlevel accessibility gaps on urban mobility networks.Specifically, we use random-walk sampling to efficiently assess the empirically observed association between demographic and urban facility accessibility and the alternative outcomes in counterfactual scenarios where the demographic does not have a causal impact.Drawing on the difference between the observed and alternative outcomes, our CRANE method can efficiently approximate the causal inference result by treating them as positive and negative samples in representation learning, respectively.The representations for neighborhoods and POIs allow us to go beyond aggregate statistics and extend our analysis to their micro-level behaviors, enriching our understanding of the causes underlying accessibility inequality.
We observe interesting findings regarding mobility inequality in the US.Neighborhoods with higher average income and a higher portion of the white population consistently present higher mobility frequency across different cities, which suggests these sub-populations have superior access to urban facilities.Neighborhoods with a higher portion of bachelor's degree holders consistently manifest greater mobility reductions in response to the COVID-19 pandemic, which implies that these neighborhoods can afford fewer outdoor activities, likely from elasticity regarding their work format.Sports and education venues are more likely visited by white and higher-income populations.Extensive experiments demonstrate that neighborhood and POI embeddings learned from counterfactual random walks on urban mobility networks can improve the performance of accessibility prediction compared to traditional correlation-based methods.A case study on the POI level illustrates that our method successfully disentangles confounding effects between neighborhood demographics.

Methods
Demographic dataset.We collect demographic data from the 2019 U.S. Census Bureau's American Community Survey 5-year Estimates (ACS) as the independent variable of mobility patterns.In this dataset, features are reported at the level of census block groups (CBGs), the smallest geographical unit for publicly available census data.In this study, we use CBGs as a proxy for neighborhoods.We focus on the six most populated metropolitan statistical areas (MSAs) in the U.S.: New York-Newark-Jersey City, NY-NJ-PA (hereby referred to as the "New York" MSA, "NY" for short), Los Angeles-Long Beach-Anaheim, CA ("Los Angeles", "LA"), Chicago-Naperville-Elgin, IL-IN-WI ("Chicago", "Chi"), Dallas-Fort Worth-Arlington, TX ("Dallas", "Dal"), Houston-The Woodlands-Sugar Land, TX ("Houston", "Hou"), Washington-Arlington-Alexandria, DC-VA-MD-WV ("Washington DC", "DC").The numbers of neighborhoods in each MSA are listed as N CBG in Table 1.
Table 1 gives an overview of the average population N P and the number of households N H of neighborhoods in each MSA.We extract the following demographic features as potential contributors to urban accessibility gaps: (1) Female ratio is the proportion of female residents in each neighborhood, which is used to analyze the potential gender gap in urban mobility (Gauvin et al., 2020).(2) White ratio is the neighborhood's proportion of white population, which corresponds to the previous finding that the c racial minorities experience social isolation when traveling in the city (Wang et al., 2018).(3) Bachelor ratio is the proportion of people over 25 years that have a bachelor's degree or higher, reflecting the real-world observation that people with different educational backgrounds are likely to exhibit different mobility patterns.(4) Average income is the household average income in the past 12 months, corresponding to diverse mobility patterns associated with economic status (Macedo et al., 2022;Šćepanović et al., 2015).( 5) Disability ratio is the proportion of households with at least one disabled resident whose mobility capability is constrained by the quality of urban infrastructure (Saha et al., 2021).The average values of the above neighborhood feature in each MSA are reported in Table 1.
Mobility dataset.To reflect the heterogeneity in mobility patterns across the city, we utilize SafeGraph's Patterns dataset (https:// docs.safegraph.com/docs/monthly-patterns) and Core Places dataset (https://docs.safegraph.com/docs/core-places).By tracking GPS-equipped mobile devices under consent, the Patterns dataset aggregates visit counts from CBGs to POIs each month, providing fine-grained mobility records of urban dwellers and detailed information about POIs.It is worth mentioning that workers at POIs are excluded from the visit counts, reflecting residents' subjective capability of accessing urban venues instead of objective requirements.Across the six MSAs, the dataset records 155 million visits paid to 758 thousand POIs every month on average.The Core Places dataset identifies each POI's category under the North American Industry Classification Systems (NAICS).
Combining the Patterns and Core Places datasets, we can quantify mobility patterns associated with each neighborhood.In this study, we adopt Amartya Sen's capability analysis framework (Sen, 1980), where equality is defined as a person's basic capability of being able to do certain basic things in the environment, e.g.move or be clothed.From the perspective of mobility, the capability of accessing and visiting more urban venues is often linked with higher social status (Barbosa et al., 2021;Chang et al., 2021;Chen et al., 2022;Lenormand et al., 2015;Wang et al., 2011;Xu et al., 2018).Thus, we are devoted to revealing the gap in two measures of urban mobility capability, namely mobility frequency (N M ) and urban facility accessibility.
First, to depict aggregate mobility capability for each neighborhood, we calculate its mobility frequency (N M ) as follows.We sum the neighborhood's visits over POIs located in the MSA in three months (from July to September) to obtain the total number of visits made by its residents and then normalize it by the neighborhood's population to obtain the average number of visits per 100 residents.To study the mobility patterns influenced by the COVID-19 pandemic, we further calculate the mobility reduction Δ M by comparing N M in 2019 and 2020.The average mobility frequencies N M and mobility reductions Δ M for  1.We can observe a consistent reduction in all analyzed MSAs, which reflects the overall impact of social distancing and stay-at-home orders during the pandemic.Second, we calculate each neighborhood's urban facility accessibility, which is the proportion of its visits to a specific category of POIs to its total visits N M .This pattern reflects urban dwellers' access to urban facilities.Specifically, we focus on four representative categories, i.e., Art and Recreation, Sports, Education, and Health.The average accessibility to four categories in each MSA in 2019 is listed in Table 1 as "Art%", "Sports%", "Edu%", and "Health%".
Preliminary analysis.We calculate the mobility patterns of each neighborhood in the six most populated US metropolitan statistical areas by the SafeGraph monthly patterns datasets covering July through September in the years of 2019 and 2020.A simple approach to measuring mobility inequality in an urban area is to calculate the correlation between a neighborhood's demographic features and its mobility patterns (See Supplementary Fig. S1 for the spatial distribution of mobility patterns and demographic features in New York).Table 2 shows correlations between mobility patterns and demographic features and their corresponding significance levels (two-sided p-values) in New York MSA as an example (See correlation tables of other MSAs in Supplementary Table S1−5).Significant correlations indicate mobility patterns vary substantially with the neighborhood's demographic features, indicating potential mobility inequalities.For example, neighborhoods with a higher bachelor ratio are correlated with higher mobility frequency.However, as observed in Supplementary Fig. S1, the bachelor ratio is also positively correlated with average income, which raises the concern of confounding: the correlation between bachelor ratio and mobility frequency might be sustained by their common correlations with average income.To address this concern, our subsequent analysis employs PSM to mitigate confounding bias and assess the impact of each feature on mobility patterns independently.
Propensity score matching.We adopt the PSM method, a matching method widely used in the causal inference literature (Rosenbaum and Rubin, 1983), to estimate the causal effect of demographic features on heterogeneous urban mobility patterns.The procedure of this method is illustrated in Fig. 1.
The core concept is to account for potential confounding factors denoted as covariates X by matching each neighborhood with neighborhoods that share similar X values but differ in a specific relevant demographic feature, referred to as the treatmentT (Fig. 1A).The PSM procedure disentangles the influence of confounding effects by creating a treatment group comprising neighborhoods with comparable covariates X and attributes the variations in mobility pattern within this group to differences in the treatment level, i.e., differences in the relevant demographic feature.However, exact matching based on high dimensional covariates can be computationally expensive and often leads to sparsity issues.To address this, we employ a propensity score function b(X) to map covariates into a scalar propensity score.Specifically, we first discretize the treatmentT into five equally-sized bins, defined as treatment levels L(T).Subsequently, we fit an ordinal regression model (McCullagh, 1980) to estimate propensity scores, where L * represents the treatment level, and θ L Ã along with w are learnable model parameters.The ordinal regression model satisfies the condition that treatment level L(T) and covariates X are independent given b(X) = w T X.Therefore, We calculate ŵT X as the estimated propensity score for each neighborhood, with ŵ being the model's fitted parameter.The distribution of X conditioned on b(X) tends to be similar across different values of L(T).As a result, the estimated propensity scores can be employed as matching criteria.To perform matching, we define the distance between two neighborhoods as the disparity in estimated propensity scores divided by the difference in L(T).
Each neighborhood is then matched with its nearest neighbor using this distance metric.
Once we have established pairs of matched neighborhoods denoted as M = {(i, j)}, we estimate the average treatment effect of T on a specific mobility behavior Y.This estimation is based on the expected change in Y (which can represent mobility frequency or urban facility accessibility) when the treatment level L(T) is increased by one unit of level, A critical prerequisite for estimating unbiased treatment effects of demographic features on mobility patterns is the selection of covariates.In the PSM procedure, for each demographic feature as the treatment, we choose other demographic features along with the ratio of residents younger than 20 years and older than 60 years as covariates for the following reasons.First, all demographic features can potentially influence urban mobility.Previous studies have revealed disparities in urban mobility based on gender (Gauvin et al., 2020), race (Wang et al., 2018), economic status (Macedo et al., 2022), and disability (Saha et al., 2021).Additionally, differences in mobility patterns among different age groups are commonly observed in urban environments.Therefore, we consider all demographic features and age distribution as potential covariates.Second, an important characteristic of the urban environment is circular causality (Bettencourt, 2021).Demographic features at the CBG level also exhibit this characteristic.For example, CBGs with a higher percentage of bachelor's degree holders may attract residents with higher-paying careers, resulting in a higher average income for that CBG.Conversely, CBGs with better economic conditions may be more appealing to highly educated residents, leading to an increased bachelor ratio.Therefore, for each treatment variable T, we include the other demographic features along with the age distribution within the CBG as covariates represented by vector X.In each MSA, we employ PSM to estimate the average treatment effects of five demographic features on mobility patterns, including mobility frequency in 2019, its reduction in 2020, and urban facility accessibility to four POI categories.
Counterfactual random walk on urban mobility network.
While PSM estimation is effective, it has limitations as it can only produce aggregate statistics at the MSA level, i.e., average treatment effects.These statistics provide limited insights into the micro-level mechanisms of mobility inequality and are not easily applicable to downstream applications or analyses.Drawing inspiration from recent studies that uncover gender disparities using word embeddings (Garg et al., 2018), we develop a novel representation learning algorithm to capture micro-level disparities in facility accessibility using continuous embedding vectors.Our approach involves conducting counterfactual random walks on an urban mobility network to assess the empirical association between a specific demographic feature and the accessibility of a particular category of POI.We compare these associations with alternative outcomes that would occur if the demographic feature did not causally affect access to the POI category.This process allows us to update corresponding embeddings that preserve real-world gaps in accessibility data.Specifically, we first construct an urban mobility network that consists of four types of heterogeneous nodes, representing POI categories, POIs, neighborhoods (CBGs), and demographic features, shown in Fig. 2.There are three types of edges in the constructed network, i.e., POI-category edges and neighborhooddemographic edges that connect each POI and neighborhood to its category and demographic feature, respectively, and POIneighborhood edges weighted by visitation frequency in mobility data.In each random walk, we first select a POI category Q and sample a specific POI P i based on its total visitation frequency.Then, we sample a neighborhood C o with probability proportional to the edge weights w io between P i and C o , normalized by the out-degree of C o , i.e., Prob(C o |P i ) ~wio /∑ k w ko .It captures the likelihood for neighborhood C o to visit that POI, i.e., the accessibility of that urban facilities.Finally, we look up the value of the interested demographic feature of C o as the observed outcomeo.Following this procedure, we can sample a Q → P i → C o → o path that iteratively connects the POI category, POI, neighborhood, and its demographic feature.It serves as a sample of the empirically observed accessibility from the interested demographic feature to a POI category.
Aside from the observed outcome, we sample a neighborhood C a as the alternative outcome in the counterfactual scenario that the interested demographic feature does not have a causal effect on the access to that POI category.Specifically, we mimic the  matching process in PSM to sample a neighborhood C a with the same stratified covariates as C o and take its demographic feature as the alternative outcomea.For instance, as shown in Fig. 2, we randomly sample a neighborhood from the pool that has the same stratified covariates as C o and take its female ratio as the alternative outcome.Intuitively, the alternative outcome assesses the counterfactual scenario that the demographic feature has no causal impact on the accessibility to specific urban facilities, and hence its distribution is solely conditioned on the covariates.Therefore, if the observed outcome and alternative outcome follow significantly different distributions, it indicates the counterfactual scenario is unlikely and the demographic feature probably has a causal effect, which provides a signal to update the node embeddings.
Network embedding for mobility inequality.On top of the sampled random walks, we seek to preserve the difference between observed and alternative outcomes in the embedding space to capture the gaps in urban facility accessibility.Specifically, the alternative outcomes serve as the negative samples that the examined demographic feature does not have a causal impact.Therefore, in each random walk, we maximize the embedding similarity for the co-occurrences of observed outcome, observed neighborhood, POI, and POI category, and minimize the embedding similarity for the co-occurrences of alternative outcome, alternative neighborhood, POI and POI category.
In each MSA, we learn an embedding vector for each of the following entities: four POI categories, all POIs, all neighborhoods, and all treatment levels of five neighborhood demographic features.Observed outcomes and alternative outcomes are represented as the vector of its corresponding treatment level L(T).We train embedding vectors to minimize the following loss function: Regularization terms are added to the loss function to achieve spatial smoothness of neighborhood embeddings and continuity in demographic feature embeddings.According to the First Law of Geography (Miller, 2004), nearby things are more related to one another.Two nearby neighborhoods should also be adjacent to our embedding space.The spatial regularization term is the weighted sum of squared distances between adjacent neighborhoods in the embedding space.Weights are defined as s Þ for each pair of neighborhoods with distance d less than a threshold σ s .The demographic regularization term is the sum of squared distances between adjacent L(T)'s in the embedding space, e.g. the lowest and second-lowest level of average income.We use the Adam optimizer (Kingma and Ba, 2015) to learn 64-dimensional embedding vectors for each category, POI, neighborhood, and L(T) to minimize the loss function (3).The spatial threshold σ s is set to 2.5 kilometers, and the strength of regularization is 0.0001 and 0.01 for spatial and demographic feature continuity.See Supplementary Note S1.2 for implementation details and complexity analysis of the algorithm.

Consistent gaps in urban accessibility
Mobility frequency.The first row in Fig. 3 shows the estimated treatment effects of demographic features on mobility frequency in each MSA.Each panel represents the effects of one demographic feature on N M in six MSAs.Positive and negative effects are indicated by blue and red columns, respectively.Whiskers represent the 95% confidence interval of the treatment effect.Significant gaps are indicated by dark columns with a confidence interval on one side of the x-axis.According to Amartya Sen's capability analysis framework, increased mobility frequency signifies an enhanced capability to access and visit urban facilities.Negative treatment effects on mobility frequency imply reduced capability as the demographic feature increases within the neighborhood.
From our analysis of treatment effects, we summarize three key observations of mobility frequency gaps.First, neighborhoods with higher average income have higher mobility frequency.In all MSAs, treatment effects are over 30, implying that a neighborhood with one unit higher L(income) will have approximately 30 more visits per 100 residents in three months.This is consistent with previous findings that low-income residents have lower mobility rates (Pucher and Renne, 2003), revealing an unignorable cost of moving around the city.Second, in terms of education, neighborhoods with a higher proportion of bachelor's degree holders demonstrate significantly lower mobility frequencies in most MSAs.This negative effect is opposite to the positive correlation reported in Table 2, underscoring the necessity of controlling for confounding demographic features.This difference implies that among individuals with similar income levels and other demographic characteristics, highly educated individuals access urban facilities less frequently.Third, neighborhoods with a higher white ratio have significantly higher mobility frequencies in most MSAs, which means that the white population is clearly advantaged in accessing urban facilities.This observation reveals the superior access to urban essential facilities of the white population in these MSAs.
Mobility reduction during the COVID-19 pandemic.Since the COVID-19 outbreak, authorities have issued mobility control measures such as social distancing and stay-at-home orders to request that urban dwellers reduce unnecessary travel.Although we observe a consistent reduction in the overall mobility frequency, such reduction is not experienced equally by all.To measure this gap, we show the treatment effect of neighborhood demographics on the reduction of mobility frequency in 2020 in the second row of Fig. 3, where blue columns indicate that a higher L(T) of the demographic feature results in a larger mobility reduction.
We summarize two key observations as follows.First, we find that neighborhoods with a higher bachelor ratio have greater mobility reduction in all six MSAs.One explanation is that highly-educated residents can more easily transition to telework due to the nature of their occupations, e.g., working on personal computers (Lund et al., 2020).In contrast, people with inferior educational backgrounds are more likely to be essential workers whose work cannot be performed remotely, such as building cleaning and equipment maintenance.As a result, when public health crises force people to reduce mobility, inequalities faced by people of different educational backgrounds become exacerbated.
Second, we find that residents with different ethnicities are unequally affected by the pandemic.Specifically, neighborhoods with a higher white ratio have less mobility reduction in all 6 MSAs, which widens the pre-existing accessibility gap among ethnic groups.This is likely because white populations have superior access to urban facilities and opportunities, making them more resilient to pandemic shocks.
Urban facility accessibility.The treatment effects of neighborhood demographics on residents' accessibility to four venue categories -Art & Recreation, Sports, Education, and Health in the six most populated MSAs are shown in Fig. 4.
Art and recreational venues are more accessed by neighborhoods with a higher bachelor ratio.Treatment effects are over 0.2% in New York and Washington DC.This reveals a difference in utilizing urban facilities of art and culture, with highly educated residents more likely to visit these venues.Sports facilities are more accessed by neighborhoods with higher average income, bachelor ratio, white ratio, and lower disability rates.This suggests that disabled residents have very limited access to sports facilities.
Meanwhile, uneven access is also prominently associated with income, race, and educational background.Educational POIs are more accessed by neighborhoods with higher average income, disability ratio, white ratio, and lower bachelor ratio.This indicates that wealthy and white populations are most advantaged in accessing educational facilities.Health services are more accessed by neighborhoods with higher disability rates and female ratios.This is reasonable as people with disabilities seek health services and rehabilitation more frequently and females carry more healthcare burdens.Surprisingly, neighborhoods with lower average income also pay a higher ratio of visits to health services in Chicago, Dallas, Houston, and Washington DC.It is likely because wealthier people resort more often to doctor home visits for care.This also reflects discrepancies in health status and healthcare needs, as wealthier people are also more likely to maintain better health with reduced onsite healthcare needs.
Besides identifying notable inequalities in mobility frequency and urban facility accessibility, we find interesting results showing how the PSM procedure alleviates confounding effects by balancing covariates.For instance, bachelor ratio is negatively correlated with art & recreation choice in the New York MSA.After controlling for white ratio, the treatment effect of bachelor ratio on art choice is positive.Cases where the treatment effect reverses correlation direction are marked as bold in Table 2.These reversals of direction demonstrate the importance of PSM to identify inequalities from a causal perspective.

Counterfactual network embeddings
Justification of CRANE.We justify the effectiveness of CRANE by examining two stages, the counterfactual random walk stage and the network embedding learning stage, respectively.
First, we justify our proposed counterfactual random walk by examining its consistency with PSM.In the design of the counterfactual random walk, observed outcomes should characterize the characteristics of neighborhoods with higher urban facility accessibility.Therefore, for demographic features with significant positive treatment effects on urban facility accessibility given by PSM, observed outcomes are expected to be larger than the alternative outcomes.We count the total number of pairs among 77 demographic feature-POI category pairs that violate this law on different sample sizes and draw their relationship in Fig. 5A.We find that when we sample more random walks, the violation percentage decreases rapidly and drops below 10% when the sample size reaches 100,000.Thus, with a sufficiently large sample size, the counterfactual random walk serves as an effective approximation of the traditional PSM procedure.In the following experiments, we sample 200,000 random walks for each category, ensuring the credibility of our analysis.
Second, we evaluate the effectiveness of our embedding algorithm by comparing each feature's treatment effect on urban resource access and the regression coefficient between L(T) and its proximity with POI categories in the embedding space.We measure proximities with the inner product between L(T) and POI category's embedding vectors.Regression coefficients are listed in Table 3 (See Supplementary Table S6 for significance levels).Insignificant treatment effects of demographic features on urban facility accessibility are marked in white cells.Among 77 significant demographic-category pairs (p < 0.05), 92.2% (71) pairs have identical directions in treatment effect with regression coefficient marked by green cells.6 pairs that have the opposite decreases with the number of sampled random walks on the urban mobility network and is reduced under 10% when the sample size is over 200,000.B 2D visualization of learned embeddings.Four category embeddings are fixed to be projected onto (0,1), (1,0), (0,-1), and (-1,0).Dashed arrows represent the directions pointing to the highest L(T) from the lowest L(T)'s embedding in the projected 2D space of each demographic feature.
Table 3 Regression coefficients between each demographic feature and its proximity with POI categories in the embedding space.
Significant inequalities in PSM analysis are highlighted, where the green (red) color indicates consistent (contradictory) directions.direction are marked in red cells.This result confirms the consistency between the embedding algorithm and matching analysis.
We further illustrate the proximity between the POI category and demographic embeddings with the highest and lowest L(T) in Chicago in Fig. 5B.Embedding vectors are projected to a 2D plane where proximities are preserved.To better demonstrate gaps in urban facility accessibility, we mark the direction from lowest L(T) to highest L(T) for each demographic feature in the plane.The visualization demonstrates (1) the association between high L(bachelor%) and high accessibility to sports and educational facilities, and (2) the association between high L(white%) and high accessibility to art and sports facilities.Based on the above observations, the counterfactual network representation learning method effectively captures MSA-level accessibility gaps across neighborhoods of different demographic backgrounds.
Predictive analysis.Given that the learned embedding vectors can successfully capture macro MSA-level treatment effects, we proceed to employ these vectors for predicting facility accessibility at the micro neighborhood-level, thus confirming their ability to capture causal relations in fine-grained accessibility gaps.Specifically, we first conduct a within-city prediction task.In each MSA, we randomly select 80% of the neighborhoods as the training set, on which we perform counterfactual random walks and the network embedding algorithm.We combine the raw demographic features of each neighborhood with the dot products between their corresponding L(T)'s embedding vectors and all category embedding vectors as the input for Multilayer Perceptron (MLP) regression models to fit the neighborhood's accessibility to four categories from the training set.Then we test the fitted models on the remaining 20% neighborhoods and estimate performance as the average explained variance on five train/test random splits (see Supplementary Notes S1.3 for implementation details).
We compare CRANE to four baselines with different model input combinations.The first baseline only utilizes the neighborhood's raw demographic features.The second baseline utilizes both raw demographic features and aggregate statistics of urban facility accessibility from the training set.For instance, if a neighborhood has the highest L(white%), the average urban facility accessibilities of all neighborhoods with the highest The values show the relative improvement in explained variance compared to using raw demographic features as model input.Bold text indicates the most improved method for prediction performance.The values show the relative improvement in explained variance compared to using raw demographic features as model input.Bold text indicates the most improved method for prediction performance.
L(white%) in the training set are concatenated with this neighborhood's raw features as model input.The third and fourth baselines utilize embedding vectors learned by two graph embedding methods, namely LINE (Tang et al., 2015) and node2vec (Grover and Leskovec, 2016).The two embedding methods are correlation-based and do not incorporate specific designs to account for causal relationships.We construct a category-neighborhood-feature network with urban facility accessibility as edge weights and use unsupervised versions of these methods to learn embedding vectors that retain proximity.Dot products between treatment level vectors and all category vectors are combined with the neighborhood's demographic features as the input of regression models.The relative improvements of the three methods compared with raw features are listed in Table 4.Among all the methods considered, CRANE outperforms others in 20 out of the 24 prediction tasks, particularly in the New York, Chicago, and Dallas datasets.On average, CRANE improves the explained variance in these tasks by 12.57%.This underscores the benefit of employing a causal matching strategy to learn embedding vectors for out-of-sample prediction tasks.Unlike correlation-based methods which are susceptible to confounding effects and distribution shifts between training and test sets, CRANE's counterfactual random walk strategy offers greater stability in prediction performance by capturing embedded micro-level causal relationships.
In addition to within-city prediction tasks, we further test CRANE's ability to transfer knowledge of micro-level accessibility gaps across MSAs.Specifically, we utilize embedding vectors and aggregate statistics learned from other MSA mobility networks to predict urban facility accessibility in Chicago MSA (see Supplementary Notes S1.3 for implementation details).Improvements in explained variance compared with raw features as input are listed in Table 5.Each column represents the performance of transferring knowledge from another MSA to Chicago.Our embedding methods achieve the best performance in 16 out of 20 tasks, especially in predicting art and sports choices.Consequently, CRANE effectively distills knowledge pertaining to micro-level accessibility gaps that are universal across MSAs, thereby enhancing its capacity to generate insights with limited data availability.
Visualizing maps of urban accessibility gap.Having demonstrated CRANE's effectiveness in capturing MSA-and neighborhoodlevel accessibility gaps, we now assess its capacity to provide micro POI-level insights through a case study in Houston.Suppose we want to query the top 10 sports POIs preferred by highincome neighborhoods.The first method is to directly look at the mobility statistics and retrieve the top 10 sports POIs with the highest accessibility from neighborhoods with the highest L(income).The second method, i.e., our CRANE method, is to retrieve the top 10 sports POIs whose embedding vectors have the highest inner product with the embedding vector of the highest L(income).As a comparison, we plot the spatial distribution of queried POIs in Fig. 6.We use black, red, and blue colors to denote POIs that are queried (1) by both methods, (2) only by mobility statistics, and (3) only by CRANE embeddings.Although all queried POIs are located within rich regions of the city, the two methods make largely different choices.The reason lies in a confounding variable, white ratio.In Houston, average income and white ratio are positively correlated and they are both positively correlated with accessibility to sports POIs.After distinguishing neighborhoods in Houston by the difference in their L(income) and L(white%), we observe that most sports POIs queried by mobility statistics (red pins) are located within regions where L(income) and L(white%) are both high (white color).In contrast, most sports POIs queried by embedding (blue pins) are located within regions where L(income) is greater than L(white%) (green color).This observation suggests that the advantage of high L(income) in sports accessibility among low L(white%) regions is captured and emphasized by CRANE, while unobserved in mobility statistics.Therefore, our CRANE framework can successfully untangle strong correlations between demographic features to generate POI-level causal representations.

Discussion
Traveling without limitation in the urban space is an essential capability for urban dwellers to sufficiently exploit urban opportunities.Our analysis of detailed mobility data covering two years reveals significant gaps in mobility behaviors from various demographic perspectives.Clear implications for urban planners and researchers can be drawn from the findings.
We first quantify the neighborhood-level treatment effects of demographic features on the mobility frequency before the COVID-19 pandemic by PSM.Significant treatment effects with identical direction among the majority of studied MSAs indicate serious inequality in mobility frequency.Neighborhoods with lower income have lower mobility frequencies.This is in line with long-standing social issues in class inequality across the U.S., which impairs equal sharing in urban opportunity.Such a gap might result from the interplay of the low-income group's segregation from the mainstream communities (Moro et al., 2021) and the high cost of urban transportation (Barbosa et al., 2021).In addition, the racial gap that neighborhoods with a more white population have higher mobility frequency is observed in all six MSAs but the capital city.Such class and racial inequalities in mobility frequency raise the concern on equality of citizens sharing opportunities in the urban space, calling for the design of urban policies that mitigate them.When we control race, income, and other confounding covariates, the bachelor ratio presents a Fig. 6 The spatial distribution of queried POIs in Houston.We query sports POIs by two methods -retrieving POIs with the highest accessibility from high-income neighborhoods and POIs with the highest inner product with the embedding of the highest L(income).Red, blue, and black pointers represent POIs retrieved by the statistic method, embedding method, and both methods.Compared with red pointers, blue pointers locate in regions where the income level is greater than the white ratio level, capturing the advantage of income on accessing sports POIs in low white ratio neighborhoods.
negative treatment effect in five MSAs.This evidence reflects the small commuting burden for highly-educated citizens under similar socioeconomic status.As opposed to the widely observed gender inequality in urban mobility in other countries (Gauvin et al., 2020;Ng and Acker, 2018), women's mobility frequency is not different from men's significantly in US cities.Additionally, we observe an insignificant treatment effect related to the disability household ratio, suggesting that urban spaces in the U.S. offer relatively equal opportunities for all genders and are accommodating for disabled individuals.
During the COVID-19 pandemic, mobility frequencies in large U.S. cities drop around 25% due to lockdown policies and POI shutdowns.By analyzing demographic features' effects on the reduction in mobility frequency in 2020, we reveal gaps in residents' capability of responding to an abrupt outbreak of infectious disease.Neighborhoods with a higher bachelor ratio reduce their mobility frequency more drastically in all six MSAs.Similarly, high-income neighborhoods also possess significant large mobility reductions in three MSAs.The above evidence reveals the gap in the capability to positively respond to mandate lockdown policies.Highly-educated people and people with better socioeconomic conditions have the advantage of accomplishing work remotely and living with their savings.By contrast, the nature of work for people with lower educational backgrounds requires them to travel in cities despite the social distancing and lockdown policies.The white population demonstrates better resilience during the pandemic shock.Neighborhoods with a higher white ratio have less mobility reduction in all six MSAs.This evidence suggests that the mobility gap between the white population and the minorities is exacerbated during the pandemic.The minorities are gradually pushed to the fringes of society, losing their rights and capability to exploit urban resources.Local government could assign economic support to the population in need and improve their remote working condition.
Furthermore, we analyze differences in residents' accessibility to different types of POIs.Both PSM and network representations demonstrate differences in accessibility to various urban venues.Sports venues are highly accessed by neighborhoods with higher average income, higher bachelor ratios, and higher white population ratios.Similarly, neighborhoods with higher average income and higher white population ratios have a higher proportion of visits to educational POIs.The class and racial gaps in education and sports accessibility urgently require policies striving for low-cost and quality opportunities for being educated and exercising for minorities and low-income populations.Neighborhoods with higher disability ratio and female ratio have higher urban facility accessibility to health POIs in New York, Chicago, Dallas, and Houston MSAs, indicating a high level of healthcare burden for women and people with disabilities.
Aside from the PSM method, we propose a counterfactual random walk-based representation learning algorithm called CRANE to capture the micro-mechanism of urban mobility inequality in an embedding space.Within-and cross-city prediction tasks confirm the effectiveness of the embedding vectors performing better and more stable in the prediction of urban facility accessibility.By querying proximity representation vectors in the embedding space, we can visualize the micro-level causal mechanism of gaps in urban facility accessibility.The effort of capturing accessibility gaps by representation learning approaches opens the door to understanding the micro-level mechanism of inequality and applying representation vectors for more urban tasks.For example, we can analyze POIs in which locations possess the harshest racial gaps.POI recommendations and site selection problems should also consider gaps in residents' mobility patterns in the embedding space to promote equal access to urban facilities.
It is possible that factors beyond our consideration may also contribute to gaps in mobility patterns, including spatial factors such as road network structures and the availability of public transportation.Our current focus is on assessing accessibility gaps across various demographic backgrounds.Given that demographic attributes largely determine residential choices and living contexts (Clark, 1992;Wilson, 2006), we have not included spatial factors as potential covariates that could confound demographic features.In future research, natural experiments could be conducted to analyze the impact of spatial factors on accessibility gaps.Here, by combining large-scale mobility data and leveraging causal analysis to balance key demographic covariates, we dramatically reduce confounding effects, providing an in-depth understanding of inequality in urban mobility.In addition, with the reopening policies and cancellation of mask mandates in years after 2021, consumption, production, and life are back on track.Future research can assess the inequality in the recovery of urban mobility in the post-pandemic era, revealing gaps in citizens' mobility resilience and how the pandemic shapes their mobility.
Overall, our study suggests that in urban planning and governance, more attention should be paid to ensuring equal accessibility to urban venues so that a city can truly become a sustainable place that "provides something to everybody" (Jacobs, 1961).

Fig
Fig. 1 A schematic representation of the propensity score matching method.A Treatment and covariates of urban CBGs.Treatment is stratified into 5 levels, denoted as L(T).B Estimation of propensity scores by fitting an ordinal regression on L(T) by covariates.C Matching two closest CBGs according to their distance determined as the difference in propensity scores divided by the difference in L(T).D Calculating average treatment effect from matched CBG pairs M.

Fig. 2
Fig.2Illustration of counterfactual random walks on urban mobility network.Starting from a POI category, a POI is sampled based on its total visitation frequency and an observed neighborhood is sampled based on its normalized visitation frequency to the sampled POI.An alternative neighborhood is sampled from all neighborhoods that have covariates identical to the observed neighborhood.
where {(c, P, C o , C a , o, a)} is the set containing the sampled category, POI, neighborhood, alternative neighborhood, observed outcome, and alternative outcome walks, v c , v P , v C o are the POI category, POI, and neighborhood embedding vectors, σ( ⋅ ) is the sigmoid function, v o and v a are embedding vectors of observed outcome and alternative outcome's treatment levels, respectively.

Fig. 3
Fig. 3 The treatment effects on mobility frequency in 2019 and mobility reduction during COVID-19 pandemic.Each panel denotes the treatment effects of one demographic feature on a mobility pattern in six MSAs.Blue and red bars indicate positive and negative effects, respectively.Whiskers correspond to the 95% confidence intervals.

Fig. 4
Fig. 4 The treatment effects on the accessibility to four POI categories.Each panel denotes the treatment effects of one demographic feature on the accessibility to a POI category in six MSAs.Blue and red bars indicate positive and negative effects, respectively.Whiskers correspond to the 95% confidence intervals.

Fig. 5
Fig. 5 Demonstration of the convergence and effectiveness of CRANE.A Convergence with sample size.The percentage of difference with PSM resultsdecreases with the number of sampled random walks on the urban mobility network and is reduced under 10% when the sample size is over 200,000.B 2D visualization of learned embeddings.Four category embeddings are fixed to be projected onto (0,1), (1,0), (0,-1), and (-1,0).Dashed arrows represent the directions pointing to the highest L(T) from the lowest L(T)'s embedding in the projected 2D space of each demographic feature.

Table 1
Basic statistics of ACS and SafeGraph datasets in 6 most populated US MSAs.

Table 2
The correlation between demographic feature and mobility behavior in New York metropolitan statistical area.

Table 4
Predicting the urban facility accessibility of out-of-sample neighborhoods.

Table 5
Predicting the urban facility accessibility of the neighborhoods in Chicago MSA by transferring the knowledge from other MSAs.