Cross-sectional and longitudinal analyses of outdoor air pollution exposure and cognitive function in UK Biobank

Observational studies have shown consistently increased likelihood of dementia or mild cognitive impairment diagnoses in people with higher air pollution exposure history, but evidence has been less consistent for associations with cognitive test performance. We estimated the association between baseline neighbourhood-level exposure to airborne pollutants (particulate matter and nitrogen oxides) and (1) cognitive test performance at baseline and (2) cognitive score change between baseline and 2.8-year follow-up, in 86,759 middle- to older-aged adults from the UK Biobank general population cohort. Unadjusted regression analyses indicated small but consistent negative associations between air pollutant exposure and baseline cognitive performance. Following adjustment for a range of key confounders, associations were inconsistent in direction and of very small magnitude. The largest of these indicated that 1 interquartile range higher air pollutant exposure was associated on average with 0.35% slower reaction time (95% CI: 0.13, 0.57), a 2.92% higher error rate on a visuospatial memory test (95% CI: 1.24, 4.62), and numeric memory scores that were 0.58 points lower (95% CI: −0.96, −0.19). Follow-up analyses of cognitive change scores did not show evidence of associations. The findings indicate that in this sample, which is five-fold larger than any previous cross-sectional study, the association between air pollution exposure and cognitive performance was weak. Ongoing follow-up of the UK Biobank cohort will allow investigation of longer-term associations into old age, including longitudinal tracking of cognitive performance and incident dementia outcomes.

small or null effects have also been shown in studies of cognitive score change over intervals of two to five years [29][30][31] . A smaller number of studies have examined associations with clinical cognitive disorders. In cross-sectional and case-control studies, air pollutant exposure has been linked with greater odds of global and amnestic mild cognitive impairment (MCI) 32 and Alzheimer's disease or vascular dementia 33 . Cohort studies investigating dementia incidence over follow-up periods of up to 15 years have reported greater hazard ratios associated with air pollutant exposure [34][35][36] and proximity to major roads 37 . Heterogeneity of study populations, measures and statistical adjustment methods means that none of the five systematic reviews published to date has included a meta-analysis of effect size estimates. Considerable uncertainty therefore remains regarding the likely magnitude of any association.
Large study samples are needed for the reliable detection of associations of small effect size. Cross-sectional studies investigating the relationship between air pollution exposure and performance on cognitive tests have used samples ranging from n = 399 22 to n = 15,973 26 , and longitudinal studies of cognitive score change have ranged from n = 2,867 30 to n = 20,150 29 . The inconsistent findings and imprecision of the effect size estimates from these studies may indicate the need for samples which are larger still. Previous studies have also varied considerably in the air pollutant exposures that have been measured-most frequently particulate matter, with gas pollutants studied less often-and in the outcomes investigated, encompassing a range of different cognitive assessments or dementia diagnoses. Furthermore, the type and number of possible confounding variables adjusted in statistical analyses have varied widely: most studies have adjusted for sociodemographic factors (including age, gender, ethnicity, education and socioeconomic status), and some have taken account of lifestyle factors such as smoking and physical activity. No study to date has examined the influence of time spent outdoors, despite the fact that the pollution measures predominantly capture neighbourhood air quality rather than individual-level exposure; it might be expected that the magnitude of the relationship between neighbourhood pollution measures and cognitive outcomes would be greater among those who spend more time outdoors.
The UK Biobank resource 38 offers an opportunity to address some of these limitations. More than half a million adults in middle to early old age were included in the UK Biobank cohort at baseline. Importantly, although the air pollution measures were taken at the neighbourhood level, participants also provided individual information regarding length of time typically spent outdoors. Moreover, a subset of the cohort returned for repeat cognitive assessment between two and seven years post-baseline, thus permitting both cross-sectional and longitudinal analyses. This is the largest study to date of air pollution and cognitive test performance, with standardised measurement of multiple pollutant exposures, cognitive outcomes and potential confounders, in a single general population cohort.
The aims of this study were to estimate the association between baseline neighbourhood-level exposure to airborne pollutants and (1) cognitive test performance at baseline and (2) cognitive score change between baseline and follow-up, taking account of confounding factors and the potential moderating influence of time spent outdoors.

Results
Characteristics of the study population. Of the full cohort (n = 502,623), 88,277 had their baseline assessment on or after 01 January 2010, of whom 86,759 (98.3%) had data on at least one air pollution measure and cognitive test. Of these, 2,913 (3.4%) attended the follow-up visit, which took place a mean of 2.8 years (standard deviation [SD] 0.2) after baseline, and had repeat data on at least one cognitive test. Table 1 summarises the baseline characteristics of the participants included in the cross-sectional and follow-up analyses. Descriptive information indicated that the sub-group of participants included in the follow-up sample differed in their baseline characteristics from the cross-sectional sample: the follow-up sub-group was somewhat older at baseline, resided in less deprived and less polluted neighbourhoods, scored better on the baseline cognitive tests, and had relatively higher proportions of men, white participants, never-smokers, participants with a degree, and non-urban dwellers. The follow-up sample also had lower proportions of missing data on most measures, although missingness was generally low across all baseline measures with the exception of physical activity (7.5% missing) and time spent outdoors (6.8% missing).
The median change score on the reasoning test was 0 (interquartile range [IQR] = 2; n = 2,878), and was also 0 on the pairs matching test (IQR = 4; n = 2,913). The median change on the reaction time test was −5 ms (indicating faster performance at follow-up; IQR = 109; n = 2,896). Of n = 2,910 with follow-up data on the prospective memory test, 121 (4.2%) showed decline.
Association between air pollution exposure and baseline cognitive function. Table 2 shows the results of the separate regression models for each air pollution exposure and each cognitive test at baseline. In the unadjusted models, associations were evident between all five air pollutant measures and four of the five cognitive scores, with higher levels of pollutants being associated with worse cognitive performance. Point estimates for the numeric memory test followed the same pattern as for the other cognitive tests, but the sample sizes on this test (≤1,458) were considerably smaller than for the other tests (≥83,238) and the confidence intervals included the null.
In the adjusted models, the point estimates attenuated towards the null and in some models they reversed direction; the confidence intervals remained narrow but included the null in the majority of models. Five of 25 results had false discovery rate (FDR)-adjusted p values below 0.05, but two of these indicated a positive relationship, such that higher pollutant exposure was associated with better performance on the reasoning test. Of the three inverse associations with cognitive performance, two were on the reaction time test; Table 2 Table S1a). Three estimates with an FDR-adjusted p value slightly above alpha in the main models now had adjusted p values that were slightly below alpha, although the point estimates and CIs were essentially unchanged compared with the adjusted models that did not include noise pollution. These estimates indicated that 1 IQR difference in PM 2.5 to 10 was associated with a numeric memory score that was 0.16 points lower (95% CI: −0.29, −0.03; possible score range 2 to 12). One IQR difference in NO x was associated with a reasoning score that was 0.022 points lower (95% CI: −0.041, −0.004; possible score range 0 to 13) and with a numeric memory score that was 0.12 points lower (95% CI: −0.21, −0.02).
Association between air pollution exposure and change in cognitive function. Table 3 shows the results of the separate regression models for each air pollution exposure and score change on each cognitive test between baseline and follow-up. Samples sizes for these models were much smaller than for the cross-sectional models (n = 2,875 to 2,910 unadjusted and n = 2,590 to 2,605 with adjustment), and confidence intervals were consequently wider. All of the FDR-adjusted p values were above 0.05, and the direction of association indicated by the point estimates was inconsistent. There was no evidence of interaction between the air pollutant measures and time outdoors in any of the models. The results were very similar when noise pollution was added to the adjusted models (Supplementary Table S1b).
Sensitivity analyses. Impact of prevalent neurological disorders. At baseline, 3,642 (4.2%) of the study population (i.e. those assessed on or after 01 January 2010) self-reported a condition that may affect brain function. When these participants were excluded from the analyses, the results of the cross-sectional models were virtually unchanged (Supplementary Table S2a) compared with the main models ( Table 2). The results of the follow-up models excluding these participants (Supplementary Table S2b) followed the same overall pattern as the main models (Table 3), although with minor variation in point estimates and confidence interval limits.
Missing covariate data. Across all models, up to 12,249 (14.2%) of the participants did not have complete covariate data for the adjusted cross-sectional analyses, and up to 293 (10.1%) did not have complete covariate data for the follow-up analyses. When the unadjusted cross-sectional models were re-run using only participants with complete covariate data (Supplementary Table S3a), all the point estimates attenuated slightly towards the null, but this did not account for the differences seen between the unadjusted and adjusted results in the main models ( Table 2). The results of the unadjusted follow-up models in participants with complete covariate data (Supplementary Table S3b) showed variation in the point estimates compared with the main models, both towards and away from the null, but this did not alter the interpretation of the adjusted main model results presented in Table 3.
Air pollution data source. Supplementary Table S4a shows that the median pollutant levels, as measured by Defra and mapped to each participant's address in the relevant year, were approximately 12-25% lower than those recorded in the central UK Biobank dataset using the baseline address. An exception was the PM 2.5 measure, which was 23% higher in the Defra dataset. As with the centrally-recorded data, the median pollutant levels recorded by Defra were lower in the follow-up sample than in the baseline sample.
The results were similar when the main cross-sectional analyses were repeated using the Defra data as the independent variables (Table 4), with the unadjusted analyses again indicating small associations between higher pollutant levels and worse performance on most cognitive tests. Eight of the 25 adjusted models had FDR-adjusted p values below 0.05, all of which indicated an inverse relationship between pollutant levels and cognitive performance. The largest effect sizes were seen for NO x exposure, for which 1 IQR difference was associated with: 2.92% (95% CI: 1.24, 4.62) higher rate of errors on the pairs matching memory test; 0.58 (95% CI: −0.96, −0.19) lower score on the numeric memory test (possible score range 2 to 12); and 0.21% (95% CI: 0.00, 0.41) slower reaction time. One IQR difference in PM 2.5 to 10 was associated with 0.04 (95% CI: 0.02, 0.06) lower score on the reasoning test (possible score range 0 to 13). Three of these models showed evidence of interaction, as detailed in Table 4: for each, the inverse relationship between pollutant level and cognitive performance was strongest in participants in the middle quintile of time spent outdoors.
The follow-up models were also repeated using the Defra data (Supplementary Table S4b). As with the analyses using the centrally-linked pollutant data, all of the FDR-adjusted p values were above 0.05, and there were no interactions between the air pollutant measures and time outdoors in any of the models.

Discussion
In this large sample of adults from the UK Biobank general population cohort, cross-sectional associations between air pollutant exposure and cognitive performance attenuated towards the null after adjustment for important confounders. Following adjustment, estimated associations were inconsistent in direction and of very small magnitude. The estimates with FDR-adjusted p values below 0.05 indicated that 1 IQR higher air pollutant exposure was associated on average with 0.35% slower reaction time (95% CI: 0.13, 0.57), a 2.92% higher error rate on the pairs matching visuospatial memory test (95% CI: 1.24, 4.62), and numeric memory scores that were 0.58 points lower (95% CI: −0.96, −0.19). There was little evidence that the results varied substantially by self-reported time spent outdoors. Sensitivity analyses showed slightly stronger associations when linkage using an alternative source of air pollutant data took account of participants' address history. Follow-up analyses of cognitive change scores did not detect any association between pollutant exposure and magnitude of score change; the precision of these results was reduced as a consequence of the smaller sample size, thus decreasing the power to detect small associations reliably. The results of both cross-sectional and follow-up analyses were robust to the potential influence of prevalent neurological disorders and missing covariate data. Overall, this study indicated that the association between air pollution exposure and cognitive performance was at most very weak in this population, particularly when confounding factors were taken into account.
The principal strength of this study was the sample size, which for the cross-sectional analyses was at least five-fold larger than similar studies in the literature [19][20][21][22][23][24][25][26][27][28] , thus allowing associations to be estimated with much greater precision than before. The study also benefited from the availability of multiple pollutant measures and multiple cognitive tests, measured in the same manner for all participants. Important covariates were also measured consistently, and adjusted analyses were planned in a principled way based on graphical analysis of assumed inter-relationships between variables. It was possible to investigate the potentially moderating relationship between neighbourhood pollutant measures and individual-level data regarding time spent outdoors, which had not been addressed in the previous literature.
A number of limitations must be considered. In common with most previous studies in this field, individual-level pollutant measures were not available, but efforts were made to mitigate this by analysing data regarding time spent outdoors (albeit that this measure was self-reported and did not necessarily represent time spent outdoors in the neighbourhood of residence from which the pollution measures were taken). The air pollution measures were also taken from different years, up to five years prior to cognitive assessment. The cognitive tests were brief and were subject to measurement error; the previously-noted low reliability of some of the tests (e.g. pairs matching test-retest correlation <0.2) 39 is likely to have biased the change score analyses towards the null, particularly in the relatively short follow-up duration studied here. It is possible to conduct joint analyses of the cognitive outcome measures using latent factors or multivariate modelling (e.g. canonical correlation analysis), which would address the problem of low reliability of individual measures, but this would also reduce the ability to detect any domain-specific relationships that may exist, e.g. affecting memory tasks only. The temporal order of some of the measures was unclear: despite the restrictions imposed on the timing of the pollution measures relative to the date of the baseline cognitive assessment, it was not possible to tell when participants' cognitive performance reached the level at which it was measured at baseline, because no premorbid cognitive estimates were available. It is therefore possible that cognitive decline had occurred prior to the study period, which in turn may or may not have been causally linked to air pollution exposure at an earlier age. Limitations regarding temporal relationships also apply to the graphical model used to inform the model adjustments (see Supplementary  Methods). It should be noted that the UK Biobank cohort had a low opt-in rate and it is not representative of the general UK population in some respects 40  and outcome have together influenced participation 41 . Non-representativeness was further amplified among the sub-group that returned for the follow-up visit, as shown in Table 1 here. Furthermore, UK Biobank recruited adults aged 40 to 70 years, and so the results of this study may not be generalisable beyond this age range.
Previous studies had reported mixed evidence regarding associations between air pollutants and cognitive test performance, with estimates varying in magnitude and/or predicated on very large differences in pollutant measures. For example, Ailshire and Clarke 19 reported an adjusted error rate ratio of 1.53 (95% CI: 1.02, 2.30) in association with higher levels of PM 2.5 on tests of orientation and working memory, but this was per 10 μg/ m 3 increment in PM 2.5 , which in their sample equated to more than 3 SD (mean 13.8, SD 3.1). The largest previous cross-sectional study, by Zeng, et al. 26 , reported that a 1-point increment on an ordinal air pollution score (range 1 = least pollution to 7 = most) was associated with a modest adjusted OR for impaired Mini-Mental State Examination performance (score <18) of 1.09 (95% CI: 1.01, 1.18). Cross-sectional results in the present study indicated very small cognitive performance differences associated with meaningful (1 IQR) sample differences in pollutant exposure levels. In two instances these associations were in a protective direction: analyses using the centrally linked pollutant data ( Table 2) showed a very small positive association between reasoning performance and both PM 10 and NO 2 . Given that these associations were not evident to the same degree in the Defra data analyses ( Table 4), and that there is no previous evidence to suggest a protective relationship, these unexpected results may be spurious false positives or may reflect selection bias in the cohort. Overall, the discrepancy between the present findings and some previously reported cross-sectional results may reflect different sample sizes and approaches to covariate adjustment, as noted earlier. Sample age ranges have also varied across previous studies (mean 37 years 27 to mean 86 years 24,26 ), although there does not appear to be a clear link between previous results and participant age, with cross-sectional studies in both early-middle 27 and older adulthood 28 showing similar null findings to ours. There is some evidence of adverse neurodevelopmental and cognitive outcomes in children exposed to high levels of air pollutants 2 , and it would be important to investigate this further in younger cohorts. Most studies have been conducted in the USA [19][20][21]25,27,28 , with a small number from other countries (Germany 22,23 and China 24,26 ), and it remains unclear to what extent research in countries with stricter emissions regulations and relatively low average pollution levels (such as the UK) are generalisable to other settings.
Fewer studies have examined the relationship between air pollution exposure and change in cognitive test performance over time [29][30][31] . The largest of these 29 had a sample size of 20,150 and a follow-up period of 4-5 years, and found no reliable association between fine particulate matter (PM 2.5 ) and performance decline on the brief 'Six Item Screener' cognitive test: the adjusted OR was 0.98 (95% CI: 0.72, 1.34) per 10 µg/m 3 increment in PM 2.5 (equivalent to 4 IQR in their sample). The studies by Tonne, et al. 30 and Weuve, et al. 31 reported a mixed picture, with detrimental associations evident in only some analyses, in cohorts followed for five and two years respectively. Evidence from the present study, with follow-up for 2.8 years (SD 0.2) and a sample size of ~2,600 in adjusted models, was in keeping with previous null results.
The evidence from this and previous studies of cognitive test performance are somewhat at odds with other reports showing consistently increased likelihood of MCI or dementia diagnosis in those with higher air pollution exposure history or closer proximity to major roads [32][33][34][35][36][37] . A possible causal link between air pollution and dementia was further suggested in a recent study of magnetite nanoparticles in post-mortem brains of a small sample of patients with Alzheimer's disease 12 . It may be that selection bias is operating in cognitive assessment research, such that participants who are able and willing to enrol and provide data are less at risk of cognitive impairment; such a bias would not affect studies of dementia outcomes that are conducted using routine health records.
Further follow-up of the UK Biobank cohort will provide future opportunities to investigate brain imaging measures, as well as cognitive performance over longer periods and incident dementia outcomes. Ongoing monitoring of participants' address history and detailed modelling of the built environment 42 will add valuable information to enhance our understanding of the influence of the local environment on cognitive and other health outcomes.

Methods
Participants. Adults aged 40 to 69 years who were registered with the National Health Service (NHS) and living within 25 miles of a study assessment centre were invited by mail to participate in UK Biobank 38 . No exclusion criteria were applied during recruitment. Twenty-two assessment centres were in operation across England, Scotland and Wales at different times between 2006 and 2010. Approximately nine million invitations were issued to achieve the cohort size of ~502,000, indicating an overall response rate of approximately 5.6% 43 . Invitations for a follow-up visit in 2012-2013 were sent by email to 103,514 cohort participants living near the UK Biobank coordinating centre in northwest England, of whom 20,345 (19.7%) attended. All participants gave written informed consent. This study was conducted under generic approval from the NHS National Research Ethics Service (Ref.

11/NW/0382).
The present study population included all participants who attended for baseline assessment on or after 01 January 2010, and who had data on at least one air pollution exposure measure and at least one cognitive test score. This baseline cut-off date was chosen so that the cognitive measures would be from the same or a later year than the physical environment measures (see below). From this study population, those participants who also attended the follow-up visit in 2012-2013 were included in the change score analyses. odds of a correct response. i Interaction between PM 2.5 to 10 and time outdoors: estimates stratified by quintile of time outdoors ranged between 0.9940 (0.9887, 0.9992) in quintile 4 and 1.0036 (0.9986, 1.0086) in quintile 5. Materials and procedure. Baseline and follow-up assessment visits lasted approximately 2-3 hours, incorporating consent processes, computerised touchscreen questionnaire (including cognitive tests), nurse interview and physical measurements. All assessments were administered in a standardised order, according to a standard operating procedure. Administration and scoring of cognitive tests and questionnaires was automated.
Air pollution measures. UK Biobank air pollution data: Air pollution and local environment measures were provided by the Small Area Health Statistics Unit (http://www.sahsu.org/) as part of the BioSHaRE-EU Environmental Determinants of Health Project (http://www.bioshare.eu/), and were linked centrally to the assessment data by UK Biobank analysts (http://biobank.ctsu.ox.ac.uk/crystal/docs/EnviroExposEst.pdf). These measures were modelled at participants' baseline residential addresses. A total of 7,221 addresses (approximately 1.4%) could not be geo-coded and therefore have missing data. Particulate matter of up to 10 μm diameter (PM 10 , PM 2.5 to 10 and PM 2.5 ), nitrogen dioxide (NO 2 ) and total nitrogen oxides (NO x ) were measured as annual average values in μg/m 3 . Estimates for the years 2005 to 2007 were derived from European Union (EU)-wide air pollution maps (resolution 100 m × 100 m). The X, Y coordinates of participants' baseline addresses were overlaid on these maps (projected to the British National Grid) and the corresponding air pollution concentration of the 100 m × 100 m grid cell was assigned to the coordinate. These data were from a land use regression (LUR) model for western Europe based on >1500 EuroAirnet monitoring sites, which also included satellite-derived air pollution estimates to improve the model performance 44 . Estimates for the year 2010 were modelled for each address using a LUR model developed as part of the European Study of Cohorts for Air Pollution Effects (ESCAPE; http:// www.escapeproject.eu/) 45,46 . ESCAPE estimates for PM in 2010 are valid up to 400 km from the monitoring area (Greater London), but the accuracy of estimates beyond this range (n = 33,935 addresses) was unknown and so these were coded as missing within the central UK Biobank dataset. Where a pollutant measure was available for more than one year, data for the earliest available year were analysed, to minimise uncertainty about the temporal order of exposure and outcome. The PM 10 measure was for 2007; PM 2.5 to 10 and PM 2.5 were for 2010; NO 2 was for 2005; and NO x was for 2010.
Defra air pollution data: The UK Biobank air pollution data described above were mapped to participants' baseline addresses, without taking account of address history at the time the pollutant measure was recorded. Separate data were made available subsequently by UK Biobank regarding participants' past address history (east and north coordinates rounded to 1 km), and the date the participant was first recorded at each location. These were used in the present study to map air pollution data provided by the UK Government Department for Environment, Food and Rural Affairs (Defra) (https://uk-air.defra.gov.uk/data/pcm-data). Modelled data 47 for the same pollutants in the same years as above (PM 10 for 2007; PM 2.5 to 10 and PM 2.5 for 2010; NO 2 for 2005; and NO x for 2010) were mapped to participants' addresses in the relevant year. These data were used only in sensitivity analyses, described below.
Other local environment measures. Population density (urban/rural) was classified categorically by UK Biobank, by combining each participant's baseline residential postcode with data generated from the 2001 census, using the GeoConvert tool provided by the UK Data Service Census Support (http://geoconvert.mimas.ac.uk/). Road traffic measures were provided for the year 2008 from the Road Traffic Statistics Branch at the Department for Transport attached to the local road network; traffic data for unmonitored links were estimates based on surrounding monitored links. A major road was defined as a road with traffic intensity >5000 motor vehicles per 24 hours. Traffic intensity on the nearest major road was measured as the average total number of motor vehicles per 24 hours. Proximity to the nearest major road was calculated as the inverse distance (1/m) from the residential location. Data were also available regarding noise pollution, and these were included in supplementary analyses to address potential residual confounding (see Data analysis section below). Noise estimates for the year 2009 were modelled using a version of the Common NOise aSSessment methOdS (CNOSSOS-EU) noise model 48,49 ; average level of noise pollution in decibels was calculated as a weighted level measured over a 24-hour period, with a 10 decibel penalty added between 23:00 h and 07:00 h.
Sociodemographic and lifestyle measures. Age was recorded in whole years. Gender was self-reported as male or female. Self-reported ethnic background was grouped categorically as white, Asian/Asian British, black/ black British, Chinese, or mixed/other ethnic group. Neighbourhood-level socioeconomic status was measured using the Townsend index of material deprivation 50 . This was calculated by UK Biobank immediately before the baseline date, based on census data regarding unemployment, car ownership, home ownership and household overcrowding; each participant was assigned a score corresponding to the census output area in which their residential postcode was located, with higher values indicating greater relative deprivation (see Supplementary Methods). Educational qualifications were self-reported, and for the present study were dichotomised according to whether or not participants held a university/college degree. Self-reported smoking status data were used by UK Biobank to categorise participants as current, former or never smokers; these were dichotomised for the present study as 'ever smoker' (current or former) versus 'never smoker' . Physical activity in a typical week was recorded using self-reported items from the International Physical Activity Questionnaire short form 51 , and was converted into a single measure of total physical activity in metabolic equivalent of task (MET) hours per week, weighted by intensity (walking, moderate or vigorous). Participants were asked to estimate how many hours they spent outdoors in a typical day in summer, and in a typical day in winter. Responses were given in whole hours from 0 to 24, and participants whose responses exceeded 10 were asked to check and confirm this. An additional option of 'Less than an hour a day' was available, and for the present study was assigned a value of 0.5. Overall time outdoors was calculated as the mean of the hours per day in summer and winter together. Cognitive assessment. The format and psychometric properties of the cognitive tests used in UK Biobank have been described by us previously 39,52 . All tests were administered visually via touchscreen, and scoring was automated. The tasks assessed reasoning (total correct of 13 items), reaction time (mean time in milliseconds to press a button in response to matching cards), numeric memory (longest numeric string recalled in reverse), visuospatial memory ('pairs matching' test: total errors when recalling positions of matching cards) and prospective memory (successfully carrying out an instruction after a filled delay). Higher values indicate better performance on the reasoning and numeric memory tests, and worse performance on the reaction time and pairs matching tests. Prospective memory test performance was categorised dichotomously as 1 for a correct response on the first attempt and 0 otherwise. Details of all tests are provided in the Supplementary Methods. The numeric memory test was removed from the UK Biobank baseline assessment battery part-way through recruitment, for reasons of time, resulting in lower sample sizes on this test than on the other four tests.
Change on the cognitive tests was measured by subtracting the baseline score from the follow-up score. Raw scores were used in these calculations, without any replacement of outlying values. Negative change score values on the reasoning test indicate worse performance at follow-up; positive change score values on the reaction time test indicate slower performance at follow-up; positive change score values on the pairs matching test indicate more errors at follow-up. Change on the prospective memory test was dichotomised as 'worse at follow-up' = 1, versus 'same or better at follow-up' = 0 (i.e., 'worse at follow-up' means the participant gave the correct response at baseline and an incorrect response at follow-up). The numeric memory test was not administered at follow-up, so no change scores were available. Data analysis. All analyses were performed using Stata version 13 53 . Data were summarised descriptively to characterise the baseline and follow-up samples. Quintiles were derived for the measures of major road proximity, traffic intensity, noise pollution, physical activity and time outdoors, based on all available data in the whole UK Biobank cohort at baseline. Regression models were used to estimate the association between air pollution exposures (independent variable) and cognitive performance (dependent variable), with and without adjustment for other covariates. The pollutant data provided centrally by UK Biobank were used in all primary analyses, and sensitivity analyses were conducted using the Defra data (see below). Multicollinearity between the air pollution measures and the covariates was within acceptable limits (variance inflation factor values 1.77 to 1.96). For all models, normal-approximation 95% confidence intervals (CI) were generated from bootstrapped standard errors (5000 replicates) 54 . Since there were 25 cross-sectional and 20 follow-up regression analyses, p values (two-tailed) for the pollutant coefficients were adjusted using the Simes-Benjamini-Hochberg false discovery rate (FDR) method 55 ; both unadjusted and FDR-adjusted p values are reported. Alpha was 0.05 (FDR-adjusted). 'Do not know' and 'Prefer not to answer' responses were treated as missing. Missing data were not imputed.
For the cross-sectional analyses, each baseline cognitive measure was regressed separately on each air pollution measure (continuous), firstly in unadjusted models and then with adjustment for baseline age, gender, ethnic group, Townsend score (continuous), education, smoking status, physical activity (quintiles), time outdoors (quintiles), major road proximity (quintiles), traffic intensity (quintiles), and population density category. The Supplementary Methods shows a directed acyclic graph of the assumptions underpinning the analytical model: the aim of covariate adjustment was to minimise confounding influences on the association between air pollution exposure and cognitive performance, rather than to construct a multivariable risk prediction model for cognitive outcome. Linear regression was used for the reasoning and numeric memory scores, which were approximately normally distributed, and unstandardized coefficients are reported. Positive skew in the reaction time distribution was addressed using a natural log transformation, and linear regression was then used; exponentiated results are reported as rate ratios (RR). Outlying pairs matching error count values (>30; n = 49, 0.06%) were replaced with a value of 30, and only participants who finished the task (achieved all six pairs) were included in the analyses. The pairs matching error count distribution remained overdispersed; a negative binomial model was used and coefficients are reported as RR. A logistic regression model was used for the prospective memory score, and results are reported as odds ratios (OR).
For the follow-up analyses, each cognitive change measure was regressed separately on each air pollution measure (continuous), firstly in unadjusted models and then with adjustment for duration between baseline and follow-up (continuous) as well as baseline age, gender, ethnic group, Townsend score (continuous), education, smoking status, physical activity (quintiles), time outdoors (quintiles), major road proximity (quintiles), traffic intensity (quintiles), and population density category. Change scores for reasoning, reaction time and pairs matching followed an approximately normal distribution, and were analysed with linear regression; unstandardized coefficients are reported. Change on the prospective memory test was analysed using logistic regression, with results reported as OR. For the pairs matching analyses, only participants who finished the task (achieved all six pairs) at both time points were included.
To test whether the association between neighbourhood air pollution exposure and cognitive outcome varied according to the amount of time typically spent outdoors, all cross-sectional and follow-up models were run with and without a product term (air pollutant measure * time outdoors [quintiles]), and the fit of the model with and without the product term was compared using the likelihood ratio test. It was predicted that associations would be stronger among those who spent the most time outdoors, either because spending time outdoors exposes individuals to more pollution, or because the neighbourhood-level measure would more accurately reflect actual pollutant exposure in those who spent more time outdoors. When the p value of the likelihood ratio test was <0.05, separate models were run for each quintile of time outdoors.
In a supplementary analysis, all adjusted models were repeated with noise pollution (quintiles) as an additional covariate. The purpose was to adjust for possible residual confounding from antecedents of both air pollution and noise pollution (see Supplementary Methods). Sensitivity analyses. Impact of prevalent neurological disorders. To address the possibility that associations between pollution exposure and cognitive function might be driven by participants with prevalent neurological disorders, the main analyses (unadjusted and adjusted) were repeated after excluding participants who had self-reported conditions that affect brain function at baseline (listed in the Supplementary Methods).
Missing covariate data. The unadjusted regression models were based on all available data, which meant that differences between unadjusted and adjusted estimates may have been partly due to the inclusion of different participants (all available in the unadjusted model, versus only those with complete covariate data in the adjusted). The unadjusted models were therefore repeated using only participants who had complete covariate data.
Air pollution data source. The air pollution exposure data provided by UK Biobank were mapped centrally to participants' baseline addresses, without taking account of address history in the year in which the pollutant measure was recorded. The Defra data described above were linked to participants' addresses in the same year as the pollutant was measured, thus potentially reducing measurement error in the exposure. The main cross-sectional and follow-up analyses were repeated with the Defra data as the independent variable, for comparison.