Birth data accessibility via primary care health records to classify health status in a multi-ethnic population of children: an observational study

Background: Access to reliable birth data (birthweight (BW) and gestational age (GA)) is essential for the identification of individuals who are at subsequent health risk. Aims: This study aimed to explore the feasibility of retrospectively collecting birth data for schoolchildren from parental questionnaires (PQ) and general practitioners (GPs) in primary care clinics, in inner city neighbourhoods with high density of ethnic minority and disadvantaged populations. Methods: Attempts were made to obtain birth data from parents and GPs for 2,171 London primary schoolchildren (34% White, 29% Black African origin, 25% South Asians, 12% Other) as part of a larger study of respiratory health. Results: Information on BW and/or GA were obtained from parents for 2,052 (95%) children. Almost all parents (2,045) gave consent to access their children’s health records held by GPs. On the basis of parental information, GPs of 1,785 children were successfully contacted, and GPs of 1,202 children responded. Birth data were retrieved for only 482 children (22% of 2,052). Missing birth data from GPs were associated with non-white ethnicity, non-UK born, English not the dominant language at home or socioeconomic disadvantage. Paired data were available in 376 children for BW and in 407 children for GA. No significant difference in BW or GA was observed between PQ and GP data, with <5% difference between sources regardless of normal or low birth weight, or term or preterm status. Conclusions: Parental recall of birth data for primary schoolchildren yields high quality and rapid return of data, and it should be considered as a viable alternative in which there is limited access to birth records. It provides the potential to include children with an increased risk of health problems within epidemiological studies.


INTRODUCTION
Despite increasing evidence that pre-natal and early post-natal insults to the developing lung may affect later respiratory health, [1][2][3][4] remarkably little emphasis has been placed on the need for rapid access to reliable birth data such as birth weight (BW) and gestational age (GA). 5 Without access to such data, it is difficult to identify individuals who are at subsequent risk, to design intervention studies to prevent or minimise the impact of such insults, or to select a population free of such risks when designing epidemiological studies. Population-based studies of lung function in children often exclude those born preterm (i.e., GAo 37 weeks) or those with a low birth weight (LBW, BW o 2.5 kg), because of the known long-term influence on lung growth. 6,7 The most accurate means of obtaining BW and GA data should be via the child's medical record. 8 The administrative process to obtain these data from hospitals or primary care centres can, however, be lengthy and complicated. The collection of birth data for many community-based longitudinal epidemiologic studies in the UK is via parental recall. 9,10 Evidence regarding the precision and reliability of parental recall of BW and GA is discrepant. Although some studies have shown maternal recall to be reliable, [10][11][12][13] others suggest a bias, with poorer recall from mothers with more than one child or who are not of White European origin. 14,15 This study provided a unique opportunity to examine the feasibility of collecting essential information relating to birth status from both parents and GPs using data collected from the Size and Lung function In Children (SLIC) study, which is the largest study of lung function undertaken in a multi-ethnic population of London primary schoolchildren to date.
The aims of this study were to (1) determine the feasibility of collecting BW and GA from parents and general practitioners (GP) in primary care surgeries, where all children are registered for health care; (2) assess the agreement of BW and GA data between GPs and parental recall; and (3) estimate the extent to which reliance on parental data may bias identification of full-term (i.e., ⩾ 37 weeks GA) and appropriately grown (i.e., ⩾ 2.5 kg BW) children for epidemiological studies, on the basis of data collected as part of the SLIC study. 16

MATERIALS AND METHODS
As part of the SLIC study 16 (www.ucl.ac.uk/slic), an epidemiological study designed to explore ethnic differences in lung function and body size in a multi-ethnic population of London children, anthropometry and spirometry were undertaken in primary schoolchildren between December 2010 and July 2013. Primary schools in the London area with a high ethnic mix of pupils were identified and ranked by education performance within boroughs. The sampling was undertaken from each stratum of rankings to ensure a wide range of socioeconomic circumstances. In this study, an allinclusive strategy was adopted to ensure that no child would feel excluded from a study that was being undertaken in the school. Thus, all children with written parental consent (n = 2,291) were eligible to participate in the study. The study was approved by the London-Hampstead Research Ethics Committee (REC: 10/H0720/53). Parents were requested to complete a study questionnaire that was sent home with the children. The information requested included relevant health information such as birth data and respiratory and medical history, ethnicity and socio-economic circumstances. A member of the study team was available to assist in person or over the phone in cases which required assistance in completing the questionnaire. Birth weight was reported in kilograms and grams or pounds and ounces (the latter being subsequently converted into kilograms and grams). 16 Children were classified into four main ethnic groups-White (European descent), Black African origin (Black African or Black Caribbean descent), South Asian (Indian subcontinent) and Other/ mixed ethnicities-on the basis of the child's ethnicity information from parental questionnaires (PQs).
Socioeconomic circumstances (SEC) were assessed at the area level using the English Index of Multiple Deprivation (IMD) [17][18][19] and at the individual level using the Family Affluence Scale (FAS). 19,20 Each child was assigned an IMD using area postcodes for both their registered GP surgery and their home address to examine potential associations of any bias between GP or parental data according to locality. The FAS, commonly used for collecting socioeconomic data from children, included information such as the number of cars and computers owned by the family, whether the child shared a bedroom 17,19 and the dominant language spoken within the household.
In cases for which parental consent was obtained for access to the child's and maternal GP records, GP surgeries were requested either to extract the relevant birth data from medical records or to permit a designated researcher to extract such data. Approval from the relevant Primary Care Trusts was obtained to access GP records, with supplementary funding for service support costs being provided by the Local Comprehensive Research Networks to enable remuneration to be offered to GP surgeries.

Statistical analysis
For the purposes of this study, GP data were used as baseline, with discrepancies between PQ and GP exceeding 0.10 kg for BW or 2 weeks for GA being considered to be of potential clinical or physiological significance. 10,11 These thresholds were used to estimate the degree of potential underestimation and overestimation from the PQ, if the GP report was assumed to be correct. Children were also classified as being of LBW (o 2.5 kg) and/or preterm ( o37 weeks' GA) according to both GPs and parental report. The Mann-Whitney U-test and binary logistic regression models were used to assess whether the child's test age or socioeconomic circumstance distribution varied between GPs who (a) did or did not respond to requests for data and (b) could or could not provide the relevant birth data. An agreement between BW and GA reported by parents and GPs was assessed using the Bland and Altman method. 21 Agreement between LBW and preterm classification according to PQ and GP was measured using the Kappa statistic. Multinomial logistic regression models were used to evaluate the extent and nature of any apparent parental misreport of birth information. Significance level was set at 0.05 and SPSS and R program were used for analyses. 22,23 Data were stored in a dedicated research database (Re-Base software, Re-Base Ltd).

RESULTS
Out of the 2,291 children with parental consent, 2,171 children (median age 8.1 (range 5.2-12.0) years; 47% boys) participated in the SLIC study. 16 Of these, parental reports for BW and/or GA were available for 2,052 (95%) children, with 2,045 (94%) parents giving consent to access GP records. Of those with parental consent, 260 (13%) contact details for GPs were missing, and therefore GPs for 1,785 children were approached. Although some GP information Figure 1. Study participation and birth data retrieval from parental recall and general practitioner. In all, 376 children had paired information (i.e., PQ and GP) for BW; 407 children had paired information for GA; and 322 children had paired information for BW and GA. BW, birth weight; GA, gestational age; GP, general practitioner; NHS, National Health Service; PQ, parental questionnaire. regarding past medical history was available for 1,202 children (67% of requested), birth data were only available for 482 children (27% of those with parental consent and GP details, and 22% of the total study sample). Paired data (parent and GP) were available from 376 records for BW and 407 for GA, representing only 18 and 20% of those with parental consent to access GP data. Data availability from both sources is summarised in Figure 1.
The sex and age distributions were similar for children with or without GP data (median age (95% confidence interval (CI)): 8.1 (8.0; 8.2) vs. 8.3 (8.1; 8.6) years, respectively). GP data were more likely to be missing for children of ethnicities other than White, who were not born in the UK, where English was not the dominant language at home or who lived in the most deprived areas or were in low FAS households ( Table 1). The distribution of area-level or individual-level SEC characteristics in terms of data availability was similar for all indices (Supplementary Table S1).
Among the 40% of GPs who responded, details regarding BW or GA were more frequently missing for older children (8.2 (95% CI: 8.1; 8.4) vs. 7.8 (7.7; 8.0) years, P o 0.0001) and those from low FAS households (mean (95% CI)% missing data: 78 (69; 87)% for low FAS vs. 42 (36; 48)% for high FAS, P o 0.0001; Table 2). The proportion of missing data was independently associated with age, country of birth, dominant language, IMD and FAS, with the adjusted odds for not obtaining data being higher for older children and those not born in UK, without English as the dominant home language or who were living in more deprived areas. Low FAS was related to increased odds of missing GP birth data after accounting for the variables mentioned above ( Table 2).
There was no significant bias between PQ and GP reports of either birth weight (bias (95% CI): − 0.04 (−0.07; − 0.01) kg) or gestational age (0.17 (0.04; 0.30) weeks), but the relatively wide limits of agreement (95% LoA (95% CI): (−0.63 (−0.68; − 0.58); 0.55 (0.50; 0.60) kg) for birth weight; (−2.4 (−2.6; − 2.2); 2.8 (2.5; 3.0) weeks) for gestational age) indicate that individual differences may exist ( Figure 2). Although there was a trend for parents to underestimate BW or GA for heavier and full-term children when compared with GP data (−0.4 ⩽ r ⩽ − 0.2, P ⩽ 0.1 for all cases), the agreement regarding BW or GA was consistent across ethnicities, indicating that no ethnic bias was observed when estimating either BW or GA from PQ as compared with GP data (Figure 3). Differences in BW or GA were also found to be constant across the age range, although they were somewhat larger for children of LBW as compared with those of normal BW. Nevertheless, no trend towards over-or under-reporting of data from PQ was evident (rho o0.2, P40.1 for both BW and GA). When the analysis was repeated after including the five extreme data points, results remained very similar albeit the limits of agreement were slightly wider.
Parental 'underestimation' of BW by at least 0.1 kg occurred in 19% (95% CI: 15; 23%) of children, whereas 'overestimation' occurred in 12% (9; 16%). By contrast, parents under-or overreported GA by at least 2 weeks in 4% (3; 7%) and 3% (1; 4%) of children, respectively. The odds of parents underestimating BW were~2.5 (95% CI: 1.1; 5.5) times higher in Black African-origin children when compared with White children. In addition, lower FAS and increasing IMD quintile were both associated with increased misclassification of birth weight status (Table 3). No significant associations were observed between socioeconomic circumstances and the likelihood of parents mis-estimating GA (Supplementary Table S2).
Nine percent (95% CI: 8; 11%) of children were classified as LBW and 6% (5; 8%) were classified as preterm by parents compared with 6% (4; 9%) and 9% (7; 12%), respectively, when classified by GPs. Among children with paired data, there was good agreement with respect to whether or not the child was of LBW (95.5%) or born preterm (97%) (Supplementary Table S3). When repeating the analysis including the five extreme data points, no difference was seen in the proportion of misclassification, and the agreement remained in the same level as the initial analysis. Significant association was found between parental mis-estimation of birth weight and socioeconomic circumstances as indicated by IMD and FAS, with those from most deprived areas or lower FAS having higher odds to mis-estimate the child's birthweight. No significant associations were found with GA, and both of these results are in line with original data in the main manuscript.

Main findings
These results demonstrate that it is currently not feasible to obtain essential birth data from GP records. Parental recall is an Although birth data from reliable health records would be undeniably preferable, our findings suggest that parental reports have the potential of yielding high-quality data and quicker access to the data, for both gestational age and birth weight.
Feasibility of data retrieval In contrast to the relative ease with which parental data were collected, obtaining birth data from GPs was difficult and information was less likely to be available if the child was older, born outside the UK or where English was not the dominant language at home. This raises a number of issues. First, owing to the overall low response from GPs, currently, this does not appear to be a feasible method for obtaining data for epidemiological studies. Further, the response rate was especially low for children from more deprived areas, thereby risking collection bias towards those with higher SEC. Second, the lower GP response rate for children not born in the UK or without English as their dominant home language may have an impact on the provision of health care for migrant children. Increasing awareness of the potential long-term influence of early-life events, including preterm birth and intrauterine growth restriction, highlights the need for GPs to try to obtain this information as accurately as possibly at the time of registration. Reassuringly, the barriers to obtaining information from the GP did not arise from parents. Not only was it feasible to collect birth data from virtually all parents via the questionnaire, but the vast majority gave consent for access to GP records, regardless of ethnicity or socioeconomic circumstance.
Interpretation of findings in relation to previously published work Comparison of paired PQ and GP data, where available, showed good agreement on average across all ethnic groups and socioeconomic circumstances. Previous studies found contrasting results. In the Millennium Cohort Study, although there was 94% agreement in the reporting of GA within 1 week between parent and medical record, disagreement was associated with low SEC. 24 Similarly, significant underestimation of GA and less accurate BW reporting was found from mothers of non-white children. 11 However, in both these studies and the current study, disagreement only resulted in minimal misclassification of birth status, suggesting that parental reporting of BW and GA is accurate enough to provide appropriate classification of birth status, especially within the context of large-scale epidemiological studies. It should be noted that although GP data were used as the baseline for the purpose of analysis, with parental 'underestimation' or 'overestimation' based on the assumption that GP data should be the most accurate, this assumption was not Abbreviations: FAS, family affluent score; GP, general practitioner; IMD, index of multiple deprivation; OR (95% CI), odds ratio (95% confidence interval). a GPs who responded could not provide any data on BW or GA for 720/1,202 (60%) children (see Figure 1).
b P values derived through univariable or multivariable logistic regression models to evaluate the factors related with the likelihood of missing birth data. Multivariable model was adjusted for age, sex, ethnicity, country of birth, language, family's IMD domain and FAS. c Detailed information regarding the IMD distribution of income and GPs domain and the individual components for FAS is presented in Supplementary Table S1. d FAS was grouped in three categories owing to the small sample size in the lower scores.
Birth data accessibility: GP versus parental recall R Bonner et al necessarily correct, as shown by several physiologically impossible birth data or 'outliers' provided by GPs (see Figure 2).

Strengths and limitations of this study
The major limitation of this study was the relatively low sample size in certain sub-categories, mainly owing to the high proportion of GP missing data for children from more deprived households, meaning that potentially important differences pertaining to deprivation could not be discounted. In addition, although the best approach for collecting birth data would be via hospital birth records, this would only have been feasible for children born in England. Given the nature of the multi-ethnic SLIC study, many of the children had been born outside London or indeed outside the UK, thereby precluding this approach. The use of GP data was as a baseline reference for comparisons with PQ, rather than a gold standard method. Nevertheless, the SLIC study is the largest study of lung function undertaken in a multi-ethnic population of London primary schoolchildren to date, providing a unique opportunity to examine the feasibility of collecting essential information relating to birth status from both parents and GPs.
Implications for future research, policy and practice A major current focus of the National Health Service in the UK is the development of administrative and informatics networks to develop linked electronic health records for both clinical (http:// www.england.nhs.uk/ourwork/tsd/sst/) and public health research (http://www.cprd.com/intro.asp) purposes. The intention is that data held in electronic health records be available to appropriate individuals via remote access, thus facilitating record/data access. However, contrary to time projections at the inception of the SLIC study, the software enabling this functionality has yet to be rolled out to the majority of GP surgeries in London. It was therefore impossible to access health records without the cooperation of Figure 2. Difference in (a) birthweight and (b) gestational age between PQ and GP data versus GP data. For clarity, GP data were used as baseline and are thus plotted on the x axis, rather than the mean of GP and PQ data. Solid horizontal line represents the bias (i.e., mean difference) of the two methods, whereas dotted lines represent the 95% limits of agreement (LoA) between the two methods. Bold solid vertical lines indicate critical cutoffs of o2.5 kg and o37 weeks, which were used to categorise children having low birth weight or born preterm, respectively, according to GPs.
▲ symbols indicate children who would have been 'misclassified' as having normal birth weight (n = 6) or born full term (n = 8) if based on PQs rather than GP records. Δ symbols indicate children who would be potentially 'misclassified' as having low birth weight (n = 11) or born preterm (n = 4) if based on PQs rather than GP data. The outliers indicated by *, which obviously indicate misreporting by either PQ or GPs, have been excluded from the analyses. GP, general practitioner; PQ, parental questionnaire. Figure 3. Differences between PQ and GP data with respect to (a) birth weight and (b) gestational age according to ethnicity. Solid horizontal lines represent the bias (i.e., mean difference) between the parental and GP data. Dashed lines represent the 95% limits of agreement between the two methods for the overall population. Bold solid vertical line indicates critical cutoffs of o2.5 kg for BW and o37 weeks for GA, which were used to categorise children who were having low birth weight or born preterm according to GPs. Points indicating extreme misclassification were excluded from this plot and analyses. GA, gestational age; GP, general practitioner; PQ, parental questionnaire.
the GP surgeries. Unfortunately, among the GPs contacted,~1/3 failed to respond at all to requests for children's birth data. This was despite providing written parental consent for access to specific medical records, monetary incentives for participation and the option for a researcher to extract the necessary information.
The reason for such a high rate of non-response is unclear, but it may reflect either lack of resources or failure to appreciate the relevance of requests for such data. Inner-city surgeries in deprived areas may find it difficult to engage with activities perceived to be research-based, and the quality of data collected by GP practice nurses may be compromised both by competing clinical priorities or lack of research-active practitioners. 25 There is a shift to electronic health records in the UK; however, even when fully functional, the issue of missing data for those not born in the UK will still remain.

Conclusions
Detailed BW and GA data were difficult to retrieve from GP records. The proportion of missing birth data from GP records emphasises the need for more accurate and systematic recording of these data. Furthermore, it is essential that electronic health records be established within health care systems and that the information be made readily available through the primary care practitioners to the appropriate personnel. Parental report of birth data, at least for primary schoolchildren, is an appropriate alternative to health records for use in obtaining high-quality data for epidemiological studies. Parental 'misclassification' was defined as a difference in child's BW of more than 0.10 kg compared with the GP records. For the purpose of this analysis, it was presumed that GP data would be the more accurate, but as mentioned in the discussion this assumption was not necessarily always correct. b Modelling was based on 376 cases for which paired data were available. c The middle category (i.e., those neither underestimated nor overestimated by PQ) was used as the baseline against which the other two were compared. d The 1st and 2nd quintile of IMD were grouped together owing to the small sample size in the 1st quintile. e FAS was grouped in three categories owing to the small sample size in the lower scores.
Birth data accessibility: GP versus parental recall R Bonner et al