Asthma, allergic rhinitis (AR), and eczema are common childhood inflammatory diseases with multiple risk factors. The epidemiology and etiology of asthma development is complex1. The risk of asthma development is multifactorial including genetic, behavioral, and environmental factors (indoor and outdoor exposure to air pollution)2,3,4. For example, a recent study using latent class analysis (LCA) identified five clusters of children from the Danish National Birth Cohort who shared similar patterns of exposure to indoor pollutant sources3. While the study uncovered few factors, it found that adolescents growing up in homes with mold during mid-childhood might be at increased odds of current asthma at age 18.

Despite research suggestions of multifactorial interrelationships between maternal, newborn, socioeconomic status (SES), neighbourhood, and environmental determinants of childhood asthma, AR, and eczema, there remains a knowledge gap on the matter in the Canadian population5,6. In Canada, clustered profiles of mother–child pairs using LCA that incorporate multifactorial determinants and comprehensive linked health care data have not been constructed, nor used in studying risks and outcomes of common childhood inflammatory diseases and their health services utilization (HSU). This knowledge gap impedes our ability to quantify disease and morbidity risks to prevent poor health outcomes in groups of mother–child pairs who share similar characteristics. Specifically, clustered profiles of multifactorial determinants may better demonstrate which groups of mother–child pairs may be at higher risk of adverse health outcomes.

This study aims to use LCA to incorporate combinations (clusters) of multiple indicators such as maternal, newborn, and SES data collected in three Canadian pediatric cohorts, as well as neighbourhood factors and environmental exposures, to identify distinct profiles and to evaluate their relationships with children’s health outcomes and HSU from health administrative databases (HAD). Identifying clusters of individuals may be useful for public health interventions aimed at preventing the development of asthma and/or allergic diseases in early childhood and adolescence.


Study design and population

We used a longitudinal birth cohort called “FActors of Mothers and Infants in Longitudinal Years” (FAMILY). FAMILY merged children from three pediatric cohorts, as well as their mothers and siblings who were identified through HAD housed at ICES (formerly the Institute for Clinical Evaluative Sciences). The pediatric cohorts were: Toronto site of the Canadian Healthy Infant Longitudinal Development (CHILD) study7; The Applied Research Group for Kids (TARGet Kids!)8; Toronto Child Health Evaluation Questionnaire (T-CHEQ) study9. The CHILD cohort, a national general population-based birth cohort, was established in 2008 to increase our understanding of the interactions between the environment and genetics in the development of asthma and allergy and potentially other common chronic diseases. Since 2008, CHILD enrolled 3624 pregnant mothers (aged > 18 years) from the general population in four major cities across Canada (Vancouver, Edmonton, Winnipeg, and Toronto). The T-CHEQ study was a multistage, cross-sectional study designed to collect population-based prevalence data regarding asthma, other allergic diseases, and possible associated risk factors in Toronto school children attending grades 1 and 2. A total of 5619 grades one and two (aged 5–9 years) Toronto school children were recruited in 2006. TARGet Kids! is an ongoing open longitudinal cohort study enrolling healthy children (from birth to 5 years of age) and following them into adolescence. The aim of the TARGet Kids! cohort is to link early life exposures to health problems including obesity, micronutrient deficiencies, and developmental problems. Children are enrolled during regularly scheduled well-child visits. Once FAMILY was assembled, individuals were linked to HAD to examine childhood disease status and HSU.

There were three inclusion criteria for this study. This study included children of mothers recruited during pregnancy into CHILD, children aged 6–9 years recruited during grades one and two for T-CHEQ, and children under 6 years who were recruited into TARGet Kids! during routine health visits with their primary care physician. Exclusion criteria for this study were: children and mothers (1) without a valid Ontario health card number for data linkage, (2) without an Ontario residence code, (3) who moved away from Ontario during the pregnancy and delivery period, (4) who were missing from the Registered Persons Database (RPDB), (5) missing data in any covariates, and (6) who were missing from the MOMBABY Database. Exclusions also applied to children who (7) died < 28 days after birth, (8) were born after April 2018, and (9) were part of multiple births. See Supplementary Figure S1 for the cohort selection flowchart.

Data sources

Participants from CHILD, TARGet Kids!, and T-CHEQ were linked to eight Ontario HAD at ICES from the earliest date of birth (1993) to the latest data available (2019). See Supplementary Table S1 for detailed descriptions on HAD used in this study.

This study also used latest available environmental exposure data, namely air pollution, including 2002–2015 ozone (O3)10,11, 1984–2016 nitrogen dioxide (NO2)12, 2000–2016 fine particulate matter (PM2.5)11,13, and urban environment data like greenness measured using the Normalized Difference Vegetation Index (NDVI) available from 1996–2011 and 2013–2015. These data were from the Canadian Urban Environmental Health Research Consortium (CANUE), accessible at ICES. Air pollution and greenness levels were assigned based on postal code at the time of children’s birth using the closest year of available data.

Covariates and outcomes

We separated covariates into two: concomitant and exposure variables. Concomitant variables were used in LCA to determine class membership, while exposure variables that are known risk factors were included regression analyses for risk prediction. The concomitant variables were selected based on availability in the three study cohorts, health administrative databases, and suggestions from literature.

For concomitant variables, the LCA included 16 factors available from cohorts and/or HAD. Maternal factors (5): age at delivery, highest education attained, immigration status (non-immigrant, landed immigrant, or refugee), pregnancy complications (including hypertension, diabetes, and pre-eclampsia), and no prenatal care visits. Newborn factors (2): child ever breastfed and number of siblings in the household at the time of newborn’s birth. SES and neighbourhood characteristics (4): SES proxy measured with the Ontario Marginalization Index at neighbourhood-level quintiles (ON-Marg, 4 domains: material deprivation, dependency, ethnic concentration, and residential instability). In this study, those who live in areas with deprivation quintiles 4 or 5, 3, and 1 or 2 are considered as “lower” SES, “average”, and “higher” SES, respectively. Environmental exposures (5): air pollution (NO2, O3, PM2.5) and greenness (NDVI) levels at birth year, and indoor environmental tobacco smoke (ETS) exposure.

Seven exposure variables were incorporated in regression analyses to measure associations with disease incidence and HSU: “Class” variable generated from the LCA; maternal history of asthma; C-section delivery; birth year; child’s sex; admission to neonatal intensive care unit (NICU) at birth; and low birthweight (< 2500 g). See Fig. 1 for timelines on exposure assignment.

Figure 1
figure 1

FAMILY cohort recruitment, birth, follow-up, and exposure timelines throughout the study period from 1993 to 2019.

This study examined several outcomes. Incidence of childhood asthma, AR, and eczema were identified from encounters in the Canadian Institute for Health Information Discharge Abstract Database (CIHI-DAD) for hospitalizations, National Ambulatory Care Reporting System (NACRS) for emergency department (ED) visits, and Ontario Health Insurance Plan (OHIP) database for physician office visits using validated health administrative data definitions with International Classification of Diseases (ICD) codes. Children were classified as having incident asthma (ICD-9: 493 and ICD-10: J45, J46) if they had ≥ 2 asthma-related ambulatory care claims in two years or ≥ 1 asthma-related hospitalizations. This validated definition demonstrated 89% sensitivity and 72% specificity in children (0–17 years)14. AR (ICD-9: 477 and ICD-10: J301–J304) and eczema (ICD-9: 691.8 and ICD-10: L20) were identified by any physician health services claim for these conditions. All-cause, asthma-specific, and respiratory-related HSU including hospitalizations, ED visits, and physician visits were captured from birth to 2019.

Statistical analysis

Firstly, we used LCA, a finite mixture model to generate distinct groups of mothers (latent classes) on the basis of similar responses to concomitant variables of interest15. 16 variables were included in the LCA: five maternal and two newborn factors, four SES and neighbourhood characteristics, and five environmental. Latent class assignment was made according to the model-based highest membership probability16. We evaluated possible models ranging from one to four latent classes. Model fit and selection were determined using BIC (Bayesian Information Criterion), AIC (Akaike Information Criterion), G2 (likelihood-ratio chi-square test), Entropy (cut-off > 0.80), Log Likelihood, LMR test (Lo-Mendell-Rubin), as well as calculated model-based predicted probability of membership (Supplementary Table S2).

Next, the latent classes were included in a marginal Cox proportional hazard (PH) regression model to estimate the latent classes-specific hazard ratios (HR) for risk of asthma, AR, and eczema in children. Additionally, six exposure variables were included in the marginal Cox-PH model as covariates. All HR with 95% confidence intervals (CI) were estimated using the marginal Cox PH regression model which accounted for clustering of mother and offspring.

The latent classes were then also included in a Poisson regression model with generalized estimating equations (GEE) to account for the correlation of the responses within family and to estimate the latent classes-specific rate ratios (RR) of all-cause, asthma-specific, and respiratory-related HSU, which are robust to misspecification of the correlation structure. Poisson regressions were stratified by the three childhood disease groups and all RRs were presented with 95% CI. Both Cox PH and Poisson regression models were adjusted for additional potential confounders. These include maternal history of asthma, C-section delivery, birth year, male sex, NICU admission at birth, and low birthweight. We explored possible interactions of covariates with the latent classes, but none were found statistically significant after adjusting for multiple comparisons.

The LCA was conducted using RStudio, poLCA statistical software version 0.98.1091 (R Foundation for Statistical Computing)15. All regression analyses were conducted using SAS Enterprise Guide 7.1 (SAS Institute Inc., Cary, NC, USA).

Research ethics

Ethics approval was obtained from the Hospital for Sick Children Research Ethics Board (Toronto, Ontario, Canada). Informed consent to use cohort participants’ health card numbers to link the study data to health administrative databases was obtained from parents at recruitment in all pediatric cohort studies. ICES is a prescribed entity under section 45 of Ontario’s Personal Health Information Protection Act. Section 45 is the provision that enables ICES to conduct analysis related to the management, evaluation, and monitoring of the health system. Section 45 authorizes health information custodians—like physicians, hospitals, and long-term care homes—to disclose personal health information to a prescribed entity, like ICES, without consent for such purposes.


Population characteristics

FAMILY included 15,724 children and their respective mothers. All children in FAMILY were followed in HAD from birth until 2019, an average of 23 years of follow-up. Table 1 shows the distribution of child, maternal, SES, and environmental factors, and disease incidence. The incidence of asthma, AR, and eczema were 22.2% (3492), 25.0% (3932), and 63.1% (9923), respectively. Mothers’ median age at time of delivery was 33 years (interquartile range [IQR]: 30–36), 22.1% (3476) were landed immigrants, and 2.9% (456) were refugees. Nearly 60.0% (9282) of the mothers had university and above education, 93.6% (14,713) reported breastfeeding their babies, 52.8% (8298) of children had no siblings, 39.4% (6189) resided in less deprived neighborhoods (i.e. two lowest quintiles of ON-Marg’s deprivation index) with average levels of air quality and greenspace. The distributions of birth years differed across latent classes in Table 1. This may be due to time of participants’ cohort recruitment; we adjusted for birth year in all regression models to account for potential birth cohort and period effects.

Table 1 Percent distributions of characteristics and outcomes of participants by latent class.

Latent Class Identification

16 concomitant variables listed in Table 1 were included in the analysis to determine latent class membership. Supplementary Table S2 shows goodness-of-fit statistics of the latent class models. We used the 4-class model as its fit statistics were best (lowest AIC, BIC, G2, and highest Entropy). The 4-class model was labeled/named based on key characteristics. The following characteristics are found in a typical member of their respective class. For example, Class 1: mothers in their 30 s–40 s with university or above education, non-immigrants, who likely had one or two children, lived in high SES neighborhoods with good air quality and greenspace. Class 2: mothers who were > 30 s with university or above education, non-immigrants, who likely had a single child, lived in a high SES neighborhood, but with relatively poorer air quality and lower amount of greenspace. Class 3: mothers in their 30 s with university or above education, likely a landed immigrant (or a refugee), with one or more children, lived in average SES neighborhoods with relatively good air quality and a good amount of greenspace. Class 4: mothers in their 20 s with high school to college education, likely a landed immigrant (or a refugee) and with a single child, lived in lower SES neighborhoods with high traffic-related air pollution and lower amount of greenspace. Table 1 and Fig. 2 show the distribution of covariates and geographic distribution of the four latent classes, respectively. Supplementary Table S3 and Supplementary Figure S2 show the distribution of conditional probabilities by covariates and latent classes.

Figure 2
figure 2

Geographical distribution of latent classes in the Greater Toronto Area in Ontario, Canada.

Association of disease incidence risk–Cox proportional hazard (PH) regression

Results from the multivariable Cox PH regression analyses (Table 2) showed that children born to mothers with maternal history of asthma had significantly increased risks of asthma, AR, and eczema. Children who were born by C-section (HR 1.10, 95% CI 1.02–1.19), with low birthweight (HR 1.31, 95% CI 1.14–1.51), being male (HR 1.42, 95% CI 1.33–1.52), and whose mother had a history of asthma (HR 1.64, 95% CI 1.50–1.79) had significantly higher risks for asthma. Children who were admitted to the NICU at birth also had significantly increased risk for asthma (HR 1.35, 95% CI 1.22–1.50), but not for AR nor eczema. Overall, similar findings were seen in AR and eczema, albeit the risks were lower with wider confidence intervals. After adjusting for all exposure variables, results showed that compared to children in Class 1, children in Classes 3 and 4 had higher risks of asthma (HR 1.24, 95% CI 1.11–1.37 and HR 1.39, 95% CI 1.22–1.59, respectively). Children in Classes 3 and 4 also had higher risks of AR (HR 1.26, 95% CI 1.14–1.41 and HR 1.39, 95% CI 1.23–1.57, respectively) and eczema (HR 1.12, 95% CI 1.06–1.19 and HR 1.15, 95% CI 1.06–1.24, respectively) compared to those in Class 1.

Table 2 Hazard ratios from Cox PH regression analyses adjusted for latent classes and exposure variables for asthma, allergic rhinitis, and eczema.

Association of health services utilization (HSU)–Poisson regression

Results from the Poisson regression analyses (Table 3) showed that amongst children with asthma, maternal history of asthma increased children’s risk of asthma-specific and all-cause HSU. Compared to children in Class 1, children with asthma in Class 2 had twofold significantly lower asthma ED visit rates (RR = 0.65, 95% CI 0.47–0.89), and lower all-cause ED visit rates (RR = 0.82, 95% CI 0.70–0.95). Compared to Class 1, children with AR in Class 3 had a significantly higher all-cause physician visit rate (RR = 1.20, 95% CI 1.10–1.30). Similarly, children with eczema in Class 3 had both significantly higher all-cause ED visit rates (RR = 1.18, 95% CI 1.09–1.28) and all-cause physician visit rates (RR = 1.14, 95% CI 1.09–1.19).

Table 3 Rate ratios of health services utilization from Poisson regression analyses adjusted for latent classes and exposure variables for asthma, allergic rhinitis, and eczema.


Our study included a large cohort of mother–child pairs with extensive data from multiple sources over a long time. We used LCA, a person-centred technique to identify and elucidate the clustering of risk and atopy related health outcomes10 in offspring. The interrelationship between SES, maternal and newborn factors, as well as environmental and neighborhood factors highlight the multifactorial development of common childhood diseases. Compared to children in Class 1, children born to younger mothers in Class 4 (including children from lower SES families residing in more deprived neighborhoods with poorer air quality and less greenspace) had a clinically meaningful 39% higher risk of asthma, AR, and 15% higher risk of eczema. Moreover, children with asthma in Class 2, compared to those in Class 1, had significantly lower asthma ED visit rates. This may be related to lower exposure to O3 in Class 2, despite a larger proportion of individuals living in neighborhoods with higher levels of other traffic-related pollution (e.g. NO2 and PM2.5). This is consistent with those reporting an association of increased asthma ED visits with increased O3 exposure17,18,19,20,21,22.

Our findings are consistent with those of previous studies using LCA. Hardie et al. used U.S. data from the National Health Interview Surveys in 2007 and 2008 (N = 7361) to evaluate the joint implications of maternal health and SES disadvantage for youth, and found that children in the low-SES class (like our Class 3) had 43% higher odds of having asthma than the low-risk class (like our Class 1)23. Kozyrskyj et al. used data from the Western Australian Pregnancy Cohort (Raine) to study associations between changes in family SES and childhood asthma24. Their LCA indicated a twofold increased risk of asthma at age 14 years amongst children who had lived in a low-income family since birth, especially girls. Compared with children in chronic low-income families, children in households with increasing incomes had 60% lower risk of asthma. Grunwell et al. used LCA including 513 children aged 6–17 years at risk for asthma exacerbation and identified four classes according to demographics, sensitization, Type-2 inflammatory markers, and lung functions25. They found that those with multiple sensitization with airflow limitation had a higher percentage of parents with high school or below education and had exposure to indoor smoking. Abuabara et al. used the Pediatric Eczema Elective Registry data and LCA to identify patterns of disease control and found that income < $50,000/year was strongly associated with eczema persistence (odds ratio [OR] = 1.69, 95% CI 1.37–2.08)26. Shakerkhatibi et al. studied air pollution-related asthma profiles among children and adolescents in Iran using LCA27. They observed a higher probability of severe asthma (6.8%) in a case village located in a polluted industrial area compared to control communities (2.6% and 1.8%) with no potential of urban air pollution. Additionally, adjusted odds of asthma were lower in the control communities than the case community in both moderate and severe asthma classes with significant ORs ranging from 0.14–0.70 to 0.32–0.53, in the respective classes. Sbihi et al. used HAD of medical visits from British Columbia, Canada to define the occurrence and recurrence of asthma over a 10-year follow-up period28. Instead of LCA, the authors used a group-based trajectory modeling method to identify asthma trajectories. They found that an interquartile increase in exposure to NO2 increased the risk of membership in the early and late-onset chronic asthma trajectories and concluded that traffic-related air pollution increased the probability of a chronic asthma trajectory. Our study also identified that Classes 3 and 4 mother–child pairs (higher risk groups) were more likely to cluster in more deprived areas in the northwest region of the Greater Toronto Area (Fig. 1).

Consistent with literature reporting lower incidence of asthma, we also observed that children born in the 2000s, compared to the 1990s, had lower HRs for asthma and other allergic diseases29,30. The reasons behind this decrease in high-income countries (e.g., US, Canada, UK, Australia) remain unclear. However, others have hypothesized that improvements in air quality, improved primary care, higher breastfeeding rate, lower antibiotics’ use in infants, etc., may be contributing factors29,31,32,33,34,35.

There are limited longitudinal studies that examine how maternal characteristics, SES, neighbourhood, and environmental determinants influence child health outcomes. This study’s greatest strength is addressing this gap with a large longitudinal birth cohort called FAMILY. FAMILY merged three Ontario pediatric cohorts, then used HAD to efficiently identify cohort participants and their mothers, forming Classes of mother–child pairs, as well as participants’ siblings. We included siblings to expand the size of the FAMILY cohort to study child health outcomes among more children beyond the cohort participants, as well as included number of siblings in the household at the time of newborn’s birth as part of the concomitant variable in the LCA. Another strength is the use of HAD spanning decades to efficiently conduct longitudinal analyses with minimal challenges. This study also has limitations. First, we lack self-reported, clinical, and biomarker factors in the FAMILY cohort analyses. This is a consequence of consolidating varying, albeit rich, individual-level information from different pediatric cohorts that used different survey instruments. Another limitation is the lack of access to reliable information from HAD on maternal lifestyle factors, namely smoking or medication use during pregnancy, as well as linked data on fathers. Since we lacked longitudinal status of all variables, we did not use the latent transition analysis model, which could examine the variation over time and to identify the association of repeated measures.

In conclusion, using LCA to identify clusters of mother–child pairs, we found environmental exposures and neighborhood factors that were significantly associated with asthma, allergic disease health outcomes, and HSU while adjusting for multifactorial interrelationships. Findings can be shared with respiratory therapists/educators and primary health care providers to provide education on disease self-management and shared with policy makers to assist families with funding to support medication use. This is particularly important for children with asthma who have high acute care demands as it may be a “marker” of poor access to primary care or poor disease self-management. Our findings also raised awareness about the health risks of environmental exposures (air pollution and lack of greenspace), especially for children with Class 3 and 4 mothers, who have a higher risk of asthma and allergic diseases. Our findings may help stakeholders develop strategies or programs to reduce environmental exposures. Future studies with LCA and linked HAD may include medication and laboratory test utilizations (e.g., eosinophil blood tests) to gain insight on using these factors as predictors in child health outcomes.