Using latent class cluster analysis to screen high risk clusters of birth defects between 2009 and 2013 in Northwest China

In the study, we aimed to explore the synergistic effects of multiple risk factors on birth defects, and examine temporal trend of the synergistic effects over time. Two cross-sectional surveys conducted in 2009 and 2013 were merged and then latent class cluster analysis and generalized linear Poisson model were used. A total of 9085 and 29094 young children born within the last three years and their mothers were enrolled in 2009 and 2013 respectively. Three latent maternal exposure clusters were determined: a high-risk, a moderate-risk, and a low-risk cluster (88.97%, 1.49%, 9.54% in 2009 and 82.42%, 3.39%, 14.19% in 2013). The synthetic effects of maternal exposure to multiple risk factors could increase the risk of overall birth defects and cardiovascular system malformation among live births, and this risk is significantly higher in high-risk cluster than that in low-risk cluster. After adjusting for confounding factors using a generalized linear Poisson model, in high-risk cluster the prevalence of nervous system malformation decreased by approximately 2.71%, and the proportion of cardiovascular system malformation rose by 0.92% from 2009 to 2013. The Chinese government should make great efforts to provide primary prevention for those on high-risk cluster as a priority target population.

In Northwest China, to the best of knowledge, how to evaluate the synthetic effects of the risk factors related to birth defects and whether the synthetic effects of these factors are characterized by increasing, decreasing, or unchanging trend over time, are not known but are important for prevention and control of birth defects. Therefore, to address these aforementioned research questions, the aims of our present study were twofold: 1) to extract different latent maternal exposure clusters using latent class cluster analysis (LCCA) based on overall risk factors in 2009 and 2013 in Shaanxi province, and to explore the association between latent clusters and birth defects; 2) to evaluate the change in the prevalence of birth defects among latent clusters in Shaanxi province for the 5-year period during 2009-2013.

Methods
Study design and participants. In 2009 and 2013, two large cross-sectional population-based epidemiological surveys of birth defects among young children was conducted respectively in Shaanxi province, Northwest China. The target population of the two study were young children born within the last three years and their mothers. Considering the urban-rural disparity in social density and fertility rates of population in the whole province, a stratified multi-stage sampling method was adopted to determine the sampling units. Within China's rural areas, the 3-level administrative structure consists of county, township, and village. Independent of rural areas, districts, streets and communities in urban areas are another different three-level administrative structure. In 2009, six counties in rural areas and six districts in urban areas were randomly determined throughout the province, while the 20 counties and 10 districts were randomly selected in 2013. In 2009, 6 villages/communities each from 6 townships/streets were sampled randomly from each selected counties. Similarly in the rural areas in 2013, from each selected counties, 6 villages each from 6 townships were sampled randomly. However, in the urban areas in 2013, from each chosen districts, 6 communities in each of 3 streets were selected randomly for sampling. Thirty babies born within the last three years and their mothers in each sampled village in 2009, and, 60 in each chosen community in 2013 were selected using a random sampling method. Finally, we expect to recruit a total of 16200 and 32400 participants respectively in 2009 and 2013. However, 2927 and 2374 subjects of the randomly sampled population declined to participate in the 2009 and 2013 study (response rate: 81.9% and 92.7% respectively). Therefore, approximately 13273 and 30026 participants were eventually obtained in 2009 and 2013 survey.
Data collection. Data were collected at the local village clinics and community health service centers in 2009 and 2013 respectively. Ten investigation teams were established for field surveys and data collection. Each team consisted of 10 to 12 investigators and a supervisor from Xi'an Jiaotong University Health Science Center and a pediatric doctor from a local maternal and child health hospital. All fieldworkers were trained to standardize questionnaire administration, including lectures and practice in the field at least one month before commencement of the survey. During the survey, all fieldworkers were closely monitored by their supervisors and randomly examined. Once any errors and/or missing values were detected, subjects were required to be re-interviewed. Most important of all, our work was strongly supported by the local hospitals and health administrative departments as well as the Ministry of Health in Shaanxi province.
Only after obtaining written informed consent from all participants or their legal guardians, a face-to-face interview was performed by the field investigators using a precoded structured questionnaire, including birth defects, socio-demographic characteristics, reproductive history, family history, environmental risk factors, maternal illness during pregnancy and maternal drug use during pregnancy. Medical records at local hospitals including prenatal diagnostic test results, clinic diagnostic results, findings of physical examination, ultrasound imaging reports and medical history were used as final diagnosis references for birth defects. This study was reviewed and approved by the ethics committee of Xi'an Jiaotong University Health Science Center. All methods in the study were performed in accordance with the relevant guidelines and regulations. Study variables. According to the International Classification of Diseases, 10th Revision (ICD-10), birth defects were classified into eleven different categories, of which nervous system (codes Q00-Q07) and cardiovascular system (codes Q20-Q28) malformation were chosen as the main outcome variables in the study. The numbers of the whole birth defects, nervous system malformation and cardiovascular system malformation were the main outcome variables in generalized linear Poisson model. Socio-demographic characteristics included one continuous variable (parity) and six categorical variables (gender, maternal age, area, maternal education level, household economic level and residence during the pregnancy). The Demographic and Health Survey household wealth index (HWI) 11 was used to assess the household economic level of the participants. After an HWI was calculated based on the principal component analysis of 5 variables representing family economic level (housing conditions, type of vehicle, income resources, and type and number of household appliances), the SES of the participants was divided into thirds: low, medium, and high (poor, middle-income, rich, respectively). For data dimensionality reduction, in our study, all possible risk factors for birth defects were clustered into these seven different indicator variables, including no folic acid supplementation, genetic factor, maternal illness, adverse pregnancy outcomes, drugs use, environmental risk factors, and unhealthy lifestyle during pregnancy. The values ranging from 0 to N were obtained by summing the number of risk factors in each indicator variable as total risk factors score (Table 1).

Statistical analysis.
In this study, we merged the two 2009 and 2013 databases into a large one for pooled analysis. First, latent class cluster analysis was performed for the seven risk factor variables all of which had been standardized before analysis. LCCA refers to a collection of statistical approaches in which individuals are classified into unobserved subpopulations represented by a categorical latent variable which is not observed and must be inferred from the data. Indicator variables used to determine latent classes can be continuous, censored, SCiEntifiC REPORts | 7: 6873 | DOI:10.1038/s41598-017-07076-0 binary, ordered/unordered categorical counts, or combinations of these variable types 12 . This study was undertaken on continuous indicators, assuming multivariate normal distribution within latent classes. The model could be indicated as following: The equation assumes that x i is the observed response to variable i, µ ij and σ ij 2 are the mean and variance for variable i for an individual from class j. The proportion of persons in each of the classes is denoted by η j , and every individual probabilities of the membership in all of the latent classes could be estimated (when summed they equal 1), reflecting the varying degrees of certainty and precision of classification. In this study, means (µ ij ) and proportion of persons in each of the classes (η j ) would be estimated. According to Bayes' theorem, in the meanwhile, the posterior probability of assigning respondents to the j class, could be obtained and expressed as follows: The models with different numbers of classes are compared using information criteria (IC)-based fit statistics such as Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), Sample-size-adjusted (SSA)-BIC. Lower values on these fit statistics indicate better model fit. Entropy is a type of statistic that assesses the accuracy with which models classify individuals into their most likely class, and can range from 0 to 1, with  Table 1. The definition of latent profile analysis indicators*. * All risk factors were transformed into 0/1 variables, and then were summed as a total indicator variable.
higher scores representing greater classification accuracy. Studies confirmed that entropy ≥0.8 is related to at least 90% correct assignment, and considered adequate for the model 13 . Second, after controlling for socio-demographic characteristics, a generalized linear Poisson model was used to explore the association between exposure clusters and birth defects. In Poisson regression, the expression relating these quantities is Here, µ is the count of birth defects, and the regression coefficients β β β … , , are unknown parameters that need be estimated from the study data. Prevalence rate ratio (PRR) was used to indicate the magnitude of change, which would not be negative and <1 for a decrease and >1 for an increase. Then, the prevalence differences (PDs) in birth defects across different latent clusters were calculated. To evaluate the changing trends of differences in birth defects between different latent clusters over time, after adjusting for all possible confounding factors (ie, sociodemographic characteristics), the PDs across different latent clusters during 2009 to 2013 were obtained using a generalized linear model. Data was entered into

Results
Baseline characteristics of participants. In the study, only children who were alive at the time of sampling, were evaluated. Moreover, some participants were excluded from the study due to missing risk factors. As a result, a total of 9085 and 29094 young children and their mothers were enrolled in 2009 and 2013 respectively. The average age of children was 9.03 ± 5.03 months in 2009 and 16.88 ± 11.29 months in 2013 respectively (P < 0.001). Of the children, the proportion of boys was higher in 2009 than that in 2013 (55.96% and 54.35%, P < 0.05). The mean age of mothers was 25.88 ± 4.10 years and 28.03 ± 4.86 years respectively in 2009 and 2013 (P < 0.001). From 2009 to 2013, the percentage of mothers with educated beyond senior high school had increased from 25.52% to 38.25%. Based on tertiles of household wealth index, there was a different distribution of households in the poor, medium and rich category between 2009 and 2013. We also observed a higher proportion of participants from middle Shaanxi province in 2009 survey compared to 2013 survey. Amongst the mothers, the average parity differed significantly between 2009 and 2013. Any difference in residence during the pregnancy could not be observed in these two surveys. More detailed description of socio-demographic characteristics of participants was provided in Table 2. It was found that the rate of folic acid supplement was dramatically increased from 26.40% to 67.37% between 2009 and 2013. The proportion of exposure to risk factors among mothers including maternal illness, drug use, and adverse pregnancy outcomes consistently rose from 2009 to 2013, expect environmental risk factors and unhealthy lifestyles (Table 3).

Latent class cluster analysis.
To determine the most optimal number of classes for our study, we began by reviewing the IC indices presented in Table 4. From 1-class model to 4-class model, we observed that BIC, AIC and SSA-BIC first rose in 2-class model and then sharply decreased in 3-class model and slightly increased in four-class model ( Table 4). The various indices each consistently suggested the three-class model provides a significantly better fit, with the lowest BIC, AIC, and SSA-BIC. On review of the entropy values from one-class to four-class, all were more than 0.90, suggesting great classification accuracy. Although a four-class model might provide higher classification accuracy of the data compared to the three-class model, this additional class was composed of a relatively small number (proportionally, <1.0%) of participants. This, coupled with a preference for the AIC, BIC, SSA-BIC values, and parsimony, led us to choose a three-class model as the optimal model for further analysis in the study. Figure 1 presented a clear depict of standardized means of seven indicators across three latent clusters. Cluster 3 had a 13.09% of participants (n = 4996) and the largest standardized means of most indicators, including not folic acid supplement, genetic factor, adverse pregnancy outcomes, and maternal illness during pregnancy, drugs use during pregnancy, and maternal unhealthy lifestyles during pregnancy. Therefore, the cluster 3 could be considered as high-risk cluster in which children were more likely to suffer from birth defects. A total of 32063 participants in cluster 1 accounted for the vast majority of the whole population (83.98%) in the study. Cluster 1 had the lowest standardized means of almost all seven indicators, and could be interpreted as low-risk cluster. Cluster 2 consisted of 1120 participants with a proportion of 2.93% in total, but comprised of the highest level of exposures to environmental risk factors during pregnancy and the second highest level of not folic acid supplement, genetic factor, adverse pregnancy outcomes, maternal illness during pregnancy and maternal unhealthy lifestyles during pregnancy, so was considered as moderate-risk cluster.
Relationship between birth defects and three latent clusters. According to latent class cluster analysis based on the pooled data, the proportions from cluster-1 to cluster-3 in 2009 was 88.97%, 1.49%, 9.54% respectively, and similarly in 2013 were 82.42%, 3.39%, 14.19% respectively, between which we found a significant difference in the distribution of the three latent clusters (P < 0.001). To explore the impact of identified latent clusters on the birth defects in 2009 and 2013, a generalized linear Poisson model was employed to control for socio-demographic characteristics. Table 5 showed that with cluster-1 as the reference group, the prevalence ratio rate (PRR) of birth defects in cluster 3 was 1.93 (95%CI: 1.26, 2.95) in 2009 and 2.28 (95%CI: 1.91, 2.73) in 2013, but PRR in cluster 2 was not significant in both years. It was obvious in 2009 rather than 2013 that the higher the education level of mothers was, the PRR was lower. In 2013, higher economic level of mothers was associated with the lower risk for birth defects (PRR: 0.80, 95%CI: 0.67, 0.96), but the permanent residents were more likely to suffer from birth defects (PRR: 1.56, 95%CI: 1.26, 1.93). Moreover, the regional differences in the prevalence of birth defects were also found with higher prevalence in Middle and South Shaanxi than that in North Shaanxi in both 2009 and 2013.
Further, we also investigated the relationship between cardiovascular system malformation and latent clusters. Compared to cluster 1, in these two surveys, the higher PRR of cardiovascular system malformation in cluster 3 was observed. The permanent residents from South Shaanxi were also important contributor of cardiovascular system malformation. Higher education level of mothers was correlated with the decreasing risk of cardiovascular system malformation in 2009. Nervous system malformation was also entered into the study, but any significant differences were not found among these socio-demographic characteristics except for survey areas.

Prevalence of birth defects across identified latent clusters between 2009 and 2013.
In our survey population, the prevalence of birth defects among young children had increased from 1.72% in 2009 to 2.16% in 2013, an increment rate of nearly 0.44%. With regard to nervous system malformation, the prevalence was decreasing from 0.14% to 0.09%, a descent rate of 0.05%. Especially for cardiovascular system malformation with a relatively high growth rate of 0.38%, the prevalence also had rose from 0.39% to 0.77% over the same time.
Further, we examined the differences in the prevalence of birth defects, cardiovascular system malformation, and nervous system malformation among three latent clusters between 2009 and 2013 ( Table 6). It was clear that the rate of any birth defects and cardiovascular system malformation was higher in cluster 3 than that in cluster 1 in both two years. After adjusting for confounding factors using a generalized linear Poisson model, the results showed that in high-risk cluster the prevalence of nervous system malformation decreased by approximately 2.71%, and the occurrence rate of cardiovascular system malformation rose by 0.92% from 2009 to 2013. In high-risk cluster, however, the whole prevalence of birth defects had not been increased in the same period. In the meanwhile, the Prevalence differences (PDs) of birth defects and cardiovascular system malformation between cluster 3 and cluster 1 were increased by 0.60% and 0.85% respectively over the studied period. On the contrary, the prevalence difference in nervous system malformation between latent cluster 3 and cluster 1 reduced approximately 1.00% from 2009 to 2013, but it was not statistically significant.     In high risk cluster, in particular, the prevalence of cardiovascular system malformation progressively increased over time, and the rate of nervous system malformation reduced in the same period. First, our results confirmed that the prevalence of the whole birth defects, cardiovascular system malformation and nervous system malformation in high risk cluster were the highest among the three latent clusters. Compared with the low-risk cluster, it was found that women in the high-risk cluster were nearly twice more likely to have offspring with birth defects after adjusting for maternal socio-demographic variables. In the meanwhile, the likelihood of cardiovascular system malformation in the high-risk cluster was three times than that of low-risk cluster. This could be due to the fact that the high risk cluster comprised of the highest value of almost all risk factors, such as not folic acid supplement, genetic factor, adverse pregnancy outcomes, and maternal illness during pregnancy, drugs use during pregnancy, and maternal unhealthy lifestyles during pregnancy. It suggested that the strongest synthetic effects of all possible risk factors for birth defects had been observed among high-risk cluster. To screen high-risk groups of expectant mothers who live in high-risk areas using LCCA will be useful for future government-led, integrated interventions for birth defects.
Second, the changes in the prevalence of cardiovascular system malformation and nervous system malformation in high risk cluster in Shaanxi province during 2009-2013 were obvious after adjusting for socio-demographic factors using the generalized linear Poisson model. In the high risk cluster, the prevalence  Table 5. The association of birth defects with baseline characteristics between 2009 and 2013 * . * Prevalence Rate Ratio (PRR) and 95% confidence interval are given to indicate the magnitude of change.
of cardiovascular system malformation was increased by approximately 0.92% from 2009 to 2013, but the rate of nervous system malformation was reduced by around 2.71% in the same period. The bulk of epidemiological evidence suggested that have long been linked to folic acid supplementation [14][15][16] . In our study, a small proportion of mothers (28.05%) received periconceptual folate in high-risk cluster in 2009, but this percentage increased to 70.89% in the same cluster in 2013. This can largely contribute to the reduction of nervous system malformation among children in high risk cluster. However, we also observe the rate of exposure to other risk factors such as maternal medical history, drug use, previous adverse pregnancy outcomes, and unhealthy lifestyles among mothers, have risen during 2009-2013, which could explain in part the increment of cardiovascular system malformation for a 5-year period. Further studies will be required to corroborate these findings and, if confirmed, to elucidate possible reasons for these changes. Third, after controlling for socio-demographic characteristics using generalized linear Poisson model, the disparities in the rates of birth defects and cardiovascular system malformation between low risk cluster and high risk cluster were increasingly expanded during 2009-2013. This results showed the birth defects might more concentrated in the high-risk cluster over time rather than low risk or medium risk cluster. It could conclude that compared to low-risk cluster, the synthetic effects of all possible risk factors on birth defects were more strengthened among high-risk cluster in surveyed areas over time. Therefore, it was necessary to pay more attentions to those in the high risk cluster in terms of implementing interventions targeting behavior changes and environmental risk factors control.
Furthermore, socio-demographic factors have been found to play an important role in the presentation of birth defects. Maternal education has also been implicated in the incidence of birth defects 17,18 . In our study, similarly, a lower percentage of birth defects or cardiovascular system malformation was found among women with high education level compared to those without education. This could be one of the reasons that highly educated mothers are more likely to have more knowledge of prenatal care, higher awareness regarding periconceptual supplementation with folate, and abstention from certain environmental risk factors that plays a major role in the incidence. Subjects with the highest household SES index had the lowest risks of birth defects, which is consistent with the previous studies [19][20][21] . In this study, higher maternal parity was also associated with birth defects and cardiovascular system malformation. Due to reproductive organ damage and dysfunction caused by an excessive number of parturitions 22 , it was inferred that it may increase the risk of birth defects. For example, a slightly larger rate of birth defects among infants born to mothers with three or more prior births than those born to first-time mothers was showed in Texas study between 1999 and 2003 23 . There was a significantly lower rate of birth defects and cardiovascular system malformation in floating populations than that in permanent populations, which was  consistent with our previous study 24 . In this study, we also observe a significant regional difference in the rate of birth defects. For example, a higher prevalence of birth defects and cardiovascular system malformation in south and central Shaanxi in contrast with that in south Shaanxi. The benefit of this LCCA is that it gives us the ability to assess the impact of groups of maternal exposures not in isolation but in patterns or groups, as they commonly existed. Based on LCCA, however, previous researches more focused on the association between dietary patterns and birth defects, including CHDs, spina bifida, cleft lip, and hypospadias [25][26][27][28][29][30] . To date, there are only two published researches that used LCCA to explore the synergistic effects of multiple risk factors and to screen high-risk groups in a high-risk birth defects area. In a cross-sectional study, Cao et al. employed the LCCA to identify maternal exposure clusters, and explore the association between clusters of risk factors and birth defects 31 . Similarly in another cross-sectional study, Zhu et al. also introduced the latent class model to examine differences in how factor profiles and single factors are related to birth defects 32 . However, our study has combined the two surveys for pooled data and extended the LCCA to evaluate the trend in the prevalence of birth defects among latent clusters over time.
Further, another strengths of LCCA is that allows adjustment for covariates, quantification of the uncertainty of class membership, and assessment of goodness of fit. The use of maternal exposure factors as continuous variables enable the study to avoid some loss of information of categorizing the data in previous study 32 . Finally, associations between latent clusters and birth defects were adjusted for socio-demographic covariates, which were possibly associated with latent clusters. However, we have to acknowledge some limitations needed to address in the study. Frist, due to the cross-sectional design of the study, the causal relationships between environmental risk factors and birth defects were not confirmed, and the results should be explained cautiously. Second, the identified latent clusters may be subjected to unavailable information on mothers' diet, which might influence the generalizability of our study. Third, the research lack of data on related genes that was highly associated with the increasing risk of birth defects in the context of genotype-phenotype associations. Despite these limitations, this study has still provided important information on the synergistic effects of multiple risk factors on the birth defects in Northwest China, and filled a gap in temporal trend of the synergistic effects of birth defects among different risk clusters.
In conclusion, our study introduced LCCA to evaluate the synthetic effects of the risk factors related to birth defects and investigate the temporal trend of the synthetic effects of these factors from 2009 to 2013. According to latent class cluster analysis based on the pooled data, the population was classified into three different latent clusters, including high-risk cluster, medium-risk cluster and low-risk cluster. The synthetic effects of maternal exposure to multiple risk factors could increase the risk of overall birth defects and cardiovascular system malformation among live births, and this risk is significantly higher in high-risk cluster than that in low-risk cluster. In high risk cluster, the prevalence of cardiovascular system malformation progressively increased over time, but the rate of nervous system malformation reduced in the same period. The primary prevention of birth defects targeting combined behavioral change and environmental risk control, should be provided for women of reproductive age in study area, particularly for high-risk cluster with the greatest needs.