Introduction

Spirometry is a pivotal screening test for the diagnosis of patients with obstructive lung disease. The main spirometry values include forced vital capacity (FVC), and forced expiratory volume (FEV1)1,2. These measurements are generally compared with the percentage predicted values. The predicted data are acquired from a healthy non-smoker standard population3,4. However, the predicted normal data change widely from different sources leading to biased results5,6,7. This bias can be avoided by using the sex, age, height, and ethnicity-specific Z-score8.

In 2012, the Global Lung Initiative (GLI-2012) released spirometric norms derived from data collected from 72031 healthy individuals aged 3–95 years8. The GLI-2012 equations provided sex, age, height, and ethnic-specific reference equations as well as the lower limit of normal (LLN) values for spirometry.

In pulmonary function testing, the fifth percentile of all normal values (a Z-score of − 1.64) is defined as the lower limit of normal (LLN). Spirometry indices at the LLN would be observed in only 1 in 20 (5%) normal populations9.

The fit of the GLI-2012 norms has been tested and some countries approved them for their use to interpret the spirometry results, for example, in the Australasian10, Norwegian1, German11, and French12 populations. On the other hand, the GLI-2012 norms seem unsuitable for clinical use in the Swedish13, Finnish14, Brazilian15, Malaysian16, and Chinese populations17. Some countries, including Iran, have not standardized the GLI-2012 equations8, although external validation of the GLI-2012 norms is recommended7,8. Moreover, the applicability of the norms should be evaluated in other parts of the world to verify their suitability in these regions. No study has evaluated the applicability of the GLI-2012 norms in the Iranian population.

The present study aimed to evaluate whether the GLI reference values apply to the Iranian population.

Method

Design

A cross-sectional study was performed in Tehran, the capital of Iran between July 16 and August 27, 2019. This article was the result of a research project approved by the National Institute for Medical Research Development (NIMAD) (code: 978931, 2019/05/28) and the Ethics Committee (code: IR.NIMAD.REC.1398.257). All methods were performed in accordance with the relevant guidelines and regulations. Written consent was obtained from all participants.

Source population and sampling

The source population of this study was recruited from Tehran population presenting to health houses affiliated with Tehran Municipality. The sampling method was the randomized clustering method, and at least two health houses in each municipality district were selected (44 centers). The individuals who presented to health centers received an explanation about the study; then, they were asked to refer family members (3–95 years) if they were willing to join. In the second step, a short meeting was held with the household members or the household head in relation to the purpose of the study. Then, all population who wished to participate in the study were interviewed and screened. Written and informed consent was obtained from all participants.

Satisfied healthy non-smokers (current or past smokers defined as those who had smoked at least 100 cigarettes in life and/or had a previous lifetime exposure of > one pack-year of smoking) and participants without a history of any current airway or lung disease (breathlessness, cough, wheeze, ischemic heart disease, and rheumatic disorders) were included in the study. The subjects who were not eligible for a baseline spirometry test and those who reported respiratory symptoms including a cough, sputum, rhinorrhea, etc. within seven days prior to the examination were excluded from the study.

Sex (female, male), age (one decimal point), height (with a precision of 0.5 cm without shoes using an accurate stadiometer), weight (with a precision of 0.5 kg measured without a jacket, bag, veil (in women), wristwatch, and with empty pockets) were recorded.

Spirometric measurements

The advanced Spirobank II device (MIR, Rome, Italy) used in this study, and FEV1, FVC, FEV1/FVC, and FEF25–75% (forced expiratory flow averaged over the middle portion of FVC) were measured.

The spirometers were calibrated every morning and a minimum of three and a maximum of eight measurements were performed per subject. The measurements were made without the use of bronchodilators according to the American Thoracic Society/European Respiratory Society (ATS/ERS) recommendations3. The repeatability criterion was < 5% deviation from the second-highest value. From the three selected large values that were within 150 ml of each other, the largest measurement was chosen as the best.

Quality control

The spirometry software provided feedback on the acceptability of the technique and repeatability. For spirometry according to the HUNT3/YoungHUNT3, curves were graded as A–F partly in line with a recent study by Hankinson et al.18. All curves graded as A–C were included in the study, i.e., at least two acceptable blows with a less than 150 mL difference. The inter-and intra-observer agreements showed excellent results.

Sample size

According to the ERS/Global Lung Function Initiative (GLFI), representative samples of at least 300 subjects can be used for validation in groups not covered by the GLI equations8. Six hundred females and 300 males in different age groups (4–82 years old) were selected. The sample size for any center was calculated based on the proportion of population per regional municipality. Finally, by removing samples with grades D and F 622 subjects (204 males and 418 females) were fully eligible to enter the study.

Analysis

Using the Excel macro for GLI8, reference values, lower limits of normal (LLN), Z-scores, and percentiles for FEV1, FVC, FEV1/FVC, and FEF25–75% were calculated for each subject in the population. If the agreement between the observed values in the reference population and the GLI reference values is perfect, the mean Z-scores should ideally be zero, and the standard deviation (SD) should be one. According to the agreement reached by the GLI team and other studies validating these spirometric reference equations (SRE), a mean Z-score outside the range of ± 0.5 is considered clinically significant, corresponding to at least 5–6% difference in the specified lung function measurement8,13,16,19,20.

The mean values and standard deviations were calculated, and Z-score curve plots were drawn. Possible relationships between Z-scores and age, height, weight, and sex were examined using multiple linear regression models. If the GLI reference values are applicable, no such relationships exist.

LLN was defined as the lower fifth percentile in the distribution from which the GLI reference values are derived, as calculated by the GLI Excel macro, if not explicitly stated otherwise. The 90% limits of normality, which are expected to include 90% of the observations if the agreement is perfect, were defined as observations with GLI Z-scores within the − 1.645 to + 1.645 range8.

Ethical approval and consent to participate

This article was the result of a research project approved by the National Institute for Medical Research Development (NIMAD) (code: 978931, 2019/05/28) and the Ethics Committee (code: IR.NIMAD.REC.1398.257). Written consent was obtained from all participants.

Consent for publication

No personal information of the participants in the article was reported.

Results

Between 16 July and 27 August 2019, 900 participants completed spirometry measurements. After exclusions, 622 participants (204 males and 418 females) aged 4–82 years met the selection criteria for the reference sample.

The mean (range) age of men and women was 38.34 (4–82) and 44.55 (4–80) years, respectively. In this sample, the mean height was 1.72 (0.08) m in males and 1.58 (0.08) m in females aged over 21 years. Thirty-nine (19.2%) men and 131 (31.4%) women had a BMI ≥ 30 kg/m2. The demographic and clinical characteristics of the participants are described in Table 1.

Table 1 Demographical and clinical characteristics of the reference population.

The Caucasian GLI-2012 was applied to our population. Overall, the mean Z-scores of FEV1, FVC, and the FEV1/FVC ratio for males and females in the various age groups were higher than the Caucasian predicted values (range: 0.01 to 1.05) except for the FEV1/FVC in the age group under 21 years (range: − 1.11 to − 0.09).

The normal distribution curves of FEV1, FVC, and FEV1/FVC based on observed GLI Z-score mean and standard deviation values in men and women are presented in Fig. 1a–f.

Figure 1
figure 1

(af) Normal distribution curves of FEV1, FVC and FEV1/FVC based on observed mean GLI Z-score values in men and women.

The distribution of the Z-scores of FEV1, FVC, FEV1/FVC, and FEF25–75% stratified by sex and age in the reference sample of healthy individuals is shown in Table 2.

Table 2 Mean GLI Z-scores for FEV1, FVC, the FEV1 /FVC ratio and FEF25–75% by age group and sex (assumption test: compare means with zero).

The FEV1 Z-score was smaller than 0.5 in men and women aged 10–21, 22–29, and 30–39 years; however, its standard deviation was often above 1. Moreover, this value was below 0.5 in girls under 10 years old and men 40–49 years old and over 70. The FEV1 Z-score was not different from zero (by one-sample t-test analysis) in the age groups 10–29 and over 70 years in both genders (P > 0.05) (Table 2).

The FVC Z-score exceeded the predicted values (0.5) across age groups < 10, 30–39, 40–49, and 50–59 years in both genders, but it was below 0.5 in individual aged 10–21 (males), 22–29 (both gender), 60–69 (males), and > 70 (both gender) years old (Table 2).

The Z-score of FEV1/FVC was below 0.5 in all age groups except for the age group under 10 (both genders) and 60–69 (males) years (Table 2).

The mean Z-score of FEF25–75% was between 0 and − 0.5 in males in all age groups; the same finding was found in women aged 10–21, 40–49, 50–59, and 70–84 years (Table 2).

In the age group over 21 years, the Z-score of FEV1/FVC was below the LLN in seven men (3.43%) and eight women (2.01%). However, these values were significantly higher in six boys (46.2%) and eight girls (40.0%) under 21 years old.

Z-score of FEV1/FVC less than LLN was zero in women over the age of 60 and in men aged 22–39 and 60–69 years. It was also less than 5% in women aged 50–59 and men aged 30–59 years (1–3.4%).

Totally, the FEV1/FVC Z-score was above the upper limit of normal in 20 (9.8%) men and 69 (16.5%) women (ULN > 1.64).

The mean Z-score of FEV1/FVC above the upper limit of normal (ULN, > 1.64) ranged between 0% (age group 22–29 years) and 25% (over 70 years) in men and between 0% (age group 10–21 and over 70 years) and 28.3% (age group 50–59 years) in women.

The FEV1 Z-score was within the 90% limits of normality (− 1.64 to + 1.64) in 81.3% of the observations (83.7% in males and 80.4% in females). The corresponding figure was 78.9% for FVC (84.5% in males and 76.1% in females) and 93.3% (90.2% in males and 94.8% in females) for the FEV1/FVC ratio in the age group over 21 years.

The Z-scores of FEV1, FVC, FEV1/FVC, and FEF25–75% were analyzed according to age, height, weight, and gender using a linear regression model.

Age, weight, and height (but not gender) had an impact on the FEV1/FVC Z-score in univariate regression (P < 0.05). In multiple linear regression (in the presence of height, weight and age as variables with P < 0.2 in univariate linear regression) height and age remained associated (B-coefficient = 0.012 and 0.007; P = 0.008, P = 0.001 respectively) with the FEV1/FVC Z-score (Fig. 2a). There was a significant association between age and FEV1 Z-score (B-coefficient = 0.007; P = 0.011). In the multiple linear regression (in the presence of age, gender and height), age remained statistically significant (B-coefficient = 0.008; P = 0.006) (Fig. 2b).

Figure 2
figure 2

(ad) Association of GLI2012 mean Z-score values of FEV1, FVC, FEV1/FVC, and FEF25–75% with gender, age, and anthropometric variables according to multiple linear regression.

There was a significant association between gender and FVC Z-score (B-coefficient = 0.389; P < 0.001) in univariate linear regression; this association was maintained in the multiple regression model too (B-coefficients = 0.346; P = 0.004) (Fig. 2c).

Height and gender had an effect on the FEF25–75% Z-score (B-coefficients =  − 0.371 and 0.006; P < 0.001 and P = 0.021, respectively) in univariate linear regression. In the multiple linear regression model, gender had a significant relationship with the FEF25–75% Z-score (B-coefficient =  − 0.358; P = 0.001) (Fig. 2d).

The prevalence of COPD defined by spirometry based on the fixed ratio (FR) criterion increased with age from 0.8% in the age group 30–39 years to 16.7% in the age group > 70 years (Pfor linear by linear association < 0.001). The prevalence of COPD according to the LLN criterion did not follow a specific trend (Pfor linear by linear association = 0.749). There was a 98.13% agreement between FR and LLN method (Fleiss' kappa coefficient = 0.58, P < 0.001). The prevalence of COPD based on FR and LLN according to age and sex is presented in Table 3.

Table 3 Prevalence of COPD based on FR (FEV1/FVC < 0.70) and LLN (FEV1/FVC < LLN) criteria by age and sex (n (%)).

Discussion

This study was the first study performed on 622 healthy non-smoker Iranian children and adults to evaluate the use of the GLI-2012 reference values to interpret FEV1, FVC, FEV1/FVC and FEF25–75%.

When applying the GLI reference values to the reference population, the Z-scores were always closer to zero in men compared to women. The mean Z-score (SD) values of FEV1/FVC and FEF25–75% (in both sexes) were reasonable, although not perfectly, normally distributed but not centered on 0 (0.11 (1.03) and 0.17 (0.88) for FEV1/FVC, and − 0.11 (0.99) and − 0.49 (0.96) for FEF25–75% in males and females, respectively. Although the Z-scores of FEV1 and FVC were below 0.5 in men, they were between 0.5 and 1 in women. In addition, the SD of all measurements was below 1.5. The Z-scores of spirometry indices of some countries1,6,10,11,12,13,16,17 are shown in Table 4.

Table 4 Z-scores for spirometry indices of some countries (source populations: never-smokers without respiratory symptoms).

A number of studies have found that the use of the GLI equation was ideal in their population, including Spain where the Z-score (mean) of each parameter was close to 0 with a maximum variance of ± 0.521. Other populations in which the use of GLI-2012 equation has been approved include Norway, Australia, Germany, and America (Asian–Americans)1,10,11,17. Although a review of the Z-score of spirometry values in Sweden showed that these values were relatively appropriate, the authors emphasized the lack of equation matching13. Moreover, although it was reported that the GLI equation could be used in the French population aged 40–65 years, the standard Z values appeared be relatively large in women (FEV1 = 0.51, FVC = 1.3). Therefore, it seems that in approving or rejecting the fit of the equation with the source population, the authors’ opinion also matters such that some studies were strict while some were not (Table 4). In general, the GLI equation did not fit the source population in studies conducted in Tunisia, China, Malesia, India, and Sweden.

Comparison of the obtained values (FEV1, FVC, FEV1/FVC, and FEF25–75% Z-scores) with the predicted values according to the age group confirmed the fit of lung function parameters (except FEV/FVC for men) for subjects aged 10–21, 22–29 and 70–84 years.

The FEV1/FVC Z-score was always under 0.5 in all age groups except males and females under 10 years and men aged 60–84 years.

A few studies tried to standardize the spirometric measurements in certain age groups including children and adolescents. Some of these studies confirmed the fit of GLI-2012 reference values in their community. However, some other studies emphasized that reference equations did not match the spirometric data in their children. In a study (2013) conducted in white, black, and South-Asian schoolchildren aged 5–11 years in London, GLI-2012 reference equations properly fitted spirometric data in white and black races. These values were fit for South Asian children based on the Southeast Asian equation22. Among 712 healthy urban-dwelling 7–13 year-old Zimbabwean schoolchildren, the mean GLI2012 Z-scores of FVC, FEV1, FEV1/FVC, and maximal-mid expiratory flow (MMEF) were measured using different ethnic GLI 2012 modules. The mean African-American GLI 2012 Z-scores were within 0.5 Z-scores from zero for all the spirometry variables; however, the Z-score SD for the FEV1/FVC ratio was ≥ 1, indicating more variability than the reference population, thus affecting the performance of the African-American GLI2012 LLN in this population23.

Chang et al. studied the age group 5–18 years in Taiwan in 2019 and provided evidence that the GLI-2012 reference equations did not properly match the spirometric data in the current Taiwanese pediatric population, indicating an urgent need for an update of the GLI reference values by the inclusion of more data of non-Caucasian descent24. Moreover, in the study by Jones et al. in 2020, the equations currently in use in Brazil seemed to underestimate the lung function of Brazilian children aged 3–12 years25. The GLI-2012 reference values for spirometry were appropriate for healthy, well-nourished African school children in Angola, DR Congo, and Madagascar, but the lower limit of normal needed adjustment20.

In this study, lung function tests in 33 children (13 boys and 20 girls) aged 4–10 years were performed with acceptable quality of grades A to C. The FVC, FEV1, FEV1/FVC, and FEF25–75% values were measured, and almost all values (except for FEV1 in girls and FEF25–75% in boys) were above or below the expected values (> + 0.5 or <  − 0.5); however, the standard deviation was estimated to be less than one in all cases except for FEV1/FVC. Z-scores, FEV1/FVC and FEF25–75%.

The percentage of the population with the FEV1/FVC value below LLN in the age group over 21 years (in both genders) was appropriate (3.43% in men and 2.1% in women) and was less than 5%. In the age group 10–21 years, the percentage of population with the FEV1/FVC value below LLN was relatively appropriate in men (5.03%) but was higher than expected in women (13.3%). These values were extremely high (42.4% in boys and 40% in girls) in the under 10 age group.

The Z-scores derived from the GLI-2012 SRE showed significant association with gender, age, and anthropometric variables. In multiple linear regression, height and age had positive associations with FEV1/FVC (B-coefficient = 0.01 and B-coefficient = 0.007), age had a positive association with FEV1 (B-coefficient = 0.008), and gender had a relatively strong relationship with FVC (B-coefficient = 0.346) and FEF25–75% (B-coefficient = 0.358).

Limitation of the study

The number of sample sizes in age groups under 10 years old (13 cases) in boys and over 70 years old (8 cases) in women was under 15 cases.

Conclusion

The GLI reference values are not perfect for the Iranian population, especially in children under 10 years old and females. However, the Z-score of FEV1/FVC matched the predicted value in almost all age groups (except children below 10 years of age). GLI reference values were appropriate for subjects over 21 years, but their use would overestimate the prevalence of airway obstruction in the Iranian population under 21 years. The results of linear regression models showed that age, height and, gender were crucial for establishing prediction equations of four spirometric measurements, i.e., FEV1/FVC, FVC, FEV1, FEF25–75%.