Introduction

IgA nephropathy (IgAN) is the most prevalent primary chronic glomerular disease worldwide. The clinical manifestation and progression of IgAN varies. The 20-year predicted survival without the need for dialysis was 96% among patients with no risk factors versus 36% among those with three factors: urinary protein excretion of more than 1 g per day, hypertension (>140/90 mm Hg) and severe histological lesions at the time of renal biopsy1,2,3. Thus, risk prediction is vital for disease prevention and refining prediction strategies remains important for targeting treatment recommendations4,5,6,7,8,9. One area of potential improvement has been the discovery of genetic markers for IgAN, as well as intermediate phenotypes, such as proteinuria and blood pressure.

Genetic factors undoubtedly influence the pathogenesis of IgAN, with an estimated heritability of 40%–50%10,11. Recent efforts using genome-wide association studies (GWASs) have identified genetic markers associated with IgAN12,13,14,15,16. In a study using a standardized seven–SNP genetic risk score (GRS), disease risk increased sharply with Eastward and Northward distance from Africa, which correlated with differences in disease prevalence among world populations. In addition, it explained 4.7% of overall IgAN risk and one standard deviation increase in the score was associated with nearly 50% increase in the odds of disease5. Thus, it strongly suggested that use of a multi-locus genetic risk score might be promising for prediction for disease susceptibility. As genetic backgrounds are stable, their presence may act over the entire life course9. However, it remains unknown whether the cumulative effects of variants identified by GWASs could benefit prediction of disease progression and treatment decisions17,18,19.

No best GRS model was recommended in the recent GRIPS Statement (recommendations for the reporting of Genetic RIsk Prediction Studies)20 and it was widely observed that the count method (risk allele counts, the total number of risk alleles an individual carries, or unweighted GRS) showed similar discriminative accuracy, but less complication in weighting process, compared with the log odds procedure (sums of the natural logarithm of the allelic odds ratio for each risk allele within and across loci, or weighted GRS) for most diseases21,22,23. Therefore, based on data from GWASs, we constructed both weighted and unweighted genetic scores24,25,26. We aimed to firstly construct models that were easy to interpret but were valid for risk prediction. Notably, a comparison with the pre-established seven–SNP GRS (a weighted score) was also conducted. The scores were then tested to assess their predictive ability in both disease/intermediate phenotype susceptibility and disease progression, using a Chinese Northern Han population. As the strongest association observed was with a subset of alleles encoding the class II Human Leukocyte Antigens (HLA), whereas several non-HLA loci also demonstrated genetic associations, both HLA allele scores and non-HLA allele scores were constructed to evaluate their respective role in specific sub-phenotypes.

Results

Association between single SNP selected and susceptibility to IgAN

As can be seen from Table 1, all the SNPs selected for further GRS analysis were associated with susceptibility to IgAN. The top seven associated IgAN alleles were also the SNPs reported in the previous GWAS conducted in our cohort, as well as the seven SNPs selected in previous seven-SNP GRS among different populations in geospatial risk analysis5,27. The association between the two novel SNPs selected from Southern Chinese Han GWAS and IgAN could also be replicated12. Although they showed less significant p values for disease association in the current study, they conferred similar risk effect compared with a previous report from a Southern Chinese population12. Odds ratio (OR) for rs2738048C and rs3803800G were 0.81 and 0.87 in our cohort, compared with 0.79 and 0.83 respectively in the previous report. Thus, the data implied that the associations between nine SNPs and IgAN were real and our current cohort could be a representative population for further risk stratification.

Table 1 SNPs used in the GRSs and their association with IgAN

Linkage disequilibrium (LD) analysis indicated that HLA variants rs9275224, rs2856717 and rs9275596 were in partial LD, with r2 ranging from 0.33 to 0.75; however, they were not in LD with two other HLA variants, rs9357155 and rs1883414 (r2 < 0.1). When the nine SNPs were included in a logistic model, they all showed significant associations with susceptibility to IgAN. Concordant with previous reports5, conditional analysis indicated that all nine SNPs were independently associated SNPs. However, no gene-gene interactions were observed among the nine SNPs, including the interaction between the CFHR3/R1(rs6677604) and the HORMAD2 loci (rs2412971) (p = 0.41), reported in the previous seven-SNP genetic score.

Individual association between single SNPs and clinical parameters of IgAN

The individual association between the nine susceptibility SNPs with clinical phenotypes of IgAN were assessed in our cohort. We observed that the risk allele A of rs3803800 was associated with an increased IgA (P = 3.91 × 10−3) level in sera, which was concordant with previous reports from Southern Chinese Han GWAS12,28. The serum IgA concentrations (g/L, mean ± standard derivation) were 3.15 ± 1.21, 3.18 ± 1.19 and 3.55 ± 1.33 for rs3803800 GG, AG and AA, respectively (Figure 1). We also observed associations of rs2412971 with serum IgA and IgA1 levels, rs1883414 with gross hematuria and hypertension (Table 2). Risk genotypes seemed to be associated with higher serum IgA or IgA1 level, higher frequency of gross hematuria or higher frequency of hypertension (Table 2). However, the effect size conferred by the risk genotype was only moderate and none of the associations survived the multiple-testing correction.

Table 2 Correlation of the SNPs and GRS with clinical phenotype in IgAN patients at renal biopsy
Figure 1
figure 1

Associations between genotypes of rs3803800 with serum IgA level.

Observed GRS in IgAN and controls

We constructed four different genetic scores involving different combinations of IgAN alleles, including GRS5 (five reported HLA alleles), GRS7 (five reported HLA alleles and two non-HLA alleles, which were the same as reported standardized GRS), GRS9 (five reported HLA alleles and four non-HLA alleles) and GRS4 (four non-HLA alleles). Every score could be weighted or un-weighted. For comparison, we also directly adopted standardized GRS as reported previously.

The distribution of unweighted GRSs (uwGRSs) between IgAN and controls were significantly different (Figure 2). The frequency of a higher uwGRS (more risk alleles) was higher in IgAN than in controls. With every 1-unit increase in the uwGRS or one copy increase of a risk allele, the disease risk increased by about 20% ~ 30% (Table 3). Using the difference value (differences of uwGRS between IgAN and controls, differences value = uwGRSIgAN − uwGRScontrol) as a risk function, the difference value of uwGRS5, uwGRS7, or uwGRS9 was much further from zero than that of uwGRS4 (non-HLA risk score). This might suggest that IgAN cases had one more copy of a risk allele than the controls, which was mainly from the HLA alleles.

Table 3 Risk of susceptibility to IgAN based on uwGRS
Figure 2
figure 2

Distribution of unweighted genetic risk score (uwGRS) between IgAN and controls.

(A) uwGRS5, (B) uwGRS7, (C) uwGRS9, (D) uwGRS4. The p values indicate comparison of cases and controls using a chi-squared test.

The data from the weighted GRS (wGRS) model (the risk score equations are shown in Table 4) was concordant with that from unweighted models. With one standard deviation increase in the score, disease risk increased about 40% ~ 60%. The OR for one standard deviation increase were 1.47, 1.60, 1.63, 1.42 and 1.68 for wGRS5 (OR = 1.47, 95% CI: 1.34–1.61, P = 8.83 × 10−16), wGRS7 (OR = 1.60, 95% CI: 1.45–1.76, P = 7.36 × 10−22), wGRS9 (OR = 1.63, 95% CI: 1.48–1.80, P = 5.66 × 10−24), wGRS4 (OR = 1.42, 95% CI: 1.30–1.56, P = 9.58 × 10−14) and standardized GRS (OR = 1.68, 95% CI: 1.53–1.84, P = 9.42 × 10−27), respectively. Examination of wGRS quartiles also suggested a pattern of increasing disease risk with each wGRS quartile. Using group 1 (lowest level of risk) as a reference group, quartile 4 had the highest odds of IgAN, with ORs of 2.37, 3.17, 3.34, 2.28 and 3.67 for wGRS5, wGRS7, wGRS9, wGRS4 and standardized GRS25, respectively. The trends across all categories were highly significant without restriction of the wGRS adopted (Table 5).

Table 4 Risk score equations for weighted genetic risk scores (wGRS) in the current study
Table 5 Risk of susceptibility to IgAN based on quartiles of wGRS

Observed GRS and clinical parameters of IgAN

We assessed the associations between the clinical parameters of IgAN, including proteinuria, hematuria, eGFR, hypertension, hyperlipidemia, hyperuricemia, CKD stage and Hass grade at the time of renal biopsy with cumulative genetic effects of identified SNPs from GWAS (Table 2). However, no clear associations were observed, except a marginally significant association between GRS4 and gross hematuria (p < 0.05). Consistent with data from individual association between single SNPs and clinical parameters of IgAN, significant associations between IgA and IgA1 levels with GRS were observed; the sera IgA level increased with increasing uwGRS or wGRS. The associations were more prominent considering GRSs that included non-HLA alleles (GRS4, GRS7 and GRS9 instead of GRS5), suggesting that the effect was mainly driven by non-HLA alleles. However, the associations became non-significant on multiple correction.

Association between genetic information and prognosis of IgAN

rs3803800 and GRS4 were marginally associated with indicators for prognosis, including natural log-transformed time averaged mean arterial pressure and eGFR slope in linear regression (Table 6). By univariate Cox regression analysis, GRS5, GRS7 and GRS9 were associated with disease progression to end stage renal disease (ESRD), in which uwGRS showed a minimal increase of sensitivity for association. Although it seemed that the relative risks were similar, uwGRS9 showed the most significant association with progression to ESRD.

Table 6 Correlation of the SNPs and GRS with prognosis of IgAN in follow-up

Consistent with previous reports, the statistics confirmed good discrimination between IgAN and controls regarding GRSs (AUC was about 0.6, p < 0.001)5(Table 7), in which GRS9 and standardized GRSs showed the better fit in model prediction. Using the Kaplan-Meier survival method with the optimal derived cut-off value (16 for uwGRS9 with a sensitivity 0.96 and specificity 0.93) identified by a receiver operator characteristic (ROC) curve, we observed a worse renal prognosis rate of 26.3% (Figure 3, p = 7.91 × 10−3) only in IgAN patients with uwGRS ≥ 16 at 10 years ESRD, compared with 12.1% in uwGRS < 16. When covariates of ACEI/ARB use and steroid use (yes or no) were introduced into multivariate Cox regression analysis, uwGRS9 ≥ 16 was still an independent predictor for ESRD in IgAN. The relative risks for uwGRS9 ≥ 16, ACEI/ARB use and steroid use were 2.52 (95% CI, 1.29–4.91, p = 6.68 × 10−3), 0.09 (95% CI, 0.04–0.23, p = 2.85 × 10−7) and 3.75 (95% CI, 1.90–7.41, p = 1.42 × 10−4). Regarding clinical parameters at the time of disease onset, including blood pressure, hematuria, proteinuria and renal pathology, the uwGRS9 ≥ 16 group also showed no significant difference compared with the uwGRS9 < 16 group (p > 0.05). Similarly, using standardized GRS ≥ mean + SD as the cut-off value, a marginally significant 10-year ESRD rate was observed in IgAN patients with standardized GRS ≥ mean + SD compared with that of standardized GRS < mean + SD (21.3% vs. 12.1%, p = 0.06).

Table 7 Comparison of different genetic risk scores in disease prediction
Figure 3
figure 3

Kaplan-Meier survival curves without ESRD/dialysis/death event, with time zero set at kidney biopsy and uwGRS9 ≤ 2 in IgAN patients.

Using the Kaplan-Meier survival method with the optimal derived cut-off values, we observed a worse renal prognosis rate of 26.3% only in IgAN patients with uwGRS9 ≥ 16 at 10 years ESRD, compared with 12.1% with uwGRS9 < 16.

Discussion

We tested previously established SNPs associated with IgAN in a large collection of Chinese patients. Although the two SNPs from a southern China GWAS were marginally associated with IgAN in our cohort, we validated that all nine SNPs could be replicated as associated with IgAN, suggesting their real genetic effect12,27.

To determine their cumulative effect, we constructed two genetic risk scores24, uwGRS and wGRS, with different portfolios of different SNPs, HLA alleles or non-HLA alleles or in combination, for IgAN and tested their correlation to ESRD events and their potential for disease prediction. Compared with single SNP, GRSs were more significantly associated with susceptibility to IgAN. All GRSs were associated with IgAN, in which the prediction power increased with increasing numbers of SNPs selected. However, IgAN cases could have one more copy of a risk allele compared with controls, mainly HLA alleles. Although a non-HLA uwGRS model (uwGRS4) could also discriminate IgAN from controls, the difference of uwGRS count between IgAN and controls were smaller than that of the HLA GRS model (uwGRS5). Compared with the HLA GRS model (GRS5), with increase of non-HLA alleles (GRS7 and GRS9), the prediction power increased only slightly. This suggested that HLA allele-based GRSs might have larger power in disease prediction than non-HLA allele-based GRSs. The other issue was the GRS calculation method24,25,26,27. Similar to previous reports21,22,23, we did not observe a highly significant difference between the two methods (uwGRS and wGRS) in disease prediction power, as shown by slightly different AUC (difference < 0.05) and Nagelkerke's pseudo R2 (difference < 0.01) scores using the same number of risk alleles. The data was consistent with previous reports: it mattered little in terms of discriminative accuracy whether genetic scores were constructed using the count method or the log odds procedure for most complex diseases with ORs for disease risk alleles similar and close to 121,22,23. uwGRS showed slightly lower p values compared with wGRS, suggesting that uwGRS might be chosen for risk stratification in IgAN. However, it may not be true for other disease. The ORs for IgAN risk alleles were similar and close to 1(<1.5); therefore, the weighted index seemed to have a marginal effect. If great discrepancies of ORs for IgAN risk alleles and risk alleles with larger effects were identified, the strategy may need change5,24,25,26,27. However, the seven–SNP genetic risk score showed a better fit in disease prediction than GRS7 and, occasionally, GRS9. There are several possible explanations: it was based on more samples (5 times larger than ours); it included a second kind of genetic interaction information; and the model was constructed using a stepwise logistic regression algorithm. Future evaluation of the seven–SNP genetic risk score in sub-phenotypes and disease prognosis in more widespread populations are still warranted.

For clinical parameter or sub-phenotype associations, we observed highly significant associations between GRS and serum IgA/IgA1 levels. The risk genetic group was consistently associated with increased IgA1 level. The data also validated associations between rs3803800 and serum IgA level, noted in a previous GWAS of IgA level conducted in a Chinese population12,28. Our data supported the notion that genetically deregulated IgA play a key role in the pathogenesis in IgAN4,6,10,11,29. However, less concordant or significant associations for other sub-phenotypes of IgAN were observed as associated with single or cumulative gene effects.

When the weak effects of the individual SNPs are considered together, we observed a strong and consistent effect on ESRD because of the GRS. The effect was independent of therapy with ACEI/ARB and corticosteroids. Consistent with a recent GRS study conducted in hypertension, which suggested that a blood pressure genetic risk score could be a significant predictor of incident cardiovascular events, the current data may further support the idea of prospects for genetic risk prediction in clinical practice17,18,19,24. We speculated that the genetic variants have cumulative effects on IgA deregulation involved in disease susceptibility and progression. Although power analysis indicated that we had about a 0.6–0.8 power to detect a two-fold increased risk considering clinical parameters and disease progression, assuming an α-level of 0.05 and allele frequency of 10%–30% (http://biostat.mc.vanderbilt.edu/PowerSampleSize), the effect size identified was far smaller than two and all the associations did not survive multiple testing. Thus, the data requires further widespread replications and functional investigations.

The strengths of the current study include the large sample size, the availability of complete genetic information and relevant covariates, the comparatively long follow-up period with a certain number of ESRD outcomes available for prospective analyses and adoptions of different GRS methods. Limitations include its single center experience and the inability to generalize to Southern Chinese Han and non-Chinese ancestry groups. The current GRS modeling was mainly based on genotyping data from a previous GWAS cohort; therefore, we cannot rule out the possibility of bias of GRS from over-fitted association, requiring future evaluation of GRS in more widespread cohorts and in prospective studies. We lacked the ability to adjust for time-varying clinical factors in disease progression. The proportion of variation explained by the SNPs remained low and the level of prediction for events was also relatively small. We lacked power to demonstrate associations with moderate genetic effects reported in a previous Southern Chinese Han GWAS.

In conclusion, we observed that GRSs comprising nine SNPs identified in a GWAS of IgAN were strongly associated with susceptibility to IgAN, in which HLA alleles contributed more than non-HLA alleles and uwGRS calculation was simpler than wGRS for prediction. The high risk GRS9 group (uwGRS9 ≥ 16) had a high risk of ESRD in follow-up, suggesting a need for early and positive intervention.

Methods

Study population

The case-control cohort analyzed in this study was the same as the previous Chinese Han cohort included in the GWAS27: 1,194 IgAN cases and 902 healthy controls recruited in the renal division of Peking University First hospital. Quality control was performed as described27. All cases carried a biopsy diagnosis of IgAN defined by typical light microscopy features and predominant IgA staining on kidney tissue immunofluorescence, in the absence of liver disease, vasculitis, Henoch–Schoenlein purpura, or other autoimmune diseases. This investigation was conducted according to the Declaration of Helsinki. All subjects provided informed consent to participate in genetic studies and the ethic review committee of Peking University First Hospital approved the study protocol.

Baseline and follow-up clinical phenotypes

Detailed phenotypic data from the patients, including degree of renal dysfunction, hematuria and proteinuria at presentation, total serum IgA and detailed biopsy findings (Haas staging), were collected at the time of renal biopsy at enrollment. Among the patients involved in the GWAS, 297 patients were followed for a mean of 5 years (range 1 to 15 years). An enzyme-linked immunosorbent assay quantified Serum IgA and IgA16. All patients received the same therapy regimen, including optimal blood pressure control target to less than 130/80 mmHg, RAS inhibition and steroids or other immunosuppressive agents for patients with persistent proteinuria. The blood pressure and proteinuria controls were expressed as time-average mean artery pressure or time-average proteinuria. The endpoint in this study was defined by diagnosis of ESRD or death. ESRD was defined as eGFR < 15 ml/min/1.73 m2 or need for renal replacement therapy (hemodialysis, peritoneal dialysis or renal transplantation). The eGFR was calculated using the Modification of Diet in Renal Disease (MDRD) formula30,31.

SNP selection

We firstly selected seven SNPs (Table 1), including five HLA SNPs and two non-HLA SNPs at five independent loci, which were independently associated with IgAN in the GWAS and they were selected in the GRS calculated in the previous report5,27.Another large GWAS conducted in a different Chinese Han population from Southern China identified additional IgAN associated non-HLA alleles; therefore, they were also selected for the current study. The additional non-HLA SNPs were rs2738048 (8p23), rs3803800 (17p13), rs4227 (17p13) and rs12537 (22q12)12. As the D′ between rs3803800 and rs4227 was 0.92 and that between rs3803800 and rs4227 was 0.91, indicating high linkage disequilibrium and possibly non-independent genetic effects, a seven-SNP model at the five independent loci and a nine-SNP model at the seven independent loci were constructed. The nine-SNP model included the novel IgAN associated variant rs2738048 and the missense variant rs3803800.

Genetic risk score

Two GRSs were constructed on an a priori basis. The first GRS using an unweighted approach (uwGRS) was the simple counts of the total number of risk alleles rather than weighting by the effect of each SNP, as the current data available may be insufficient to provide stable estimates for each effect of small magnitude24. The second GRS was the weighted-GRS (wGRS) that utilized the allelic odds ratios (OR) to account for the strength of the genetic association within each allele, because different IgAN alleles may have different odds ratios. The wGRS was the weighted sum of risk allele counts, where the weight for each SNP was the natural log of the OR25,26. Different ORs may be observed in different populations for the same allele; therefore, we adopted ORs observed in our current dataset. For comparison and cross-validation, we also directly calculated standardized genetic risk based on the seven SNPs associated with IgA nephropathy in the previous analysis of 10,755 individuals from 12 international case-control cohorts. A coded allele is an allele coded 0, 1, or 2 according to the number of copies of the target allele, as reported5,27. Individuals with 100% non-missing genotypes across all the scored loci were analyzed. Ultimately, 1190 cases and 899 controls were included in the current study.

Statistical analysis

We used logistic regression to study the association of each allele with the risk of IgAN, according to an additive log-odds model. We calculated a GRS5 that included five HLA alleles, a GRS4 that included four non-HLA alleles, a GRS7 that included five HLA alleles and two non-HLA risk alleles and a GRS9 that included five HLA alleles and four non-HLA risk alleles.

The difference in the distribution of uwGRSs between IgAN cases and controls was tested using the chi-squared test. To explore the observed patterns in more detail, we also divided the subjects into quartiles based on the GRS of controls and computed the proportions of cases and controls in each quartile. To assess whether risk was significantly different according to quartile, we performed logistic regressions that modeled the risk of disease as a function of each GRS quartile compared with the reference quartile. Finally, we calculated the odds for the top group (group 4) compared with the bottom group (group 1) as the referent group25,26.

Linear regression was applied for correlation analysis of natural log-transformed serum IgA levels, natural log-transformed proteinuria and natural log-transformed eGFR. Binary logistic regression was carried out for the correlation analysis of history of gross hematuria. Ordinal logistic regression was performed for the correlation analysis of clinical subtype, microscopic hematuria, CKD stage at the time of biopsy and Hass biopsy grade12,27.

To set the cut-off values between patients and controls and within the cohort of IgAN patients grouped as progressive cases versus non-progressive cases, we used ROC curve analyses to find the best compromise value between sensitivity and specificity; we also generated ROC curves by plotting the sensitivity of the GRS score against 1-specificity and calculated the area under the curve (AUC). As reported, the percentage of the total variance in the disease state explained by the risk score was estimated by Nagelkerke's pseudo R2 from the logistic regression model, with the risk score as a quantitative predictor and disease state as an outcome. The C-statistic was estimated as an area under the receiver operating characteristic curve provided by the above logistic model. The AUC statistics were compared using a non-parametric approach, as described previously4. The Kaplan-Meier survival method and Cox proportional hazards models were used to generate estimates of predicted risk of ESRD.

Descriptive statistics included mean (SD) and median (with range values).These analyses were carried out with SPSS Statistics version 16.0.