Complement component C4 structural variation and quantitative traits contribute to sex-biased vulnerability in systemic sclerosis

Copy number (CN) polymorphisms of complement C4 play distinct roles in many conditions, including immune-mediated diseases. We investigated the association of C4 CN with systemic sclerosis (SSc) risk. Imputed total C4, C4A, C4B, and HERV-K CN were analyzed in 26,633 individuals and validated in an independent cohort. Our results showed that higher C4 CN confers protection to SSc, and deviations from CN parity of C4A and C4B augmented risk. The protection contributed per copy of C4A and C4B differed by sex. Stronger protection was afforded by C4A in men and by C4B in women. C4 CN correlated well with its gene expression and serum protein levels, and less C4 was detected for both in SSc patients. Conditioned analysis suggests that C4 genetics strongly contributes to the SSc association within the major histocompatibility complex locus and highlights classical alleles and amino acid variants of HLA-DRB1 and HLA-DPB1 as C4-independent signals.


Supplementary Figure 4 MHC region conditional association with Systemic Sclerosis
Association is calculated using logistic regression in the first dataset (N=26,633) with cohort, genetic background and sex as covariates and depicted as position (GRCh38) by significance (Manhattan plot) in grey if no additional covariates were used. The arrow marks the position of C4. A Manhattan plot with additional conditioning on 10 independent C4 eQTLs, obtained by forward selection in the first dataset, depicted in blue. B Manhattan plot with additional conditioning on 13 independent C4A exclusive eQTLs, obtained by forward selection in the first dataset, depicted in red. C Manhattan plot with additional conditioning on 12 independent C4B exclusive eQTLs, obtained by forward selection in the first dataset, depicted in orange. D Manhattan plot with additional conditioning on 10 independent C4 eQTLs (obtained by forward selection in the first dataset) plus 4 AAs of HLA-DPB1 (obtained by forward selection in the first dataset conditioning on 10 independent C4 eQTLs) depicted in blue. E Manhattan plot with additional conditioning on 10 independent C4 eQTLs (obtained by forward selection in the first dataset) plus 4 classical alleles of HLA-DPB1 and 1 classical allele of HLA-DRB1 (obtained by forward selection in the first dataset conditioning on 10 independent C4 eQTLs) depicted in blue.
The second row depicts comparisons of the conditional p-values against each other.

Supplementary Figure 5 MHC region conditional association with Systemic Sclerosis
Association is calculated using logistic regression in the first dataset (N=26,633) with cohort, genetic background and sex as covariates and depicted as position (GRCh38) by significance (Manhattan plot) in grey if no additional covariates were used. A Manhattan plot with additional conditioning on 16 independent C4 eQTLs (obtained by forward selection in the second dataset) and 8 independent amino acids of DRB1 and DPB1 (obtained by forward selection in the first dataset conditioning on 16 independent eQTLs) depicted in blue. B Manhattan plot with additional conditioning on 16 independent C4 eQTLs (obtained by forward selection in the first dataset) and 9 independent classical alleles of DRB1 and DPB1 (obtained by forward selection in the first dataset conditioning on 16 independent eQTLs) depicted in blue.
The second row depicts comparisons of the conditional p-values against each other. * = percentages are estimated due to missing data. Cohort 1 has 9% missing data with respect to limited and diffuse SSc and 11% missing data on antibodies. Cohort 2 has 50% missing data with respect to limited and diffuse SSc and 25% missing data on antibodies.      Depicted are seven models and their expression variance explained in the second dataset (N=857). Copy numbers (CN) were calculated from the imputed C4 alleles per individual as dosages.

Supplementary
Model 4 is the final model with only copy number information as predictors. Model 5 adds eQTLs by forward selection until no more eQTLs are found with p<0.01. Model 6 is an "eQTL only" model (forward selection until no more eQTLs are found with p<0.01). Model 7 is like model 6 but only eQTLs with suggestive association to SSc (pGWAS <10-5) are forward selected. All SNPs are GRCh38.  Allele frequencies are noted for both types of alleles and correlations greater 0.4 are highlighted.

R-squared adjusted
The column pGWAS(SSc) depicts the significance of assoctiation of HLA alleles to SSc as calculated by logistic regression in the first dataset (N=26,633) using 5 principal components, cohort and sex as covariates and p<5x10 -8 are depicted in bold. Allele names and codes are taken from Kamitaki Nature 2020