Novel HLA class I associations with HIV-1 control in a unique genetically admixed population

Associations between HLA class I alleles and HIV progression in populations exhibiting Amerindian and Caucasian genetic admixture remain understudied. Using univariable and multivariable analyses we evaluated HLA associations with five HIV clinical parameters in 3,213 HIV clade B-infected, ART-naïve individuals from Mexico and Central America (MEX/CAM cohort). A Canadian cohort (HOMER, n = 1622) was used for comparison. As expected, HLA allele frequencies in MEX/CAM and HOMER differed markedly. In MEX/CAM, 13 HLA-A, 24 HLA-B, and 14 HLA-C alleles were significantly associated with at least one clinical parameter. These included previously described protective (e.g. B*27:05, B*57:01/02/03 and B*58:01) and risk (e.g. B*35:02) alleles, as well as novel ones (e.g. A*03:01, B*15:39 and B*39:02 identified as protective, and A*68:03/05, B*15:30, B*35:12/14, B*39:01/06, B*39:05~C*07:02, and B*40:01~C*03:04 identified as risk). Interestingly, both protective (e.g. B*39:02) and risk (e.g. B*39:01/05/06) subtypes were identified within the common and genetically diverse HLA-B*39 allele group, characteristic to Amerindian populations. While HLA-HIV associations identified in MEX and CAM separately were similar overall (Spearman’s rho = 0.33, p = 0.03), region-specific associations were also noted. The identification of both canonical and novel HLA/HIV associations provides a first step towards improved understanding of HIV immune control among unique and understudied Mestizo populations.

Mexican and Central American cohorts. Three thousand two hundred and thirteen HIV-1 clade B-infected, Antiretroviral Treatment (ART)-naïve individuals from Mexico and from 6 out of 7 Central American countries were enrolled by convenience sampling from 2000 to 2016 as part of an international multicenter cross-sectional study to evaluate HIV molecular epidemiology, drug resistance prevalence and HLA adaptation in Mesoamerica. Individuals were enrolled and donated a single blood sample at the time of HIV diagnosis or at follow-up visits prior to starting ART according to national guidelines. Every HIV-infected person naive to ART attending each participating clinic was offered the opportunity to participate during active recruitment periods. No additional exclusion criteria were applied. In Mexico (MEX cohort), 1679 participants were enrolled in a national collaborative network of clinics from 23 out of 32 Mexican states comprising Baja California, Campeche, Chiapas, Chihuahua, Colima, Guerrero, Hidalgo, Jalisco, Mexico City, Michoacan, Morelos, Nuevo Leon, Oaxaca, Puebla, Queretaro, Quintana Roo, Sinaloa, Sonora, State of Mexico, Tabasco, Tlaxcala, Veracruz and Yucatan. From Central America (CAM cohort), 1534 subjects were recruited, including 418 from Guatemala, 102 from Belize, 42 from El Salvador, 402 from Honduras, 254 from Nicaragua and 316 from Panama, using convenience sampling. Participating institutions in Central America included: Guatemala: Roosevelt Hospital, Guatemala City (a national referral center) 31 ; Belize: Ministry of Health, Belmopan; El Salvador: Rosales National Hospital, San Salvador; Honduras: University School Hospital, Tegucigalpa; National Cardio-Pulmonary Institute, Tegucigalpa; Mario Catarino Rivas Hospital, San Pedro Sula; Atlántida Hospital, La Ceiba; South Hospital, Choluteca (five of the largest HIV clinics across the country) 32 ; Nicaragua: Roberto Calderón Hospital, Managua (the largest reference center in the country) 33 ; Panama: Gorgas Memorial Institute for Health Studies, Panama City (a national reference center) 34 . Demographic data were obtained via questionnaire at the time of sample donation. Blood samples, completed consent forms and demographic questionnaires were shipped to the Centre for Research in Infectious Diseases (CIENI) of the National Institute of Respiratory Diseases (INER) in Mexico City, a WHO-accredited laboratory for HIV genotyping, within 72 hours of collection. HIV clinical information (baseline drug resistance test, plasma viral load [pVL] and CD4 counts) were sent back to the different Mexican states or countries in Central America for clinical follow-up of the participants.

HOMER cohort. The HAART Observational Medical Evaluation and Research (HOMER) cohort 35 from
British Columbia, Canada, a historic retrospective observational cohort comprising HIV-infected antiretroviral naive individuals initiating their first combination antiretroviral treatment regimen since 1996, was used as a reference for HLA allele frequency and HLA-pVL associations comparison. Clinical measurements from the earliest available time point before initiating ART were used. pVL in the HOMER cohort were performed with either Roche COBAS AmpliPrep/COBAS AMPLICOR HIV-1 MONITOR UltraSensitive Test, version 1.5 (1996-2008) or Roche COBAS AmpliPrep/COBAS TaqMan v1 HIV-1 Test (2008)(2009)(2010). Median pVL and CD4 counts in the HOMER cohort were 4.90 (IQR 4.33-5.26) log 10 copies/mL and 340 (IQR 170-500) cells/µL respectively. The cohort was predominantly male (86.2%), and the median age at recruitment was 37.4 (IQR 32-44) years. HOMER is primarily composed of Caucasian individuals (>60%); a minority of individuals self-identified as Hispanic (~2%) or Indigenous/Amerindian (~13%) 35 . The present analysis was restricted to subtype B-infected persons (n = 1622; 90% of total cohort 35 ). As described elsewhere 29,30 , the majority of HLA class I types were defined at subtype level resolution, with the remaining intermediate-resolution data imputed to subtype-level resolution using a machine learning algorithm trained on HLA-A, B and C subtypes from >13,000 individuals with known ethnicity 36 . HIV subtyping. HIV  (protease and reverse transcriptase) sequences, obtained as described elsewhere 31 . All non-subtype B-infected individuals were removed prior to analysis.
HIV clinical parameters. HIV pVL was determined by automated real-time polymerase chain reaction (PCR) using the m2000 system (Abbott, Abbott Park, IL, USA) with a detection limit of 40 HIV RNA copies/ mL. CD45 + , CD3 + , CD4 + and CD8 + cell counts were obtained by flow cytometry using the Trucount Kit in FACSCanto II instrument (BD Biosciences, San Jose, CA).
HLA class I typing. Peripheral blood mononuclear cells (PBMCs) were isolated by density gradient centrifugation (Ficoll-Paque Pharmacia, Uppsala, SE) from blood samples from MEX cohort, while buffy coats were isolated from CAM cohort blood samples and cryopreserved until DNA extraction. Total genomic DNA was extracted from PBMCs (∼6 million cells) or buffy coats (200 µL) using the QIAmp Blood Mini Kit (QIAGEN, Valencia, CA), according to the manufacturer's specifications. HLA-A, -B and -C types were resolved at subtype-level (e.g. second field/4-digit) resolution using a previously described protocol with some modifications 38 . Briefly, a nested polymerase chain reaction (PCR) using universal, locus-specific primers was used to amplify a ∼1000 base pair region spanning exons 2 and 3 (which encode the α1 and α2 domains of the HLA peptide binding groove) of HLA-A, -B and -C loci, using the Expand High Fidelity PCR system (Roche Applied Science, Laval, PQ) (3.5 U/µL). For HLA-A PCR reactions, 5% dimethyl sulfoxide (DMSO, Sigma-Aldrich, St Louis, MO) was added to decrease unspecific amplification and primer dimerization. HLA-B PCR products were cleaned up with ExoSAP-IT (Affimetrix, Cleveland, OH) and HLA-A and -C PCR products were diluted 10-fold and directly sequenced using a set of six sequencing primers per locus as previously described 38 on a 3730xl Genetic Analyzer (Applied Biosystems, Foster City, CA), using the BigDye Terminator v3.1 chemistry (Life Technologies, Carlsbad, CA). Sequences were trimmed with Sequencing Analysis software v5.4 (Applied Biosystems) and HLA subtypes were assigned using UType v7.1 RUO (Applied Biosystems) using the up-to-date IPD-IMGT/HLA Database (http:// www.ebi.ac.uk/ipd/imgt/hla/). Using this procedure, ambiguous HLA assignments can arise due to the presence of polymorphisms outside of the analyzed region (exons 2 and 3). For these cases, ambiguous HLA pairs were managed as G groups, including: A*74:01:01G (A*74:01 in the text, encompassing A*74:01 and A*74:02; difference in exon 1), C*18:01:01G (C*18:01 in the text, encompassing C*18:01 and C*18:02; difference in exon 5), C*17:01:01G (C*17:01 in the text, encompassing C*17:01, C*17:02 and C*17:03; difference in exons 1 and 5). Ambiguous HLA assignments may also arise due to the lack of phase resolution as a result of bulk (direct) sequencing of PCR amplicons. Considering only HLA ambiguities that affect the first (allele group-level) and second (subtype-level) HLA fields and that included either two common alleles or one common and one well-documented subtype present in the Common and Well-documented catalogue 39 , a total of 142 HLA-A, 161 HLA-B, and 104 HLA-C pairs were ambiguous due to lack of phase resolution (Supplementary Table S1). These ambiguities were resolved by assigning the most probable combination using HLA allele frequency (AF) and linkage disequilibrium (LD) data from the same cohorts. All HLA haplotypes were confirmed using the same published probabilistic method used for the HOMER cohort 36 (HLA Completion web tool, http://boson.research.microsoft.com/hla/). Only 72 (of 6392, 1.12%) missing HLA-A or HLA-C types were imputed due to PCR/sequencing failure at these loci or lack of sample availability, using the same tool. HLA pairs with an unresolved HLA-B allele (23/3213 pairs, 0.71%) were considered missing data. As previously described 40 , our HLA typing methodology was validated to be 99.9% accurate in Mesoamerican Mestizo populations by comparing assigned HLA types to those obtained via amplification of exons 1 to 8 (for HLA-A and HLA-C) and exons 1 to 7 (for HLA-B) followed by next generation sequencing (TruSight HLA v2 Sequencing Panel, Illumina, San Diego, CA) in a large independent Mexican cohort (n = 323). Raw HLA typing data are available upon direct request to the authors.
HLA linkage disequilibrium analysis. LD between pairs of HLA alleles was assessed using the Los Alamos HIV Molecular Immunology Database HLA Analysis Tools (https://www.hiv.lanl.gov), with multiple comparisons addressed via Bonferroni correction. For the MEX cohort LD analysis, 12,970 two-way comparisons were performed (p-values < 3.9E-06 were considered significant), and for the CAM cohort 14,234 two-way comparisons were performed (p-values < 3.5E-06 were considered significant). The high-dimensional visualization tool Disentangler 41 was used to graph HLA haplotype structures.
HLA allele frequency comparison. HLA AF were calculated by direct gene count (denominator 2n). All HLA subtypes with AF > 0.001 in at least one cohort were compared using two-tailed Fisher's exact test; here, multiple comparisons were addressed using q-values, the p-value analogue of the false discovery rate 42 . Results with p < 0.05 and q < 0.2 were considered statistically significant.
Univariable and multivariable analyses of HLA-HIV clinical parameter associations. HLA allele associations with five clinical parameters previously described to be predictive of HIV disease progression were investigated. These included pVL 43 and absolute CD4 count 44 , used in routine clinical monitoring of HIV infection, as well as CD4 percentage 45 , CD4/CD8 ratio 45 , and a proxy variable called Z-score that combines information from both CD4 count and pVL 46 47 , where associations with p-values < 0.05 and q-values < 0.2 were considered significant. We additionally instituted a scoring system where, for each HLA allele investigated, we summed its total number of significant protective and risk associations (each assigned + 1 and −1, respectively), such that final scores could range from +5 to −5. Alleles with no significant associations were assigned a score of 0. Two multivariable linear regression models were also constructed to account for potential confounders of HLA-HIV associations, after which the overall 5-parameter scores were adjusted accordingly. First, independent models were constructed relating each HLA allele to each HIV clinical parameter, while adjusting for gender, age, geographical origin (country/region coded as n-1 binary variables) and the effect of the most significant HLA associations for that parameter (defined as the HLA alleles with p < 0.001 in the corresponding Mann-Whitney univariable analysis). Specific HLA alleles included in each model are listed in Supplementary Table S9. Second, we constructed models adjusting for all HLA subtypes in significant (p < 0.05 and q < 0.2) linkage disequilibrium with every HLA allele associated with an HIV clinical parameter in that cohort. Statistical analyses were undertaken using Stata/MP v14.1 (StataCorp, College Station, TX) and R v3.3.3 48 .

MEX and CAM cohort characteristics.
Overall, the clinical and demographic characteristics of the Mexico (MEX; n = 1679), Central America (CAM; n = 1534) and combined (n = 3213) cohorts (Table 1) are consistent with the frequent diagnosis of HIV in advanced infection in Latin America and an concentrated epidemic mainly in men who have sex with men 49 . Median pVL in MEX was significantly higher (p < 0.00001) than that in CAM (4.72 versus 4.57 Log 10 HIV RNA copies/mL), though CD4 T cell counts and CD4/CD8 ratios did not differ significantly between the cohorts (median 315 cells/µL and ~0.28, respectively). CD4+ T cell percentages and Z-scores 46 also differed marginally between cohorts (p < 0.01).   Tables S6-S7).

Unique immunogenetic profiles of HIV-infected individuals in
Overall, these findings highlight the immunogenetic uniqueness of Mesoamerican Mestizo populations and support them as ideal for identifying novel HLA correlates of HIV control.

HLA associations with HIV pVL in MEX/CAM include both canonical and novel associations.
Given the significantly different HLA frequency distributions between MEX/CAM and HOMER, we hypothesized that HLA associations with HIV parameters would also differ markedly. We explored this by identifying HLA allele associations with the well-characterized marker of HIV disease progression pVL 43 in both cohorts, in a univariable analysis. At the predefined statistical threshold of q < 0.2, we identified thirteen HLA alleles (5 HLA-A, 5 HLA-B, and 3 HLA-C) significantly associated with lower pVL, and four alleles (2 HLA-A, and 2 HLA-B) associated with higher pVL in HOMER, compared to sixteen HLA alleles (1 HLA-A, 7 HLA-B, and 8 HLA-C) associated with lower pVL, and twelve alleles (4 in HLA-A, 5 in HLA-B, and 3 in HLA-C) associated with higher pVL in MEX/CAM (Fig. 3). A number of these associations were consistent across cohorts: in all cases these were HLA alleles previously reported to be associated with HIV progression, including the canonical protective alleles B*57:01 and B*27:05 1,9,11,12,16,20,21,[23][24][25] which were associated with significantly lower pVL in both cohorts. Additionally, as previously described, A*30:02 14 Additional novel associations between Amerindian HLA alleles and HIV risk/protection. We next extended our analyses of our Mesoamerican cohorts to identify HLA alleles associated with an expanded panel of five HIV clinical parameters (pVL, CD4 count, Z-score, %CD4, and CD4/CD8 ratio). HLA alleles associated with each of the 5 clinical parameters, stratified by cohort, are listed in Table 2. Consistent with previous reports 12 , HLA-B alleles featured prominently among these associations, with 43 HLA-B alleles (24 in MEX/ CAM, 11 in MEX, and 8 in CAM) associated with at least one clinical parameter (p < 0.05, q < 0.2) compared to 29 HLA-A alleles (13 in MEX/CAM, 5 in MEX, and 11 in CAM) and 29 HLA-C alleles (14 in MEX/CAM, 9 in MEX, and 6 in CAM). As a novel way to quantify HLA associations with HIV clinical parameters, we instituted a scoring system that summed each HLA allele's total number of significant protective and risk associations (each assigned +1 and −1, respectively), such that final scores ranged from +5 to −5 (Fig. 4). Alleles with no significant associations were assigned a score of 0. As expected, a given HLA allele's overall association score correlated significantly in a dose-dependent manner with its associated median value for all 5 clinical parameters in both MEX/CAM and individual cohorts (Supplementary Figure S1); also, HLA subtypes associated with pVL (Fig. 3) generally tended to be associated with other clinical parameters (e.g. 22 of 28, 78.5% in MEX/CAM), though some exceptions were noted (Fig. 4).
To further quantify effects of HLA alleles on HIV parameters, we repeated univariable analyses using generalized linear models (Supplementary Table S8). HLA-HIV scores derived from the GLM analysis were highly concordant with those obtained using the Mann-Whitney U test (Spearman rho >0.9 in the combined and individual cohorts, in all cases p < 0.0001, Supplementary Figure S2). We next performed a multivariable analysis to correct for gender, age, geographic location of recruitment, as well as HLA alleles with the most significant associations for each HIV clinical parameter in the univariable analysis in both pooled and individual cohorts. Resulting regression coefficients and 95% confidence intervals are shown in Fig. 5 and Supplementary Table S9   Secondary analyses using only pVL and CD4 count. Although the use of the 5-parameter scoring system enhanced sensitivity to identify associations (see discussion), HLA/HIV association studies have traditionally used pVL and CD4 count only 5,11,17,20 . To facilitate direct comparison of our results to previous studies, results based on pVL and CD4 count only are provided in Supplementary Tables S8-S9. Correlation between the results of the 5-and 2-parameter scoring systems were robust for both univariable (all rho >0.93, in all cases p < 0.0001; Supplementary Figure S3, panels A-C) and multivariable analyses (all rho > 0.88, in all cases p < 0.0001; Supplementary Figure S3, panels D-F).

Regional variation in HLA-HIV associations between MEX and CAM cohorts. Finally, we wished
to investigate the extent to which HLA-HIV associations in MEX and CAM were universal versus region-specific.
Taking the union of all HLA alleles that achieved a non-zero score in univariable analyses in either the MEX and/or CAM cohorts, we assessed the degree to which their scores correlated between cohorts using Spearman's correlation. Overall, the relationship was statistically significant but the strength of the correlation was rather modest (Spearman rho = 0.334, p = 0.032, Fig. 6D Figure S4), which in turn influences power to identify individual HLA associations below a predefined significance threshold. Nevertheless, this observation suggests that, while many HLA-HIV associations are common to both regions, others may be region-specific.

Discussion
In the present study, the largest and most comprehensive of its kind undertaken to date in the unique and highly genetically admixed Latin American Mestizo population, we characterized HLA allele frequency distributions, haplotype structures and identified both canonical and novel associations between HLA alleles and HIV control.
To accomplish the latter, we implemented a novel scoring system based on five partially interrelated clinical parameters, an approach that allowed more nuanced classification of the consistency of associations observed. Indeed, despite the cross-sectional nature of our study and the fact that we investigated these associations during chronic HIV infection, we readily detected numerous canonical associations with HIV control (e.g.    Supplementary  Table S9 for specific HLA alleles adjusted for in each model). Coefficients and 95% confidence intervals (CI) of significant (p < 0.05, q < 0.2) associations are shown. Alleles are grouped by HLA-HIV score (+5 to −5), then ordered by the coefficient of Z-score, pVL, CD4 count, %CD4 and CD4/CD8 ratio. Protective and risk alleles are shaded with progressively deeper green and orange colors, respectively. The number (n = ) of individuals expressing each HLA allele is shown. Blue vertical lines denote a coefficient equal to zero.  [68][69][70][71][72][73] , and also with spondyloarthropathies in Japan 74 , observations which mirror established links between HLA alleles canonically associated with HIV protection (e.g. B*27:05) and autoimmune conditions [75][76][77] .
In contrast, the common Amerindian B*39:05 subtype (the 3rd most frequent HLA-B allele in MEX/CAM [AF = 0.057]) consistently ranked among the strongest risk alleles (score -5 MEX/CAM, -4 MEX and -4 CAM) even after multivariable correction (though it is important to note that this allele is in strong LD with C*07:02). Notably, B*39:05 was not identified as mounting strong immune pressures in Gag p24, but rather in Pol 40 (at codons 37, 134 and 296). Similarly, no significant Gag codons have been identified as being under strong immune A number of caveats and considerations merit mention. First, it is important to note the interrelated nature of the five clinical parameters evaluated (e.g. in the MEX/CAM cohort, Spearman rho between pVL and CD4, %CD4 and CD4/CD8 was -0.54 in all cases; Spearman rho between pVL and Z-score was −0.87; Spearman rho for CD4 and Z-score, %CD4 and CD4/CD8 was > 0.79 in all cases). Despite these relatively strong relationships however, pVL and CD4 are nevertheless well-established as independent predictors of HIV progression; as such it is customary to include both in HLA association studies (e.g. 17,19 ). Similarly, inclusion of all 5 variables in our scoring system afforded us increased sensitivity to detect both previously-described HLA-HIV associations (e.g. in MEX/CAM univariable: B*57:02 17 5,12,17,20 [n = 16, score -1]). Similarly, the observation that canonical protective associations including B*57:01, B*57:03 and B*27:05 all achieved the highest protective score (+5) also increases confidence in our identification of B*39:05 and B*35:12, which both achieved the lowest scores (−5), as novel risk alleles. Moreover, the observation that some HLA alleles showed associations with different HIV clinical parameters raises the intriguing possibility that certain HLA subtypes influence HIV progression through different mechanisms (though insufficient power to detect relatively weak associations, combined with the use of a predefined significance threshold, cannot be ruled out). As with all HLA association studies, strong linkage disequilibrium, particularly between HLA-B and HLA-C loci located in close proximity within the MHC's beta block gene group [79][80][81] makes it difficult to tease apart individual allele effects in some cases. Associations between certain HLA-C alleles and HIV clinical parameters may thus be partially or completely attributable to LD with HLA-B alleles (e.g. B*27:05 for C*02:02 and B*35:12 for C*04:01). Moreover, for three HLA-B~C combinations (B*14:02~C*08:02, B*39:05~C*07:02 and B*40:01~C*03:04), the allele responsible for the observed association could not be resolved due to strong LD. Similarly, we cannot rule out the possibility of additive and/or synergistic effects between HLA alleles. Finally, the risk of spurious associations is an ever-present concern in HLA association studies, particularly when reporting novel associations with relatively rare alleles (e.g. the protective B*15:39 or the detrimental A*01:02 alleles, both observed in 8 individuals only). Validation of these novel associations in other cohorts, along with elucidation of possible mechanisms, are therefore warranted. In particular, the evaluation of HIV-specific CTL responses in Latin American individuals expressing protective or risk HLA alleles should provide important mechanistic insights. Despite these caveats, our study further confirms that some HLA allele associations (e.g. B*27:05 and B*57:01) transcend the boundaries of race and HIV subtype, whereas others are likely to be particular to the unique immunogenetic background of the population under study.
Results of our study may also be relevant to the ultimate pursuit of effective HIV vaccines, whether these be prophylactic or therapeutic, global or geographically-tailored. In particular, detailed characterization (and continuous monitoring 82 ) of HLA associations with HIV clinical parameters in ethnically diverse human populations hardest hit by the HIV epidemic may help inform the design of CTL epitope-based vaccine constructs 83 , predict the relative population coverage of such constructs, and ultimately aid in the interpretation of results from future HIV vaccine trials in a population-specific context.