Introduction

Individuals exhibit variation in their response, including potential disease outcome, upon exposure to infectious pathogens. This is likely in part because of underlying host genetic differences that influence the ability of pathogens to attach/invade and replicate on/inside the human host (or regulate behavior that alters exposure). There is evidence that host genetic factors influence antibody responses to both vaccines (eg varicella and rubella)1, 2 and to naturally occurring infections such as malaria, Chagas, and Epstein–Barr virus.3, 4, 5 We previously demonstrated that differences in antibody titer, which reflect infection history, are significantly heritable for a number of common infectious pathogens.6 Here we attempt to localize the genetic factors influencing serological phenotypes of these pathogens using genome-wide statistical gene-mapping approaches. We had selected 12 pathogens for examination because of reports linking them to risk for cardiovascular disease. These include the following: two bacterial pathogens: Chlamydia pneumoniae and Helicobacter pylori; one protozoan: Toxoplasma gondii; six herpes viruses: cytomegalovirus (CMV), herpes simplex type 1 virus (HSV-1), herpes simplex type 2 virus (HSV-2), human herpesvirus 6 (HHV-6), human herpesvirus 8 (HHV-8), and varicella zoster virus (VZV); and three other viruses: hepatitis A virus (HAV), influenza A virus, and influenza B virus. We also examined three measures of pathogen burden (ie, the number of pathogens to which an individual has been exposed) – all herpes viruses, all viruses, and all pathogens – in order to determine whether there were underlying genetic variants shared among these traits. To our knowledge, published genome-wide investigations are not currently available for most of the infectious antibody traits examined here, with the exception of anti-CMV antibodies.7

Materials and methods

Sample collection

Participants consisted of 1932 members of extended Mexican–American families in the San Antonio Family Study (SAFS), including 771 men and 1161 women, most of whom were initially recruited during the years 1991–1995. On average, each phenotyped individual had 40 other phenotyped family members in this study, with up to six generations represented and the largest pedigree consisting of 133 phenotyped members. Participant ages ranged from 16 to 94 years, and the average age was 41 years. The SAFS combines the San Antonio Family Heart Study (SAFHS)8 and the San Antonio Family Diabetes Gallbladder Study (SAFDGS).9, 10 The combined study is now focused on the identification of genetic risk factors for many common diseases and their quantitative correlates, beyond the initial respective foci on cardiovascular disease and diabetes. Individuals in the SAFHS were randomly ascertained (without regard for disease status), whereas SAFDGS participants were initially recruited on the basis of a single diabetic proband in each pedigree. Although the SAFDGS is weakly enriched for diabetes, the rate of diabetes (and other major diseases including heart disease and obesity) is similar in the two component studies, indicating that the overall study may be viewed (and analyzed) as a randomly ascertained sample. The study and protocols were approved by the Institutional Review Board of the University of Texas Health Science Center at San Antonio, and all study participants signed informed consent statements.

Serology

Fasting blood samples were collected in EDTA vacutainers from study participants during the initial recruitment (1991–1995). Plasma was aliquoted and frozen as previously described,11 and stored at −80 °C. Samples were thawed just before their use in determining IgG antibody levels, using commercially available ELISA assays for the following infectious agents: C. pneumoniae (Bioclone Australia, Marrickville, NSW, Australia); H. pylori and CMV (Inverness Medical Professional Diagnostics, Palatine, IL, USA); T . gondii, VZV, and influenza A and B (IBL America, Minneapolis, MN, USA); HSV-1 and HSV-2 (Focus Diagnostics Inc, Philadelphia, PA, USA); HAV (Bio-Rad Laboratories, Redmond, WA, USA); and HHV-6 (Advanced Biotechnologies, Rockville, MD, USA). These antibody titer data, along with information on age and sex, are available in Supplementary Table 1.

Statistical genetic analysis

The quantitative IgG antibody level traits were analyzed for each pathogen. Statistical analyses were performed using a variance components (VC) method with the SOLAR software computer package (Texas Biomedical Research Institute, San Antonio, TX, USA).12 Because of the sensitivity of VC analyses to extreme values (outliers or ‘thick tails’ generating high kurtosis), the quantitative antibody level traits (optical density values) were inverse-normalized (by rank) before analysis. In addition, the sum of seropositive reactions to the pathogens examined here was used as a measure of pathogen burden for (1) all herpes viruses, (2) all viruses, and (3) all pathogens. We have previously published gene-mapping results on antibodies directed against herpes virus Epstein–Barr virus (EBV),13 and these results are not repeated in this paper. However, the EBV antibodies were included when calculating the pathogen burden trait. The influence of shared environmental factors was modeled using a ‘household’ variance component to account for shared residency at the time of the blood draw.14

Given that the SAFS contains extended families, several genome-wide analyses (including linkage and association) were performed using SOLAR12 in order to take full advantage of the information available for this sample. To identify genomic regions containing variants that may influence the antibody traits, genome-wide linkage analysis was run based on multipoint identity-by-descent estimations that were calculated with LOKI15, 16 using 28,388 SNPs with very low linkage disequilibrium between them. In addition, genome-wide joint linkage and association analysis was performed using 944,565 SNPs. This was done by including a random-effects linkage component (a QTL effect) and a fixed-effect allelic component (an additive allele dosage effect). SNP genotypes were generated using several versions of Illumina’s SNP genotyping BeadChip microarrays (HumanHap550v3, HumanExon510Sv1, Human1Mv1, and Human1M-Duov3 (Illumina, Inc., San Diego, CA, USA)), and before analysis underwent rigorous quality control measures as described previously.13 A VC-based random-effects linkage model was used for the linkage component, and the association component was implemented as an additive measured genotype model. For our sample, we used the effective number of SNPs (calculated from linkage disequilibrium) to estimate that P≤1.3 × 10−7 is the genome-wide significance threshold corresponding to α=0.05. To account for population substructure, which may produce spurious association results,17 we ran principal components analysis using R princomp (R development team) on the SNPs to identify differential ethnic contributions of ancestral populations to study participants. This was done in the founders of pedigrees only, and the offspring were then assigned the average values of their parents for each principal component in order to avoid the risk that principal components may tag potentially important genetic differences between pedigrees per se. The top five principal components were included in all statistical analyses as additional covariates. Variant locations were based on the hg19 reference sequence.

Results

Antibody levels and heritability estimates

We measured IgG antibody levels against 12 infectious pathogens in Mexican–American families from San Antonio, TX, USA. These pathogens have previously been suggested to have a role in cardiovascular disease risk. The seroprevalence rates for these pathogens ranged from 10% for T. gondii to 92% for VZV (Table 1), and are described in greater detail in the study by Rubicz et al.18 Heritability estimates were significant for all pathogens, ranging from 0.08 for HSV-2 to 0.37 for HHV-8 (Table 1 and Rubicz et al.).6 Three measures of pathogen burden (ie, the number of seropositive reactions to the investigated pathogens) were generated – namely, for all herpes viruses, all viruses, and all pathogens. These three burden traits were also significantly heritable, with heritability estimates between 0.16 and 0.24.

Table 1 Estimated rates of seroprevalence and heritability for pathogens examined in this study

Genome-wide linkage results

Genome-wide linkage analysis, using identity-by-descent allele sharing probabilities computed from >28 000 SNPs in low linkage disequilibrium, was conducted in order to locate any major loci that might be influencing the infectious pathogen IgG antibody traits. The genome-wide linkage plots are provided in Supplementary Figure 1. Significant LOD scores (LOD score≥3.0) were obtained for HHV-6 on chromosome 7q36.3 (LOD 3.62), for HHV-8 at 6q14.1 (3.80) located outside the human leukocyte antigen region, and for HAV at 13q34 (3.03) (Table 2), but not for the other pathogens investigated. Additional suggestive LOD scores (≥2.0) were obtained for eight pathogens and for all three measures of pathogen burden.

Table 2 Significant (LOD≥3.0) and suggestive (LOD≥2.0) multipoint linkage peaks for IgG antibody level traits

Genome-wide joint linkage and association results

We next performed genome-wide joint linkage and association analysis using nearly 1 million SNPs in order to potentially increase our power (in particular for common variants) and refine our search for responsible loci (Manhattan-style genome-wide association plots are given in Supplementary Figure 2). Note that the joint linkage and association test involves two degrees-of-freedom (the linkage variance component and the measured genotype covariate), and hence, although these analyses use more available information, it is not guaranteed that joint linkage and association analysis actually increases power. For our sample, consisting of large multigenerational families, we estimated P≤1.3 × 10−7 to be the genome-wide significance threshold (at α=0.05) (see Methods). Genome-wide significant results (P=5.34 × 10−8) were obtained for C. pneumoniae for SNP rs4812712 (Chr.20.hg19:g.42104939 A>C) on chromosome 20 (Table 3). A quantile–quantile plot of observed versus expected P-values for genome-wide association results for C. pneumoniae is provided in Supplementary Figure 3. The genomic inflation factor19 for this trait (λ=1.016) does not deviate greatly from 1, indicating no obvious inflation of significance levels. No other pathogen yielded genome-wide significant results. There are suggestive results (at P≤1.3 × 10−6) for C. pneumoniae on chromosome 11, for CMV on chromosome 14, for HHV-8 on chromosome 6, for influenza A on chromosomes 15 and 19, and for all herpes viruses on chromosome 11.

Table 3 Significant and suggestive results of genome-wide joint linkage and association analysis for IgG antibody measurements against infectious pathogens

Discussion

The heterogeneity observed among individuals and populations in infection status to various pathogens and subsequent disease progression may be influenced by social and environmental factors including population density, hygiene, nutritional status, and stress, as well as the genetic composition of both the pathogen and the host. Here we conducted a genome-wide investigation into host genetic variation influencing IgG antibody response to 12 common infectious pathogens in extended Mexican-American families. Most of the pathogens examined in this study are common, and many of them are thought to be transmitted relatively easily, primarily through person-to-person contact or through respiratory secretions. Individuals are likely to be exposed to many of these pathogens multiple times throughout their lifetime. Exceptions to these are HSV-2, which is sexually transmitted, and T. gondii, which is contracted mainly through eating undercooked meat or contact with cat feces. This is likely part of the explanation for the fairly low seroprevalences of these two pathogens (24% and 10%, respectively), which are the lowest among the examined infectious agents. If a given individual indeed is likely to have been exposed to most of these microbes, then the absence of antibodies against these pathogens should also be informative about host genetics (ie, an individual was likely exposed, but either did not produce antibodies or produced only low levels of antibodies). We therefore included both seropositive and seronegative individuals in this investigation. Our contention that seronegative individuals do provide host genetic information is supported by our previous demonstration that these antibody level traits are substantially heritable when one includes seronegative individuals in the analysis, with the smallest heritability estimate obtained for the sexually transmitted HSV-2 (in which case seronegative individuals may well have not been exposed at all and thus would not provide any information on host genetics).6 It should be noted that antibody level does not necessarily correlate with resistance against a particular pathogen. Furthermore, antibody levels are likely also influenced by the timing (how long since last exposure) and/or frequency of pathogen exposure, dose of pathogen, and efficiency of the immune system in dealing with the pathogen. Such factors are expected to add noise to antibody levels, which – similar to the manifold complexities underlying most complex traits – would reduce power to detect genetic loci influencing these traits and require large sample sizes to achieve genome-wide significance. This is in agreement with our observation here that it is difficult, but not impossible, to identify genomic regions harboring genetic variants influencing these types of serological traits. The IgG antibody traits examined here are naturally occurring for the most part, as vaccines were not available against the majority of these pathogens at the time of blood sample collection (1991–1995) or, when available, were not in common use in the study population. For transcripts significantly correlated with anti-C. pneumoniae antibody level and other antibody level traits, please see Supplementary Appendix 1

In our genome-wide analyses, our top association finding in this study was for the C. pneumoniae antibody level trait with SNP rs4812712:A>C on chromosome 20. It should be noted that there is no evidence that IgG antibodies against C. pneumonia confer resistance to infection and/or disease. The nearest genes to rs4812712:A>C are SRSF6 and L3MBTL1. The SRSF6 gene encodes a protein that is involved in mRNA splicing and its overexpression is suggested to contribute to the development of lung and colon cancers.20 C. pneumoniae seropositivity may be a risk factor for lung cancer, with higher anti-C. pneumoniae antibody titer correlated with cancer risk.21 Although the mechanism by which infection with C. pneumoniae may increase the risk of lung cancer is not known, one possible explanation is that it may induce irregular apoptosis in tissues.22 The human SRSF6 protein has also been linked to infection with HIV-1, reportedly regulating HIV-1 mRNA processing and possibly being involved in nuclear export of spliced mRNAs.23, 24 L3MBTL1, on the other hand, is a tumor suppressor gene that appears to suppress genes and microRNAs that re-activate during tumor growth and whose expression is associated with decreased risk of death due to breast cancer.25

We found suggestive evidence of association for anti-CMV antibody level on chromosome 14, with our top SNP located nearest to the pseudogene LOC728667. However, the nearby DHRS4 gene may be of interest in this trait as it is involved in retinol metabolism. Retinol is known to influence both immune function and ocular health, and infection with CMV has a role in the development of certain ocular diseases, including retinitis.26 We compared our association results with those of Kuparinen et al.,7 who also conducted genome-wide association analysis for the anti-CMV IgG antibody level. Although they did not find evidence for a major genetic locus, they did identify AGBL1 as potentially influencing this trait, based on suggestive evidence of association of five SNPs in this gene. The AGBL1 gene may influence infection with CMV by means of its potential role in processing tubulin, given that CMV capsids are reported to utilize the microtubule network in nuclear targeting of epithelial cells,27 or alternatively by evading lysosomal fusion through microtubule network disaggregation during macrophage replication, thus allowing the virus to persist in these cells.28 Our results did not, however, provide further supporting evidence, possibly in part owing to population-level differences in our sample (which consisted of Mexican Americans, versus Finnish individuals in their study).

We investigated whether it is possible to localize genetic factors influencing pathogen burden traits, including all herpes viruses, all viruses, and all pathogens. These traits were of interest for the study because the number of seropositive reactions against infectious pathogens has previously been identified as a risk factor for chronic disease including atherosclerosis.29, 30, 31 Although no major loci were significantly associated with any of these pathogen burden measures, there was suggestive evidence for association and/or linkage on chromosomes 8 and 11, including a genetic variant in the oncogene MYEOV for all herpes viruses (various herpes viruses are shown to contribute to the development of cancers, eg, EBV to Burkitt's lymphoma and Hodgkin's lymphoma).32, 33

In summary, this study identified a variant on chromosome 20 that is significantly associated with anti-C. pneumoniae IgG. However, no other antibody level traits obtained genome-wide significance levels of association. This may be in part due to the heterogeneity of factors that contribute to variation in antibody levels, including differences in exposure, its frequency, and its timing relative to blood draw, which would reduce power to detect underlying genetic loci influencing these traits. Further investigation may offer additional insight into the genetic factors and biological pathways involved in antibody titer levels, and possibly the relationship of antibody levels to resistance or susceptibility to infection and disease progression, potentially identifying novel methods of prevention and/or treatment. We hope that our data will be useful for larger-scale meta-analytical association analyses on many of these antibody traits.