Introduction

Variation in susceptibility to HIV-1 infection depends on numerous factors, and host genetic variation has been well-described as an important component. Genetic variation in HIV-1 coreceptors and human leukocyte antigen genes are among the best-described host factors shown to be associated with resistance or susceptibility to HIV-1 infection (Anzala et al. 1998; Dunand et al. 1997; MacDonald et al. 2000). Little is known about the role of polymorphisms in immune regulatory genes in susceptibility to HIV-1 infection. Interferon regulatory factor 1 (IRF-1), the first-identified member of the IRF family, was originally defined as a transcription regulator of IFN-β (Fujita et al. 1989, 1988). Later studies showed that IRF-1 function was not restricted to the IFN pathway but, rather, by binding an interferon-stimulated response element (ISRE) present in the promoters of IFN-stimulated genes (ISG), IRF-1 could regulate multiple immune genes’ expression and is broadly involved in both innate and adaptive immunity, especially in the regulation of Th1/Th2 differentiation, which may affect resistance to HIV-1 infection (Kroger et al. 2002; Lohoff et al. 1997; Mamane et al. 1999; Taki et al. 1997; Taniguchi et al. 2001; Trivedi et al. 2001). The human IRF-1 gene, spanning 7.72 Kb with its 495 bp immediate upstream promoter sequence, has been assigned to 5q31.1 (Harada et al. 1994; Itoh et al. 1991). By conducting a search of the most recent assembly of human genome sequence data (2003: BLAT genome search, UCSC Genome Bioinformatics), IRF-1 and its promoter mapped to nt 131902681–131894935 and 131903176–131902682 on chromosome 5, respectively. They are located in the IL-4 or Th2 gene cluster, a group of critical immune genes that includes IL-3, IL-4, IL-5, IL-13, GM-CSF, and IRF-1.

Besides its importance in regulating antiviral immune responses, IRF-1 has also been shown to be able to directly regulate HIV-1 transcription and replication. A region downstream of the HIV-1 5′-LTR, spanning nt +200 to +217, was shown to be homologous to the ISRE and has been suggested to be important for HIV-1 transcription and replication (Battistini et al. 2002; Sgarbanti et al. 2002). IRF-1 was shown to be able to activate HIV-1 LTR transcription in the absence of Tat, a pivotal HIV-1 protein believed to be the key initiator and regulator of HIV-1 transcription and replication (Frankel and Young 1998).This suggests that IRF-1 plays a role in the establishment of primary HIV-1 infection, because it can be induced early upon viral entry prior to Tat. IRF-1 could also cooperate with Tat in amplifying HIV-1 gene transcription and replication, suggesting that IRF-1 is likely important in HIV-1/AIDS disease progression as well.

IRF-1 genetic polymorphism has been studied in a variety of disease contexts. Beside the well-studied different “GT” dinucleotides repeats in intron 7 of IRF-1, 31 single nucleotide polymorphisms (SNPs) in IRF-1 have been identified in human subjects or cell lines by other researchers (Database1; Database2; Donn et al. 2001; Nakao et al. 2001; Noguchi et al. 2000; Saito et al. 2001, 2002). Due to the potential role that IRF-1 may play in HIV-1 replication and its importance in the antiviral immune response, it is important to identify genetic polymorphisms in IRF-1 and determine if these mutations result in coding changes in expressed IRF-1 protein. Once identified, these polymorphisms could be examined for their relevance to susceptibility to HIV-1 infection.

Materials and methods

Subjects

Two hundred seventy-seven subjects were enrolled in this study. All individuals are from a well-characterized commercial female sex worker cohort established in Pumwani area of Nairobi, Kenya (Fowke et al. 2000, 1996). The population came from diverse regions of Kenyan and represents a variety of ethic groups. Eighteen healthy donors without African genetic background were utilized as controls. Studies involving these sampling cohorts had been approved by both the University of Manitoba and the University of Nairobi ethics review panels.

Extraction of DNA

Peripheral blood samples were obtained from the described individuals; DNA was extracted either from peripheral blood mononuclear cells (PBMCs) or from whole blood directly using Qiagen DNA extraction kit (Qiagen, Inc., Mississauga, ON, Canada) following the manufacturer’s instructions.

Complete gene sequencing of IRF-1 and its promoter

Comparative genomic sequencing for the IRF-1 gene was conducted on all DNA samples. In total, 17 segments were PCR amplified using primers designed to cover the entire IRF-1 gene and its promoter region. All primers were designed based on GenBank sequences for IRF-1 and its immediate upstream promoter region (accession numbers: L05072 and X53095) (Table 1). Following DNA extraction, PCR amplification was conducted as follows: the 50 μl PCR reaction mixture contained 60 mM Tris–HCl (pH 9.0), 1.5 mM MgCl2, 15 mM (NH4)2SO4, 100 μM dNTP mixture, 25 pmol of each primer; 1.25 U Taq DNA Polymerase (Invitrogen Life Technologies, Burlington, ON, Canada); and 100–200 ng of DNA template. PCR was performed on a GeneAmp PCR-System 9600 thermocycler (Applied Biosystems, Foster City, CA, USA) using the following program: (1) an initial denaturation step of 3 min at 94°C; (2) 35 cycles of 30 s at 94°C, 30 s at the indicated annealing temperature (Tm) (Table 1), and 2 min elongation at 72°C followed by a final extension step of 10 min at 72°C. All PCR amplicons were purified using Millpore’s Amicon Microcon-PCR Centrifugal filter devices (Millipore Bedford, MA, USA) prior to sequencing. Sequencing was conducted using ABI PRISM BigDye Terminator Version 3.0 Cycle Sequencing system (Applied Biosystems). Each reaction contained 2 μl of Big Dye (Version 3 or 3.1), 1.5 μl of 10 mM primer (forward or reverse), and 2 μl of purified PCR amplicon templates from above. Sequencing cycling reactions were as follows: (1) an initial denaturation step of 3 min at 96°C; (2) 80 reaction cycles each consisting of 30 s at 96°C, 30 s at the specified Tm (Table 1), and 4 min at 60°C. All PCR products were ethanol precipitated and then reconstituted with 20 μl formamide (Applied Biosystems) and resolved using ABI Prism 3100 Genetic Analyzer (Hitachi, Japan). SNPs in a select 16 individuals were reconfirmed using Expand High Fidelity Plus PCR system (Roche Diagnostics GmbH, Mannheim, Germany) to eliminate the possibility of errors introduced due to imperfect proofreading in PCR amplification.

Table 1 Oligonucleotide primers used for IRF-1 sequencing. Tm annealing temperature

Sequence resolution and polymorphism identification

Sequence data were resolved with Sequencher (Version 4.0.5, Gene Codes Corporation, USA), and gene variations were identified by alignment with published GenBank sequences (accession numbers: L05072 or X53095). All polymorphisms were confirmed by sequencing in both directions. Genotype and allele frequencies in the sample population were then calculated. All SNPs were tested for Hardy–Weinberg equilibrium. Pairwise linkage disequilibrium (LD) analysis between SNPs was conducted using PyPop (Lancaster et al. 2003).

Results and discussion

The Kenyan population displayed extensive diversity throughout the IRF-1 locus. Fifty-three SNPs in IRF-1 were identified in both the IRF-1 gene and its promoter region (Table 2, Fig. 1) (Cha et al. 1992). The genotype and allelic frequencies of each SNP were calculated and depicted in Table 2. Twenty-seven of these SNPs have been previously reported either in publications or by the International Human Genome Sequencing Consortium through the NCBI-based SNP database (Database1; Database2; Donn et al. 2001; Nakao et al. 2001; Noguchi et al. 2000; Saito et al. 2001, 2002). A number of previously reported SNPs were not observed in the subjects from this study, including 5551T/G, 4950A/G, 5558T/C, 5636G/A, 7662C/– and 6355G/A, demonstrating the high degree of IRF-1 genetic polymorphism in different ethnic populations. Beside the 26 novel SNPs, two novel insertion mutations were identified in the second intron of IRF-1 [“CA” insertion between nt 2592 and 2593 in L05072 or nt 131900079 and 131900078 in chromosome 5 (TG→TCAG)] and in the 3′ UTR [an “A or G” insertion between nt 7607 and 7608 or nt 131895046–131895044 in chromosome 5 (AG→AA/GG)] with genotype frequencies at 16.33% and 93.33%, respectively. A 16 bp deletion was identified in intron 7 (Table 3) in 28.30% of subjects tested. This mutation is completely linked with the “T” allele of SNP at 4816. Remarkably, this deletion is located quite close to the “GT” dinucleotides repeats (4907→28 in L05072), a putative site for the formation of Z-DNA (Cha et al. 1992). Taken together, these findings indicate a high degree of polymorphism in this particular population and suggest that there may be distinct IRF-1 genetic variations within the local Kenyan population.

Table 2 Polymorphisms in the IRF-1 gene and its promoter region detected in a Kenyan population
Fig. 1
figure 1

IRF-1 genomic structure, and distribution of polymorphic sites. Nucleotide numbers indicating loci are according to GenBank sequence for IRF-1 (accession number L05072) and correspond to nt 131902681–131894935 of human chromosome 5. The human IRF-1 gene, spanning 7.72 Kb, and its immediate upstream promoter sequence have been mapped to chromosome 5q31.1. It is composed of 10 exons (exon 1 is untranslated), nine introns, and one 3′ UTR. Forty-two SNPs were distributed in the untranslated exon I, introns 1, 2, 3, 6, 7, 8, 9, and in the 3′ UTR. Two insertion polymorphism were identified in intron 2 and 3′ UTR. One 16 bp deletion polymorphism was discovered just before the dinucleotide “GT” repeat in intron 7. Two silent mutations were also identified in exon VII (4396 A/G and 4420C/T)

Table 3 Deletion polymorphism identified in intron 7 of the IRF-1 gene

All of the above variations, except for two silent mutations at 4396A/G and 4420C/T in exon 7, were located in noncoding regions (Fig. 1) (Cha et al. 1992). Of the 53 SNPs identified, 11 were located in the promoter region (−415A/C, −410G/A, −388T/C, −386C/T, −300G/A, −298G/A, −281G/A, −280G/C, −203G/A, −90G/A, and −65G/A), four in Exon 1 (5′ untranslated region (UTR)) [(53C/G, 142C/T, 154C/T, 197G/A)], and seven in the 3′ UTR (6936C/T, 7175C/T, 7238C/T, 7303T/C, 7311G/A, 7447C/A, 7489C/T). Another reported silent mutation, 6355G/A, was not found in any subjects (Noguchi et al. 2000). The other variations were distributed throughout introns 1, 2, 3, and 6–9. Except for the SNP at position 4816 (P=0.0025), all SNPs tested were found to be in Hardy–Weinberg equilibrium. The 4816 SNP is also associated with the 16 bp deletion polymorphism in intron 7, suggesting this region may be undergoing some selective pressure, or there is a bias in the population under investigation in this study.

Polymorphisms in the promoter region may affect the transcriptional activation of IRF-1 and its subsequent expression. Supporting evidence comes from a study that demonstrated that a particular promoter polymorphism (SNP at −300G/A) was shown to be correlated with cellular immunity to HIV infection (Saito et al. 2002). Our data demonstrates that promoter polymorphism is common in the Kenyan IRF-1 gene sequences. We identified 11 promoter polymorphisms in the target population, seven of which are novel. Three of them are located in putative Sp1 binding sites (Fig. 2). The functional significance of these variations in IRF-1 promoter remains unknown. Introns and other noncoding regions in the genome have often been regarded as “junk DNA” with little or no apparent function. But significant evidence is arising that the noncoding regions have significant functions in gene expression, and the “junk” is no longer considered “junk” (Herbert 1996; Mattick 1994; Moore 1996). Genetic variations in noncoding regions of cytokine genes have been correlated with multiple disease conditions (Haukim et al. 2002). A SNP at 7311 in the IRF-1, 3′ UTR is correlated with occurrence of juvenile idiopathic arthritis (Donn et al. 2001). Whole allele distributions of “GT” repeats in intron 7 of IRF-1 have also been shown to be associated with onset time of childhood atopic asthma (Nakao et al. 2001). Further study on the epidemiological and functional consequences of these IRF-1 mutations on the HIV-1 epidemic in Kenya is underway.

Fig. 2
figure 2

IRF-1 promoter sequence and distribution of polymorphic sites. * Nucleotide numbers indicating loci are according to GenBank sequence for IRF-1 promoter (accession number X53095) and correspond to nt 131903176–131902682 of human chromosome 5. The immediate upstream region of the IRF-1 promoter is 495 bp in length and mapped to chromosome 5q31.1. The putative recognition sites for known transcription factors are indicated; GAS: IFN-γ-activated sequence. Eleven SNPs marked with * dispersed in the IRF-1 promoter region. SNPs at −386, −203 and −65 were located within three Sp1 binding sites, respectively. No SNPs were related with GAS and NF-κB recognition. This figure was adapted from Harada et al. (1994)

LD between SNPs was observed throughout the IRF-1 gene. Pairwise LD analysis revealed that LD was extensive in the IRF-1 gene and was observed to some degree to be present between most SNPs (Tables 4, 5). LD exists even between SNPs separated by long intervals and distributed in different functional regions of the IRF-1 gene. As shown in Table 4, complete linkage was observed among SNPs at positions 4227, 4318, 4379, 4396, 5203, 5261, 6460, 6467, and 7311 (“4227 cluster”). Partial linkage disequilibrium is also observed between the “GT” dinucleotides repeats in intron 7, the “4227 cluster,” and other identified SNPs (data not shown). This suggests that haplotype analysis may be necessary for full evaluation of the effect of IRF-1 gene variation in its functional context or in association studies with disease conditions. We were unable to confirm a previous report that nearly complete LD existed between SNPs at −300 and 4396 (Noguchi et al. 2000).

Table 4 Complete linkage between 4227 and other SNPs within the IRF-1 gene. This linkage was observed in all tested subjects
Table 5 Pairwise LD estimates for all SNPs with minor allele frequency higher than 0.1. Pairwise LD analysis were conducted for all identified SNPs with minor allele frequencies higher than 0.1 by using PyPop. The results were shown based on P-value. The “4227 cluster” was represented by the 4227 SNP itself. Nucleotide numbers indicating loci were numerated according to GenBank sequence for IRF-1 (accession number L05072)

Also remarkably, 35 consistent discrepancies were identified between the sequence data we obtained and GenBank sequences for IRF-1 and its promoter. The consistent discrepancies were defined here as the same “variation” found in all subjects but different from the GenBank submissions. All these 35 discrepancies are depicted in Table 6. These discrepancies were reconfirmed in a control population of 18 male/female individuals with different genetic backgrounds, including Asian, European, and North American, suggesting these discrepancies are consistent among ethic groups tested. We also noted differences between the two GenBank sequences (X53095 and L05072) in the overlapping region (496–669 of X53095 and 1–173 of L05072), suggesting that these discrepancies might be caused by sequence inaccuracy in the available IRF-1 sequence data. This study will provide more complete data on sequence of the human IRF-1 gene.

Table 6 Consistent discrepancies between standard and detected sequences