Introduction

Genetic investigation of inherited diseases has matured from the mapping of Huntington's disease in the early 1980s to a future that will potentially provide information pertaining to many thousands of genetic variants underlying both simple and complex diseases. There is no doubt that central to the success of this genomic era has been the coordinated efforts which have resulted in, first, the complete sequencing of the human genome and its public availability to researchers globally and, second, development of high-throughput technology platforms amenable to dense marker maps in large study populations. However, population choice is of primary concern in any gene-mapping study. Genetic isolates provide a potentially powerful sample population for disease gene mapping of complex multifactorial traits, due to the combined effects of geographical isolation, limited variation in environmental influences and purported higher levels of linkage disequilibrium (LD). Additionally, such populations generally arise from a small number of founding members, possibly from quite diverse cultural backgrounds, therefore introducing genetic admixture affects. There are many population isolate studies being investigated—the study populations themselves as diverse as the genetic disorders they are being used to research. European researchers have focused their efforts on historically and culturally distinct populations from Scandinavia including Finland (Peltonen et al., 1999; Varilo et al., 2000, 2003; Wessman et al., 2002) and Iceland (Gulcher and Stefansson, 1998; Helgason et al., 2003), Mediterranean regions of Italy, especially Sardinia (Eaves et al., 2000; Zavattari et al., 2000; Angius et al., 2001, 2002a, 2002b; Pugliatti et al., 2003; Falchi et al., 2004; Tenesa et al., 2004) and Corsica, France (Latini et al., 2004). North American isolates have mainly been concerned with large extended pedigrees from, for example, Hutterites (Abney et al., 2000, 2001, 2002; Ober et al., 2001; Newman et al., 2003, 2004; Weiss et al., 2006), while several remote Polynesian populations are presently involved in disease gene-mapping studies (Redd et al., 1995; Murray-McIntosh et al., 1998; Shmulewitz et al., 2001; Han et al., 2002; Kayser et al., 2003; Wijsman et al., 2003; Tsai et al., 2004; Bonnen et al., 2006). These studies indicate that the population isolate approach to identifying disease loci presents an important framework as a means of identifying genes involved in complex multifactorial diseases.

This study focused on the genetic isolate of Norfolk Island. The Island of Norfolk is situated 1700 km northeast of Sydney, on the Norfolk Ridge, which runs from New Zealand to New Caledonia. It was initially a penal colony of the British Empire until the last convict settlers were transported to Tasmania in the 1850s (Hoare, 1999). At this time, the descendants of the Bounty mutineers and Tahitian women who were previously inhabitants of Pitcairn Island relocated to Norfolk, with a total population 194 (40 men, 47 women, 54 boys and 53 girls). This small population originated from nine paternal (‘Bounty’ mutineers) and twelve maternal (Tahitian) lineages, although, due to the violent dynamics of the colony, only one of the Bounty mutineers (John Adams) survived to inhabit Norfolk (Edgecombe, 1999; Hoare, 1999). Interestingly, pedigree-based searches undertaken in the genealogical program Brother's Keeper (Version 6.0, Rockford, MI, USA) revealed that 80% of the male participants included in the present Xq13.3 LD study are directly related to John Adams, four of these individuals possess an unbroken patrilineage to Adams.

Norfolk's history is particularly well documented especially since anthropologists from the Island have maintained an exhaustive genealogical history in the form of a single large family pedigree composed of 6500 individuals who have contributed to the present-day population. Of the 1200 current permanent residents, up to 80% can trace their heritage back to the Island's initial founders. In addition, severe immigration and quarantine legislation restrict new founders from migrating to Norfolk. This, together with its isolation from other populations, makes Norfolk a potentially valuable resource for the mapping of genes involved in the pathogenesis of complex, yet common disorders such as hypertension, diabetes and obesity, which are known to be prevalent in the Polynesian admixture populations of the South Pacific (Abbott et al., 2001). Norfolk Island is an isolated community with a strict quarantine and an unusual health care system. Inhabitants are not covered by Australian or New Zealand health systems, consequently health is administered by the Norfolk Government. Such health care is currently directed toward emergency situations with minimal public health or preventive care. Considering that to date, there has been virtually no public health screening of the Norfolk Island community, our recent health study conducted on Norfolk Island highlighted the severity and extent of cardiovascular disease (CVD) risk factors within this isolated population (Bellis et al., 2005). Extensive analysis involving heritability estimates and power calculations based on the complex Norfolk Island pedigree suggested that this population possesses unique characteristics which could aid in facilitating identification of genes involved with complex multifactorial diseases such as CVD.

Materials and methods

Population samples

Collection and initial analysis of the Norfolk population has previously been described in detail (Bellis et al., 2005). Briefly, recruitment of individuals from Norfolk Island over the age of 18 was made possible through local media announcements via radio and newspaper. Interested participants were included in the study after providing a signed informed consent statement. Ethical clearance for the health study was granted by the Griffith University Human Research Ethics Committee prior to collections of samples or phenotyping of participants.

Laboratory procedures

Genomic DNA was extracted from whole blood using a standard salting out method (Miller et al., 1988).

X chromosomal DNA analysis

Levels of LD in the Xq13.3 region were assessed in a study group that initially included 86 male samples. Since the Norfolk Island community possesses the unique characteristic of belonging to a single large complex family, the number of unrelated individuals available for LD analysis was limited. This was exemplified when the calculations concerning relatedness within the Norfolk Island population determined the mean inbreeding coefficient to be 0.0044. The maximum individual inbreeding coefficient observed was 0.0684, which is equivalent to an offspring of first cousins. The pair-wise coefficient of relationship values were interesting since, among the related pairs, most are less than third degree relatives (φ2=0.125) (Bellis et al., 2005). However, to avoid upwardly biasing LD results, first and second degree relatives were identified and subsequently removed from the analysis, resulting in a final sample size of 56 men, which is comparable to other population sizes for LD estimates (Angius et al., 2001, 2002a; Marroni et al., 2006). Furthermore, aims of this investigation were to study the descendents of the initial European men and their Tahitian wives and to accurately capture LD patterns within the complex Norfolk Island pedigree. Pedigree analysis indicated that a total of 113 individuals were unrelated (first and second degree relatives were excluded), including 59 men. Results are based on 56 men from this group.

To evaluate the extent of LD within the Norfolk isolate we chose six microsatellite markers (DXS983, DXS8092, DXS8082, DXS1225, DXS8037 and DXS986) located in the Xq13.3 region which has been regularly studied for LD estimation in genetic isolates, including in the Finnish, Saami, Sardinina and Brazillian populations (Laan and Paabo, 1997; Kaessmann et al., 1999; Zavattari et al., 2000; Angius et al., 2001, 2002a; Pereira and Pena, 2006). Primer sequences were obtained from the Genome Database. PCR cycling conditions were explained elsewhere (Angius et al., 2001). PCR product sizes were assessed via ABI Prism 310 DNA Analyzer (PE Biosystems) and data were processed using GeneScan v3.1 and Genotyper v2.5 software.

For the six microsatellite markers, the normalized disequilibrium, D′, was calculated by using a multiallelic extension of the Lewontin standardized measure of disequilibrium (Lewontin, 1988) between the various marker loci pairs. Pair-wise LD was evaluated using a test analogous to Fisher's exact test, but extended to a contingency table of arbitrary size as implemented in the software Arlequin (Schneider et al., 2000). Disequilibrium across each locus was plotted by the GOLD program (Abecasis and Cookson, 2000).

To minimize the risk of type 1 error associated with the multiple pair-wise LD testing, raw P-values were corrected through multiple comparison procedures, namely the Holm–Sidak step-down procedure. The adjusted P-value is derived as follows:

where m is the number of P-values greater than or equal to that being corrected, P is the raw value resulting from the statistical procedure used to test the hypothesis and P′s is the Holm–Sidak corrected P-value. This is believed to be a more elegant approach than the Bonferroni-based procedure, which can produce corrected values of P that exceed 1 (Ludbrook, 1998).

Expected gene diversity values and their corresponding sample variance were estimated for six short tandem repeat (STR) markers on Xq13.3 in the Norfolk Island population. These data were compared to Sardinian isolates from Ogliastra and Talana (Angius et al., 2002a). Allele frequencies were determined by gene counting. The genetic diversity value is equivalent to heterozygosity for diploid data and is defined as the probability that two randomly chosen alleles are different in the sample. An unbiased estimate of gene diversity (ĥ) was calculated by Genetic Data Analysis (Lewis and Zaykin, 2002) as follows:

Since both the number of alleles sampled and their evenness define diversity, it should be clear that large values of ĥ represent very diverse samples. The sampling variance of this measure, V(ĥ), was formulated by Nei (1987) and is given by the following formula:

where n is the number of gene copies, k is the number of alleles, and pi is the frequency of the i-th allele of the locus considered (Nei and Roychoudhury, 1974; Nei, 1987). The average gene diversity, avg(ĥ), is estimated by sampling r loci from the genome. Namely,

where r is the number of sampled loci and nl is the sample size (gene copies sampled) per locus.

The sampling variance of this estimator is given by:

This sampling variance V(avg(ĥ)), is composed of intralocus variance, V(h), and interlocus variance Vs(h), such that:

Generally, interlocus variance is much larger than intralocus variance (Nei, 1987).

Estimates of genetic diversity and its variance were calculated by formula described above. Independent samples t-test was used to determine whether the difference was significant for the following comparisons, first, between Norfolk Island and Talana, and second between Norfolk Island and Ogliastra.

NOS2A genotyping and LD analysis

This study aimed to investigate three polymorphic sites within the NOS2A gene located on chromosome 17q11.2–q12. The three markers were single nucleotide polymorphisms (SNPs) and were investigated in a subset of the Norfolk Island sample population (n=227). The PCR primer sequences for three SNPs are presented in Table 1. The following simplex reaction was used to amplify each of the SNPs; 1.75 mmol l−1 MgCl2, 1 unit of Taq polymerase, 200 μmol l−1 deoxyribonucleotide triphosphates, 1 × standard PCR buffer, 0.2 μmol l−1 each of forward and reverse primer, 40 ng of genomic DNA, made up to a final volume of 25 μl with sterile distilled water. Thermal cycling conditions were 1 cycle at 94 °C for 4 min, 35 cycles of 94 °C for 1 min, 60°C for 1 min and 1 cycle of 72°C for 2 min.

Table 1 Primer sequence data for NOS2A SNPs (−1026G/T, −1659C/T and −2447C/G) including expected restriction digest fragment sizes

LD between the three SNPs (−1026/−1659/−2447) was calculated by the formula:

where p11 is the observed frequency of the haplotype and p1 and q1 are the individual allele frequencies. D′ is the normalized value of D and has a value between +1 and −1. A value of 0 therefore represents that the markers are in linkage equilibrium, whereas +1 indicates that LD is at its theoretical maximum and a value of <0 that the rare alleles are in the repulsion phase. Statistical significance of D′ was calculated using a χ2 statistic with one degree of freedom.

where,

and N refers to the number of chromosomes observed (Hartl and Clark, 1997).

Results

LD analysis of Xq13.3 STR markers

A total of six microsatellite markers were used to assess the levels of LD present in the genetically isolated population of Norfolk Island. The specific markers spanned a region of up to 11.5 Mb. Table 2 illustrates the extent and strength of LD over this region in a sample of male Norfolk Islanders. With the exception of two marker pairs (one of which is the most distantly spaced markers, DXS983 and DXS98611.5 Mb), it is evident that all of the marker pairs are in significant LD (P<0.05) both before and following correction.

Table 2 Pair-wise LD results between Xq13.3 markers

Average gene diversity estimates for this subset of Norfolk Island individuals indicated that the study population possesses a similar homogeneous genetic architecture when compared to other genetic isolates (Angius et al., 2002a). For comparison purposes, gene diversity values reported in Saami and Finnish samples were included (Laan and Paabo, 1997). However, sample variances were not available in these additional samples, therefore preventing calculation of confidence intervals (Table 3).

Table 3 Gene diversity data comparisons for six microsatellite markers on Xq13.3 in known genetically isolated populations

Heterozygosity is estimated in haploid data by calculating gene diversity. The frequency of heterozygotes is important in population comparisons since each heterozygote carries different alleles and represents the existence of variance.

Comparison between Norfolk Island, Talana and Ogliastra indicated that Talana possesses the lowest average genetic diversity (0.77, 0.67 and 0.75, respectively) as well as the lowest individuals locus diversity (0.617 at DXS8082). This difference is also evident when comparing the total variance of genetic diversity. The difference was significant between the Norfolk Island and Talana populations (P=0.01). However, considering the Norfolk Island sample possessed higher levels of interlocus variance (Table 3), Kalinowski (2002) suggests that if more markers were sampled in the Norfolk population, this variance could be minimized.

Decay of D′ in the Xq13.3 genomic region of sampled Norfolk Islanders is plotted against the same data from the genetic isolate of Talana (Angius et al., 2002a). Figure 1 illustrates that although the gene diversity estimates are higher in the Norfolk Island population when compared to Talana, the levels of D′ are higher and LD decays at a similar rate over genetic distance to the Norfolk Island population.

Figure 1
figure 1

Strength of linkage disequilibrium (LD) evaluated as multiallelic D′ values in the Norfolk Island population versus Talana population stratified physical distances for the chromosome Xq13.3 region.

LD analysis NOS2A SNPs

This study investigated three SNPs within the NOS2A gene located on chromosome 17q11.2–q12 that were previously identified and analyzed in a Gambian and UK Caucasian population (Burgner et al., 2003). The SNPs (−1026g/t, −1659c/t, −2447c/g) are clustered closely together in the proximal promoter region of the NOS2A gene and have been shown to be in complete and near-complete LD in the Gambian and UK samples, respectively (Burgner et al., 2003). The initial objective of this study was to determine the frequency, and assess LD, of these SNP alleles in an isolated population from Norfolk Island. Table 4 illustrates the minor allele frequency (MAF) comparison between different study populations.

Table 4 Comparison of minor allele frequencies (MAF) for proximal NOS2A promoter region SNPs in Caucasian, Gambian and Norfolk Islander samples

Genotype frequencies for all markers were found to comply with Hardy–Weinberg equilibrium ratios (P>0.05) in both the Australian Caucasian control group and the Norfolk Island population. There were no significant differences detected between any of the three NOS2A SNPs in the Norfolk Island population and the Australian Caucasian control group (P>0.05).

As can be seen in Table 4, the frequencies of the rarer allele for these SNPs differ substantially between the two ethnic groups from Gambia and the United Kingdom and for SNPs −1026g/t and −1659c/t, the difference is significant (Burgner et al., 2003). Although not statistically significant, reduced MAFs for the three SNPs in the Norfolk population are indicative of a population with reduced genetic heterogeneity when compared to outbred populations.

Highly significant LD was found to exist between alleles of all three SNPs in the Norfolk Islander population (P<0.000001). This is consistent with previous work (Bugeja et al., 2005) and is to be expected given the close proximity of the SNPs, the relatedness of the Norfolk population and the prevalence of the minor SNP allele (>10%). With the exception of the −1026/−1659 marker pair, all the SNPs studied in the Norfolk population were found to be in complete and significant LD with each other.

Haplotype analysis of NOS2A SNPs

Haplotype analysis suggested that the Norfolk Island population displays reduced heterogeneity at the three SNPs investigated in the NOS2A proximal promoter region. Although comparison to other Caucasian study groups suggests that the same common haplotype is shared, the Norfolk Island population has higher frequency of this haplotype when compared to other presented population data (Table 5).

Table 5 NOS2A SNP haplotype frequencies in Norfolk Islander, Caucasian and Gambians

Discussion

The primary focus of the current study was to determine the extent of LD across the Xq13.3 region in a genetically isolated population from Norfolk Island. This particular region has been routinely utilized for LD estimation in numerous populations (Kaessmann et al., 1999; Zavattari et al., 2000; Angius et al., 2001, 2002a; Pereira and Pena, 2006) allowing comparison between different isolates. The Norfolk Island population study found that LD extends up to 9.5–11 Mb in this particular region, with 13 out of the 15 marker pairs in LD, which is at least comparable to or exceeding that found in other genetic isolates, such as the secluded population of Talana on the island of Sardinia (Angius et al., 2002a). Additionally, gene diversity calculations based on the six microsatellite markers in the same Xq13.3 region suggest that the sampled population of Norfolk Island shows reduced genetic expected heterogeneity, again similar to other isolate populations (Angius et al., 2002a).

Analysis of three SNPs within the NOS2A gene showed interesting results in terms of Norfolk Island's genetic architecture, including a reduced MAF of all three SNPs (when compared to outbred Australian, UK and Gambian populations) and increased frequency of the most common haplotype, both are indicators of increased genetic homogeneity, which is expected considering Norfolk's limited number of initial founders, and stable population growth in isolation from other populations. Although no significant differences were observed regarding MAF and haplotype frequency between Norfolk and an outbred Australian population, it should be noted that the three SNPs in a 2 kb region provide very few data points for comparison and substantial LD was expected in this limited region. Further studies would obviously increase SNP saturation over a greater genetic distance providing significantly more points for comparison. Additionally, introduction of phenotype correlation analysis may be worth investigating, particularly studying the markers within the proximal promoter region of this gene (Hobbs et al., 2002). In relation to this, genome-wide linkage analysis for a number of CVD-related quantitative traits in the Norfolk Island population is currently in progress. This present study provides preliminary comparisons between marker types and the Norfolk population, which should be useful for follow-up analysis stemming from the STR genome scan data.

Founder population isolates offer several significant advantages for disease gene mapping over mainstream populations. First, the limited number of ancestors minimizes genetic heterogeneity and therefore it is expected that there will be fewer susceptibility genes with greater overall effect. Also, in geographically and culturally isolated populations, environmental ‘noise’ is reduced, minimizing the confounding effects of nongenetic variables. These characteristics have been exploited in the study of rare genetic disorders caused by single defective genes (Puffenberger et al., 1994; Nikali et al., 1995; Newman et al., 2003) and can also be advantageous in complex disease gene mapping (Sheffield et al., 1998; Bourgain et al., 2000; Peltonen et al., 2000; Shifman and Darvasi, 2001). Furthermore, the kinship (relatedness) coefficient is much higher in isolated founder populations. This is important because very large extended pedigrees, suitable for powerful linkage analysis of complex traits, can often be identified (Bourgain et al., 2000). A recent study investigating genome scan information in a Hutterite isolate, stressed the importance of good genealogical information (Newman et al., 2001). This study noted that a failure to take full pedigree information into account can reduce the power to detect linkage, or inflate LOD scores and also the failure to account for relatedness can affect association studies. Hence, it is important that good genealogical information with defined pedigree information is available. This allows the analysis of large extended pedigrees and greatly increases the power and accuracy of a gene-mapping study. In addition, a good understanding of local population history is important for evaluating factors such as the number of founders, population size, consanguinity, immigration, population expansion rate and genetic drift. The best candidate populations for detecting associations with common genetic variants are believed to be isolates with a small effective number of unrelated founders (10–100), as this offers the advantage of a smaller number of disease susceptibility variants within the test populations compared with outbred populations (Sheffield et al., 1998; Bourgain et al., 2000; Peltonen et al., 2000).