On methods of spatial analysis for genotyped individuals

Shimatani, K; Takahashi, M

doi:10.1038/sj.hdy.6800295

Download PDF

Original Article
Published: 28 July 2003

On methods of spatial analysis for genotyped individuals

K Shimatani¹ &
M Takahashi²

Heredity volume 91, pages 173–180 (2003)Cite this article

784 Accesses
14 Citations
Metrics details

Abstract

Spatial autocorrelation methods have commonly been applied to individual-based spatial genetic studies, although their properties and the relations among the statistics have not been carefully examined. This paper first introduces a reformulation of widely used spatial statistics using point processes. When Moran's I statistics are applied to allele frequencies within an individual, the frequencies are no longer continuous variables but have only three discrete values and specific interpretations of Moran's I statistics and the number of alleles in common (NAC) can be expressed as the weighted sum of join-count statistics. The distributions of minor genotypes are amplified in Moran's I depending on the allele frequency in the population, while NAC uses a constant weighting system. Under the point process framework, spatial analysis can be conducted on the common theoretical base, from individual locations to genetic distributions of different levels, (for example, genotype and allele). The methodology is demonstrated by application to field data for molecular ecological studies of Fagus crenata population dynamics.

Dutch population structure across space, time and GWAS design

Article Open access 11 September 2020

Siland a R package for estimating the spatial influence of landscape

Article Open access 05 April 2021

Spatially explicit analysis reveals complex human genetic gradients in the Iberian Peninsula

Article Open access 24 May 2019

Introduction

Spatial autocorrelation analysis is currently used as a standard statistical technique for analysing individual-based spatial genetic structure from mapped data of genotyped individuals. The commonly used methods can be classified into two categories depending on the treatment of the genotypic data (Heywood, 1991). One method considers genotypes as nominal data and applies join-count statistics (ie, standard normal deviate, SND). The other, first transforms genotypic data into allele frequencies and applies Moran's I statistics to interval data. With this method, each individual is considered as a population, thus, its allele frequency is 1 if it homozygous for a specific allele, 0.5 if heterozygous, and 0 otherwise. Some studies have used the latter method (eg, Xie and Knowles, 1991; Geburek and Tripp-Knowles, 1994; Streiff et al, 1998; Ueno et al, 2000), others have used both (eg, Leonardi and Menozzi, 1996; Chung and Epperson, 2000), and some have calculated additional statistics, such as the number of alleles in common (NAC; Berg and Hamrick, 1995; Takahashi et al, 2000) and coancestry (Loisselle et al, 1995). In each case, pairs of individuals are classified into distance classes and statistics are calculated for every distance class.

Join-count statistics for short distance classes directly indicate whether a specific genotype is clustered and whether two genotypes are attracting or repulsing. In contrast, large positive Moran's I values for short distance classes are generally interpreted as a tendency for neighbouring individuals to have a ‘similar’ allele frequency. When Sokal and Oden (1978) introduced spatial autocorrelation methods, Moran's I statistics were calculated for the allele frequencies of populations investigated. In this case, the allele frequencies are continuous variables, thus a general interpretation of correlations is feasible; positive correlations are present when the two variables tend to show similar values. However, at the individual level, the frequency can no longer be continuous but takes three discrete values.

In previous studies, when both Moran's I and join-count (and other spatial statistics) were calculated, the two statistics were not simultaneously interpreted. Although Epperson (1995) pointed out that Moran's I is a weighted sum of join-count statistics, no study has directly applied this relations to field data. In addition, some studies analysed the spatial distribution of individuals, although separately from genotypic distribution (eg, Berg and Hamrick, 1995; Ueno et al, 2000).

This paper introduces a methodology that uses join-count statistics, Moran's I statistics for within-individual frequencies, and other spatial statistics together with the spatial distribution of individuals. Beginning with a brief review of conventional spatial autocorrelation methods in genetics, the first part explains point processes, which have been commonly applied in individual-based spatial ecology and play an important role in simultaneous analysis of the spatial distribution of individuals and genotypes. The next part reformulates conventional spatial statistics using the language of point processes (Shimatani, 2002). The reformulated Moran's I for within-individual frequencies and NAC can be expressed as the weighted sum of the reformulated join-count statistics, thus providing an insight into the interpretation of autocorrelations between the three discrete values and clarifying the relation between the measures. The methodology is demonstrated by application to field data of a Fagus crenata population taken from Takahashi et al (2000). The final part discusses the biological implications of the analysis and the utility of the methods for population genetics and molecular ecology.

Spatial statistics

Conventional statistics for genetics

Suppose that the mapped data of genotyped individuals are given. Let {X_i} (i=1, 2,…, n) be the x–y coordinates of individual i. Conventionally, spatial autocorrelation techniques are applied as follows. First, Euclidean distances are calculated between the individuals and divided into distance classes of width 2Δ as (0, 2Δ], (2Δ, 4Δ], (4Δ, 6Δ], …. For each pair of individuals (i, j), weight W_i,j[r] is given for discrete distances of r=Δ, 3Δ, 5Δ, … depending on their interdistance ∣∣X_i−X_j∣∣ as

(1) Moran's I statistics for within-individual frequencies: Fix one locus and one allele, named A. Let a(i) be the allele frequency within individual i, namely, a(i)=1 if individual i is homozygous for allele A, a(i)=0.5 if heterozygous for A, and a(i)=0 otherwise. Let ā be the (estimated) frequency of allele A in the given population. For each distance class (r−Δ, r+Δ], Moran's I statistics (Cliff and Ord, 1981; Sokal and Oden, 1978) are applied to the frequency of allele A within an individual as

where V is the variance of a(i):

(2) Coancestry: Using the same notations as above, some recent studies used the following equation called coancestry (with respect to allele A) (Loisselle et al, 1995):

If a population is Hardy-Weinberg equilibrium

then, ρ_A(r)=I_A(r)/2.

(3) NAC: Let nac(i, j) be the average number of alleles in common over the loci considered between individuals i and j (Surles et al, 1990). This genetic similarity index can be extended to spatial statistics as (Berg and Hamrick, 1995)

(4) Join. count statistics: Fix one locus and let m(i) be the genotype of individual i. Classify individuals by their genotypes of that locus, such as AA, AB, BB, AC, BC, … and define join-count statistics, for example, with respect to AA-AA, AA-AB as

These are equal to the observed number of joins with specific genotype(s) (in biallelic cases, there are an additional four statistics denoted as J_AA-BB[r], J_AB-AB[r], J_AB-BB[r], J_BB-BB[r]). Subtracting the expected number of joins under the random distribution and dividing it by the square root of the variance, we obtain SND which approximately follow the normal distribution.

For any of these statistics, ploting values at r=Δ, 3Δ, 5Δ, …, produces a correlogram, illustrating fine-scale spatial genetic structure.

Point processes

A point process is a stochastic system that places points in the plane. If points are classified into several types, the system is called a multivariate point process. If each point has a mark (generally, a real number or a set of real numbers), the system is called a marked point process. In this paper, a point corresponds to an individual (tree), a type to the genotype, and a mark to its allele frequency within an individual or multilocus genotype. The details and brief introduction to terminology below are taken from Stoyan and Stoyan (1994), and Stoyan and Penttinen (2000), respectively.

Let λ be the density; the mean number of individuals per unit area. The product density, J(r), is the probability density that there are individuals at two arbitrarily chosen points with interdistance r. If individuals are randomly distributed, the probability that an individual exists at each point is independently equal to λ, thus, J(r)/λ². The normalised product density, g(r)=J(r)/λ², is called the pair correlation function. g(r) >1 (or J(r)>λ²) for relatively small r means that the interdistance r is more frequent than a random point pattern, thus, there is clustering of individuals.

When individuals are classified into K types, K(K+1)/2 product densities {J_k,l(r)} (1≤k≤l≤K) can be considered; they express the probability density that there are type k and type l individuals at two arbitrarily chosen points of interdistance r (J_k,l(r) does not specify which type exists at which point). Let their normalised versions be denoted as g_k,k(r)=J_k,k(r)/λ_k² and g_k,l(r)=J_k,l(r)/2λ_kλ_l, where λ_k refers to the density of type k individuals.

If each individual has a mark, let M denote the set of marks, and m(i) the mark of individual i. Let f(m₁, m₂) be a function on M × M. For two arbitrarily chosen points with interdistance r, define a random variable that vanishes if there is no individual at one of the points and is equal to f(m(i), m(j)) if individuals i and j exist. Let J_f(r) be the expected value of this random variable and define

g_f(r) = J_f(r)/J(r) (6)

g_f(r) can be interpreted as the conditional mean of f(m(i), m(j)) given that ∣∣X_i−X_j∣∣=r.

Suppose that we have a complete map of individuals with types or marks in a rectangular sampling plot with side lengths a and b (a<b). Denote the data by {X_i, m(i)} {i=1, 2, …, n}, where m(i) refers to the type or mark. g(r), g_k,k(r), g_k,l(r), and g_f(r) can be estimated as follows (Penttinen et al, 1992; Stoyan and Stoyan, 1994, pp 284–293).

Here, r<a, λ̂ = n/ab is the estimated density, λ̂_k is the estimated density of type k individuals,

is the Epanechnikov kernel in which δ is an arbitrarily fixed constant, and s(r) = ab − r(2a + 2b − r)/π is the edge correction factor. Generally, an estimator includes an edge correction. An exception is g_f(r) (equation (10)) in which the edge corrections in the numerator and the denominator cancel each other. The majority of ecological studies use Riplay's edge correction (Hasse, 1995), but the edge correction factors do not cancel and ĝ_f(r) must be calculated by a more complicated equation.

Reformulation of statistics

Shimatani (2002) reformulated the spatial statistics (eqations (2), (3), (4) and (5)) in the language of the point process. Using a(i) as a mark and f(m₁, m₂)=(m₁−ā)(m₂−ā)/V and f(m₁, m₂)=(m₁−ā)(m₂−ā)/ā(1−ā) as a function in equation (6), we obtain the reformulated Moran's I statistics for within-individual frequencies and the coancestry, respectively. Substituting these functions into (10), we obtain their estimators as

where is the unbiased estimator of the variance of a(i).

In the same way, let the mark set be (multilocus) genotypes and let the function be nac(i, j). It induces the reformulated NAC, and its estimator is given as

equations (12), (13) and (14) express the expected values of (a(i) − ā)(a(j) − ā)/V, (a(i) − ā)(a(j) − ā)/ā(1 − ā), and nac(i, j), respectively, given that ∣∣X_i−X_j∣∣=r.

Classifying individuals by the single-locus genotype, the estimators of product densities, Ĵ_k,l(r), which correspond to join-count statistics, are given as

which express the probability density that there are individuals of genotype AA-AA, and AA-AB, … at two points of interdistance r, respectively.

Conventional spatial autocorrelation methods and marked point processes have different mathematical backgrounds. When Sokal and Oden (1978) first introduced the former into population genetics, each population was considered as a lattice point, a set of populations formed an irregular lattice, and Moran's I statistics were calculated for allele frequencies of the populations. Later, the method was modified to be applicable to individual-based studies, in which an individual is treated as a population. Using this approach, individual locations are fixed and statistical analysis is primarily conducted to test whether the spatial genetic distribution on the given locations significantly differs from the random pattern. On the other hand, marked point processes investigate the spatial distribution of marked points. This approach assumes that genotyped individuals are distributed throughout a plane according to some stochastic system and equations (7), (8), (9) and (10) estimate functions associated with the hidden process from samples [thus, the unbiased estimator should be used in the variance term in equation (12)]. Hence, the point process contains potential for constructing stochastic models that can simultaneously explain individual locations and their genotypes (Shimatani, 2002).

Despite the differences in their mathematical background, in practice, the reformulated expressions (equations (12), (13), (14) and (15)) can be derived simply by replacing W_ij[r] with w(r−∣∣X_i−X_j∣∣) in equations (2), (3) and (4). Hence, both equations exhibit a similar correlogram and a graph (Shimatani, 2002). Figures 1a and b compare I_A[r] and Î_A(r) for a Fagus crenata population (data from Takahashi et al (2000), Figure 2). The conventional statistics use a weight of either 1 or 0, whether the interdistance exactly falls into (r-Δ, r+Δ], while in the point processes, weights are gradually decreased when interdistances diverge from r, and are fixed to 0 if it falls outside r±δ. The latter approach enables us to calculate the values for any distance and draw a smooth curve illustrating spatial genetic patterns, more elegantly than the broken line of a correlogram. In addition, it is no longer necessary to fix arbitrarily the width 2Δ of the distance class; occasionally, a slight change of Δ dramatically affects the statistics, especially for the first distance class. In contrast, although the point process formulation requires arbitrary fixation of the width δ of the kernel function (equation (11)), it works at most on smoothing the curve (Stoyan and Stoyan, 1994, pp 284–290, see also Figure 1a).

Weighted-sum expressions

Because mark a(i) takes only three discrete values, Î_A(r) and ρ̂_A(r) can be reformulated under multivariate point processes. Fix a locus and an allele, pack all the alleles other than the fixed A into one group, and denote it by ^*. Index genotypes AA, A^*, ^** as 1, 2, 3. Î_A(r)(and ρ̂_A(r)) can be expressed as the weighted sum of {Ĵ_{K − L}(r)/Ĵ(r)} (1≤K≤L≤3) as in (Shimatani, 2002):

where S_KL denotes the K-L component of the matrix

(This relation was first pointed out in Epperson (1995) for the conventional Moran's I for within-individual frequencies (equation (2)) and join-count statistics (equation (5)). Replacing V̂ with ā(1 − ā) involves the weighted-sum expression of ρ̂_A(r). Note that Ĵ_{K − L}(r)Ĵ(r) does not contain the edge correction factors.

If one biallelic locus is considered, NÂC(r) can be written in the same form:

where

Even when the locus has more than two alleles, if the minor alleles have sufficiently small frequencies, NÂC(r) of that locus can be approximated by this biallelic form. Although Î_A(r) and NÂC(r) have the same form except for the weighting system, the weight matrix S of Î_A(r) is a variable of allele frequency ā in the population, whereas NÂ(r) uses a constant matrix. If the population is at the Hardy–Weinberg equilibrium, V = ā (1−ā)/V₂, thus

Hence, S=S(ā) varies depending on the allele frequency ā, for example, as:

Hence, for the spatial autocorrelation of the three discrete variables, positive values are obtained when: (1) both genotypes (frequencies) are identical or (2) one is a heterozygote (A^*) and the other is a homozygote (AA) if ā<0.5 or ^**-type if ā>0.5. Unlike the general cases in which positive correlations appear when two variables tend to have similar values, the autocorrelation of the three discrete variables involves more concrete and specific interpretation.

As the allele frequency deviates from 0.5, more variation appears between the six weights, for instance

Î_A(r) (and coancestry ρ̂_A(r)) provides 38/0.11=345 times greater weight with ^**–^** joins than AA–AA joins if ā = 95%, and 198/0.02=9801 times greater if ā = 99%, meaning that Î_A(r) and ρ̂_A(r) intensify the information on minor genotype(s). This contrasts with NAC, which always considers AA–AA as similar as AB–AB and BB–BB.

Instead of f(m₁, m₂)=(m₁−ā)(m₂−ā)/V, if product f(m₁, m₂)=m₁m₂ is used, the resulting function $g_{m_{1} m_{2}} (r)$ normalised by the square of the mean of marks is called the mark correlation function (Stoyan and Stoyan, 1994, pp 291–293; Stoyan and Peenttinen, 2000). This function has frequently been used in ecology where the mark refers to the sizes of trees (eg, Penttinen et al, 1992). If the discrete mark a(i) above is used, $g_{m_{1} m_{2}} (r) / ā^{2}$ can be estimated in the form:

where

This function is easier to interpret than Î_A(r), and directly indicates whether allele A is clustering. Namely, $g_{m_{1} m_{2}} (r) / ā^{2} \equiv 1$ if allele A is randomly distributed. ${\hat{g}}_{m_{1} m_{2}} (r) / ā^{2} < 1$ for small r suggests that the neighbouring individuals tend to be AA homozygotes (or A^* heterozygotes if ā<0.5), while $ĝ_{m_{1} m_{2}} (r) / ā^{2} > 1$ indicates that the neighbouring individuals tend not to share this allele. In contrast, Moran's I takes a product of (a(i) − ā)(a(j) − ā), thus, Î_A(r) > 0 suggests either or both, and cannot distinguish between the two cases. However, matrix U (equation (21)) includes three zero components, meaning that this measure ignores half of the spatial information.

The effects of the different weighting systems are demonstrated below.

Field data

The above methodology is demonstrated by field data of Fagus crenata (population AK) from Takahashi et al (2000). Fagus crenata is widely distributed in cool temperate forest in Japan, especially in areas with abundant snowfall. This species is wind-pollinated, self-incompatible and has limited seed dispersal. The study site was once old-growth forest dominated by F. crenata, harvested approximately 80 years ago preserving some seed trees, then, naturally regenerated. The stand is currently covered with secondary F. crenata forest. Takahashi et al (2000) genotyped 486 individuals in the 0.77 ha plot for nine isozyme loci. This paper primarily uses the genetic data of two loci; Mdh-3 (EC 1.1.1.37) and Aap-1 (EC 3.4.11.2) (Figure 2). The details of the data are described in Takahashi et al (2000).

Application to field data

Figure 1 illustrates the reformulated Moran's I statistics for within-individual frequencies (Î(r), equations (12), (16) and (17)) with respect to allele b (frequency=0.80) of Mdh-3 and allele b (frequency=0.94) of Aap-1, and (single-locus) NÂC(r) of each locus for the Fagus crenata population. The biallelic approximation (equations (18) and (19)) was used for NÂC(r) because the third alleles have frequencies of only 0.2–0.3%.

For the Mdh-3 locus, the monotonically decreasing Î_b(r) suggests spatial genetic structure whereas NÂC(r) shows no clear tendency. For Aap-1, both Î_b(r) and NÂC(r) indicate a trend of decreasing up to 10 m but Î_b(r) takes its maximum at 4 m. The application of equations (16), (17), (18), (19), equations (20) and (21) leads us to examine as to what caused the differences between the two functions and between the two loci, and reveals details of the spatial genetic patterns.

Figure 3 illustrates the six product density functions {Ĵ_{K − L}(r)} (1≤K≤L≤3) (equation (15)). Dividing Ĵ_{K − I}(r) by their sum (=Ĵ(r)), the conditional probabilities that a randomly selected pair have genotypes K and L, given that their interdistance is r, are obtained (Figure 4). Multiplying Ĵ_{K − L}(r)/Ĵ(r) by weight

involves the six curves and their sum is equal to Î_b(r) (Figure 5). Changing the weight matrix to T (equation (19)), another six curves are obtained whose sum is NÂC(r) (Figure 6).

Mdh-3

There appear to be a great number of bb–bb and bb–b^* joins for short distances but not as many for long distances (Figure 3a). This is largely because of the clustering of trees themselves rather than the clustering of genotypes on the trees. In fact, compared with the long interdistances, the ratios Ĵ_{bb − bb}(r)/Ĵ(r) and Ĵ_{bb − b*}(r)/Ĵ(r) are smaller for short distances (Figure 4a); instead, ratios for b^*–b^*, b^*–^**, and ^**–^** are greater for short distances than for long distances. This means that arbitrarily chosen neighbouring trees are expected to be bb–bb or bb–b^* because allele b is in the majority, that the probabilities of selecting b^*–b^*, b^*–^**, or ^**–^** are small because these genotypes are in the minority, and that minor genotypes are chosen more frequently for neighbouring pairs than separated pairs.

When the six percentage functions Ĵ_{K − L}(r)/Ĵ(r) are multiplied by matrix S_Mdh-3, components b^*–b^*, b^*–^**, and ^**–^** are amplified; 1.10/0.48–7.81/0.48 ≈2–16 times more than bb–bb, and their tendencies of monotonic decrease become apparent, resulting in the monotonically decreasing Î_b(r) (Figure 5a). On the other hand, NÂC(r) uses the matrix consisting of only {0, 1, 2} (equation (19)). All the curves stay almost constant, and NÂC(r) reveals no clear spatial structure for Mdh-3 (Figure 6). The mark correlation function (Figure 7) also suggests that minor allele c is clustering while major allele b is not. Ī_b(r) intensified the former characteristics, visualizing the spatial structure in the Mdh-3 locus, while NÂC(r) with no amplification does not express this pattern.

Aap-1

The allele frequency is highly biased to the major allele b. This involves large Ĵ_{bb − bb}(r) and Ĵ_{bb − b*}(r), while the other four nearly overlap the horizontal axis (Figure 3b). Taking their percentages, Figure 4b shows that up to 10 m, Ĵ_{bb − bb}(r)/Ĵ(r) monotonically decreases while Ĵ_{bb − b*}(r)/Ĵ(r) monotonically increases. NÂC(r) reflects only the two major curves, resulting in the clear spatial structure up to 10 m (Figure 1c). On the contrary, Î_b(r) (Figure 1b) indicates the strongest spatial structure at 4 m. This contrasting result was caused by the spatial pattern of ^**–trees; although there is a sharp peak at 4 m, it has not become visible until the weight matrix S_Aap-1 intensified ^**–^** joins 26.29/0.1=263 times more than bb–bb joins (Figure 5b).

In summary, NÂC(r) and Î_b(r) involve similar decreasing graphs, especially if calculated by 5 m distance classes because it averaged 0-5 m and the peak at 4 m has disappeared (Figure 1b). However, the former is mostly because of the spatial pattern of the major allele whereas in the latter, the minor alleles make a greater contribution.

Discussion

Individual-based spatial genetic studies have applied Moran's I statistics for interval data to three discrete variables, which can be written as a weighted sum of the six joint-count statistics (Epperson, 1995). There have been two evaluations of the utility of this statistics and two directions in developing spatial statistics. Epperson (1995) suggested classifying individuals by their genotypes and applying full joint-count statistics to ensure the high resolution of spatial data. In contrast, Smouse and Peakall (1999) aimed to develope a statistic that summarizes spatial patterns and has as much statistical power as possible for testing the randomness of genetic patterns. Because full join-count statistics reflect the original genotypic information, they can reveal various characteristics of spatial pattern, whereas information loss is inevitable for summarized statistics. On the contrary, even the summarized autocorrelations tend to be sensitive to stochastic variance (Slatkin and Arter, 1991), and join-count statistics, which must be calculated from smaller sample sizes than their weighted sum, should be more vulnerable to stochastic effects. The two approaches have contrasting advantages and disadvantages. However, they can complement each other if summarised statistics can be decomposed into components. In fact, the statistics introduced in Smouse and Peakall (1999) are defined for multilocus genotypes as well as decomposable into loci and alleles, therefore, one can check which allele/locus largely contribute to the summarised statistics and which have little influence. This paper has extended this approach and shown that by changing the values of the weighting matrix, both Moran's I for within-individual frequencies and NAC are decomposable into join-count statistics. Hence, under the spatial analysis proposed here, join-count statistics (simplified by packing the other alleles into type ^*) are used to examine the roles of each genotype in the summarized statistics, and thus play relatively complementary roles, which differs from the method suggested in Epperson (1995) in which many join-count statistics should play a central role. Instead, we may accumulatively analyse spatial genetic patterns from the genotypic level to the allele level, and possibly to the single-locus and multilocus level, and examine their relations in the hierarchy.

Some previous studies have separately analysed individual distribution and genes by Ripley's K-function of point processes (Berg and Hamrick, 1995) or Morishita's index of dispersion (I_δ) (Ueno et al, 2000), and by spatial autocorrelation based on the lattice theory, respectively. The introduction of point processes involves fundamental conceptual changes. First, all of the spatial analyses are established on a common theoretical base, from the spatial distribution of individual trees to the spatial distribution of genetic variation, at different levels indicated above. More importantly, the lattice theory fixes individual locations whereas the point process treats them as random variables, and the goal is no longer to test the spatial randomness but includes the construction of stochastic models that can simultaneously explain individual distributions and their genotypes (Shimatani, 2002). It is currently quite common for ecologists to use genetic markers as tools for ecological studies, called molecular ecology. The marked point process provides appropriate analytical tools when we examine populations for ecological purposes such as forest dynamics, by means of genetic markers, which may involve new insights into population genetics.

Takahashi et al (2000) suggested that the presence of the spatial genetic structure represented by Moran's I statistics is a result of regeneration from limited seed trees because then offspring surrounding the mother tend to share alleles inherited from the mother. Moreover, applying point process models in which the genetic structure was represented by the average of the reformulated Moran's I statistics, Shimatani (2002) quantitatively estimated the number of seed trees as moderately limited (eg, 35 trees/ha) rather than very limited (eg, 10 trees/ha), suggesting advance reproduction of harvested adults. Takahashi et al (2000) also illustrated a map of genotypic distribution for the Pgi-1 locus in which a preserved tree with a minor allele d is surrounded by young trees with this allele (this observation can be quantitatively assessed by the application of the decomposition analysis introduced here to the Pgi-1 locus, which actually indicated the clustering of heterozygotes c^*, mostly cd). This paper also shows the clustering of minor alleles for Mdh-3 and Aap-1 (in fact, seven cc-trees are clumped into two patches, see Figure 2).

Spatial patterns of minor alleles reflect the founder effect: regeneration from a limited number of seed trees. For the above three loci, Î_A(r)'s property of intensifying minor allele' information worked appropriately. On the contrary, the Dia-1 locus has four alleles, and two minor alleles, a and c, are separately distributed in the plot. In such cases, because Î_A(r) specifies one allele and packs all the others together, Î_A(r) for any allele cannot effectively illustrate the spatial pattern at the locus while NÂC(r) may. The advantages and disadvantages of each function should be extensively examined to characterize spatial patterns more adequately and to construct stochastic models.

In conclusion, spatial analysis for mapped, genotyped individuals should not rely on one statistic and the simultaneous use of several point process functions is recommended. If the locus is close to biallelic, the above demonstration suggests fixing an allele with sufficient frequency, packing all the other alleles into one group, calculating the six product density functions, providing weights depending upon the function, drawing their curves, and then examining the spatial genetic pattern. The comprehensive application of point process functions provides analytical tools for spatial data of genotyped individuals with population genetics and molecular ecology.

References

Berg EE, Hamrick JL (1995). Fine-scale genetic structure of a Turkey oak forest. Evolution 49: 110–120.
Article Google Scholar
Chung MG, Epperson BK (2000). Clonal and spatial genetic structure in Eurya emarginata (Theaceae). Heredity 84: 170–177.
Article Google Scholar
Cliff AD, Ord JK (1981). Spatial Processes: Models and Applications. Pion Ltd: London.
Google Scholar
Epperson B (1995). Fine-scale spatial structure: correlations for individual genotypes differ from those for local gene frequencies. Evolution 49: 1022–1026.
Article Google Scholar
Geburek T, Tripp-Knowles P (1994). Genetic architecture in bur oak, Quercus macrocarpa (Fagaceae), inferred by means of spatial autocorrelation analysis. Plant Syst Evol 189: 63–74.
Article Google Scholar
Hasse P (1995). Spatial pattern analysis in ecology based on Ripley's K-function: Introduction and methods of edge correction. J Veg Sci 6: 575–582.
Article Google Scholar
Heywood JS (1991). Spatial analysis of genetic variation in plant populations. Annu Rev Ecol Syst 22: 335–355.
Article Google Scholar
Leonardi S, Menozzi P (1996). Spatial structure of genetic variability in natural stands of Fagus sylvatica L. (beech) in Italy. Heredity 77: 359–368.
Article Google Scholar
Loisselle BA, Sork VL, Nason J, Graham C (1995). Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am J Bot 82: 1420–1425.
Article Google Scholar
Pettinen A, Stoyan D, Henttonen HM (1992). Marked point processes in forest statistics. For Sci 38: 806–824.
Google Scholar
Shimatani K (2002). Point processes for fine-scale spatial genetics and molecular ecology. Biom J 44: 325–352.
Article Google Scholar
Slatkin M, Arter HE (1991). Spatial autocorrelation methods in population genetics. Am Nat 138: 499–517.
Article Google Scholar
Smouse PE, Peakall R (1999). Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity 82: 561–573.
Article Google Scholar
Sokal RR, Oden NL (1978). Spatial autocorrelation in biology. 1. Methodology. Biol J Linn Soc 10: 199–228.
Article Google Scholar
Stoyan D, Stoyan H (1994). Fractals, Random Shapes and Point Fields. John Wiley and Sons: Chichester.
Google Scholar
Stoyan D, Penttinen A (2000). Recent applications of point process methods in forestry statistics. Statist Sci 15: 61–78.
Article Google Scholar
Streiff R, Labbe T, Bacilieri R, Steinkellner H, Glössl J, Kremer A . (1998). Within-population genetic structure in Quercus robur L. and Quercus petraea (Matt.) Liebl. assessed with isozymes and microsatellites. Mol Ecol 7: 317–328.
Article Google Scholar
Surles SE, Arnold J, Schnabel A, Hamrick JL, Bongarten BC (1990). Genetic relatedness in open-pollinated families of two leguminous tree species, Robinia pseudoacacia L. and Gleditsia triacanthos L. Theor Appl Genet 80: 49–56.
Article CAS Google Scholar
Takahashi M, Mukouda M, Koono K (2000). Differences in genetic structure between two Japanese beech (Fagus crenata Blume) stands. Heredity 84: 103–115.
Article Google Scholar
Ueno S, Tomaru N, Yoshimaru H, Manabe T, Yamamoto S . (2000). Genetic structure of Camellia japonica L. in an old-growth evergreen forest, Tsushima, Japan. Mol Ecol 9: 647–656.
Article CAS Google Scholar
Xie CY, Knowles P (1991). Spatial genetic substructure within natural populations of jack pine (Pinus banksiana). Can J Bot 69: 547–551.
Article Google Scholar

Download references

Acknowledgements

We thank two anonymous referees for critically commenting on the manuscript.

Author information

Authors and Affiliations

The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato, Tokyo, 106-8569, Japan
K Shimatani
Forest Tree Breeding Center, 3809-1, Ishi, Juo, Taga, Ibaraki, 319-1301, Japan
M Takahashi

Authors

K Shimatani
View author publications
You can also search for this author in PubMed Google Scholar
M Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K Shimatani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shimatani, K., Takahashi, M. On methods of spatial analysis for genotyped individuals. Heredity 91, 173–180 (2003). https://doi.org/10.1038/sj.hdy.6800295

Download citation

Received: 31 May 2002
Accepted: 25 February 2003
Published: 28 July 2003
Issue Date: 01 August 2003
DOI: https://doi.org/10.1038/sj.hdy.6800295

Keywords

This article is cited by

Variability of local spatial structure in a wave‐regenerated Abies forest
- Satoshi N. Suzuki
- Naoki Kachi
- Jun‐Ichirou Suzuki
Ecological Research (2012)
Putting the ‘landscape’ in landscape genetics
- A Storfer
- M A Murphy
- L P Waits
Heredity (2007)

On methods of spatial analysis for genotyped individuals

Abstract

Similar content being viewed by others

Dutch population structure across space, time and GWAS design

Siland a R package for estimating the spatial influence of landscape

Spatially explicit analysis reveals complex human genetic gradients in the Iberian Peninsula

Introduction