Abstract
An increasing number of field studies have shown that the phenotype of an individual plant depends not only on its genotype but also on those of neighboring plants; however, this fact is not taken into consideration in genome-wide association studies (GWAS). Based on the Ising model of ferromagnetism, we incorporated neighbor genotypic identity into a regression model, named “Neighbor GWAS”. Our simulations showed that the effective range of neighbor effects could be estimated using an observed phenotype when the proportion of phenotypic variation explained (PVE) by neighbor effects peaked. The spatial scale of the first nearest neighbors gave the maximum power to detect the causal variants responsible for neighbor effects, unless their effective range was too broad. However, if the effective range of the neighbor effects was broad and minor allele frequencies were low, there was collinearity between the self and neighbor effects. To suppress the false positive detection of neighbor effects, the fixed effect and variance components involved in the neighbor effects should be tested in comparison with a standard GWAS model. We applied neighbor GWAS to field herbivory data from 199 accessions of Arabidopsis thaliana and found that neighbor effects explained 8% more of the PVE of the observed damage than standard GWAS. The neighbor GWAS method provides a novel tool that could facilitate the analysis of complex traits in spatially structured environments and is available as an R package at CRAN (https://cran.rproject.org/package=rNeighborGWAS).
Similar content being viewed by others
Introduction
Plants are immobile and thus cannot escape their neighbors. In natural and agricultural systems, individual phenotypes depend not only on the plants’ own genotype but also on the genotypes of other neighboring plants (Tahvanainen and Root 1972; Barbosa et al. 2009; Underwood et al. 2014). This phenomenon has been termed neighbor effects or associational effects in plant ecology (Barbosa et al. 2009; Underwood et al. 2014; Sato 2018). Such neighbor effects were initially reported as a form of interspecific interaction among different plant species (Tahvanainen and Root 1972), but many studies have illustrated that neighbor effects occur among different genotypes within a plant species with respect to: (i) herbivory (Schuman et al. 2015; Sato 2018; Ida et al. 2018), (ii) pathogen infections (Mundt 2002; Zeller et al. 2012), and (iii) pollinator visitations (Underwood et al. 2020). Although neighbor effects are of considerable interest in plant science (Dicke and Baldwin 2010; Erb 2018) and agriculture (Zeller et al. 2012; Dahlin et al. 2018), they are often not considered in quantitative genetic analyses of field-grown plants.
Complex mechanisms underlie neighbor effects through direct competition (Weiner 1990), herbivore and pollinator movement (Bergvall et al. 2006; Verschut et al. 2016; Underwood et al. 2020), and volatile communication among plants (Schuman et al. 2015; Dahlin et al. 2018). For example, lipoxygenase (LOX) genes govern jasmonate-mediated volatile emissions in wild tobacco (Nicotiana attenuata) that induce defenses of neighboring plants (Schuman et al. 2015). Even if direct plant–plant communications are absent, herbivores can mediate indirect interactions between plant genotypes (Sato and Kudoh 2017; Ida et al. 2018). For example, the GLABRA1 gene is known to determine hairy or glabrous phenotypes in Arabidopsis plants (Hauser et al. 2001), and the flightless leaf beetle (Phaedon brassicae) is known to prefer glabrous plants to hairy ones (Sato et al. 2017). Consequently, hairy plants escape herbivory when surrounded by glabrous plants (Sato and Kudoh 2017). Yet, there are few hypothesis-free approaches currently available for the identification of the key genetic variants responsible for plant neighborhood effects.
Genome-wide association studies (GWAS) have been increasingly adopted to resolve the genetic architecture of complex traits in the model plant, Arabidopsis thaliana (Atwell et al. 2010; Seren et al. 2017; Togninalli et al. 2018), and crop species (Hamblin et al. 2011). The interactions of plants with herbivores (Brachi et al. 2015; Nallu et al. 2018), microbes (Horton et al. 2014; Wang et al. 2018), and other plant species (Frachon et al. 2019) are examples of the complex traits that are investigated through the lens of GWAS. To distinguish causal variants from the genome structure, GWAS often employs a linear mixed model with kinship considered as a random effect (Kang et al. 2008; Korte and Farlow 2013). However, because of combinatorial explosion, it is generally impossible to test the full set of inter-genomic locus-by-locus interactions (Gondro et al. 2013); thus, some feasible and reasonable approach should be developed for the GWAS of neighbor effects.
To incorporate neighbor effects into GWAS, we have focused on a theoretical model of neighbor effects in magnetic fields, known as the Ising model (Ising 1925; McCoy and Maillard 2012), which has been applied to forest gap dynamics (Kizaki and Katori 1999; Schlicht and Iwasa 2004) and community assembly (Azaele et al. 2010) in plant ecology. Using the Ising analogy, we compare individual plants to a magnet: the two alleles at each locus correspond to the north and south dipoles, and genome-wide multiple testing across all loci is analogous to a number of parallel two-dimensional layers. The Ising model has a clear advantage in its interpretability, such that: (i) the optimization problem for a population sum of trait values can be regarded as an inverse problem of a simple linear model, (ii) the sign of neighbor effects determines the model’s trend with regard to the generation of a clustered or checkered spatial pattern of the two states, and (iii) the self-genotypic effect determines the general tendency to favor one allele over another (Fig. 1).
In this study, we proposed a new methodology integrating GWAS and the Ising model, named “neighbor GWAS.” The method was applied to simulated phenotypes and actual data of field herbivory on A. thaliana. We addressed two specific questions: (i) what spatial and genetic factors influenced the power to detect causal variants? and (ii) were neighbor effects significant sources of leaf damage variation in field-grown A. thaliana? Based on the simulation and application, we determined the feasibility of our approach to detect neighbor effects in field-grown plants.
Materials and methods
Neighbor GWAS
Basic model
We analyzed neighbor effects in GWAS as an inverse problem of the two-dimensional Ising model, named “neighbor GWAS” hereafter (Fig. 1). We considered a situation where a plant accession has one of two alleles at each locus, and a number of accessions occupied a finite set of field sites, in a two-dimensional lattice. The allelic status at each locus was represented by x, and so the allelic status at each locus of the ith focal plant and the jth neighboring plants was designated as xi(j)∈{−1, +1}. Based on a two-dimensional Ising model, we defined a phenotype value for the ith focal individual plant yias:
where β1 and β2 denoted self-genotype and neighbor effects, respectively. If two neighboring plants shared the same allele at a given locus, the product xixj turned into (−1) × (−1) = +1 or (+1) × (+1) = +1. If two neighbors had different alleles, the product xixj became (−1) × (+1) = −1 or (+1) × (−1) = −1. Accordingly, the effects of neighbor genotypic identity on a particular phenotype depended on the coefficient β2 and the number of the two alleles in a neighborhood. If the numbers of identical and different alleles were the same near a focal plant, these neighbors offset the sum of the products between the focal plant i and all j neighbors \(\mathop {\sum }\nolimits_{ < i,j > } x_ix_j\) and exerted no effects on a phenotype. When we summed up the phenotype values for the total number of plants n and replaced it as E = −β2, H = −β1 and εI = Σyi, Eq. 1 could be transformed into \({\it{\epsilon }}_I = - {E}\mathop {\sum }\nolimits_{ < i,j > } x_ix_j - {H}\mathop {\sum }x_i\), which defined the interaction energy of a two-dimensional ferromagnetic Ising model (McCoy and Maillard 2012). The neighbor effect β2 and self-genotype effect β1 were interpreted as the energy coefficient E and external magnetic effects H, respectively. An individual plant represented a spin and the two allelic states of each locus corresponded to a north or south dipole. The positive or negative value of Σxixj indicated a ferromagnetism or paramagnetism, respectively. In this study, we did not consider the effects of allele dominance because this model was applied to inbred plants. However, heterozygotes could be processed if the neighbor covariate xixj was weighted by an estimated degree of dominance in the self-genotypic effects on a phenotype.
Association tests
For association mapping, we needed to determine β1 and β2 from the observed phenotypes and considered a confounding sample structure as advocated by previous GWAS (e.g., Kang et al. 2008; Korte and Farlow 2013). Extending the basic model (Eq. (1)), we described a linear mixed model at an individual level as:
where β0 indicated the intercept; the term β1xi represented fixed self-genotype effects as tested in standard GWAS; and β2 was the coefficient of fixed neighbor effects. The neighbor covariate \(\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}\) indicated a sum of products for all combinations between the ith focal plant and the jth neighbor at the sth spatial scale from the focal plant i, and was scaled by the number of neighboring plants, L. The number of neighboring plants L was dependent on the spatial scale s to be referred. Variance components due to the sample structure of self and neighbor effects were modeled by a random effect \(u_i \in {\boldsymbol{u}}\) and \({\boldsymbol{u}}\sim {\mathrm{Norm}}(\textbf{0},\sigma _1^2{\boldsymbol{K}}_1 + \sigma _2^2{\boldsymbol{K}}_2)\). The residual was expressed as \(e_i \in {\boldsymbol{e}}\) and \({\boldsymbol{e}}\sim {\mathrm{Norm}}(\textbf{0},\sigma _e^2{\boldsymbol{I}})\), where I represented an identity matrix.
Variation partitioning
To estimate the proportion of phenotypic variation explained (PVE) by the self and neighbor effects, we utilized variance component parameters in linear mixed models. The n × n variance-covariance matrices represented the similarity in self-genotypes (i.e., kinship) and neighbor covariates among n individual plants as \({\boldsymbol{K}}_1 = \frac{1}{{q - 1}}{\boldsymbol{X}}_1^{\mathrm{T}}{\boldsymbol{X}}_1\) and \({\boldsymbol{K}}_2 = \frac{1}{{q - 1}}{\boldsymbol{X}}_2^{\mathrm{T}}{\boldsymbol{X}}_2\), where the elements of n plants × q markers matrix X1 and X2 consisted of explanatory variables for the self and neighbor effects as X1 = (xi) and \({\boldsymbol{X}}_2 = (\frac{{\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}}}{L})\). As we defined \(x_{i(j)} \in\){+1, −1}, the elements of the kinship matrix K1 were scaled to represent the proportion of marker loci shared among n × n plants such that \({\boldsymbol{K}}_1 = \left( {\frac{{k_{ij}\, + \,1}}{2}} \right)\);\(\sigma _1^2\)and \(\sigma _2^2\) indicated variance component parameters for the self and neighbor effects.
The individual-level formula Eq. (2) could also be converted into a conventional matrix form as:
where y was an n × 1 vector of the phenotypes; X was a matrix of fixed effects, including a unit vector, self-genotype xi, neighbor covariate \(({\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}})/{L}\), and other confounding covariates for n plants; β was a vector that represents the coefficients of the fixed effects; Z was a design matrix allocating individuals to a genotype, and became an identity matrix if all plants were different accessions; u was the random effect with Var(u) =\(\sigma _1^2{\boldsymbol{K}}_1 + \sigma _2^2{\boldsymbol{K}}_2\); and e was residual as Var(e) = \(\sigma _e^2{\boldsymbol{I}}\).
Because our objective was to test for neighbor effects, we needed to avoid the detection of false positive neighbor effects. The self-genotype value xi and neighbor genotypic identity \(\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}\) would become correlated explanatory variables in a single regression model (sensu colinear) due to the minor allele frequency (MAF) and the spatial scale of s. When MAF is low, neighbors \(x_j^{(s)}\) are unlikely to vary in space and most plants will have similar values for neighbor identity \(\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}\). Furthermore, if the neighbor effects range was broad enough to encompass an entire field (i.e., s→∞), the neighbor covariate and self-genotype xiwould become colinear according to the equation: \(\left(\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}\right)/L = x_i\left( {\mathop {\sum }\nolimits_{j = 1}^L x_j^{\left( s \right)}} \right)/L = x_i\bar x_j\), where \(\bar x_j\) indicates a population-mean of neighbor genotypes and corresponds to a population-mean of self-genotype values \(\bar x_i\), if s→∞. The standard GWAS is a subset of the neighbor GWAS and these two models become equivalent at s = 0 and \(\sigma _2^2\) = 0. When testing the self-genotype effect β1, we recommend that the neighbor effects and its variance component \(\sigma _2^2\) should be excluded; otherwise, the standard GWAS fails to correct a sample structure because of the additional variance component at \(\sigma _2^2\, \ne \,0\). To obtain a conservative conclusion, the significance of β2 and \(\sigma _2^2\) should be compared using the standard GWAS model based on self-effects alone.
Given the potential collinearity between the self and neighbor effects, we defined different metrics for the proportion of phenotypic variation explained (PVE) based on self or neighbor effects. Using a single-random effect model, we calculated PVE for either the self or neighbor effects as follows:
‘single’ PVEself = \(\sigma _1^2/(\sigma _1^2 + \sigma _e^2)\)when s and \(\sigma _2^2\) were set at 0, or
‘single’ PVEnei = \(\sigma _2^2/(\sigma _2^2 + \sigma _e^2)\)when \(\sigma _1^2\) was set at 0.
Using a two-random effect model, we could focus on one variable while considering relationships between two variables (sensu partial out) for either of the two variance components. We defined such a partial PVE as:
‘partial’ PVEself = \(\sigma _1^2/(\sigma _1^2 + \sigma _2^2 + \sigma _e^2)\) and
‘partial’ PVEnei = \(\sigma _2^2/(\sigma _1^2 + \sigma _2^2 + \sigma _e^2)\).
As the partial PVEself was equivalent to the single PVEself when s was set at 0, the net contribution of neighbor effects at s ≠ 0 was given as
‘net’ PVEnei = (partial PVEself + partial PVEnei) − single PVEself,
which indicated the proportion of phenotypic variation that could be explained by neighbor effects, but not by the self-genotype effects.
Simulation
To examine the model performance, we applied the neighbor GWAS to simulated phenotypes. Phenotypes were simulated using a subset of the actual A. thaliana genotypes. To evaluate the performance of the simple linear model, we assumed a complex ecological form of neighbor effects with multiple variance components controlled. The model performance was evaluated in terms of the causal variant detection and accuracy of estimates. All analyses were performed using R version 3.6.0 (R Core Team 2019).
Genotype data
To consider a realistic genetic structure in the simulation, we used part of the A. thaliana RegMap panel (Horton et al. 2012). The genotype data for 1307 accessions were downloaded from the Joy Bergelson laboratory website (http://bergelson.uchicago.edu/?page_id=790 accessed on February 9, 2017). We extracted data for chromosomes 1 and 2 with MAF at >0.1, yielding a matrix of 1307 plants with 65,226 single nucleotide polymorphisms (SNPs). Pairwise linkage disequilibrium (LD) among the loci was r2 = 0.003 [0.00–0.06: median with upper and lower 95 percentiles]. Before generating a phenotype, genotype values at each locus were standardized to a mean of zero and a variance of 1. Subsequently, we randomly selected 1,296 accessions (= 36 × 36 accessions) without any replacements for each iteration and placed them in a 36 × 72 checkered space, following the Arabidopsis experimental settings (see Fig. S1).
Phenotype simulation
To address ecological issues specific to plant neighborhood effects, we considered two extensions, namely asymmetric neighbor effects and spatial scales. Previous studies have shown that plant–plant interactions between accessions are sometimes asymmetric under herbivory (e.g., Bergvall et al. 2006; Verschut et al. 2016; Sato and Kudoh 2017) and height competition (Weiner 1990); where one focal genotype is influenced by neighboring genotypes, while another receives no neighbor effects. Such asymmetric neighbor effects can be tested by statistical interaction terms in a linear model (Bergvall et al. 2006; Sato and Kudoh 2017). Several studies have also shown that the strength of neighbor effects depends on spatial scales (Hambäck et al. 2014), and that the scale of neighbors to be analyzed relies on the dispersal ability of the causative organisms (see Hambäck et al. 2009; Sato and Kudoh 2015; Verschut et al. 2016; Ida et al. 2018 for insect and mammal herbivores; Rieux et al. 2014 for pathogen dispersal) or the size of the competing plants (Weiner 1990). We assumed the distance decay at the sth sites from a focal individual i with the decay coefficient α as \(w\left( {s,\alpha } \right) = {\mathrm{e}}^{ - \alpha (s - 1)}\), since such an exponential distance decay has been widely adopted in empirical studies (Devaux et al. 2007; Carrasco et al. 2010; Rieux et al. 2014; Ida et al. 2018). Therefore, we assumed a more complex model for simulated phenotypes than the model for neighbor GWAS as follows:
where β12 was the coefficient for asymmetry in neighbor effects. By incorporating an asymmetry coefficient, the model (Eq. (4)) can deal with cases where neighbor effects are one-sided or occur irrespective of a focal genotype (Fig. 2). Total variance components resulting from three background effects (i.e., the self, neighbor, and self-by-neighbor effects) were defined as \(u_i \in {\boldsymbol{u}}\) and \({\boldsymbol{u}}\sim {\mathrm{Norm}}(\textbf{0},\sigma _1^2{\boldsymbol{K}}_1 + \sigma _2^2{\boldsymbol{K}}_2 + \sigma _{12}^2{\boldsymbol{K}}_{12})\). The three variance component parameters \(\sigma _1^2\), \(\sigma _2^2\), and \(\sigma _{12}^2\), determined the relative importance of the self-genotype, neighbor, and asymmetric neighbor effects in ui. Given the elements of n plants × q marker explanatory matrix with \({\boldsymbol{X}}_{12} = (\frac{{x_i}}{L}\mathop {\sum }\nolimits_{ < i,j > }^L w(s,\alpha )x_ix_j^{(s)})\), the similarity in asymmetric neighbor effects was calculated as \({\boldsymbol{K}}_{12} = \frac{1}{q-1}{\boldsymbol{X}}_{12}^{\mathrm{T}}{\boldsymbol{X}}_{12}\). To control phenotypic variations, we further partitioned the proportion of phenotypic variation into those explained by the major-effect genes and variance components PVEβ + PVEu, major-effect genes alone PVEβ, and residual error PVEe, where PVEβ + PVEu + PVEe = 1. The optimize function in R was used to adjust the simulated phenotypes to the given amounts of PVE.
Parameter setting
Ten phenotypes were simulated with varying combination of the following parameters, including the distance decay coefficient α, the proportion of phenotypic variation explained by the major-effect genes PVEβ, the proportion of phenotypic variation explained by major-effect genes and variance components PVEβ + PVEu, and the relative contributions of self, symmetric neighbor, and asymmetric neighbor effects, i.e., PVEself:PVEnei:PVEs×n. We ran the simulation with different combinations, including α = 0.01, 1.0, or 3.0; PVEself:PVEnei:PVEs×n = 8:1:1, 5:4:1, or 1:8:1; and PVEβ and PVEβ + PVEu = 0.1 and 0.4, 0.3 and 0.4, 0.3 and 0.8, or 0.6 and 0.8. The maximum reference scale was fixed at s = 3. The line of simulations was repeated for 10, 50, or 300 causal SNPs to examine cases of oligogenic and polygenic control of a trait. The non-zero coefficients (i.e., signals) for the causal SNPs were randomly sampled from −1 or 1 digit and then assigned, as some causal SNPs were responsible for both the self and neighbor effects. Of the total number of causal SNPs, 15% had self, neighbor, and asymmetric neighbor effects (i.e.,β1 ≠ 0 and β2 ≠ 0 and β12 ≠ 0); another 15% had both the self and neighbor effects, but no asymmetry in the neighbor effects (β1 ≠ 0 and β2 ≠ 0 and β12 ≠ 0); another 35% had self-genotypic effects only (β1 ≠ 0); and the remaining 35% had neighbor effects alone (β2 ≠ 0). Given its biological significance, we assumed that some loci having neighbor signals possessed asymmetric interactions between the neighbors (β2 ≠ 0 and β12 ≠ 0), while the others had symmetric interactions (β2 ≠ 0 and β12 ≠ 0). Therefore, the number of causal SNPs in β12 was smaller than that in the main neighbor effects β2. According to this assumption, the variance component \(\sigma _{12}^2\) was also assumed to be smaller than \(\sigma _2^2\). To examine extreme conditions and strong asymmetry in neighbor effects, we additionally analyzed the cases with PVEself:PVEnei:PVEs×n = 1:0:0, 0:1:0, or 1:1:8.
Summary statistics
The simulated phenotypes were fitted by Eq. (2) to test the significance of coefficients β1 and β2, and to estimate single or partial PVEself and PVEnei. To deal with potential collinearity between xi and neighbor genotypic identity \(\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}\), we performed likelihood ratio tests between the self-genotype effect model and the model with both self and neighbor effects, which resulted in conservative tests of significance for β2 and \(\sigma _2^2\). The simulated phenotype values were standardized to have a mean of zero and a variance of 1, where true β was expected to match the estimated coefficients \(\hat \beta\) when multiplied by the standard deviation of non-standardized phenotype values. The likelihood ratio was calculated as the difference in deviance, i.e., −2 × log-likelihood, which is asymptotically χ2 distributed with one degree of freedom. The variance components, \(\sigma _1^2\) and \(\sigma _2^2\), were estimated using a linear mixed model without any fixed effects. To solve the mixed model with the two random effects, we used the average information restricted maximum likelihood (AI-REML) algorithm implemented in the lmm.aireml function in the gaston package of R (Perdry and Dandine-Roulland 2018). Subsequently, we replaced the two variance parameters \(\sigma _1^2\) and \(\sigma _2^2\) in Eq. (2) with their estimates \(\hat \sigma _1^2\) and \(\hat \sigma _2^2\) from the AI-REML, and performed association tests by solving a linear mixed model with a fast approximation, using eigenvalue decomposition (implemented in the lmm.diago function: Perdry and Dandine-Roulland 2018). The model likelihood was computed using the lmm.diago.profile.likelihood function. We evaluated the self and neighbor effects for association mapping based on the forward selection of the two fixed effects, β1 and β2, as described below:
-
1.
Computed the null likelihood with \(\sigma _1^2\, \ne \,0\) and \(\sigma _2^2 = 0\) in Eq. (2).
-
2.
Tested the self-effect, β1, by comparing with the null likelihood.
-
3.
Computed the self-likelihood with \(\hat \sigma _1^2\), \(\hat \sigma _2^2\), and β1 using Eq. (2).
-
4.
Tested the neighbor effects, β2, by comparing with the self-likelihood.
We also calculated PVE using the mixed model (Eq. (3)) without β1 and β2 as follows:
-
1.
Calculated single PVEself or single PVEnei by setting either \(\sigma _1^2\) or \(\sigma _2^2\) at 0.
-
2.
Tested the single PVEself or single PVEnei using the likelihood ratio between the null and one-random effect model.
-
3.
Calculated the partial PVEself and partial PVEnei by estimating \(\sigma _1^2\) and \(\sigma _2^2\) simultaneously.
-
4.
Tested the partial PVEself and partial PVEnei using the likelihood ratio between the two- and one-random effect model.
We inspected the model performance based on causal variant detection, PVE estimates, and effect size estimates. The true or false positive rates between the causal and non-causal SNPs were evaluated using ROC curves and area under the ROC curves (AUC) (Gage et al. 2018). An AUC of 0.5 would indicate that the GWAS has no power to detect true signals, while an AUC of 1.0 would indicate that all the top signals predicted by the GWAS agree with the true signals. In addition, the sensitivity to distinguish self or neighbor signals (i.e., either β1 ≠ 0 or β2 ≠ 0) was evaluated using the true positive rate of the ROC curves (i.e., y-axis of the ROC curve) at a stringent specificity level, where the false positive rate (x-axis of the ROC curve) = 0.05. The roc function in the pROC package (Robin et al. 2011) was used to calculate the ROC and AUC from −log10(p value). Factors affecting the AUC or sensitivity were tested by analysis-of-variance (ANOVA) for the self or neighbor effects (AUCself or AUCnei; self or neighbor sensitivity). The AUC and PVE were calculated from s = 1 (the first nearest neighbors) to s = 3 (up to the third nearest neighbors) cases. The AUC was also calculated using standard linear models without any random effects, to examine whether the linear mixed models were superior to the linear models. We also tested the neighbor GWAS model incorporating the neighbor phenotype \(y_j^{(s)}\) instead of \(x_j^{(s)}\). The accuracy of the total PVE estimates was defined as PVE accuracy = (estimated total PVE − true total PVE) / true total PVE. The accuracy of the effect size estimates was evaluated using mean absolute errors (MAE) between the true and estimated β1 or β2 for the self and neighbor effects (MAEself and MAEnei). Factors affecting the accuracy of PVE and effect size estimates were also tested using ANOVA. Misclassifications between self and neighbor signals were further evaluated by comparing p value scores between zero and non-zero coefficients. If −log10(p value) scores of zero β are the same or larger than non-zero β, it infers a risk of misspecification of the true signals.
Arabidopsis herbivory data
We applied the neighbor GWAS to field data of Arabidopsis herbivory. The procedure for this field experiment followed that of our previous experiment (Sato et al. 2019). We selected 199 worldwide accessions (Table S1) from 2029 accessions sequenced by the RegMap (Horton et al. 2012) and 1001 Genomes project (Alonso-Blanco et al. 2016). Of the 199 accessions, most were overlapped with a previous GWAS of biotic interactions (Horton et al. 2014) and half were included by a GWAS of glucosinolates (Chan et al. 2010). Eight replicates of each of the 199 accessions were first prepared in a laboratory and then transferred to the outdoor garden at the Center for Ecological Research, Kyoto University, Japan (Otsu, Japan: 35°06′N, 134°56′E, alt. ca. 200 m: Fig. S1a). Seeds were sown on Jiffy-seven pots (33-mm diameter) and stratified at a temperature of 4 °C for a week. Seedlings were cultivated for 1.5 months under a short-day condition (8 h light: 16 h dark, 20 °C). Plants were then separately potted in plastic pots (6 cm in diameter) filled with mixed soil of agricultural compost (Metro-mix 350, SunGro Co., USA) and perlite at a 3:1 ratio. Potted plants were set in plastic trays (10 × 40 cells) in a checkered pattern (Fig. S1b). In the field setting, a set of 199 accessions and an additional Col-0 accession were randomly assigned to each block without replacement (Fig. S1c). Eight replicates of these blocks were set >2 m apart from each other (Fig. S1c). Potted plants were exposed to the field environment for 3 weeks in June 2017. At the end of the experiment, the percentage of foliage eaten was scored as: 0 for no visible damage, 1 for ≤10%, 2 for >10% and ≤25%, 3 for >25% and ≤50%, 4 for >50% and ≤75%, and 5 for >75%. All plants were scored by a single person to avoid observer bias. The most predominant herbivore in this field trial was the diamond back moth (Plutella xylostella), followed by the small white butterfly (Pieris rapae). We also recorded the initial plant size and the presence of inflorescence to incorporate them as covariates. Initial plant size was evaluated by the length of the largest rosette leaf (mm) at the beginning of the field experiment and the presence of inflorescence was recorded 2 weeks after transplanting.
We estimated the variance components and performed the association tests for the leaf damage score with the neighbor covariate at s = 1 and 2. These two scales corresponded to L = 4 (the nearest four neighbors) and L = 12 (up to the second nearest neighbors), respectively, in the Arabidopsis dataset. The variation partitioning and association tests were performed using the gaston package, as mentioned above. To determine the significance of the variance component parameters, we compared the likelihood between mixed models with one or two random effects. For the genotype data, we used an imputed SNP matrix of the 2029 accessions studied by the RegMap (Horton et al. 2012) and 1001 Genomes project (Alonso-Blanco et al. 2016). Missing genotypes were imputed using BEAGLE (Browning and Browning 2009), as described by Togninalli et al. (2018) and updated on the AraGWAS Catalog (https://aragwas.1001genomes.org). Of the 10,709,466 SNPs from the full imputed matrix, we used 1,242,128 SNPs with MAF at >0.05 and LD of adjacent SNPs at r2 < 0.8. We considered the initial plant size, presence of inflorescence, experimental blocks, and the edge or center within a block as fixed covariates; these factors explained 12.5% of the leaf damage variation (1.2% by initial plant size, Wald test, Z = 3.53, p value < 0.001; 2.4% by the presence of inflorescence, Z = −5.69, p value < 10−8; 8.3% by the experimental blocks, likelihood ratio test, χ2 = 152.8, df = 7, p value < 10−28; 0.5% by the edge or center, Z = 3.11, p value = 0.002). After the association mapping, we searched candidate genes within ~10 kb around the target SNPs, based on the Araport11 gene model with the latest annotation of The Arabidopsis Information Resource (TAIR) (accessed on 7 September 2019). Gene-set enrichment analysis was performed using the Gowinda algorithm that enables unbiased analysis of the GWAS results (Kofler and Schlotterer 2012). We tested the SNPs with the top 0.1% −log10(p value) scores, with the option “--gene-definition undownstream10000,” “--min-genes 20,” and “--mode gene.” The GO.db package (Carlson 2018) and the latest TAIR AGI code annotation were used to build input files for gene ontologies (GOs). The R source codes, accession list, and phenotype data are available at the GitHub repository (https://github.com/naganolab/NeighborGWAS).
R package, “rNeighborGWAS”
To increase the availability of the new method, we have developed the neighbor GWAS into an R package, which is referred to as “rNeighborGWAS”. In addition to the genotype and phenotype data, the package requires a spatial map indicating the positions of individuals across a space. In this package, we generalized the discrete space example into a continuous two-dimensional space, allowing it to handle any spatial distribution along the x- and y-axes. Based on the three input files, the rNeighborGWAS package estimates the effective range of neighbor effects by calculating partial PVEnei and performs association mapping of the neighbor effects using the linear mixed models described earlier. Details and usage are described in the help files and vignette of the rNeighborGWAS package available via CRAN at https://cran.r-project.org/package=rNeighborGWAS.
To assess its implementation, we performed standard GWAS using GEMMA version 0.98 (Zhou and Stephens 2012) and the rNeighborGWAS. The test phenotype data were the leaf damage scores for the 199 accessions described above or flowering time under long-day conditions for 1057 accessions (“FT16” phenotype collected by Atwell et al. 2010 and Alonso-Blanco et al. 2016). The flowering time phenotype was downloaded from the AraPheno database (https://arapheno.1001genomes.org/: Seren et al. 2017). The full imputed genotype data were compiled for 1057 accessions, whose genotypes and flowering time phenotype were both available. The cut-off value of the MAF was set at 5%, yielding 1,814,755 SNPs for the 1057 accessions. The same kinship matrix defined by K1 above was prepared as an input file. We calculated p values using likelihood ratio tests in the GEMMA program, because the rNeighborGWAS adopted likelihood ratio tests.
Results
Simulation
We conducted simulations to test the capability of the neighbor GWAS to estimate PVE and marker-effects. As expected by the model and data structure, correlation was detected between the self-genotypic variable xi and the neighbor variable Σxixj/L in the simulated genotypes (Fig. S2). The level of Pearson’s correlation coefficient r varied from a slight correlation to complete correlation as the MAF became smaller, from 0.5 to 0.1 (Fig. S2). The correlation was also more severe as the scale of s was increased. For example, even at s = 2, we could cut off the MAF at >0.4 to keep |r | below 0.6 for all SNPs. The element-wise correlation between K1 and K2 indicated that at least 60% of the variation was overlapping between the two genome-wide variance-covariance matrices in the partial genotype data used for this simulation (R2 = 0.62 at s = 1; R2 = 0.79 at s = 2; R2 = 0.84 at s = 3).
A set of phenotypes were then simulated from the real genotype data following a complex model (Eq. (4), and then fitted using a simplified model (Eq. (2). The accuracy of the total PVE estimation was the most significantly affected by the spatial scales of s (Table 1). The total PVE was explained relatively well by the single PVEself that represented the additive polygenic effects of the self-genotypes (Fig. 3 left). Inclusion of partial PVEnei accounted for the rest of the true total PVE, which was considered the net contribution of neighbor effects to phenotypic variation (Fig. 3 right). The net PVEnei was largest when the effective range of the neighbor effects was narrow (i.e., strong distance decay at α = 3) and the contribution of the partial PVEnei was much larger than that of PVEself (the case of α = 3 in Fig. 3 right). However, the sum of the single PVEself (=partial PVEself at s = 0) and the partial PVEnei did not match the true total PVE (Fig. 3 left), as expected by the correlation between the self and neighbor effects (Fig. S2). Due to such correlation, the single PVEself or single PVEnei mostly overrepresented the actual amounts of PVEself or PVEnei, respectively (Fig. S3). The overrepresentation of the single PVEself and single PVEnei was observed even when either the self or neighbor effects were absent in the simulation (Fig. S4). These results indicate that (i) single PVEnei should not be used, (ii) partial PVEnei suffered from its collinearity with the single PVEself, and (iii) net PVEnei provides a conservative estimate for the genome-wide contribution of neighbor effects to phenotypic variation.
Although the partial PVEnei could not be used to quantify the net contribution of the neighbor effects, this metric inferred spatial scales at which neighbor effects remained effective. If the distance decay was weak (small value of decay coefficient α) and thus the effective range of the neighbor effects was broad, partial PVEnei increased linearly as the reference spatial scale was broadened (the case of α = 0.01 in Fig. 3 left). On the other hand, if the distance decay was strong (large value of decay coefficient α) and thus the effective scale of the neighbor effects was narrow, partial PVEnei decreased as the reference spatial scale was broadened or saturated at the scale of the first nearest neighbors (the case of α = 3 in Fig. 3 left). Considering the spatial dependency of the partial PVEnei, we could estimate the effective spatial scales by ΔPVEnei = partial PVEnei,s+1 − partial PVEnei,s and by the scale that resulted in the maximum ΔPVEnei as s = arg max ΔPVEnei (Fig. S5).
The spatial scale that yielded the maximum AUC for neighbor signals (AUCnei) coincided with the patterns of the partial PVEnei across the range of s. If the distance decay was weak (α = 0.01) and thus the effective range of the neighbor effects was broad, the AUCnei increased linearly as the reference spatial scale was broadened (the case of α = 0.01 in Fig. 4). If the distance decay was strong (large value of decay coefficient α) and thus the effective scale of the neighbor effects was narrow, the AUCnei did not increase even when the reference spatial scale was broadened (the case of α = 3 in Fig. 4). Thus, the first nearest scale was enough to detect neighbor signals, unless the distance decay was very weak.
In terms of the AUC, we also found that the number of causal SNPs, the amount of PVE by neighbor effects (controlled by the total PVE = PVEβ + PVEu; and ratio of PVEself:PVEnei), and the distance decay coefficient α were significant factors affecting the power to detect neighbor signals (AUCnei: Table 1). The power to detect self-genotype signals (AUCself) depended on the number of causal SNPs and PVEβ but was not significantly influenced by the distance decay coefficient of the neighbor effects (α) (Table 1). The power to detect self-genotype signals changed from strong (AUCself > 0.9) to weak (AUCself < 0.6), depending on the number of causal SNPs, the PVE by the major-effect genes, and as the relative contribution from the PVEself increased (Fig. S6 upper). Compared to the self-genotype effects, it was relatively difficult to detect neighbor signals (Fig. 4 right; Fig. S6 lower), ranging from strong (AUCnei > 0.9) to little (AUCnei near to 0.5) power. When the number of causal SNPs = 10, the power to detect neighbor signals decreased from high (AUCnei > 0.9) to moderate (AUCnei > 0.7) with the decreasing PVEβ and the distance decay coefficient (Fig. S6 lower). There was almost no power to detect neighbor signals (AUCnei near to 0.5) when the number of causal SNPs = 50 and PVEnei had low relative contributions (Fig. S6 lower). The result of the simulations indicated that strong neighbor effects were detectable when a target trait was governed by several major genes and the range of neighbor effects was spatially limited. Additionally, linear mixed models outperformed standard linear models as there were 8.8% and 1.4% increases in their power to detect self and neighbor signals, respectively (AUCself, paired t-test, mean of the difference = 0.088, p value < 10−16; AUCnei, mean of the difference = 0.014, p value < 10−16). When the neighbor phenotype \(y_j^{(s)}\) was incorporated instead of the genotype \(x_j^{(s)}\), the power to detect neighbor effects was very weak, such that the AUCnei decreased to almost 0.5.
To examine misclassifications between the self and neighbor signals, we compared the sensitivity, effect size estimates, and p value scores among causal SNPs having non-zero coefficients of the true β1 and β2. The sensitivity to distinguish the self or neighbor signals was largely affected by the number of causal SNPs, the amount of PVE by the major-effect genes PVEβ, and the relative contribution of the self and neighbor effects (controlled by PVEself:PVEnei) (Table 1; Fig. S7). The mean absolute errors of the self-effect estimates for \(\hat \beta _1\) (MAEself) largely depended on the number of causal SNPs and the relative contribution of the variance components (Table 1; Fig. S8 upper), while those of the neighbor effect estimates for \(\hat \beta _2\) (MAEnei) were dependent on the relative contribution of the variance components and the spatial scales to be referred (Table 1; Fig. S8 lower). Given that the self and neighbor signals were sufficiently detected when the number of causal SNPs was 50 (Fig. S6 middle column), p values under this condition were compared between the causal and non-causal SNPs. We observed that strong self-signals (β1 ≠ 0) were unlikely to be detected as neighbor effects (Fig. 5 lower). Causal SNPs responsible for both the self and neighbor effects (β1 ≠ 0 and β2 ≠ 0) were better detected than the non-causal SNPs (β1 = 0 and β2 = 0) (Fig. 5 right column). The sensitivity to distinguish neighbor signals from self-signals was large when the true contribution of the neighbor effects was as large as PVEself:PVEnei = 1:8, but decreased when the contribution of the self-effects was as large as PVEself:PVEnei = 8:1 (Fig. S7 lower). In contrast, if the contribution of the neighbor effects was relatively large (PVEself:PVEnei = 1:8), the SNPs responsible for the neighbor effects alone (β1 = 0 and β2 = 0) could also be detected as self-effects (Fig. 5). As expected by the level of correlation (Fig. S2), the false positive detection of the self-signals was more likely when the distance decay coefficient was small and thus the effective range of the neighbor effects was broad (the case of α = 0.01 and β1 = 0 and β2 ≠ 0 in Fig. 5). This coincided with the strength of the correlation (Fig. S2), as the false positive detection of self-signals and false negative detection of neighbor signals are more likely if the MAF was small (Fig. S9). Consistent with the false positive detection, the sensitivity to distinguish self-signals from neighbor signals remained large, even when the contribution of the neighbor effects was far larger (the case of PVEself:PVEnei = 1:8 in Fig. S7 upper). Strong self-effects (p value < 10−5 for \(\hat \beta _1\)) and slight neighbor effects (p value < 0.05 for \(\hat \beta _2\) at s = 1 and α = 3) remained when asymmetric neighbor effects were strong (β1 ≠ 0 and β2 ≠ 0 and β12 ≠ 0 and PVEself:PVEnei:PVEsxn = 1:1:8; Fig. S10). These results indicate that (i) the collinearity may lead to the false positive detection of self-effects, yet is unlikely to result in the false positive detection of neighbor effects, and that (ii) smaller MAFs are more likely to cause the false positive detection of self-effects and decrease the power to detect true neighbor effects.
Arabidopsis herbivory data
To estimate PVEself and PVEnei, we applied a linear mixed model (Eq. (3)) to the leaf damage score data for the field-grown A. thaliana. The leaf damage variation was significantly explained by the single PVEself that represented additive genetic variation (single PVEself = 0.173, \(\chi _1^2\) = 10.1, p value = 0.005: Fig. 6a; Fig. S11 right). Variation partitioning showed a significant contribution of neighbor effects to the phenotypic variation in the leaf damage at the nearest scale (partial PVEnei = 0.214, \(\chi _1^2\) = 7.23, p value = 0.004 at s = 1: Fig. S11 left). The proportion of phenotypic variation explained by the neighbor effects did not increase when the neighbor scale was referred up to the nearest and second nearest individuals (partial PVEnei = 0.14, \(\chi _1^2\) = 1.41, p value = 0.166 at s = 2: Fig. S11 left); therefore, the effective scale of the neighbor effects was estimated at s = 1 and variation partitioning was stopped at s = 2. These results indicated that the effective scale of the neighbor effects on the leaf damage was narrow (s = 1) and the net PVEnei at s = 1 explained an additional 8% of the PVE compared to the additive genetic variation attributable to the single PVEself (Fig. 6a). The genotype data had moderate to strong element-wise correlation between K1 and K2 in these analyses (r = 0.60 and 0.78 at s = 1 and 2 among 199 accessions with eight replicates). We additionally incorporated the neighbor phenotype \(y_j^{(s)}\) instead of the neighbor genotype \(x_j^{(s)}\) in Eq. (2), but the partial PVEnei did not increase (partial PVEnei = 0.066 and 0.068 at s = 1 and 2, respectively).
The standard GWAS of the self-genotype effects on the leaf damage detected the SNPs with the second and third largest −log10(p values) scores, on the first chromosome (chr1), though they were not above the threshold for Bonferroni correction (Fig. 6b; Table S2). The second SNP at chr1-21694386 was located within ~10 kb of the three loci encoding a disease resistance protein (CC-NBS-LRR class) family. The third SNP at chr1-23149476 was located within ~10 kb of the AT1G62540 locus that encodes a flavin-monooxygenase glucosinolate S-oxygenase 2 (FMO GS-OX2). No GOs were significantly enriched for the self-effects on herbivory (false discovery rate > 0.08). A QQ-plot did not exhibit an inflation of p values for the self-genotype effects (Fig. S12a).
Regarding the neighbor effects on leaf damage, we found non-significant but weak peaks on the second and third chromosomes (Fig. 6c; Table S2). The second chromosomal region had higher association scores than those predicted by the QQ-plot (Fig. S12b). A locus encoding FAD-binding Berberine family protein (AT2G34810 named BBE16), which is known to be up-regulated by methyl jasmonate (Devoto et al. 2005), was located within the ~10 kb window near SNPs with the top eleven −log10(p values) scores on the second chromosome. Three transposable elements and a pseudogene of lysyl-tRNA synthetase 1 were located near the most significant SNP on the third chromosome. No GOs were significantly enriched for the self-effects on herbivory (false discovery rate > 0.9). We additionally tested the asymmetric neighbor effects of β12 in the real dataset on field herbivory, but the top 0.1% of the SNPs for the neighbor effects for β2, did not overlap with those of the asymmetric neighbor effects β12 (Table S2).
Based on the estimated coefficients \(\hat \beta _1\) and \(\hat \beta _2\), we ran a post hoc simulation to infer a spatial arrangement that minimizes a population sum of the leaf damage \({\sum }y_i = \beta _1 {\sum }x_i + \beta _2\mathop {\sum }\nolimits_{ < i,j > } x_ix_j\). The constant intercept β0, the variance component ui, and residual ei were not considered because they were not involved in the deterministic dynamics of the model. Figure 7 shows three representatives and a neutral expectation. For example, a mixture of a dimorphism was expected to decrease the total leaf damage for an SNP at chr2-14679190 near the BBE16 locus (\(\hat \beta _2 > 0\): Fig. 7a). On the other hand, a clustered distribution of a dimorphism was expected to decrease the total damage for an SNP at chr2-9422409 near the AT2G22170 locus encoding a lipase/lipooxygenase PLAT/LH2 family protein (\(\hat \beta _1 \approx 0,\hat \beta _2\, < \,0\): Fig. 7b). Furthermore, a near monomorphism was expected to decrease the leaf damage for an SNP at chr5-19121831 near the AT5G47075 and AT5G47077 loci encoding low-molecular cysteine-rich proteins, LCR20 and LCR6 (\({\hat \beta _1}\, > \,0,{\hat \beta _2}\, < \,0\): Fig. 7c). If the self and neighbor coefficients had no effects, we would observe a random distribution and no mitigation of damage i.e., \(\mathop {\sum }y_i \approx 0\) (Fig. 7d). These post hoc simulations suggested a potential application for neighbor GWAS for the optimization of the spatial arrangements in field cultivation.
Comparing self p values between the neighbor GWAS and GEMMA
To ascertain whether the self-genotype effects in the neighbor GWAS agree with those of a standard GWAS, we compared the p value scores between the rNeighborGWAS package and the commonly used GEMMA program (Fig. S13). For the leaf damage score, the neighbor GWAS yielded almost the same −log10(p values) scores for the self-effects as the GEMMA program (r = 0.9999 among all the 1,242,128 SNPs: Fig. S13a). The standard GWAS, using the flowering time phenotype, also yielded the consistent −log10(p values) scores between the neighbor GWAS and GEMMA (r = 0.9999 among all the 1,814,755 SNPs: Fig. S13b). Both the flowering time GWAS using the neighbor GWAS and GEMMA found two significant SNPs above the genome-wide Bonferroni threshold on chromosome 5 (chr5-18590741 and chr5-18590743, MAF = 0.49 and 0.49, −log10(p value) = 7.797 and 7.797 for the neighbor GWAS; chr5-18590741 and chr5-18590743, MAF = 0.49 and 0.49, −log10(p value) = 7.798 and 7.798 for GEMMA), which were located within the Delay of Germination 1 (DOG1) locus, that was reported previously by Alonso-Blanco et al. (2016). Another significant SNP was observed at the top of chromosome 4 (chr4-317979, MAF = 0.12, −log10(p value) = 7.787 and 7.933 for the neighbor GWAS and GEMMA), which was previously identified as a quantitative trait locus underlying flowering time in long-day conditions (Aranzana et al. 2005).
Discussion
Spatial and genetic factors underlying simulated phenotypes
Benchmark tests using simulated phenotypes revealed that appropriate spatial scales could be estimated using the partial PVEnei of the observed phenotypes. When the scale of the neighbor effects was narrow or moderate (α = 1.0 or 3.0), the scale of the first nearest neighbors would be optimum for increasing the AUC to detect neighbor signals. In terms of the neighbor effects in the context of plant defense, mobile animals (e.g., mammalian browsers and flying insects) often select a cluster of plant individuals (e.g., Bergvall et al. 2006; Hambäck et al. 2009; Sato and Kudoh 2015; Verschut et al. 2016). In this case, the neighbor effects could not be observed among individual plants within a cluster (Sato and Kudoh 2015). The exponential distance decay at α = 0.01 represented situations in which the effective range of the neighbor effects was too broad to be detected; only in such situations should more than the nearest neighbors be referred to, to gain the power to detect neighbor effects. We also considered the asymmetric neighbor effects where the neighbor genotype similarity had significant effects on one genotype, but not on another genotype. In this situation, strong self-effects could be observed when the symmetric neighbor effects were weakened. This additional result suggests that asymmetric neighbor effects should be tested if strong self-effects and weak symmetric neighbor effects are both detected at a single locus.
Neighbor effects are more likely to contribute to phenotypic variation when its effective range becomes narrow due to a strong distance decay (α = 3), as suggested by the net PVEnei. However, the total phenotypic variation was explained relatively well by the single PVEself that represented additive polygenic effects. Previous studies showed that genetic interactions could lead to an overrepresentation of narrow-sense heritability in GWAS (e.g., Zuk et al. 2012; Young and Durbin 2014). This occurs because the SNP heritability is represented by the genetic similarity between individuals, and thereby covariance of the kinship matrix helps to fit the phenotypic variance attributable to gene-by-gene interactions (Young and Durbin 2014; Schrauf et al. 2020). This problem is also observed in the neighbor GWAS that models pairwise interactions at a focal locus among neighboring individuals. Given the difficulty in distinguishing the kinship and genetic interactions, we conclude that the non-independence of the self and neighbor effects is an intrinsic feature of the neighbor GWAS, and that the difference of the PVE between a standard and neighbor GWAS i.e., net PVEnei should be used as a conservative estimate of PVEnei.
Neighbor GWAS of the field herbivory on Arabidopsis
Our genetic analysis of the neighbor effects is of ecological interest, as the question of how host plant genotypes shape variations in plant–herbivore interactions, is a long-standing question in population ecology (e.g., Karban 1992; Underwood and Rausher 2000; Utsumi et al. 2011). Despite the low PVE and several confounding factors under field conditions, the present study illustrated the significant contribution of neighbor genotypic identity to the spatial variation of the herbivory on A. thaliana. Although the additional fraction explained by the neighbor effects was 8%, this amount was plausible in the GWAS of complex traits. For example, the variance components of epistasis explained 10–14% PVE on average for 46 traits in yeast (Young and Durbin 2014). Even when heritability is high, the significant variants have often explained a small fraction of PVE, which is known as the missing heritability problem in plants and animals (Brachi et al. 2011; López-Cortegano and Caballero 2019).
Regarding the self-genotype effects, we detected GS-OX2 near the third top-scoring SNP on the first chromosome. GS-OX2 catalyzes the conversion of methylthioalkyl to methylsulfinylalkyl glucosinolates (Li et al. 2008) and is up-regulated in response to feeding by the larvae of the large white butterfly (Pieris brassicae) (Geiselhardt et al. 2013). On the other hand, the second top-scoring SNP of the neighbor effects was located near the BBE16 locus, responsive to methyl jasmonate, a volatile organic chemical that is emitted from damaged tissue and elicits the defense responses of other plants (Reymond and Farmer 1998; van Poecke 2007). However, because none of the associations were significant above a genome-wide Bonferroni threshold, they should be interpreted cautiously. Nearby genes should only be considered candidates, and further work is necessary to confirm that they exert any neighbor effects on herbivory.
Potential limitation
Despite many improvements, it is more difficult for GWAS to capture rare causal variants than common ones (Lee et al. 2014; Auer, Lettre 2015; Bomba et al. 2017). This problem is more severe in neighbor GWAS, because smaller MAFs result in stronger collinearity between the self-genotype effects xi and the neighbor genotypic identity \(\mathop {\sum }\nolimits_{ < i,j > }^L x_ix_j^{(s)}\). Our simulations showed that the rare variants responsible for the neighbor effects might be misclassified as self-effects, though the opposite was not found, i.e., the misclassification of self-signals into neighbor effects could be suppressed. In GWAS, genotype data usually contain minor alleles and possess kinship structures to some extent, making collinearity unavoidable. To anticipate false positive detection of neighbor effects, the significance of variance components and marker effects involving neighbor effects should always be compared using the standard GWAS model.
The present neighbor GWAS focused on single-locus effects and did not incorporate locus-by-locus interactions. Although it is challenging to integrate all the association tests for epistasis into GWAS (Gondro et al. 2013; Young and Durbin 2014), it is possible that multiple combinations among different variants govern neighbor effects. For example, neighbor effects on insect herbivory may occur due to the joint action of volatile-mediated signaling and the accumulation of secondary metabolites (Dicke and Baldwin 2010; Erb 2018). The linear mixed model could be extended as exemplified by the asymmetric neighbor effects; however, we need to reconcile multiple criteria including the collinearity of explanatory variables, inflation of p values, and computational costs. Further customization is warranted when analyzing more complex forms of neighbor effects.
Conclusion
Based on the newly proposed methodology, we suggest that neighbor effects are an overlooked source of phenotypic variation in field-grown plants. GWAS have often been applied to crop plants (Jannink et al. 2010; Hamblin et al. 2011), where genotypes are known, and individuals are evenly transplanted in space. Considering this outlook for agriculture, we provided an example of neighbor GWAS across a lattice space in this study. However, wild plant populations sometimes exhibit more complex spatial patterns than those expected by the Ising model (e.g., Kizaki and Katori 1999; Schlicht and Iwasa 2004). In the rNeighborGWAS package, we allowed neighbor GWAS for a continuous two-dimensional space. While its application has now been limited to experimental populations, neighbor GWAS has the potential for compatibility with the emerging discipline of landscape genomics (Bragg et al. 2015). In this context, the additional R package could help future studies to test self and neighbor effects using a wide variety of plant species.
Neighbor GWAS may also have the potential to help determine optimal spatial arrangements for plant cultivation, as suggested by the post hoc simulation. Genome-wide polymorphism data are useful not only for identifying causal variants in GWAS, but also for predicting the breeding values of crop plants for genomic selection (e.g., Jannink et al. 2010; Hamblin et al. 2011; Yamamoto et al. 2017). Given that the neighbor GWAS consists of a marker-based regression, this methodology could also be expanded as a genomic selection tool to help predict population-level phenotypes in spatially structured environments.
Data availability
The leaf damage data on A. thaliana are included in the supporting information (Table S1). The simulation code and R script used in this study are available at the GitHub repository (https://github.com/naganolab/NeighborGWAS). R package version of the neighbor GWAS method is available at CRAN (https://cran.r-project.org/package=rNeighborGWAS).
References
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM et al. (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–491. https://doi.org/10.1016/j.cell.2016.05.063
Aranzana MJ, Kim S, Zhao K, Bakker E, Horton M, Jakob K et al. (2005) Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genet 1:e60. https://doi.org/10.1371/journal.pgen.0010060
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y et al. (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631. https://doi.org/10.1038/nature08800
Auer PL, Lettre G (2015) Rare variant association studies: considerations, challenges and opportunities. Genome Med 7:16. https://doi.org/10.1186/s13073-015-0138-2
Azaele S, Muneepeerakul R, Rinaldo A, Rodriguez-Iturbe I (2010) Inferring plant ecosystem organization from species occurrences. J Theor Biol 262:323–329. https://doi.org/10.1016/j.jtbi.2009.09.026
Barbosa P, Hines J, Kaplan I, Martinson H, Szczepaniec A, Szendrei Z (2009) Associational resistance and associational susceptibility: having right or wrong neighbors. Ann Rev Ecol Evol Sys 40:1–20. https://doi.org/10.1146/annurev.ecolsys.110308.120242
Bergvall UA, Rautio P, Kesti K, Tuomi J, Leimar O (2006) Associational effects of plant defences in relation to within- and between-patch food choice by a mammalian herbivore: neighbour contrast susceptibility and defence. Oecologia 147:253–260. https://doi.org/10.1007/s00442-005-0260-8
Bomba L, Walter K, Soranzo N (2017) The impact of rare and low-frequency genetic variants in common disease. Genome Biol 18:77. https://doi.org/10.1186/s13059-017-1212-4
Brachi B, Meyer CG, Villoutreix R, Platt A, Morton TC, Roux F et al. (2015) Coselected genes determine adaptive variation in herbivore resistance throughout the native range of Arabidopsis thaliana. Proc Natl Acad Sci USA 112:4032–4037. https://doi.org/10.1073/pnas.1421416112
Brachi B, Morris GP, Borevitz JO (2011) Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol 12:232. https://doi.org/10.1186/gb-2011-12-10-232
Bragg JG, Supple MA, Andrew RL, Borevitz JO (2015) Genomic variation across landscapes: insights and applications. N. Phytol 207:953–967
Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84:210–223. https://doi.org/10.1016/j.ajhg.2009.01.005
Carlson M (2018) GO.db: A set of annotation maps describing the entire Gene Ontology. R package version 3.7.0. https://doi.org/10.18129/B9.bioc.GO.db
Carrasco LR, Harwood TD, Toepfer S, MacLeod A, Levay N, Kiss J et al. (2010) Dispersal kernels of the invasive alien western corn rootworm and the effectiveness of buffer zones in eradication programmes in Europe. Ann Appl Biol 156:63–77. https://doi.org/10.1111/j.1744-7348.2009.00363.x
Chan EKF, Rowe HC, Kliebenstein DJ (2010) Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics 185:991–1007. https://doi.org/10.1534/genetics.109.108522
Dahlin I, Rubene D, Glinwood R, Ninkovic V (2018) Pest suppression in cultivar mixtures is influenced by neighbor-specific plant-plant communication. Ecol Appl 28:2187–2196. https://doi.org/10.1002/eap.1807
Devaux C, Lavigne C, Austerlitz F, Klein EK (2007) Modelling and estimating pollen movement in oilseed rape (Brassica napus) at the landscape scale using genetic markers. Mol Ecol 16:487–499. https://doi.org/10.1111/j.1365-294X.2006.03155.x
Devoto A, Ellis C, Magusin A, Chang HS, Chilcott C, Zhu T et al. (2005) Expression profiling reveals COI1 to be a key regulator of genes involved in wound- and methyl jasmonate-induced secondary metabolism, defence, and hormone interactions. Plant Mol Biol 58:497–513. https://doi.org/10.1007/s11103-005-7306-5
Dicke M, Baldwin IT (2010) The evolutionary context for herbivore-induced plant volatiles: beyond the ‘cry for help’. Trends Plant Sci 15:167–175. https://doi.org/10.1016/j.tplants.2009.12.002
Erb M (2018) Volatiles as inducers and suppressors of plant defense and immunity - origins, specificity, perception and signaling. Curr Opin Plant Biol 44:117–121. https://doi.org/10.1016/j.pbi.2018.03.008
Frachon L, Mayjonade B, Bartoli C, Hautekèete NC, Roux F (2019) Adaptation to plant communities across the genome of Arabidopsis thaliana. Mol Biol Evol 36:1442–1456. https://doi.org/10.1093/molbev/msz078
Gage JL, De Leon N, Clayton MK (2018) Comparing genome-wide association study results from different measurements of an underlying phenotype. G3: Genes|Genomes|Genet 8:3715–3722. https://doi.org/10.1534/g3.118.200700
Geiselhardt S, Yoneya K, Blenn B, Drechsler N, Gershenzon J, Kunze R et al. (2013) Egg laying of cabbage white butterfly (Pieris brassicae) on Arabidopsis thaliana affects subsequent performance of the larvae. PLoS ONE 8:e59661. https://doi.org/10.1371/journal.pone.0059661
Gondro C, van der Werf J, Hayes B (eds) (2013) Genome-wide association studies and genomic prediction. Methods in Molecular Biology. Humana Press, New York. https://doi.org/10.1007/978-1-62703-447-0
Hamblin MT, Buckler ES, Jannink JL (2011) Population genetics of genomics-based crop improvement methods. Trends Genet 27:98–106. https://doi.org/10.1016/j.tig.2010.12.003
Hambäck PA, Björkman M, Rämert B, Hopkins RJ (2009) Scale-dependent responses in cabbage herbivores affect attack rates in spatially heterogeneous systems. Basic Appl Ecol 10:228–236. https://doi.org/10.1016/j.baae.2008.06.004
Hambäck PA, Inouye BD, Andersson P, Underwood N (2014) Effects of plant neighborhoods on plant-herbivore interactions: resource dilution and associational effects. Ecology 95:1370–1383. https://doi.org/10.1890/13-0793.1
Hauser MT, Harr B, Schlötterer C (2001) Trichome distribution in Arabidopsis thaliana and its close relative Arabidopsis lyrata: molecular analysis of the candidate gene GLABROUS1. Mol Biol Evol 18:1754–1763. https://doi.org/10.1093/oxfordjournals.molbev.a003963
Horton MW, Bodenhausen N, Beilsmith K, Meng D, Muegge BD, Subramanian S et al. (2014) Genome-wide association study of Arabidopsis thaliana leaf microbial community. Nat Commun 5:5320. https://doi.org/10.1038/ncomms6320
Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A et al. (2012) Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet 44:212–216. https://doi.org/10.1038/ng.1042
Ida TY, Takanashi K, Tamura M, Ozawa R, Nakashima Y, Ohgushi T (2018) Defensive chemicals of neighboring plants limit visits of herbivorous insects: Associational resistance within a plant population. Ecol Evol 8:12981–12990. https://doi.org/10.1002/ece3.4750
Ising E (1925) Beitrag zur theorie des ferromagnetismus. Z für Phys 31:253–258
Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genom 9:166–177. https://doi.org/10.1093/bfgp/elq001
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ et al. (2008) Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723. https://doi.org/10.1534/genetics.107.080101
Karban R (1992) Plant variation: its effects on populations of herbivorous insects. In: Fritz RS, Simms EL (eds) Plant resistance to herbivores and pathogens: ecology, evolution, and genetics, University of Chicago Press, Chicago, pp. 195–215
Kizaki S, Katori M (1999) Analysis of canopy-gap structures of forests by Ising-Gibbs states-equilibrium and scaling property of real forests. J Phys Soc Jpn 68:2553–2560. https://doi.org/10.1143/JPSJ.68.2553
Kofler R, Schlotterer C (2012) Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies. Bioinformatics 28:2084–2085. https://doi.org/10.1093/bioinformatics/bts315
Korte A, Farlow A (2013) The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9:29. https://doi.org/10.1186/1746-4811-9-29
Lee S, Abecasis GR, Boehnke M, Lin X (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95:5–23. https://doi.org/10.1016/j.ajhg.2014.06.009
Li J, Hansen BG, Ober JA, Kliebenstein DJ, Halkier BA (2008) Subclade of flavin-monooxygenases involved in aliphatic glucosinolate biosynthesis. Plant Physiol 148:1721–1733. https://doi.org/10.1104/pp.108.125757
López-Cortegano E, Caballero A (2019) Inferring the nature of missing heritability in human traits using data from the GWAS catalog. Genetics 212:891–904. https://doi.org/10.1534/genetics.119.302077
McCoy BM, Maillard JM (2012) The importance of the Ising model. Prog Theor Phys 127:791–817. https://doi.org/10.1143/PTP.127.791
Mundt CC (2002) Use of multiline cultivars and cultivar mixtures for disease management. Ann Rev Phytopathol 40:381–410. https://doi.org/10.1146/annurev.phyto.40.011402.113723
Nallu S, Hill JA, Don K, Sahagun C, Zhang W, Meslin C et al. (2018) The molecular genetic basis of herbivory between butterflies and their host plants. Nat Ecol Evol 2:1418–1427. https://doi.org/10.1038/s41559-018-0629-9
Perdry H, Dandine-Roulland C (2018) gaston: Genetic Data Handling (QC, GRM, LD, PCA) & Linear Mixed Models. R package version 1.5.4. https://CRAN.R-project.org/package=gaston
R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Reymond P, Farmer EE (1998) Jasmonate and salicylate as global signals for defense gene expression. Curr Opin Plant Biol 1:404–411. https://doi.org/10.1016/s1369-5266(98)80264-1
Rieux A, Soubeyrand S, Bonnot F, Klein EK, Ngando JE, Mehl A et al. (2014) Long-distance wind-dispersal of spores in a fungal plant pathogen: estimation of anisotropic dispersal kernels from an extensive field experiment. PLoS One 9:e103225. https://doi.org/10.1371/journal.pone.0103225
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC et al. (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma 12:77. https://doi.org/10.1186/1471-2105-12-77
Sato Y (2018) Associational effects and the maintenance of polymorphism in plant defense against herbivores: review and evidence. Plant Species Biol 33:91–108. https://doi.org/10.1111/1442-1984.12201
Sato Y, Ito K, Kudoh H (2017) Optimal foraging by herbivores maintains polymorphism in defence in a natural plant population. Funct Ecol 31:2233–2243. https://doi.org/10.1111/1365-2435.12937
Sato Y, Kudoh H (2015) Tests of associational defence provided by hairy plants for glabrous plants of Arabidopsis halleri subsp. gemmifera against insect herbivores. Ecol Entomol 40:269–279. https://doi.org/10.1111/een.12179
Sato Y, Kudoh H (2017) Herbivore-mediated interaction promotes the maintenance of trichome dimorphism through negative frequency-dependent selection. Am Nat 190:E67–E77. https://doi.org/10.1086/692603
Sato Y, Shimizu-Inatsugi R, Yamazaki M, Shimizu KK, Nagano AJ (2019) Plant trichomes and a single gene GLABRA1 contribute to insect community composition on field-grown Arabidopsis thaliana. BMC Plant Biol 19:163. https://doi.org/10.1186/s12870-019-1705-2
Schlicht R, Iwasa Y (2004) Forest gap dynamics and the Ising model. J Theor Biol 230:65–75. https://doi.org/10.1016/j.jtbi.2004.04.027
Schrauf MF, Martini JWR, Simianer H, De los Campos G, Cantet R, Freudenthal J et al. (2020) Phantom epistasis in genomic selection: on the predictive ability of epistatic models. G3: Genes|Genomes|Genet 10:3137–3145. https://doi.org/10.1534/g3.120.401300
Schuman MC, Allmann S, Baldwin IT (2015) Plant defense phenotypes determine the consequences of volatile emission for individuals and neighbors. eLife 4:e04490. https://doi.org/10.7554/eLife.04490
Seren Ü, Grimm D, Fitz J, Weigel D, Nordborg M, Borgwardt K et al. (2017) AraPheno: a public database for Arabidopsis thaliana phenotypes. Nucleic Acids Res 45:D1054–D1059. https://doi.org/10.1093/nar/gkw986
Tahvanainen JO, Root RB (1972) The influence of vegetational diversity on the population ecology of a specialized herbivore, Phyllotreta cruciferae (Coleoptera: Chrysomelidae). Oecologia 10:321–346. https://doi.org/10.1007/BF00345736
Togninalli M, Seren Ü, Meng D, Fitz J, Nordborg M, Weigel D et al. (2018) The AraGWAS Catalog: a curated and standardized Arabidopsis thaliana GWAS catalog. Nucleic Acids Res 46:D1150–D1156. https://doi.org/10.1093/nar/gkx954
Underwood N, Hambäck PA, Inouye BD (2020) Pollinators, herbivores, and plant neighborhood effects. Quart Rev Biol 95:37–57. https://doi.org/10.1086/707863
Underwood N, Inouye BD, Hambäck PA (2014) A conceptual framework for associational effects: when do neighbors matter and how would we know? Quart Rev Biol 89:1–19. https://doi.org/10.1086/674991
Underwood N, Rausher MD (2000) The effects of host-plant genotype on herbivore population dynamics. Ecology 81:1565–1576. https://doi.org/10.1890/0012-9658(2000)081[1565:TEOHPG]2.0.CO;2
Utsumi S, Ando Y, Craig TP, Ohgushi T (2011) Plant genotypic diversity increases population size of a herbivorous insect. Proc R Soc B 278:3108–3115. https://doi.org/10.1098/rspb.2011.0239
van Poecke RM (2007) Arabidopsis-insect interactions. Arabidopsis Book 5:e0107. https://doi.org/10.1199/tab.0107
Verschut TA, Becher PG, Anderson P, Hambäck PA (2016) Disentangling associational effects: both resource density and resource frequency affect search behaviour in complex environments. Funct Ecol 30:1826–1833. https://doi.org/10.1111/1365-2435.12670
Wang M, Roux F, Bartoli C, Huard-Chauveau C, Meyer C, Lee H et al. (2018) Two-way mixed-effects methods for joint association analysis using both host and pathogen genomes. Proc Natl Acad Sci USA 115:E5440–E5449. https://doi.org/10.1073/pnas.1710980115
Weiner J (1990) Asymmetric competition in plant populations. Trends Ecol Evol 5:360–364. https://doi.org/10.1016/0169-5347(90)90095-U
Yamamoto E, Matsunaga H, Onogi A, Ohyama A, Miyatake K, Yamaguchi H et al. (2017) Efficiency of genomic selection for breeding population design and phenotype prediction in tomato. Heredity 118:202–209. https://doi.org/10.1038/hdy.2016.84
Young AI, Durbin R (2014) Estimation of epistatic variance components and heritability in founder populations and crosses. Genetics 198:1405–1416. https://doi.org/10.1534/genetics.114.170795
Zeller SL, Kalinina O, Flynn DFB, Schmid B (2012) Mixtures of genetically modified wheat lines outperform monocultures. Ecol Appl 22:1817–1826. https://doi.org/10.1890/11-0876.1
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824
Zuk O, Hechter E, Sunyaev SR, Lander ES (2012) The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci USA 109:1193–1198. https://doi.org/10.1073/pnas.1119675109
Acknowledgements
The authors would like to thank Ü. Seren, A. Korte, and M. Nordborg for kindly providing the full imputed SNP data prior to it being publicly available; T. Tsuchimatsu, K. Iwayama, J. Bascompte, and anonymous reviewers for discussions and comments; and Dynacom Co., Ltd. for technical assistance with the R package development. This study was supported by the Japan Science and Technology Agency (JST) PRESTO (Grant number, JPMJPR17Q4 and JPMJPR16Q9) to YS and EY; Japan Society for the Promotion of Science (JSPS) Postdoctoral Fellowship (16J30005) to YS; MEXT KAKENHI (18H04785) and the Swiss National Science Foundation (31003A_182318) to KKS; URPP Global Change and Biodiversity of the University of Zurich to YS and KSS; and JST CREST (JPMJCR15O2 and JPMJCR16O3) to AJN and KKS. The field experiment was supported by the Joint Usage/Research Grant of Center for Ecological Research, Kyoto University, Japan.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor Jinliang Wang
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sato, Y., Yamamoto, E., Shimizu, K.K. et al. Neighbor GWAS: incorporating neighbor genotypic identity into genome-wide association studies of field herbivory. Heredity 126, 597–614 (2021). https://doi.org/10.1038/s41437-020-00401-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41437-020-00401-w
This article is cited by
-
Single-gene resolution of diversity-driven overyielding in plant genotype mixtures
Nature Communications (2023)