We thank the Editor for the opportunity to respond to the letter from Belgard et al.1 In their letter, these authors consider that the issue of ethnic population stratification may have negatively impacted the findings in our original manuscript.2 We agree that population stratification is an important issue that needs to be accounted for in such analyses.

We wrote to Dr Belgard who kindly provided the 19 single-nucleotide polymorphisms (SNPs) used in their analysis.1 These 19 SNPs were derived from the 30 SNPs provided in our original article. Of these 19 SNPs, the number of SNPs with positive weights exceeded the number of SNPs with negative weights, including the second most negative weighted SNP, rs12317962, on KCNMB4, which would bias the classifier score. Our original analyses included a total of 237 SNPs. In order to address the issue of ethnic population stratification, we downloaded data from the 1000 genome cohort,3 including Central European (CEU), Finnish (FIN), Great British (GBR) and Iberian Spanish (IBS) populations.

In their analysis using 19 SNPs, Belgard et al. indicated that in Finns (non-autism spectrum disorder (ASD)), our classifier had a higher chance of classifying individuals as ASD compared with CEU (non-ASD) individuals. They concluded that our classifier might be better at separating between European subpopulations than cases from controls. In order to examine this in detail, we tested our classifier performance in correctly identifying control individuals from the CEU, FIN, GBR and IBS control populations. As not all SNPs were available across all data sets, we retrained the classifier using the common SNPs on our training set and then applied the classifier on unseen validation data from the FIN, GBR and IBS control cohorts. Comparing these ethnic European subpopulations, we found that greater differences in classifier score between these populations occurred when only part of the classifier was used (a difference as high as 25% was observed between the FIN and GBR groups). However, using the full classifier, the effects of ethnic population contributed to <6% of the total difference in classifier score. We also provide the full 237 SNPs relevant to our classifier (Table 1). The full code used in the generation of the classifier has been made available on the Autism Genetic Resource Exchange (AGRE) website (http://agre.org), together with testing of the classifier on other ASD data sets.

Table 1 List of all 237 SNPs for ASD classifier in the CEU Cohort,2 organised from highest to lowest median weightings

Using our SNPs, we then examined their predictive accuracy in classifying control individuals from the FIN and GBR (non-ASD) populations, as well as SFARI (Simons Foundation Autism Research Initiative) ASD probands (the independent validation sample in our paper). We plotted the percentage of individuals classified as ASD against the number of SNPs used in the classifier, with SNPs ordered by absolute magnitude of their weightings. As can be seen in Figure 1, while population stratification may have an influence at lower SNP numbers with regard to differences in classifier accuracy between populations, such an effect is diminished as a greater number of SNPs are included. The separation in percentage classified as ASD between the SFARI/ASD and the FIN/GBR groups occurred with increasing gradient between 50 and 100 SNPs, whereas at >150 SNPs the separation between these groups plateaus. This is to be expected, as these SNPs have the smallest weightings within the classifier. Therefore, in keeping with Belgard et al’s analysis, we show that at low SNP numbers, population effects may influence classification accuracy, but these effects are of second order to the ASD signal as the number of SNPs increases.

Figure 1
figure 1

Percentage of individuals classified as ASD as a function of the number of single-nucleotide polymorphisms (SNPs) ordered in decreasing absolute magnitude. Significant variance was observed at smaller number of SNPs (not plotted). Note the gradient differential between SFARI cases versus FIN and GBR between SNPs 80 and 150. ASD, autism spectrum disorder; SNPs, single-nucleotide polymorphisms; SFARI-CASES, Simons Foundation Autism Research Initiative ASD probands; population samples from the 1000 genome cohort3: GBR, Great British; FIN, Finnish.

PowerPoint slide

Using the classifier, as described above, we tested its accuracy in correctly classifying controls (non-ASD) within individual European cohorts. We achieved accuracies (that is, correct classification as non-ASD) of 82% for the FIN, 78% for GBR and 67% for the Spanish cohorts. In addition, to determine classifier performance confidence intervals, we performed a bootstrap analysis (1000 permutations were undertaken; 80% of the data was used to train a classifier to predict the remaining 20%) on all white non-hispanic populations, including all available populations (that is, SFARI and Autism Genetic Resource Exchange probands, and WTBC, CEU, FIN, GBR and IBS Controls). Diagnostic accuracy for ASD was 66.0% (90% CI: 61.5–71.9), with a sensitivity of 63.4% (90% CI: 54.3–75.9) and specificity of 67.2% (90% CI: 59.5–74.3). This equates to a positive likelihood ratio of 1.9 (90% CI: 1.3–3.0).

In our paper, we reported positive and negative predictive accuracies that were 70.8% and 71.8%, respectively.2 Based on a population prevalence of 1:88 cases of ASD in the US population,4 this equates to a positive predictive value (that is, precision) of 2.8% and a negative predictive value of 99.5%. This suggests that the classifier is not suitable as a general screening method, rather it should only be considered in high-risk populations where the base rate of ASD is high and produces acceptable positive and negative predictive values.

In conclusion, we demonstrate that the SNPs in our classifier show some ability to non-randomly distinguish between ASD and controls and that our results are not merely explained by population stratification as demonstrated in our analyses in independent cohorts of individuals of European ancestry. Further work on such approaches is needed in order to validate these findings, for example, prospective studies that examine children at risk for ASD (such as families with an affected member).