Systemizing is genetically correlated with autism and is genetically distinct from social autistic traits

The hypersystemizing theory of autism suggests that autistic individuals, on average, have superior attention to detail, and a stronger drive to systemize. Systemizing involves identifying input-operation-output relationships. Here, we report the results of genome-wide association studies (GWAS) of systemizing measured using the Systemizing Quotient - Revised in n = 51,564 individuals. We identify three genome-wide significant loci: Two of these were significant in the non-stratified GWAS: rs4146336 on chromosome 3 (P = 2.58×10−8) and rs1559586 on chromosome 18 (P = 4.78×10−8). In addition, we also identified a significant locus in the males-only GWAS (rs8005092 on chromosome 14, P = 3.74×10−8). We find that 12%± 1.2 of the variance in systemizing is captured by SNPs (P=1.2×10−20). We identify a positive genetic correlation between autism and systemizing (rg = 0.26±0.06; P = 3.35×10−5), which is independent of genetic contribution to educational attainment. We further demonstrate that genetic risk for autism from systemizing is genetically distinct from genetic risk emerging from social autistic traits, suggesting distinct shared genetics between autism and social and non-social traits. Our results highlight the importance of considering both social and non-social autistic traits in elucidating the genetic architecture of autism.


Introduction
The hypersystemizing theory of autism suggests that autistic individuals, on average, have superior attention to detail, and a stronger drive to systemize. Systemizing involves identifying input-operation-output relationships in order to analyse and build systems and to understand the laws that govern specific systems 1 . Several lines of evidence suggest that autistic individuals have intact or superior systemizing. This idea was noted in the earliest reports describing autism. In his 1944 paper describing autism 2 , Hans Asperger noted a proclivity for patterns and order in autistic children. Of one child, he writes, "He orders his facts into a system and forms his own theories even if they are occasionally abstruse." He observes that another child had "specialised technological interests and knew an incredible amount about complex machinery," while a third child "was preoccupied by numbers." In Leo Kanner's 1943 paper, he writes that the children with autism have "precise recollection of complex patterns and sequences 3 ." These initial clinical observations have been quantified using different measures. For example, on a self-report measure of systemizing (the Systemizing Quotient -Revised, or the SQ-R) 4 , autistic adults, on average, score significantly higher than non-autistic individuals 4,5 .
The same pattern of results is seen in autistic children, using the parent-report version of the SQ 6 . Systemizing is also highly correlated with aptitude in science, technology, engineering, and mathematics (STEM) 7 . Fathers and grandfathers of children with autism are significantly overrepresented in the field of engineering 8 . The same is true of mothers 9 . This is in line with higher rates of autism in geographical regions that have higher rates of people working in fields such as information technology, like Eindhoven in the Netherlands 10 . Further, autistic individuals are more likely to enrol in STEM majors (34.31%) compared to the general population (22.8%) and other learning disabilities (18.6%) 11 . STEM professionals also score significantly higher on measures of autistic traits (mean = 21.92, SD = 8.92) compared non-STEM professionals (mean = 18.92, SD = 8.48) 12 . Finally, unpublished work from Sweden suggests that high technical IQ in fathers increases risk for autism in children. A few studies have also investigated systemizing in other psychiatric traits and conditions, including schizotypy 13 and anorexia nervosa 14 .
It is unclear if the link between autism and systemizing is due to underlying genetics or due to other non-biological factors (for example, people high in systemizing may be more aware of autism and hence more likely to seek a diagnosis). Here we directly test this using genome-wide association data from 51,564 research participants from 23andMe, Inc., a personal genetics company, and who completed the SQ-R. This study has the following aims: 1. To investigate the genetic architecture of systemizing, using the SQ-R; 2. To identify the genetic correlation between the SQ-R and psychiatric conditions (including autism) and psychological traits; 3. To identify enrichment in tissues, gene groups, and biological pathways.
We identified three genome-wide significant loci associated with the SQ-R, and identified potential candidate genes for these loci using chromatin interactions and eQTLs in neural tissues. We identify a significant, and positive genetic correlation between the SQ-R and autism. This genetic correlation is independent of the genetic correlates of educational attainment, which is also genetically correlated with autism. We further demonstrate that social phenotypes which are genetically correlated with autism are not genetically correlated with the SQ-R, suggesting shared genetics between autism and two distinct phenotypic domains: social traits and non-social traits. In addition, the SQ-R is also significantly correlated with schizophrenia and measures of cognition. We also find that systemizing has a modest heritability, is enriched in evolutionary conserved sites and fetal DNAse hypersensitivity sites, and has a high genetic correlation between males and females.

Participants:
Research participants in the GWAS of the SQ-R were from 23andMe and are described in detail elsewhere 15,16 . All participants provided informed consent and answered surveys online according to a human subjects research protocol, which was reviewed and approved by Ethical & Independent Review Services, an external AAHRPPaccredited private institutional review board (http://www.eandireview.com). All participants completed the online version of the questionnaire on the 23andMe participant portal. Only participants who were primarily of European ancestry (97% European Ancestry) were selected for the analysis using existing methods 17 . Unrelated individuals were selected using a segmental identity-by-descent algorithm 18 . A total of 51,564 participants completed the SQ-R (males = 26,063, and females = 25,501).
Phenotype: The SQ-R is self-report measure of systemizing drive, or interest in rulebased patterns 4 . There are 75 items on the SQ-R, with a maximum score of 150 and a minimum score of 0. Scores on the test are normally distributed. The SQ-R has good crosscultural stability and good psychometric properties with Cronbach's alpha ranging from 0.79 to 0.94 in different studies 19 . Test-retest reliability available in a Dutch sample indicated high reliability of 0.79 (Pearson correlation) 19 . This was supported by another study in 4,058 individuals which identified high internal cohesion 20 . Exploratory followed by confirmatory factor analysis using Rasch modelling suggests that the SQ-R is unidimensional 20 . A sex difference has been observed in multiple studies with males, on average, scoring significantly higher than females. Criterion validity shows that the SQ-R has a modest but significant correlation with the Mental Rotation Test (r =.25, P =.013), as well as its subscales 21 . Autistic individuals, on average, score higher on the SQ-R in multiple different studies 22,23 . Further, the SQ-R also predicts autistic traits, with a combination of the SQ-R and the Empathy Quotient predicting as much as 75% of the variance on the Autism Spectrum Quotient, a measure of autistic traits 4  Genotyping, imputation, and quality control: Details of genotyping, imputation and quality control are given elsewhere 24 . Briefly, unrelated participants were included if they had a call rate of greater than 98.5%, and were of primarily European ancestry (97% European ancestry). A total of 1,030,430 SNPs (including InDels) were genotyped. SNPs were excluded if: they failed the Hardy-Weinberg Equilibrium Test at P < 10 -20 ; had a genotype rate of less than 90%; they failed the parent-offspring transmission test using trio data in the larger 23andMe research participant database; or if allele frequencies were significantly different from the European 1000 Genomes reference data (chi-square test, P < 10 -20 ).
This was followed by imputation against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac2 26 . Genetic association analyses were restricted to SNPs with a minor allele frequency > 1%.
Genetic association: Our primary analysis was an additive model of genetic effects and was conducted using a linear regression with age, sex, and the first five ancestry principle components included as covariates. In addition, given the modest sex difference, we also conducted sex-stratified analyses. SNPs were considered significant at a genome-wide threshold of P <5x10 -8 . Leading SNPs were identified after LD-pruning using Plink (r 2 > 0.8).
Winner's curse correction was conducted using an FDR based shrinking 27 .
We calculate variance explained by first standardizing the regression estimates and then squaring the estimates. This is equivalent to: Where R 2 is the proportion of variance explained for SNP j.

‫ܤ‬ ଶ
is the non-standardized regression coefficient, MAF is the minor allele frequency for SNP j, and is the variance of SQ. Further details of this formula are provided in the Supplementary Note.

GWAS of the SCDC: The SCDC (The Social and Communication Disorders
Checklist) 28 scores were calculated from children of the 90s (ALSPAC cohort) 29 , in children aged 8. In total, SCDC scores were available on n = 7,825 children. From this, we removed individuals for whom complete SCDC scores were not available. After excluding related individuals and individuals with no genetic data, data was available on a total of n = 5421 unrelated individuals. The SCDC score was not normally distributed so we log-transformed the scores and ran regression analyses using the first 2 genetic principal components and sex as the covariates using Plink 2.0 30 . Dosage data from BGEN files were converted using hardcalls, with calls with uncertainty > 0.1 treated as missing data. We excluded SNPs that deviated from Hardy-Weinberg Equilibrium (P < 1x10 -6 ), with minor allele frequency < 0.01 and missing call rates > 2%. We further excluded individuals with genotype missing rates > 5%. Further details are provided in the Supplementary Note.
Genomic inflation factor, heritability, and functional enrichment: LDSR 31,32 was used to calculate for inflation in test statistics due to unaccounted population stratification.
Heritability and genetic correlation for all the traits was calculated using LDSR using the north-west European LD scores. For all phenotypes we performed genetic correlation without constraining the intercept. We identified significant genetic correlations using a Bonferroni adjusted P-value < 0.05. For autism, we validated the genetic correlation using a second, larger dataset: the PGC-iPSYCH meta-analysis 33 autism GWAS dataset (Supplementary Note). Details of the phenotypes, the sample sizes and the references are provided in Supplementary Table 3.
Heritability and genetic correlation was performed using extended methods in LDSR 32 . Difference in heritability between males and females was quantified using: where Z is the Z score for the difference in heritability for a trait, (h 2 males -h 2 females ) is the difference SNP heritability estimate in males and females, and SE is the standard errors for heritability. Two-tailed P-values were calculated and reported as significant if P < 0.05.
Functional annotations: For the primary GWAS (non-stratified analyses), we conducted functional annotation using FUMA 34 . We restricted our analyses to the nonstratified analyses due to the high genetic correlation between the sexes and the low statistical power of the sex-stratified GWAS. We conducted gene-based association analyses using MAGMA 35 within FUMA and report significant genes after using a stringent Bonferroni corrected P < 0.05. In addition, we conducted enrichment for tissue specific expression and pathway analyses within FUMA. For the significant SNPs, we investigated enrichment for eQTLs using brain tissues in the BRAINEAC and GTEx 36 database within FUMA. In addition, we also investigated chromatin interactions from the Dorsolateral Prefrontal Cortex, the hippocampus, and neural progenitor cells. Significant results were identified using a Benjamini-Hochberg FDR corrected P < 0.05. We further conducted partitioned heritability using extended methods in LDSR 37 . We conducted partitioned heritability for baseline categories.

GWIS:
To investigate if the SQ-R is genetically correlated with autism independent of the genetic effects of cognition, we constructed a unique SQ-R phenotype after conditioning on the genetic effects of educational attainment using GWIS 38 . GWIS takes into account the genetic covariance between the two phenotypes to calculate the unique component of the phenotypes as a function of the genetic covariance and the SNP heritability. Prior to performing GWIS, we standardized the beta coefficients for the SQ-R GWAS by using the following formula: completed the AQ and the SQ-R. We derived the variables "attention to detail" and "rigidity" based on the scoring criteria provided by Bralten and colleagues 39 . We note that this is different from the attention to detail subscale as defined in the original paper describing the AQ 40 . Linear regression analyses were conducted using "attention to detail" and "rigidity" scores as the dependent variable, total SQ score as the predictor variable, and sex, status, and age of completion of both the SQ-R and the AQ as covariates. Status was divided into five categories: Autistic individuals, controls, first degree relative of an autistic individual, suspected autism, and suspected autism and relative of an autistic individual.

Phenotypic distribution and genome-wide association analyses
The mean score for all participants was 71±21 out of a total of 150 on the SQ-R, with males scoring higher than females on the SQ-R (76.5±20 in males; 65.4±20.6 in females).
LDSR intercept suggested that there was minimal inflation due to population stratification ( Figure 1). Supplementary Table 1 provides a list of all independent SNPs with P < 1x10 -6 .
For two of these loci (rs1559586 and rs8005092), eQTL and chromatin interactions identified  considerable polygenicity and small per-SNP effect size. Local LD plots for the three significant loci are provided in Supplementary Figure 3.
insert Figure 1 here
Systemizing is thought to contribute to the non-social domain of autism. To identify if systemizing contributes to social autistic traits we conducted genetic correlation using the Social and Communication Disorder Checklist (SCDC), which is genetically correlated with autism 43 (Supplementary Note). Additionally, we also conducted genetic correlation between the SQ-R and family relationship satisfaction, friendship satisfaction 44 , and selfreport empathy which are negatively correlated with autism. We did not identify a significant genetic correlation between SQ-R and SCDC (r g = 0.04±0.18, P = 0.81), family relationship satisfaction (r g = -0.04 ± 0.06, P = 0.47), friendship satisfaction (r g = 0.06±0.07, P = 0.42), and 1 0 empathy (r g = 0.10±0.07, P = 0.17). Given that the sample size for the SCDC is small, we reran the genetic correlation after constraining the intercept, mirroring the analyses in the paper that reported a genetic correlation between autism and the SCDC. We did not identify significant genetic correlations even after constraining the intercept (r g = 0.04±0.13, P = 0.73)) These results suggest that the contribution of systemizing to the genetic risk of autism is distinct from the contribution of social traits.
insert Figure 2 here insert Figure 3 here A recent study 39 identified that polygenic scores for autism predicts autistic traits in specific subdomains of the Autism Spectrum Quotient (AQ) which they termed "attention to detail" and "rigidity." We queried if systemizing predicts both of these domains within the AQ. Whilst we did not have access to a cohort with both genetic data and AQ scores, we had information on n = 5,663 individuals who had completed both the AQ and the SQ-R in the Cambridge Autism Research Database. Our results show that the SQ-R significantly predicts variation in scores in the "attention to detail" and "rigidity" domains on the AQ as defined by  Table 8). Pathway based and tissue enrichment analysis did not identify any significant results (Supplementary Table 9 and Supplementary Figure 5).

Discussion
Here we present the results of a genome-wide association study of systemizing. We identify three genome-wide significant loci associated with SQ-R, two in the non-stratified GWAS and one in the males-only GWAS. These need to replicated in larger sample sizes to provide robust confirmation of the results. Chromatin interaction analyses in neural tissues prioritize five genes for further functional follow up. Three of these genes have high expression in brain tissues. One of these candidate genes is STXBP6. STXBP6 binds to STX1 in the presynaptic boutons to help inhibit Ca 2+ mediated exocytosis of neurotransmitters 41 .
Several studies have identified genes involved in synapse formation and maintenance that are disproportionately mutated in autism 45 . Further, transcriptome studies of the post-mortem brain have identified downregulated genes that are enriched for synaptic transmission 46,47 .
The idea that autistic individuals, on average, have a stronger interest in and aptitude for systems is not new. Several theories have sought to understand this link. For example, the hyper-systemizing theory suggests that this could explain the 'resistance to change' seen in many autistic individuals, in that systemizing requires varying one variable at a time whilst holding all others constant, in order to observe how manipulating one variable (e.g, the operation) changes the output 1 . In this way, resistance to change may appear as a 'symptom' but may simply reflect a different learning style in which the autistic person is systematically seeking to understand a rule-governed domain. The prediction error theory may also be relevant to this hypothesis in that a strong systemizer may be more attracted to highly lawful, more predictable domains, like mathematics or computers, the social world being the extreme opposite of this 48 . Note that versions of this theory may not be well-supported by evidence, since this theory suggests autistic people will have difficulty making predictions across the board, whilst the hyper-systemizing theory suggests this will only be true in domains that are not lawful. While a few studies have investigated the genetic architecture of social traits, or repetitive behaviour, no study to our knowledge has investigated the genetic architecture of non-social cognitive traits associated with autism.
Genetic correlation analyses identified, as predicted, a positive correlation with autism. The correlation was not trivial, and this was independent of the genetic contributors to educational attainment. To further understand the genetic architecture of autism, we tested the genetic correlation between SQ-R and four social traits that are all correlated with autism: social and communication difficulties in childhood which is positively correlated with autism, and friendship satisfaction, family relationship satisfaction, and empathy, which are all negatively genetically correlated with autism. We did not identify a significant correlation between SQ-R and the four phenotypes. These results suggest that there are at least two distinct sources of shared genetics with autism: one linked to social autistic traits and another linked to non-social autistic traits. This lends itself to the idea of multiple underlying risk factors, genetic and other, that contribute to risk for autism. In this hypothesis, the risk for autism does not stem from one source of genetic risk, but multiple different sets of genetic risks that contribute to the considerable phenotypic heterogeneity observed within the autism spectrum. Further research is needed to understand if individuals within the autism spectrum can be stratified based on these sets of risk.
Approximately 12% of variance in systemizing was explained by SNPs, and this was similar between males and females. To our knowledge, there is no study examining heritability of the SQ-R in twins. Given the modest sex-difference observed in the phenotypic scores, we conducted sex-stratified GWAS. Genetic correlation analyses suggested a near identical genetic architecture between the sexes. This was also supported by a similar additive SNP heritability between the sexes. Taken together, these results suggest there is, by and large, a similar genetic architecture between sexes. The observed phenotypic sex differences can then be attributed to differences in environment, or other biological factors (such as prenatal hormones) which we were unable to fully test in this study.
In conclusion, we conducted the first GWAS of systemizing and identify three genome-wide significant loci. Genetic correlation analyses identified significant and replicable positive correlation with autism. This is independent of educational attainment and is not correlated with social autistic traits. These results suggest that there are multiple different sources of genetic risk that contribute to autism.

Figure 3: Genetic correlations between SQ-R and SQminEdu GWIS estimates
Genetic correlation between SQ-R and three phenotypes (in Red). Genetic correlation between the GWIS SQminEdu dataset and the same three phenotypes (in blue). The bars represent 95% confidence intervals. P-values provided on top of the bars.