Recent natural selection conferred protection against schizophrenia by non-antagonistic pleiotropy

Schizophrenia is a debilitating psychiatric disorder associated with a reduced fertility and decreased life expectancy, yet common predisposing variation substantially contributes to the onset of the disorder, which poses an evolutionary paradox. Previous research has suggested balanced selection, a mechanism by which schizophrenia risk alleles could also provide advantages under certain environments, as a reliable explanation. However, recent studies have shown strong evidence against a positive selection of predisposing loci. Furthermore, evolutionary pressures on schizophrenia risk alleles could have changed throughout human history as new environments emerged. Here in this study, we used 1000 Genomes Project data to explore the relationship between schizophrenia predisposing loci and recent natural selection (RNS) signatures after the human diaspora out of Africa around 100,000 years ago on a genome-wide scale. We found evidence for significant enrichment of RNS markers in derived alleles arisen during human evolution conferring protection to schizophrenia. Moreover, both partitioned heritability and gene set enrichment analyses of mapped genes from schizophrenia predisposing loci subject to RNS revealed a lower involvement in brain and neuronal related functions compared to those not subject to RNS. Taken together, our results suggest non-antagonistic pleiotropy as a likely mechanism behind RNS that could explain the persistence of schizophrenia common predisposing variation in human populations due to its association to other non-psychiatric phenotypes.


Results
RNS signatures across the genome.We assessed the signatures of recent polygenic adaptation across the genome after the human diaspora out of Africa (about 100 kya 31 ).Pcadapt 48 was used to study RNS signals across human populations from 1000 Genomes Project data.Up to 3 principal components (PCs) were kept based on the proportion of variance explained (Supplementary Fig. 1).The first PC separates African populations from the rest, while the second and third PCs differentiate between Asian and European populations (Supplementary Fig. 2), similar to previous results described using 1000 Genomes phase 1 data 47 .
In addition, to rule out the possibility that many of the described RNS markers were false positives due to discontinuous PC space 50 , we estimated Fst statistical parameters to describe adaptation selection markers between We used sequencing data from 1000 Genomes Project phase 3 outbred and unrelated subjects (N = 1699 European (EUR), East Asian (EAS), Southern Asian (SAS) and African (AFR) individuals; American individuals were not used due to admixed ancestry) and GWAS summary statistics from schizophrenia 49 .ancestry-specific summary statistics for European and East Asian subjects were used for additional sensitivity analyses.Overlapping variants from both sources (5,554,437 SNPs) were included in the subsequent analyses.Throughout the whole process, markers of RNS were considered as the top 5% LD-independent variants with the highest probability of being markers of natural selection (RNS markers; N SNPs = 8679; LD-clumping based on SCZ).As a sensitivity analysis, we used top 5% LD-independent variants as RNS markers after LD-clumping based on RNS (LD-RNS markers; N SNPs = 6262).Enrichment of RNS was also tested in Alzheimer's disease as a methodological control.Enrichment, effect-stratified analysis (bias towards risk or protection), and biological profiling of RNS in schizophrenia were performed as described in Methods.RNS-stratified polygenic score analyses (PGS SCZ ) were performed in an independent Spanish case-control sample (CIBERSAM; N SCZ = 1927; N HC = 1561) and previous SCZ GWAS 39 as discovery sample.LDSC-SEG = LDSC applied to specifically expressed genes.
Vol:.(1234567890) Enrichment of selection signatures across schizophrenia associated loci.We evaluated the relationship between RNS and schizophrenia predisposing variation from the latest schizophrenia GWAS 49 using a comprehensive set of analyses (Fig. 1).First, we analyzed the enrichment of selection signals across schizophrenia GWAS significance thresholds.To perform well-powered analyses, we considered the 5% of LD-independent SNPs with the lowest p RNS as RNS markers (N SNPs = 173,701; LD-clumping based on p SCZ , Supplementary Data 1) and compared them with the remaining LD-independent SNPs.We observed an increasingly significant enrichment for RNS markers (N SNPs = 8679) across the most stringent schizophrenia GWAS thresholds (p SCZ threshold < 5 × 10 -8 , Fisher's exact test p = 4.40 × 10 -13 ; OR (CI 95%) = 3.45 (2.51, 4.59); Fig. 2A; Supplementary Data 3).To ensure that these   2B; Supplementary Data 3).Moreover, to ensure no methodology bias is present, enrichment analyses were repeated using GWAS data from Alzheimer's disease 51 , a related brain disorder 52 that may escape natural selection due to its late onset.LD-clumping was performed based on Alzheimer's GWAS p-values.No significant enrichment between RNS and Alzheimer's disease was observed (Supplementary Data 3; Fig. 2C) using the same procedure described above for schizophrenia.
We then assessed the partial correlation between schizophrenia and RNS across the set of LD-independent SNPs nominally associated with SCZ, while controlling for their minor allele frequency on the 1000 Genomes Project data (p SCZ < 0.05; N SNPs = 41,079; LD-clumping based on p SCZ ) and found a significant correlation (Spearman-rho = 0.034; p = 7.03 × 10 -12 ).This result was confirmed by conducting 10,000 random permutations of the GWAS results with respect to the p RNS values and testing whether the observed correlation was significantly higher than the null distribution (p perm = 1.3 × 10 -3 ; Supplementary Data 4).A similar correlation pattern between schizophrenia and RNS was observed after LD-clumping based on RNS (Spearman-rho = 0.075; p = 5.12 × 10 -13 ).We repeated this analysis using GWAS data from Alzheimer's disease and observed no significant correlation with RNS (Supplementary Data 4).
We then assessed schizophrenia polygenic score prediction (PGS SCZ ) in an independent case-control cohort (CIBERSAM; N SCZ = 1927; N HC = 1561; Supplementary Data 6).For this analysis, we used the PGS SCZ threshold with the most significant association with schizophrenia in the analysis using whole genome data (p SCZ < 0.2; N SNPs = 61,040).After dividing the resultant schizophrenia predisposing variation into 20-quantiles (N SNPs = 3052) ranked by p RNS , we observed a reduction of the explained variance in the case-control cohort when moving to higher p RNS (linear regression t = − 3.78; p = 0.0013; Fig. 3A).To account for the likely bias of the genomic properties of selection markers, we also compared PGS SCZ using variants from the first quantile against PGS SCZ using 1000 sets including the same number of matched SNPs selected from the rest of variants.Matched SNPs were selected accounting for MAF, number of SNPs in LD (LD buddies), distance to nearest gene, and gene density (see Methods).PGS SCZ prediction from the first quantile was found to be significantly higher than the distribution of 1000 predictions from matched SNP selections (p = 0.019; Fig. 3B).
Moreover, to ensure our PGS SCZ predictions using variants subject to RNS were not biased by the presence of Long-Range LD regions 53 , we repeated the PGS SCZ analysis after excluding Long-Range LD regions previously described 53 (p SCZ < 0.2; N SNPs = 60,407).Again, a similar reduction of the explained variance towards higher p RNS was found (t = − 4.64; p = 2 × 10 -3 ).Also, PGS SCZ prediction from the first quantile was found to be significantly higher than the distribution of 1000 predictions from the remaining SNPs (p = 0.033; Supplementary Data 6).
Figure 3. Schizophrenia Polygenic score predictions (PGS SCZ ) in an independent cohort stratified by the probability of the SNPs to be subject to RNS. (A) PGS SCZ predictions (% pseudo-R 2 ) on an independent schizophrenia case-control sample (CIBERSAM; N SCZ = 1927; N HC = 1561) using 20 subsets of SNPs ranked by their probability to be subject to RNS (p RNS ) are displayed.For this analysis, we used the most significant schizophrenia threshold in the PGS SCZ analysis using whole genome data (p SCZ < 0.2; N SNPs = 61,040).LD-clumping based on p SCZ (r 2 < 0.1; 500 kb) was performed for PGS estimations.(B) PGS SCZ performance (% pseudo-R 2 ) using variants from the first RNS quantile (SNPs with the highest probability to be subject to RNS; N SNPs = 3052) was compared against the distribution of PGS SCZ Naggelkerke's pseudo-R 2 obtained by selecting the same number of matched SNPs from the remaining 95%.A total of 1000 sets of matched SNPs were randomly drawn from SNPs matched to the variants from the first RNS quantile based on MAF, number of SNPs in LD, distance to nearest gene, and gene density.See Supplementary Data 6 for detailed results.= 30,761; LD-clumping based on p SCZ; Supplementary Data 1D) populations and compared them with the remaining LD-independent SNPs.We observed an increasingly significant enrichment for European-based RNS markers (N SNPs = 6983) across the most stringent schizophrenia GWAS thresholds (p SCZ threshold < 5 × 10 -8 , Fisher's exact test p = 2.30 × 10 -6 ; OR (CI 95%) = 2.76 (1,83, 4.04); Supplementary Data 3).In case of the less powered test based on East Asian populations, there is also a trend for enrichment of schizophrenia loci across East Asian-based RNS markers (N SNPs = 8983; p SCZ threshold < 5 × 10 -8 , Fisher's exact test p = 0.08; OR (CI 95%) = 1.65 (0.88, 2.86); Supplementary Data 3).
Moreover, we described a great overlap between European ancestry based and global RNS markers.Across the 46,872 common LD-independent SNPs in both European and all population RNS data, 1176 out of 2249 RNS markers were among the top 5% with highest probability to be RNS marker (Chi-square test p < 1 × 10 -16 ).
Alleles arisen during human evolution and subject to RNS are biased towards protection against schizophrenia.In order to study whether RNS markers confer risk or protection to schizophrenia, we focused on the derived alleles arisen during human evolution (i.e.derived selected alleles).We used LD-independent SNPs with available information of the ancestral allele (the allelic state of the most recent common ancestor of human and the closest primate) in the 1000 Genomes database (N SNPs = 170,609 (98.2% of total LD-independent SNPs)).We calculated GWAS derived-OR SCZ referred to the derived allele and evaluated whether RNS markers associated with schizophrenia were enriched for derived alleles conferring risk or protection to the illness.
Nine out of 13 RNS markers from schizophrenia GWS loci with derived protective alleles have many other pleiotropic effects such as body size, body mass index or systolic blood pressure (Supplementary Data 8).However, this ratio was not significantly greater than for schizophrenia GWS loci with derived risk alleles (4 out of 7 RNS markers; Fisher's exact test P = 0.47).
Additionally, as pleiotropy might limit effect size 54,55 , we evaluated PGS SCZ performance in the independent CIBERSAM case-control sample (N SCZ = 1927; N HC = 1561) for RNS markers and the rest of non-selected SNPs, by considering variants with protective or derived risk alleles separately (Fig. 4C, Supplementary Data 7B).Since there were different number of risk (N SNPs = 2967) and protective (N SNPs = 3168) SNPs subject to RNS, we then conducted PGS SCZ comparisons after using random permutations of 1000 SNPs for each subset of SNPs (Fig. 4D, Supplementary Data 7B).Across RNS markers, risk variants explained more variance in the SCZ-HC status than protective variants do (Wilcoxon test; p < 10 -16 ; median R 2 RISK = 1.10; median R 2 PROTECTIVE = 0.54).The difference in explained variance between risk and protective variants was smaller across non-RNS SNPs (Wilcoxon test; (A) Average log 10 (OR SCZ ) of RNS markers (orange) and the remaining SNPs (grey), referred to the derived alleles that emerged during human evolution, across different SCZ GWAS thresholds.95% confidence intervals of the means are displayed.(B) Proportion of SNPs within RNS markers and the remaining SNPs with a protective or risk effect of the derived allele, across different schizophrenia GWAS thresholds.(C) PGS SCZ predictions in the case-control target sample (N SCZ = 1927; N HC = 1561) stratified by RNS.Schizophrenia GWAS summary statistics were subsetted based on the relationship of the SNP to RNS and the risk or protective effect of the derived allele as follows: (i) RNS markers (top 5% of the SNPs with the highest probability of being subject to RNS) whose derived alleles arisen through human evolution confer protection to schizophrenia (derived-OR SCZ < 1; light brown), (ii) RNS markers (top 5% of the most selected SNPs) whose derived alleles arisen through human evolution confer risk to schizophrenia (derived-OR SCZ > 1; dark brown), (iii) SNPs not subject to RNS (from the remaining 95% of SNPs) whose derived alleles arisen through human evolution confer protection to schizophrenia (derived-OR SCZ < 1; light grey), (iv) SNPs not subject to RNS (from the remaining 95% of SNPs) whose derived alleles arisen through human evolution confer risk to schizophrenia (derived-OR SCZ > 1; dark grey).Explained variance attributable to PGS was calculated as the increase in Nagelkerke's pseudo-R 2 between a linear model with and without the PGS variable.P values were obtained from the binomial logistic regression of SCZ phenotype on PGS, accounting for LD and including sex, age, and 10 MDS ancestry components as covariates.Significant PGS predictions after FDR correction (p FDR < 0.05) are marked with an asterisk.(D) Distribution of variance explained (% pseudo-R 2 ) from stratified PGS SCZ predictions based on the abovementioned subsets, using a similar number of variants (N SNPs = 1000 from all available SNPs (P SCZ < 1)) for each analysis.Wilcoxon test was used to compare 1000 pseudo-R 2 distributions from PGS predictions.See Supplementary Data 7 for detailed results.www.nature.com/scientificreports/p = 4.5 × 10 -13 ; median R 2 RISK = 0.54; median R 2 PROTECTIVE = 0.46).We observed similar MAF across protective (MAF(CI95%) = 0.366 (0.358, 0.374)) and risk (MAF(CI95%) = 0.336 (0.329, 0.344)) RNS markers, thus ruling out that the results could be inflated by higher MAF of risk SNPs.
Functional enrichment of schizophrenia GWS loci subject to RNS.We finally aimed to describe the functional characteristics of the loci that were both subject to RNS and associated with schizophrenia.We used LDSC applied to specifically expressed genes (LDSC-SEG) to evaluate the schizophrenia heritability enrichments across RNS markers within 10 different tissues, 13 brain related tissues, and 3 brain cell-types and compared them to the LDSC-SEG enrichments across the rest of SNPs (Supplementary Data 4).While schizophrenia predisposing variation non-subject to RNS was found to be enriched at cortical brain tissue (FDR-p = 0.029) and neuronal cell-type (FDR-p = 0.017), schizophrenia predisposing variation within RNS markers was found to be enriched beyond brain-related tissues and not enriched in any brain-related cell types (Fig. 5, Supplementary Data 4).
We also characterized RNS signals by comparing genes mapped from schizophrenia GWS loci subject and not subject to RNS.Using fine mapping results from the latest SCZ GWAS 49 , we considered the broad set of 628 mapped from GWS loci with low numbers of expected causal SNPs (K < 3.5; see Methods, Supplementary Data 9A).Genes mapped from GWS loci harboring (22 genes) and not harboring (468 genes) RNS markers were tested for functional over-representation analysis (ORA) using the whole gene-set as background.Although no brain-relevant overrepresentation was found in any case, we observed an enrichment of loci subject to RNS in brain unrelated functional signatures (miRNA metabolic process p FDR = 0.006; gland development p FDR = 0.026; epidermis development p FDR = 0.026; Supplementary Data 9B).

Discussion
Here in this work, we characterized the RNS signatures that occurred after the human diaspora out of Africa across the whole genome and demonstrated a significant enrichment of RNS markers across schizophrenia predisposing loci as well as a significant enrichment in their contribution to schizophrenia heritability.Then, by analyzing the derived and ancestral alleles we observed that RNS markers were enriched for derived alleles conferring protection to schizophrenia.This tendency towards protective versus derived risk alleles was not found across variants not subject to RNS, thus suggesting a positive selection of protective schizophrenia alleles during recent evolution of human populations.The biological characterization of schizophrenia predisposing loci subject to RNS and their mapped genes showed an underrepresentation of brain and neuronal related functions as well as more pleiotropic associations relative to schizophrenia predisposing loci not subject to RNS.These results suggest that recent selection of schizophrenia alleles could be promoted by conferring selective advantage through other non-psychiatric phenotypes.Our results suggest pleiotropy as a likely mechanism behind RNS that could explain, at least in part, the recent evolutionary dynamics of schizophrenia predisposing genetic variation.
Previous studies have tried to explain why genetic predisposing variation to schizophrenia persists in the population, despite the reproductive fitness reduction in affected individuals 20,21,24 .In this sense, many authors have described the persistence of schizophrenia predisposing alleles as a result of balanced selection or because they could provide fitness advantages in certain environments 22,41,59,60 .However, in this study, we observed that schizophrenia GWS loci are enriched in variants that have rapidly expanded across modern human populations (and are therefore subject to RNS) and that the derived alleles confer protection rather than risk to schizophrenia.Thus, our results are in line with recent studies reporting negative selection of schizophrenia predisposing alleles 26,61 .Pardiñas et al. 26 have suggested a plausible explanation of the purported schizophrenia evolutionary paradox of persistent common variation and progressive removal of risk alleles through background selection 62 .By this mechanism, removal of rare haplotypes harboring deleterious mutations could reduce genetic diversity and allow common alleles with small deleterious effects to maintain a high frequency by drift 26,62 .
Apparent contradictory results regarding the positive or negative selection of alleles that predispose to schizophrenia need to be integrated using a dynamic perspective.Since our study has focused only on genetic selection initiated after African populations started their dispersion into Eurasia around 60,000-100,000 years ago [31][32][33] , a reliable explanation for the persistence of common alleles conferring risk to schizophrenia may reside on fitness advantages provided during earlier stages of the human evolution.In this sense, a previous study focusing on the Sapiens-Neanderthal divergence described a pattern of positive selection of schizophrenia predisposing variation that could have contributed to the Homo sapiens success against other human species 41 .Moreover, schizophrenia loci were also described to be enriched in genomic regions that had experienced specific evolutionary acceleration during early human evolution (HARs) in comparison with other non-human primates 63 .Although these evidences reinforce the idea of schizophrenia predisposing variation acquisition as a price to pay for human abilities such as language development 24 or even more adaptive immunological profiles 64 , the evolutionary pressures to retain schizophrenia risk alleles as an evolutionary advantage might have stopped acting before the dispersion of the Homo sapiens from Africa.In line with our results, studies exploring RNS patterns have not found a pattern of positive selection of schizophrenia predisposing alleles 26,27,29 .A recent study has proposed an evolutionary framework by which schizophrenia risk alleles increased their frequency with the development of the social brain and high-order cognitive functions, but after a turning point the trend was reversed and a negative pressure against schizophrenia alleles became the standard 61 .Interestingly, this turning point from positive to negative selection of predisposing alleles in schizophrenia could be shared by other psychiatric traits.In fact, a recent work observed a similar scenario for attention-deficit/hyperactivity disorder (ADHD): while introgressed Neanderthal alleles were enriched in ADHD risk variants, modern environments triggered the progressive elimination of ADHD-predisposing alleles 65 .This temporal perspective should also prevent readers from reaching conclusions from current evolutionary dynamics.While our approach explored www.nature.com/scientificreports/RNS from around 100,000 years ago, recent methods have been used to assess modern selection from up to 100 generations (around 2000 years ago) 66,67 .In these recent studies, schizophrenia was also affected by even more recent negative selection 67 .Given the evidence here described about the relationship between predisposing variation to schizophrenia and RNS throughout human populations, we assessed whether these evolutionary dynamics were also present at the Asian or European intrapopulation level.While the association between RNS and the genetic predisposition to schizophrenia is replicated in European populations, this association appears to be absent among East Asian populations.This finding suggests that the observed relationship between schizophrenia and recent natural selection may be driven by selective pressures resulting from other traits, acting through specific environments Figure 5. Comparative enrichments of tissue and cell-type signatures across RNS markers and the remaining non-selected SNPs.Tissue and cell-type sub-annotations were created from the intersection between the described annotations (RNS markers and the rest of SNPs) and gene expression data from 10 whole tissues 56 , 13 brain-related tissues (Brain GTEx 57 ), and 3 brain cell-type annotation files (neurons, astrocytes, and oligodendrocytes 57,58 ).See Methods for further details.(A) Heritability enrichment of schizophrenia across RNS markers and the rest of non-selected SNPs within genes expressed in 10 whole tissues 56 .P values are derived from one-sided t-tests evaluating whether the cell-type enrichment within a particular tissue subannotation is higher than the control background annotations.We considered all RNS markers and the rest of the SNPs from schizophrenia GWAS control annotations since they represent the genomic background in which tissue sub-annotations are embedded.(B) Heritability enrichment of schizophrenia across RNS markers and the rest of the non-selected SNPs within genes expressed in 13 brain-related tissues (Brain GTEx 57 ).To control for brain expression bias, enrichments across brain-tissue sub-annotations were compared to "antitarget tissue" sub-annotations.The latter were created as the intersection between the RNS or the rest of SNPs' annotations and the remaining 12 brain-related tissue annotations.P values are from one-sided t-tests evaluating whether the enrichment within a particular brain tissue sub-annotation is higher than the anti-target brain tissue sub-annotation.(C) Heritability enrichment of schizophrenia across RNS markers and the rest of the non-selected SNPs within genes expressed in 3 brain cell-type annotation files (neurons, astrocytes, and oligodendrocytes 57,58 ).To control for brain expression bias, enrichments across cell-type sub-annotations were compared to "anti-target cell-type" sub-annotations.The latter were created as the intersection between the RNS or the rest of SNPs' annotations and the remaining 2 cell-type annotations.One-sided t-tests were performed to evaluate whether the enrichment within a particular brain cell-type sub-annotation was higher than the anti-target brain cell-type sub-annotation.Dashed lines represent the nominal significance threshold (p < 0.05).* Significant results after Benjamini-Hochberg FDR correction.ACC: Anterior cingulate cortex; BG: Basal ganglia; CNS: central nervous system.that are unique to certain populations.However, the considerably smaller sample size of the Asian schizophrenia cohorts compared to the European samples could be the main cause of the lack of association between recent natural selection and schizophrenia predisposing loci.
Upon assessing the whole SNP-based heritability enrichment across tissue and cell-type genome annotations, stratified by RNS markers, our results demonstrate a depletion of brain and neuronal-related functions in loci subject to RNS, as compared to those not subject to RNS.While common schizophrenia predisposing variation not subject to RNS exhibit clear patterns of heritability enrichment in brain-related tissues and neuronal cell types, aligning with expectations from prior studies, schizophrenia predisposing variation under RNS display lower enrichments in neuronal and brain-related functions, along with significant heritability enrichments in other tissues not directly linked to the central nervous system, such as the kidney or liver (Fig. 5).Similarly, functional enrichment analyses of fine-mapped genes derived from RNS-affected SNPs reveal enrichment in biological functions unrelated to the brain.
In fact, our results pointing to pleiotropic effects of schizophrenia loci under selection are in line with previous findings on specific schizophrenia related genes with marked evolutionary patterns.For instance, the GWS schizophrenia risk variant rs13107325 (C/T) within the SLC39A8 gene has been described to be under positive selection of its derived risk T allele in Europeans 36,39 .This evolutionary event, however, has been reported to be driven by the migration of modern humans out of Africa and the consequent adaptation to colder climates by reducing blood pressure and the risk for hypertension 22,40 .Similarly, survival of the derived risk T allele of variant rs1150711 (C/T) of the ZNF323 gene could be the result from the ameliorated lung function provided by this allele to European populations 59 .These and similar examples of antagonistic pleiotropy, in which the allele that predisposes to schizophrenia confers adaptive advantage or protects against another condition, have been used to explain the evolutionary paradox of schizophrenia.However, our study suggests that, on a genome-wide scale, non-antagonistic pleiotropy could be contributing to the RNS enrichment across schizophrenia common variation.In this sense, recent positive selection of schizophrenia derived protective alleles could also be a likely by-product of the pleiotropic effect of some genetic variants favoring adaptation to the environment during modern evolution of human populations 68,69 .
Differing genetic risks for schizophrenia described across distinct populations 29 could therefore be the result of distinct evolutionary pressures related to other phenotypes, with predisposing variants having also an impact on schizophrenia risk.Although most of schizophrenia alleles have been described to exert similar effects across populations 70 , allele frequencies could vary across populations, thus leading to differences in individual genetic risk profiles.In this sense, greater polygenic scores for schizophrenia have been described in African populations 29,71 .However, reportedly greater polygenic scores and our finding of positive selection of schizophrenia derived protective alleles do not imply that African populations are at higher genetic risk for schizophrenia.Since GWAS have been performed mainly in cohorts of European ancestry, there is a limited portability for the estimation of polygenic risk scores across African cohorts using the available GWAS data 65,71 , and these inferences should be avoided.
By evaluating the PGS predictions on schizophrenia within RNS with derived protective and risk alleles separately, we observed greater variance explained across similar numbers of risk than protective RNS markers (Fig. 4D).These findings suggest that risk variants under RNS are more likely to increase schizophrenia risk by acting in a cumulative way.It has been suggested that strong negative selection acting against highly pleiotropic deleterious variants of large effect may give rise to common pleiotropic variants of lower effect sizes 54,55 .Consequently, the predictive performance of a set of variants will decrease as the level of pleiotropy increases.If there is an excess of pleiotropic variants among the RNS derived protective alleles compared with derived risk alleles, this might explain the higher predictive capacity of a PGS based on derived risk alleles.
This study was subject to several limitations.First, we described RNS markers using whole 1000 Genomes data, while we used association data from mainly European-ancestry schizophrenia cohorts (around 80% 49 ).In this sense, however, our sensitivity analyses considering population specific data lead to similar results in the case of European population, and therefore validating our findings, although this scenario was not replicated in East Asian populations.Therefore, the implications of our results suggesting protective selection by nonantagonistic pleiotropy may be limited, since environments shaping this evolutionary pattern could be absent in other populations.Nevertheless, this inconsistency between European and East Asian population specific analyses may be caused by a great difference in statistical power, and larger Asian cohorts should be used to rule out this possibility.Second, to have sufficient statistical power, our analysis is restricted to variants having a MAF higher than 5%.This filter could have removed many GWAS signals with remarkable RNS effects.For instance, a variant within SLC39A8, a schizophrenia-associated gene with well-described evolutionary properties, is absent from this study due to its low allele frequency.Third, along the lines of a previous related study 72 , we have used the top 5% markers with the highest probability of being subject to RNS for our main analyses in order to have enough statistical power.Finally, it is worth noting that while pcadapt has been employed in previous studies involving similar 1000 Genomes data 47 , PCA-based methods have been reported to have a susceptibility to false positives when applied to populations with pronounced stratification and a non-continuous principal component space.In this regard, the concurrence of selection outliers detected by alternative Fst methods served as validation for our identification of RNS markers using pcadapt.
These limitations notwithstanding, our results shed additional light on the relationship between RNS and schizophrenia.By taking advantage of the genome wide scans and statistical outputs provided by novel methods such as Pcadapt, we have described a clear enrichment of protective RNS markers across schizophrenia GWS loci and suggest non-antagonistic pleiotropy as a likely explanation.This novel perspective could help to integrate previous contradictory findings related to the evolutionary paradigm of schizophrenia and pave the way for further studies of the evolutionary patterns of other neuropsychiatric disorders and human behavioral traits.

Methods
RNS scans in human populations.To evaluate the selective pressures that took place after the human diaspora out of Africa (from ca.100,000 years ago to present), we performed scans detection of RNS signatures using a principle components-based approach implemented in Pcadapt v 4.3.2 47,48.Pcadapt performs PCA and computes p-values to test for the presence of selection outliers, based on the correlations between genetic variation and the selected K principal components.Briefly, for a given SNP, z-scores are obtained by regressing the SNP position on the K principal components.The test implemented is based on the multi-dimensional Mahalanobis distance from the SNP to the K components, thus describing how distant the SNP is from the mean.
To study RNS signatures we used genetic data from the 1000 Genomes phase 3 sequencing database 73 , across African, European, and Asiatic populations.American populations, who show greater genetic heterogeneity due to recent admixture events, were eliminated from the analysis, as suggested by the Pcadapt developers 47 .Moreover, given the elevated inbreeding coefficients for some of the 10,000 Genomes subpopulations 74 , only 1699 subjects that have been previously described as outbred and unrelated 74 were finally considered.We retained common genetic variation (MAF > 0.05), biallelic, and overlapping with summary data from the schizophrenia GWAS used in this study, thus yielding a total of 5,554,437 SNPs.
Since LD can affect ascertainment of population structure 75 , Pcadapt accounts for LD-genome structure (window size = 200 SNP, r 2 = 0.1) for the estimation of RNS probabilities for each of the 5,554,437 SNPs considered.The distribution of PC loadings was evaluated to ensure that selection signals correspond to regions subject to adaptation rather than to regions of lower recombination (high LD).We used Cattell's rule to select the appropriate number of PCs (K = 3).Pcadapt test was then performed and derived p-values (p RNS ) for every SNP were calculated to inform about their likelihood of being subject to RNS.
Throughout the subsequent analyses of the study, we consider the top 5% selected markers (5% of SNPs with the lower p RNS ; RNS markers) after LD-clumping based on schizophrenia GWAS p-values (PLINK v1.9 parameters-clump-r2 0.1-clump-kb 500) and refer to them as RNS markers.We prioritized the association with schizophrenia and the top 5% RNS markers were selected to have enough SNPs with which to perform the subsequent analyses.However, to ensure that the reported results were not biased by LD-clumping based on SCZ p-values, we also considered the top 5% RNS markers after LD-clumping based on p RNS and refer to them as LD-RNS markers.
We also used OutFLANK, that provides a robust estimation of the null distribution of a Fst test statistic 76 , between African and European 1000 thousand genomes populations, as an additional selection parameter to compare with Pcadapt estimates.
Samples and GWAS summary data.We used the latest summary statistics of the schizophrenia GWAS (PGC-SCZ3) conducted by the Psychiatric Genomics Consortium (PGC) 49 (data available at https:// www.med.unc.edu/ pgc/ resul ts-and-downl oads).Main analyses were performed using the whole summary statistics (67,390 schizophrenia/schizoaffective disorder cases and 94,015 controls).Moreover, both ancestry-specific summary statistics for European (53,386 cases and 77,258 controls) and East Asian subjects (14,004 cases and 16,757 controls) were used for additional sensitivity analyses to explore association between RNS signals within European or East Asian populations and ancestry-specific predisposing variation to schizophrenia.As a specificity analysis for the detection of selection signals, we also used Alzheimer's disease 51 GWAS summary statistics in independent analyses.This disorder was selected based on its adequate statistical power comparable to that of schizophrenia, its described clinical similarities, and its later age at onset, after natural selection pressure has exerted its effect.Details regarding the summary statistics used are described in Supplementary data 10.
The Major Histocompatibility Complex (MHC) region was excluded in all analyses and only biallelic SNP and those with an imputation quality score > 0.9 were considered.Overall, 5,554,437 variants from 1000 Genomes overlapping with schizophrenia summary data were converted to PLINK format and used as the input for Pcadapt in the analysis.6,693,073 variants were used in the case of Alzheimer's disease.
We used a case-control sample including 1927 schizophrenia cases (65% males) and 1561 healthy controls (HC) (55% males) from CIBERSAM (Centro de Investigación Biomédica en Red en Salud Mental, Spain) as an independent target sample for PGS predictions (SCZ_CIBERSAM).All research was performed in accordance with relevant guidelines/regulations.Informed consent was obtained from all individual participants included in the study, and ethic boards/committees from the following involved Spanish hospitals approved the protocol: Clinical Research Ethics Committee of the Hospital Sant Joan de Reus, Research Ethics Committee of Asturias, Research Ethics Committee of Cantabria, Bioethics Commission of the University of Barcelona (CBUB), Galician Regional Research Ethics Committee, Scientific Research Ethics Committee of Hospital Gregorio Marañón and Research Ethics Committee of the Valencia's clinical Hospital.All participants were genotyped as part of the Psychiatric Genomics Consortium (PGC), and passed quality control (QC filters) per PGC-SCZ2 criteria 39 .There is no overlap between PGC-SCZ2 and SCZ_CIBERSAM samples.
Polygenic score (PGS) analyses.PGS for schizophrenia (PGS SCZ ) were calculated from PGC-SZ2 39 GWAS summary statistics as discovery sample and SCZ_CIBERSAM case-control cohort as target sample.Summary data from the latest schizophrenia GWAS 49 was not used to estimate PGS because target sample was included as part the cohorts in the study.PGS SCZ were calculated using SNPs present in the 1000 genomes database for the studied populations.We included only biallelic variation, with imputation quality scores > 0.9, and excluded indels.65PLINK 1.9 was used to calculate PGS across schizophrenia patients and healthy controls weighted by the logOR in the discovery sample.Standardized PGS were calculated, and significance was evaluated by logistic regression, using case-control status as dependent variable and sex, age, and the first 10 www.nature.com/scientificreports/MDS ancestry components as covariates.We calculated explained variance attributable to PGS as the increase in Nagelkerke's pseudo-R 2 between a model with and without PGS variable.Initially, we used several P thresholds (P < 5 × 10 -8 , 5 × 10 -5 , 1 × 10 -3 , 0.01, 0.05, 0.1, 0.2, 0.5 and 1) and the whole genome variation to estimate predictions in the SCZ_CIBERSAM case-control cohort.The most significant P threshold (P < 0.2) was selected for the subsequent PGS analyses stratified by RNS.We then performed stratified PGS predictions to compare SNPs that were subject to RNS to those that were not.Summary SNP data with P SCZ < 0.2 was divided into 20 quantiles of increasing p RNS .PGS predictions were performed across the 20 SNP quantiles in the CIBERSAM case-control sample.Linear regression was performed to evaluate the change in the variance explained by PGS on case-control status in the CIBERSAM sample across the 20 quantiles.Nagelkerke's pseudo-R 2 from PGS predictions using the first quantile, corresponding to the 5% of SNPs with the highest probability of being subject to RNS, was compared against the distribution of pseudo-R 2 obtained by selecting the same number of SNPs from the remaining 95% of SNPs.On account of the likely bias in prediction due to the non-random genome features of SNPs subject to RNS, selection of SNPs from the remaining 95% was carried out by identification of sets of randomly drawn SNPs that were matched to the top 5% selected SNP based on allele frequency, number of SNPs in LD, distance to nearest gene, and gene density using default parameters from SNPsnap 77 (https:// data.broad insti tute.org/ mpg/ snpsn ap/).Since an earlier study by Price et al. suggested that recent selection signals could be partly explained as artifacts caused by long-range LD regions 53 that can lead to inflated variance explanations from variants subject to RNS, we repeat the previously explained stratified PGS predictions after removing 24 described Long-Range LD Regions as a sensitivity analysis to assess whether the results hold.
Additionally, we also calculated stratified PGS after subsetting SNP summary data into those for which the derived allele that emerged during human evolution conferred protection (derived-OR SCZ < 1) or risk (derived-OR SCZ > 1) to schizophrenia.Across variants subject (RNS markers) and non-subject to RNS, independently, we compared 1000 predictions from protective and risk variants, using a similar number of SNPs (N SNPs = 1000), on the SCZ_CIBERSAM case-control sample.Wilcoxon test was used to compare the pseudo-R 2 distributions of predictions for SNPs with a protective (derived-OR SCZ < 1) or risk (derived-OR SCZ > 1) derived allele across SNPs subject (RNS markers) and non-subject to RNS.
First, we created per SNP annotation files (one per chromosome and desired annotation).Each file consisted of a row per SNP and a column for each sub-annotation (1 = a particular SNP is part of that sub-annotation, 0 = The SNP does not belong to the annotation).Annotation files were created for: i) RNS markers (SNPs from schizophrenia GWAS (PGC-SCZ3) summary statistics 49 that belong to the top 5% with the highest likelihood (lowest p RNS ) of being a RNS marker) and the rest of the SNPs (the remaining 95% of the SNPs from schizophrenia GWAS summary data).ii) Sub-annotations from the intersection between the described annotations (RNS markers and the rest of SNPs) and gene expression data from 10 whole tissues 56 , 13 brain-related tissues (Brain GTEx 57 ), and 3 brain cell-type annotation files (neurons, astrocytes, and oligodendrocytes 57,58 ).
We used ldsc v1.0.1 15,57 , a command line tool for estimating heritability.We performed both heritability enrichment analyses across the described annotations (-h2) and one-sided t-tests to evaluate whether the celltype enrichment in schizophrenia within a particular annotation was higher than the same cell-type enrichment in schizophrenia outside the target annotation (anti-target) but within a background annotation (using -h2-cts).Control background annotations from the original study were used 56,57 .
We ran LDSC using associated data files from phase 3 of the 1000 Genomes Project 73 .LD scores were computed for each annotation file using the recommended parameters: 1-cM window (-ld-wind-cm 1), restriction to Hapmap3 SNPs, and exclusion of the MHC region due to its high gene density and exceptional LD, as recommended by the developers 15 .The '-overlap-annot' argument and 1000 Genomes phase 3-based frequency files ('1000G_Phase3_frq' files via -frqfile-chr argument) and LD weights ('weights_hm3_no_hla' files via-w-ld-chr argument) were used for LD score calculations.
Partitioned LDSC computes the proportion of SNP heritability associated with each annotation column while considering all the remaining annotations.This is performed by regression models using the estimated LD-scores jointly with other independent LD scores for baseline annotations to improve the model performance.We used the full baseline model v2.2, consistent of a full annotation column (1 per all SNPs) and 158 independent functional annotations, available at LDSC repository (https:// data.broad insti tute.org/ alkes group/ LDSCO RE/), as independent LD scores.
Characterization of schizophrenia genome-wide significant (GWS) loci with and without RNS markers.We used FINEMAP results of the schizophrenia GWS loci from the latest schizophrenia GWAS 49 , and compared the functional overrepresentation of mapped genes from GWS loci harboring RNS markers against those genes not harboring RNS markers.Again, the MHC region was not considered due to its complex LD structure.In order to have sufficient statistical power, the broad fine-map set 628 genes (435 protein-coding) that contained at least one credible SNP from 249 regions with low numbers of expected causal SNPs (K < 3.5) were used 49 .Genes mapped only from RNS markers (22 genes) were compared against genes mapped by SNPs not subject to RNS (468 genes).www.nature.com/scientificreports/Functional overrepresentation of genes from each category was assessed by over-representation analysis (ORA) with WEB-based GEne SeT AnaLysis Toolkit (WebGestalt; http:// www.webge stalt.org/) 78 .All mapped genes from schizophrenia GWS loci studied here in this study were used as a background list.Enrichment across gene sets from Gene Ontology (GO) cellular components (CC) and biological functions (BF) categories was evaluated.FDR by Benjamini-Hocheberg adjustment was used to evaluate enrichment significance across GO gene sets, and a minimum threshold of 5 genes overlapping with each gene set was considered.

Statistical analyses.
Statistical analyses were performed using R (https:// www.r-proje ct.org/).Onesample Kolmogorov-Smirnov test was performed to evaluate the overrepresentation of SNPs subject to RNS after principal components-based approach implemented in Pcadapt.Fisher's exact tests were performed to study the enrichment for the RNS markers ( top 5% RNS loci; LD-based on p SCZ ; N SNPs = 8679) or the LD-RNS markers (top 5% RNS loci; LD-based on p RNS ; N SNPs = 6262) across the different schizophrenia GWAS thresholds (p SCZ < 5 × 10 −8 , 1 × 10 −7 , 1 × 10 −6 , 1 × 10 −5 , 1 × 10 −4 , 0.001, 0.01, 0.05, 0.1, 0.2, and 0.5).We also analyzed enrichment for the RNS markers for Alzheimer's disease (N SNPs = 6263).Spearman correlation between P values from GWAS and RNS was performed (− log 10 p GWAS , vs. − log 10 p RNS ).Correlations were confirmed by conducting 10,000 random permutations of the GWAS results with respect to the p RNS values and testing whether the observed correlation coefficients were significantly higher than the ones in the null distribution of the permuted datasets.
We used Fisher's exact tests to evaluate the enrichment of schizophrenia GWS loci subject to RNS with the derived protective or risk allele having the lead association with schizophrenia among all phenotypes included in GWAS Atlas (https:// atlas.ctglab.nl/) 79 .

EASFigure 1 .
Figure1.Workflow of the analytic pipeline.We used sequencing data from 1000 Genomes Project phase 3 outbred and unrelated subjects (N = 1699 European (EUR), East Asian (EAS), Southern Asian (SAS) and African (AFR) individuals; American individuals were not used due to admixed ancestry) and GWAS summary statistics from schizophrenia49 .ancestry-specific summary statistics for European and East Asian subjects were used for additional sensitivity analyses.Overlapping variants from both sources (5,554,437 SNPs) were included in the subsequent analyses.Throughout the whole process, markers of RNS were considered as the top 5% LD-independent variants with the highest probability of being markers of natural selection (RNS markers; N SNPs = 8679; LD-clumping based on SCZ).As a sensitivity analysis, we used top 5% LD-independent variants as RNS markers after LD-clumping based on RNS (LD-RNS markers; N SNPs = 6262).Enrichment of RNS was also tested in Alzheimer's disease as a methodological control.Enrichment, effect-stratified analysis (bias towards risk or protection), and biological profiling of RNS in schizophrenia were performed as described in Methods.RNS-stratified polygenic score analyses (PGS SCZ ) were performed in an independent Spanish case-control sample (CIBERSAM; N SCZ = 1927; N HC = 1561) and previous SCZ GWAS39 as discovery sample.LDSC-SEG = LDSC applied to specifically expressed genes.

Figure 2 .
Figure 2. Enrichments for RNS signals in schizophrenia GWAS.Fisher's exact tests were performed to study the enrichment of the RNS markers (LD-clumping based on p SCZ ; top 5% of most selected SNPs; N SNPs = 8679) and LD-markers (LD-clumping based on p RNS ; top 5% of most selected SNPs; N SNPs = 6262) across schizophrenia GWAS thresholds 49 .(A) OR (95%CI) for the enrichment and proportion of 8679 RNS markers across the different schizophrenia (SCZ) GWAS thresholds.(B) OR (95%CI) for the enrichment and proportion of 6262 LD-markers markers across the different schizophrenia GWAS thresholds.(C) OR (95%CI) for the enrichment and proportion of RNS markers across the different Alzheimer's disease (ALZ) 51 GWAS thresholds.LD-independent SNPs were considered after clumping based on ALZ p-values.See Supplementary Data 3 for detailed results.

Figure 4 .
Figure 4. Directionality of RNS enrichment in schizophrenia considering the derived alleles emerged during human evolution.(A) Average log 10 (OR SCZ ) of RNS markers (orange) and the remaining SNPs (grey), referred to the derived alleles that emerged during human evolution, across different SCZ GWAS thresholds.95% confidence intervals of the means are displayed.(B) Proportion of SNPs within RNS markers and the remaining SNPs with a protective or risk effect of the derived allele, across different schizophrenia GWAS thresholds.(C) PGS SCZ predictions in the case-control target sample (N SCZ = 1927; N HC = 1561) stratified by RNS.Schizophrenia GWAS summary statistics were subsetted based on the relationship of the SNP to RNS and the risk or protective effect of the derived allele as follows: (i) RNS markers (top 5% of the SNPs with the highest probability of being subject to RNS) whose derived alleles arisen through human evolution confer protection to schizophrenia (derived-OR SCZ < 1; light brown), (ii) RNS markers (top 5% of the most selected SNPs) whose derived alleles arisen through human evolution confer risk to schizophrenia (derived-OR SCZ > 1; dark brown), (iii) SNPs not subject to RNS (from the remaining 95% of SNPs) whose derived alleles arisen through human evolution confer protection to schizophrenia (derived-OR SCZ < 1; light grey), (iv) SNPs not subject to RNS (from the remaining 95% of SNPs) whose derived alleles arisen through human evolution confer risk to schizophrenia (derived-OR SCZ > 1; dark grey).Explained variance attributable to PGS was calculated as the increase in Nagelkerke's pseudo-R 2 between a linear model with and without the PGS variable.P values were obtained from the binomial logistic regression of SCZ phenotype on PGS, accounting for LD and including sex, age, and 10 MDS ancestry components as covariates.Significant PGS predictions after FDR correction (p FDR < 0.05) are marked with an asterisk.(D) Distribution of variance explained (% pseudo-R 2 ) from stratified PGS SCZ predictions based on the abovementioned subsets, using a similar number of variants (N SNPs = 1000 from all available SNPs (P SCZ < 1)) for each analysis.Wilcoxon test was used to compare 1000 pseudo-R 2 distributions from PGS predictions.See Supplementary Data 7 for detailed results. https://doi.org/10.1038/s41598-023-42578-0 https://doi.org/10.1038/s41598-023-42578-0 https://doi.org/10.1038/s41598-023-42578-0www.nature.com/scientificreports/ https://doi.org/10.1038/s41598-023-42578-0 https://doi.org/10.1038/s41598-023-42578-0 African and European populations (Supplementary Data 6).A great consistency was observed between Fst and Pcadapt selection signals: 98% and 73% of the top 100 selection markers estimated by Fst overlapped with the top 5% and 1% of Pcadapt RNS markers described (Supplementary Data 6).