Making the MOSTest of imaging genetics

Regional brain morphology has a complex genetic architecture, consisting of many common polymorphisms with small individual effects, which has proven challenging for genome-wide association studies to date, despite its high heritability1,2. Given the distributed nature of the genetic signal across brain regions, joint analysis of regional morphology measures in a multivariate statistical framework provides a way to enhance discovery of genetic variants with current sample sizes. While several multivariate approaches to GWAS have been put forward over the past years3–5, none are optimally suited for complex, large-scale data. Here, we applied the Multivariate Omnibus Statistical Test (MOSTest), with an efficient computational design enabling rapid and reliable permutation-based inference, to 171 subcortical and cortical brain morphology measures from 26,502 participants of the UK Biobank (mean age 55.5 years, 52.0% female). At the conventional genome-wide significance threshold of α=5×10−8, MOSTest identifies 347 genetic loci associated with regional brain morphology, more than any previous study, improving upon the discovery of established GWAS approaches more than threefold. Our findings implicate more than 5% of all protein-coding genes and provide evidence for gene sets involved in neuron development and differentiation. As such, MOSTest, which we have made publicly available, enhances our understanding of the genetic determinants of regional brain morphology.

Regional brain morphology has a complex genetic architecture, consisting of many common polymorphisms with small individual effects, which has proven challenging for genome-wide association studies to date, despite its high heritability 1,2 . Given the distributed nature of the genetic signal across brain regions, joint analysis of regional morphology measures in a multivariate statistical framework provides a way to enhance discovery of genetic variants with current sample sizes. While several multivariate approaches to GWAS have been put forward over the past years 3-5 , none are optimally suited for complex, large-scale data. Here, we applied the Multivariate Omnibus Statistical Test (MOSTest), with an efficient computational design enabling rapid and reliable permutation-based inference, to 171 subcortical and cortical brain morphology measures from 26,502 participants of the UK Biobank (mean age 55.5 years, 52.0% female). At the conventional genome-wide significance threshold of a=5x10 -8 , MOSTest identifies 347 genetic loci associated with regional brain morphology, more than any previous study, improving upon the discovery of established GWAS approaches more than threefold. Our findings implicate more than 5% of all protein-coding genes and provide evidence for gene sets involved in neuron development and differentiation. As such, MOSTest, which we have made publicly available, enhances our understanding of the genetic determinants of regional brain morphology.
Regional surface area and thickness of the cerebral cortex and volume of subcortical structures are highly heritable brain morphological features with complex genetic architectures, involving many common genetic variants with small effect sizes 1,2 . The predominant strategy for identifying genomic loci associated with complex traits is through genome-wide association studies (GWAS), a mass-univariate approach whereby the association between a single outcome measure and each of millions of genetic variants, in isolation, is tested. This is accompanied by a stringent multiple comparison correction to control the family-wise error rate, necessitating very large sample sizes to identify even relatively strong effects. To date, the largest GWAS of regional brain morphological features, based on brain scans obtained from up to fifty thousand individuals, identified almost two hundred genetic variants 1 , which together explained only a fraction of the reported narrow-sense heritability. These studies primarily investigate each region of interest individually, compounding the multiple comparisons correction problem.
In addition to small effect sizes across many variants, the genetic architectures of sets of regional brain features are likely to strongly overlap. Gene expression studies of the human brain have shown widespread gradients across the cortex 6 . Thus, genetic variants probably have distributed effects across regions and morphological measures. While cortical thickness and surface area have been reported to be phenotypically and genetically only weakly correlated to each other 7 , many brain-related traits, e.g. mental disorders, share a large proportion of genetic variants, even in the absence of an overall correlation 8 . The discovery of these variants may be boosted through joint analysis of these traits, in a multivariate framework. This avoids the family-wise error rate penalty for studying multiple outcome measures, or the use of strategies that reduce phenotypic information to a single composite score, which can cause considerable loss of statistical power 9 . Importantly, a multivariate approach is much more consistent with the notion of the brain being an integrated unit, with highly interconnected and biologically similar brain regions, compared to univariate approaches that ignore the information shared across these component measures.
Here we implement a Multivariate Omnibus Statistical Test (MOSTest), designed to boost the power of imaging genetics by capitalizing on the distributed nature of genetic influences across brain regions and pleiotropy across imaging modalities. MOSTest is efficient and capable of combining large-scale genome-wide analyses of dozens of measures for tens of thousands of individuals within hours while achieving enhanced statistical power. Key steps of the MOSTest analysis include: 1) applying a rank-based inverse normal transformation to the input measures; 2) estimating the multivariate correlation structure from the GWAS on randomly permuted genotype data; 3) calculating the Mahalanobis norm, as the sum of squared de-correlated z-values across univariate GWAS summary statistics, to integrate effects across the measures into a multivariate test statistic; and 4) employing the gamma cumulative density function to fit an analytic form for the null distribution, enabling extrapolation to and beyond the 5x10 -8 significance threshold. This avoids the extensive computational burden associated with permutation-based approach which has, until now, prohibited its application in GWAS. The Supplementary Materials contains a detailed description and validation of these steps through simulations.
We compare the MOSTest with an established inferential methodology recently used by the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) consortium 1 , referred to as the min-P approach; min-P takes the smallest p-value of each SNP across the univariate GWAS, and corrects this for the effective number of traits studied 4,10 , i.e. shared genetic architecture across traits does not contribute to statistical power. Min-P achieves its maximum power when the genetic effects across traits are independent; conversely, multivariate approaches have greater power when genetic effects are distributed across traits 3 .
Here, we applied MOSTest to sets of regional brain morphology measures, hypothesizing that it will outperform the min-P approach due to the distributed nature of genetic effects and the presence of pleiotropy across modalities. Our sample consisted of 26,502 healthy White European participants of the UK Biobank (UKB), with a mean age of 55.5 years (standard deviation (SD) 7.4 years), 52.0% female. We processed T1-weighted structural MRI scans with the FreeSurfer v5.3 standard "recon-all" processing pipeline, producing a subset of 35 subcortical volume estimates 11 , as well as subsets of surface area and cortical thickness estimates, both consisting of 68 cortical regions following the Desikan-Killiany parcellation 12 , for a total set of 171 measures. All measures were pre-residualized for age, sex, scanner site, a proxy of image quality (FreeSurfer's Euler number), the first twenty genetic principal components to control for population stratification, and a global measure specific to each set of variables: mean cortical thickness for the regional thickness measures, total surface area for the area measures, and intracranial volume for the subcortical structures. Subsequently, we performed a rank-based inverse normal transformation to the residualized measures. We made use of the UKB v3 imputed data, carrying out standard quality-checks and setting a minor allele frequency threshold of .005, leaving 7.4 million SNPs. Univariate GWAS of each measure was performed with standard tools. The resulting summary statistics were then tested for overall significance with MOSTest. Independent significant SNPs and genomic loci were identified in accordance with FUMA SNP2GENE definition 13 . Details on sample composition, processing of the data, and the analysis techniques are presented in the Online Methods section.
The multivariate GWAS identified thousands of independent SNPs reaching the genome-wide significance threshold of a=5x10 -8 across hundreds of independent loci, as shown in Figure 1A. See the Extended Data section for the list of discovered loci per feature set. Overall, MOSTest led to a threefold higher discovery than the min-P approach. The difference in performance is particularly pronounced when all features are combined, as is also evident from the Miami plots shown in Figure  1B through E. As can be seen in Figure 1F, many loci identified with MOSTest were shared across the three feature subsets. Further, the number of MOSTestdiscovered loci for all features combined (347) is larger than the summation of number of unique loci found when analyzing the feature subsets individually (301). This illustrates how MOSTest capitalizes on the distributed and non-sparse nature of genetic effects through the combination of features.
We carried out replication analyses of the GWAS results in an additional 4,884 UKB participants, whose neuroimaging data was released after we carried out our primary analyses. The loci discovered through MOSTest and min-P replicated at similar levels, with approximately 40% being significant in this smaller additional sample. Therefore, the absolute number of loci replicating is three times higher for MOSTest compared to min-P. The full results are shown in the Extended Data.
We performed extensive validation of MOSTest methodology and implementation, confirming that it has correct type-I error in simulations with synthetic data ( Figure S4) and on real data under permutations ( Figure  S2). We also carried out a formal comparison between the MOSTest and MV-PLINK 5 , finding that the two methods have similar statistical power for detecting associations ( Figure S1), while the MOSTest analysis ran 10,000 times faster. Further, we performed LD score regression and used the intercept to validate that MOSTest results are free of genomic inflation (Table S5).

Figure 1. Highly improved locus discovery through MOSTest. A. Number of independent whole-genome significant loci identified (on the y-axis and in the bubbles) for each set of features (on the x-axis), by MOSTest (in darker colored circles) and by min-P (in lighter colored squares). B -E. Miami plots, contrasting the observed -log10(p-values), shown on the yaxis, of each SNP for MOSTest (top half) with min-P (bottom half)
, for each of the feature sets. The x-axis shows the relative genomic location, grouped by chromosome, and the red dashed lines indicate the whole-genome significance threshold of 5x10 -8 . Note, y-axis is clipped at -log10(p-value)=100. F. Venn diagram depicting the number of loci, identified through MOSTest, overlapping between the three feature subsets.
Using the MiXeR tool 8,14 we fitted a Gaussian mixture model of the null and non-null effects to the GWAS summary statistics, estimating for each feature set the number of SNPs involved, i.e. their combined polygenicity, and their effect size variance, or 'discoverability'. Please see the Online Methods for more details. The results are summarized in Figure 2, depicting the estimated proportion of genetic variance explained by discovered SNPs by both approaches as a function of sample size. The horizontal shift of the curve indicates that the effective sample size of MOSTest is generally about twice as high as that of min-P, with the highest discovery for MOSTest when all features are combined, and lowest discovery for the set of cortical thickness features.
Cortical maps, depicting the morphological associations of the lead SNPs identified with MOSTest on all features with regional surface area and thickness measures, made clear that these SNPs have distributed effects, often with mixed directions, across regions and feature sets. As an example, Figure 3 shows the maps for the top two hits (rs1080066 on chromosome 15, p=1.2x10 -305 , and rs13107325 on chromosome 4, p=3.1x10 -124 ), all other maps are available in the Supplementary Material. These maps revealed anterior-posterior gradients as well as hemisphere-specific effects of some of the lead SNPs, in line with reported genetic patterns of the brain 15,16 .  Gene-level analyses, using Multi-marker Analysis of GenoMic Annotation (MAGMA) 13,17, indicated that 1034 out of all 18,775 protein coding genes (i.e. 5.5%) were significant, with a p-value below a Bonferroni corrected threshold of =.05/18,775. Figure 4A shows the number of significant genes for each set of features. Through competitive gene-set analyses we identified 136 significant Gene Ontology sets for MOSTest applied to all features, the vast majority of which related to neuronal development and differentiation, with Figure 4B listing the top fifteen.
Applying the MOSTest approach to structural brain imaging data, we discovered more loci associated with regional cortical and subcortical morphology than any previous GWAS of brain morphology, even those that had nearly double the sample size 1,2 . Further, a direct comparison with established methods in the same sample revealed a threefold increase in discovery. This improvement indicates the presence of extensive shared genetic architecture across brain regions and across morphological measures, attesting to the importance of estimating levels of polygenic overlap beyond those indicated by genetic correlations 8 , and arguing for techniques that boost discovery of genetic determinants leveraging shared signal between traits 18 . Indeed, overlapping genetic determinants are to be expected given the shared genetic control of neurodevelopment across brain regions 19 , and that similar molecular mechanisms operate across regional borders defined by gross morphological features. This is in accordance with the high levels of pleiotropy across many brain-related traits and disorders 20 . Therefore, our multivariate strategy is better tailored to the underlying neurobiological processes than conventional univariate approaches, as confirmed by our identification of highly significant links to gene sets of neuronal development and differentiation. Further, our extensive validation of the MOSTest methodology and implementation show that the MOSTest is a valid statistical test with an improved statistical power to detect associations in a multivariate context, suitable for applications to large-scale data.
With the large gain of power and consequently lower required sample sizes of MOSTest, we predict that it will be possible to uncover the majority of SNPs influencing brain morphology in the upcoming years. The UKB initiative, for instance, is set to release neuroimaging data of a 100,000 individuals by 2022 21 , which we estimate will enable MOSTest to identify SNPs that will explain about 40% of the additive genetic variance in regional brain morphology. The output of MOSTest is suited for secondary analyses and follow-up studies to investigate the relation between the set of loci discovered and individual features, with a much decreased multiplecomparisons burden. The MOSTest code is publicly available, see the Supplementary Materials. In addition to brain structure, the MOSTest approach may also be of value in uncovering the genetic determinants of brain function and other complex human phenotypes consisting of correlated component measures, such as mental, cognitive or cardiometabolic phenotypes, by taking advantage of the rich multivariate datasets now available.

Materials & Correspondence.
The data incorporated in this work were gathered from public resources. The code will be made available via  Dr. Andreassen has received speaker's honorarium from Lundbeck, and is a consultant to HealthLytix. Dr. Dale is a Founder of and holds equity in CorTechs Labs, Inc, and serves on its Scientific Advisory Board. He is a member of the Scientific Advisory Board of Human Longevity, Inc. and receives funding through research agreements with General Electric Healthcare and Medtronic, Inc. The terms of these arrangements have been reviewed and approved by UCSD in accordance with its conflict of interest policies. The other authors declare no competing financial interests.

Sample
We made use of data from participants of the UKB population cohort, obtained from the data repository under accession number 27412. The composition, set-up, and data gathering protocols of the UKB have been extensively described elsewhere 22 . For this study, we selected White Europeans that had undergone the neuroimaging protocol. For the primary analysis, making use of T1 MRI scan data released up to April 2019, we excluded 1094 individuals with a primary or secondary ICD10 diagnosis of a neurological or mental disorder, as well as 594 individuals with bad structural scan quality as indicated by an age and sex-adjusted Euler number 23 more than three standard deviations lower than the scanner site mean. Our sample size for this analysis was n=26502, with a mean age of 55.51 years (SD=7.42). 51.97% of the sample was female. For the replication analyses, we made use of an additional neuroimaging batch released in September 2019. After identical preprocessing steps as the primary sample, this sample consisted of 4,884 individuals with a mean age of 55.47 years (SD=7.37), 52.42% was female.
Data preprocessing T1-weighted scans were collected from three scanning sites throughout the United Kingdom, all on identically configured Siemens Skyra 3T scanners, with 32-channel receive head coils. The UKB core neuroimaging team has published extensive information on the applied scanning protocols and procedures, which we refer to for more details 21 . The T1 scans were obtained from the UKB data repositories and stored locally at the secure computing cluster of the University of Oslo. We applied the standard "recon-all -all" processing pipeline of Freesurfer v5.3, performing automated surface-based morphometry and subcortical segmentation 11,12 . From the output, we extracted the sets of regional subcortical and cortical morphology measures, as well as estimated intracranial volume (eICV). Table S1 contains all the regional morphology measures, per subset, included in the current study. For each of these, we included both the left and right hemisphere measure, if applicable.
We subsequently regressed out age, sex, scanner site, Euler number, and the first twenty genetic principal components from each measure. We further regressed out a global measure specific to each of the feature subsets: eICV for the subcortical volumes, mean thickness for the regional thickness measures, and total surface area for the regional surface area measures. This was done to ensure we are studying the genetic determinants of regional brain morphology rather than global effects. Following this, we applied rank-based inverse normal transformation 24 to the residuals of each measure: : y " # = Φ &' ( ) * &+ ,&-+.' /, where r " is the ordinary rank of the measure for i-th individual, N gives the sample size, Φ &' denotes the standard normal quantile, y " # is the value after transformation, and c=0.5. This leads to normally distributed input into the univariate GWAS. See the Extended Data section below for a more in-depth discussion of the importance of this normalization procedure.

Univariate GWAS procedure
We made use of the UKB v3 imputed data, which has undergone extensive quality control procedures as described by the UKB genetics team 25 . After converting the BGEN format to PLINK binary format, we additionally carried out standard quality check procedures, including filtering out individuals with more than 10% missingness, SNPs with more than 5% missingness, and SNPs failing the Hardy-Weinberg equilibrium test at p=1*10 -9 . We further set a minor allele frequency threshold of 0.005, leaving 7,428,630 SNPs.
The univariate GWAS on each of the 171 pre-residualized and normalized regional brain morphology measures were carried out using the standard additive model of linear association between genotype vector, g 2 , and phenotype vector, y. To speed up calculations we implemented the association test directly in MOSTest software. Statistical significance was assessed from Pearson's correlation coefficient r 2 = corr(y, g ⋅2 ), as implemented in MATLAB's corr function. This is equivalent to testing significance of the regression slope, β : 2 , as both β : 2 and r 2 are assumed to be t-distributed and have the same t-value: t 2 = β 2 /se?β 2 @ = r 2 /se(r 2 ) = r 2 √N − 2/F1 − r 2 -, and therefore the same p-value, equal to Student's t cumulative distribution function (cdf) with N − 2 degrees of freedom: P IJK,2 = 2 tcdf(−Nt 2 N, N − 2), where N is the sample size 14 . Further, we validated that the above procedure produces the same results as the association test implemented in the commonly used PLINK's additive model, option plink --assoc. Independent significant SNPs and genomic loci were identified in accordance with FUMA SNP2GENE definition 13 . First, we select a subset of SNPs that pass genome-wide significance threshold 5x10 -8 (calculated by minP or MOSTest), and use PLINK to perform a clumping procedure at LD r2=0.6, to identify the list of independent significant SNPs. Second, we clump the list of independent significant SNPs at LD r2=0.1 threshold to identify lead SNPs. Third, we query the reference panel for all candidate SNPs in LD r2 of 0.1 or higher with any lead SNPs. Further, for each lead SNP, it's corresponding genomic loci is defined as a contiguous region of the lead SNPs's chromosome, containing all candidate SNPs in r2=0.1 or higher LD with the lead SNP. Finally, adjacent genomic loci are merged together if they are separated by less than 250 KB. Allele LD correlations are computed from EUR population of the 1000 genomes Phase 3 data.

MOSTest procedure
Let z "2 be the value of signed test statistic (z-score) calculated from the univariate association test between j-th SNP and i-th phenotype. Let z 2 = (z '2 , … , z Q2 ) be the vector of z-scores of j-th SNP across K phenotypes. Let Z = {z "2 } be the matrix of zscores, with rows corresponding to SNPs, and columns corresponding to phenotypes. Further, let Z U = Vz W "2 X be the matrix of z-scores, calculated from association tests on a randomly permuted genotype vector of each SNP. To preserve correlation structure among phenotypes, the permutation was performed only once for each SNP, and the resulting genotype vector was used in association test across all phenotypes.
The MOSTest test statistic, X 2 -, for the j-th SNP is calculated as Mahalanobis norm X 2 -= z 2 Z R \ &' z 2 , where R \ is the KxK correlation matrix of Z U . The null hypothesis of the MOSTest is that z 2 is distributed as a multivariate normal random variable with zero mean and covariance R \ . To compute the theoretical (i.e., under null) p-value of the MOSTest test statistic, we calculated the tail probability that a Chi-square statistics exceeds X 2 -. This probability is given by chi-square distribution with N degrees of freedom, or, equivalently, a gamma distribution, Gamma(K/2,0.5) 26 . Instead of using theoretical values, we fit the two free parameters of the Gamma(a,b) distribution to the observed distribution of X 2 under permutation (shown in Table  S4). The p-value of the MOSTest test statistic is then obtained from a cumulative distribution function of the gamma distribution, p^_`Z = CDF dJeeJ(J,f) (z 2 Z R \ &' z 2 ). Controlling for covariates, such as genetic principal components, is done via pre-residualization of all phenotype vectors, i.e. we replace them with the corresponding residual after multiple linear regression of the phenotype vector on the covariates. Additionally, we perform a rank-based inverse normal transformation of the residualized phenotypes, to ensure that z-scores forming the input to MOSTest are normally distributed.
MOSTest code will be made publicly available through GitHub upon publication of this article, https://github.com/precimed/mostest.

MiXeR analysis
We applied a causal mixture model 8,14 to estimate the percentage of variance explained by genome-wide significant SNPs as a function of sample size. For each SNP, i, MiXeR models its additive genetic effect of allele substitution, β " , as a pointnormal mixture, β " = (1 − π ' )N(0,0) + π ' N(0, σ l -), where π ' represents the proportion of non-null SNPs (`polygenicity`) and σ l represents variance of effect sizes of non-null SNPs (`discoverability`). Then, for each SNP, j, MiXeR incorporates LD information and allele frequencies for 9,997,231 SNPs extracted from 1000 Genomes Phase3 data to estimate the expected probability distribution of the signed test statistic, z 2 = δ 2 + ϵ 2 = N ∑ qH " r "2 β " + ϵ 2 " , where N is sample size, H " indicates heterozygosity of i-th SNP, r "2 indicates allelic correlation between i-th and j-th SNPs, and ϵ 2 ∼ N(0, σ t -) is the residual variance. Further, the three parameters, π ' , σ l -, σ t -, are fitted by direct maximization of the likelihood function. Fitting the univariate MiXeR model does not depend on the sign of z 2 , allowing us to calculate |z 2 | from MOSTest p-values. Finally, given the estimated parameters of the model, the power curve S(N) is then calculated from the posterior distribution p?δ 2 Nz 2 , N).

Gene-set analyses
We made use of the Functional Mapping and Annotation of GWAS (FUMA) online platform (https://fuma.ctglab.nl/) to further process the output from MOSTest and min-P. Through FUMA, we carried out MAGMA-based gene analyses using default settings, which entail the application of a SNP-wide mean model and use of the 1000 Genomes Phase 3 EUR reference panel. Gene-set analyses were done in a similar manner, restricting the sets under investigation to those that are part of the Gene Ontology biological processes subset (n=4436), as listed in the Molecular Signatures Database (MsigdB) v5.2.

Extended Data Morphological features included
Please see Table S1 for an overview of all regional morphological features included in the analyses. We included all features outputted by the default Freesurfer subcortical and cortical processing streams, except for the range of global measures, CSF, surface holes, vessels, optic chiasm and hypointensities, as we did not consider these measures of regional brain morphology.  Table S2 for an overview of the number of whole-genome significant SNPs, the number of independent SNPs, and the number of independent loci (see the Online Methods for definitions), per test and per feature set. The full tables of discovered loci for each feature set, for both MOSTest and min-P are in Data S1-S8. This includes information on lead SNP, genomic location, significance, and mapped genes, as outputted by FUMA. Results from the replication Table S3 lists the results from the replication attempt of the main GWAS via MOSTest and min-P. For this, we used a new batch of neuroimaging data of 4,884 healthy White European UKB participants that was released in October 2019, after we ran our primary analyses and made the results of this available via bioRxiv. This data was processed identically to the main sample and then analysed through MOSTest and min-P. We subsequently calculated the percentage of loci discovered in the main analyses, per test and per feature set, that was nominally significant in this additional sample. As can be seen in Table S3, for each combination of test and feature set, the replication rate was approximately 40%, indicating no major difference in the replication rates. From this it follows that the absolute number of loci replicating is threefold higher for MOSTest than for min-P.

Validation and formal comparison between MOSTest and other tools
Current multivariate approaches, such as canonical correlation analysis as implemented in MV-PLINK 5 and ordinal regression as implemented in MultiPhen 4 , perform a multiple regression in an opposite direction: the genotype vector is used as an outcome variable, while each phenotype is turned into an explanatory variable. The p-value is then calculated from an F-test, which tests for an association between the genotype vector and the most predictive linear combination of phenotypes at each SNP. Figure S1 compares the -log10(p-value), calculated by MV-PLINK and MOSTest. MV-PLINK takes 10,000x longer to run (requiring approximately 250K CPU hours, instead of 24 CPU hours with MOSTest), it was therefore infeasible to run the analysis on the entire set of 7.4M SNPs. Instead we tested a set of 356 LD-independent SNPs (a subset of all genome-wide significant SNPs remaining after LD-based clumping with r2=0.6 threshold) with a p-value from min-P below the genome-wide significance threshold. The results show very high correlation (r=0.9976) between MV-PLINK and MOSTest -log10(p-values), with median of 14.16 (MOSTest) versus 14.40 (MV-PLINK). Another recently developed multivariate test, aMAT 27 , uses the same test statistic as MOSTest, but after applying regularization (spectral filtering) to the correlation matrix R. In our data regularization wasn't necessary, as the conditioning number was reasonably low, see Table  S4, leading to a well-defined matrix R -1 .
Another recently developed multivariate test, aMAT 10 , uses the same test statistic as MOSTest, but after applying regularization (spectral filtering) to the correlation matrix R. In our data regularization wasn't necessary, as the conditioning number was reasonably low, see Extended Data Table 2, leading to a well-defined matrix R -1 .

Rank-based inverse normal transformation
We carried out a rank-based inverse normal transformation of the measures, otherwise non-normally distributed measures can inflate p-values and thus elevate type-I error. Extended Data Figure 2 shows the empirical distribution of MOSTest and min-P test statistics under the null (calculated via permutations), along with p-value calculated from the test, with the rankbased INT, showing correct behaviour. Extended Data Figure 3 shows the distributions after running MOSTest without the transformation, leading to deviations, highlighting that this transformation is important for maintaining correct type-I error.