Identification of NCAN as a candidate gene for developmental dyslexia

A whole-genome linkage analysis in a Finnish pedigree of eight cases with developmental dyslexia (DD) revealed several regions shared by the affected individuals. Analysis of coding variants from two affected individuals identified rs146011974G > A (Ala1039Thr), a rare variant within the NCAN gene co-segregating with DD in the pedigree. This variant prompted us to consider this gene as a putative candidate for DD. The RNA expression pattern of the NCAN gene in human tissues was highly correlated (R > 0.8) with that of the previously suggested DD susceptibility genes KIAA0319, CTNND2, CNTNAP2 and GRIN2B. We investigated the association of common variation in NCAN to brain structures in two data sets: young adults (Brainchild study, Sweden) and infants (FinnBrain study, Finland). In young adults, we found associations between a common genetic variant in NCAN, rs1064395, and white matter volume in the left and right temporoparietal as well as the left inferior frontal brain regions. In infants, this same variant was found to be associated with cingulate and prefrontal grey matter volumes. Our results suggest NCAN as a new candidate gene for DD and indicate that NCAN variants affect brain structure.


Material and Methods
All methods were carried out in accordance with the relevant guidelines and regulations. DD pedigree. The DD pedigree was originally recruited for the genetic study of dyslexia by Nopola-Hemmi et al. 27,28 from the Department of Paediatric Neurology at the Hospital for Children and Adolescents, University of Helsinki, Finland. The criteria for the probands (children under 16 years of age) included in the study were a severe reading impairment, an extended family history of dyslexia (at least four affected individuals) and a pedigree suggestive of autosomal dominant transmission. All individuals were Finnish, of North European/Caucasian origin and native Finnish speakers. Participants from the nuclear families and their first-and second degree relatives were interviewed by a pediatric neurologist or neuropsychologist and were sent a detailed questionnaire regarding their reading and spelling difficulties, school history, and remedial education.
All individuals included in the analyses (both affected and unaffected) were tested by a clinical neuropsychologist to verify the diagnosis of dyslexia. The diagnostic assessment included intelligence tests (WAIS-R, WISC-R), age-and grade-appropriate Finnish reading and spelling tests 29,30 and a neuropsychological test 31 . The criteria for dyslexia included a pronounced history of reading problems, remarkable deviation in age-related reading skills (depending on the age, at least two years) and a normal performance intelligence quotient (IQ > 85). The neurocognitive type of dyslexia segregating in the DD family consisted of deficits in phonological awareness, verbal short-term memory, and rapid naming. As the diagnoses in the family were performed more than 15 years ago, certain phenotypic information may have been lost or never collected. The protocol for diagnosing DD was, however, well according to the accepted approach at the time. Due to this, we here analyse only the binary DD affected/not affected phenotype in the current study.
Genomic DNA from whole blood was available from 14 family members in the current study; 8 of them had confirmed DD (Fig. 1). Samples from the family were collected and analyzed according to ethical permissions Dnro 53/2006 and Dnro 4U/2016 by the ethics committee of the Central Finland Health Care District. Informed consent was obtained from all participating family members.
All available samples were genotyped on the Illumina HumanCoreExome12v1-0 genotyping chip (Illumina, San Diego, USA) and non-parametric multipoint linkage (NPL) analysis was run using Merlin version 1.12 32 and the-exp option. The genotyping and linkage analysis are described in detail in Supplementary methods. Next generation sequencing. One hundred ng of genomic DNA from the affected individual 3935 was used for exome sequencing using an AmpliSeq library on the Ion Proton platform (Life Technologies, Carlsbad, CA) at the Uppsala Genome Center (Science for Life Laboratory, Uppsala University, Uppsala, Sweden). Variants identified using the Ion Torrent Suite software (Life Technologies) were filtered to retain only non-synonymous-, splice site-or stop codons variants. These were filtered further, to retain rare variants of < 1% frequency in the 1000 Genomes ALL dataset (all populations) 33 and the ESP6500 dataset (all populations) within the regions of linkage (NPL > 1). The exome sequencing is described in more detail in Supplementary methods.
Two μg of genomic DNA from a second affected family member (3821) were used for whole-genome sequencing at Science for Life Laboratory, Stockholm, Sweden using a TruSeq library run on Illumina HiSeqX and the GATK best-practice pipeline. The sequencing is described in more detail in the Supplementary methods. As we hypothesized a highly penetrant autosomal dominant inheritance of DD in the family and a variant of strong risk-effect, we opted to extract only exonic variants and focused on variants shared by the two sequenced individuals, 3821 and 3935. The filtering steps were as above.
WES and WGS sequence data are available in the DDBJ/EMBL/GenBank databases under the accession number PRJEB12695. We used the human genome reference hg19 for all genomic positions of variants described here. Individuals 3935 and 3821 were selected for sequencing based on being affected carriers of the putative risk haplotype and due to the availability of high quality DNA.
Gene expression correlation analysis. The Functional Annotation of the Mammalian Genome 5 (FANTOM5) human/mouse tissue and cell line promoter expression database 34 was used to examine the expression of NCAN and compare it to the global expression levels of a representative set of DD candidate genes (MRPL19, KIAA0319, ROBO1, CTNND2, DYX1C1, CEP63, KIAA0319L, PCNT, GCFC2, CYP19A1, DCDC2, FOXP2, CNTNAP2 and GRIN2B) in 127 of the 152 tissue samples in the FANTOM5 dataset. We looked at the expression in all available human tissues (to get a general picture of the expression patterns), in brain tissue only (presumably yielding stronger correlation within the set of genes expressed primarily in neurological tissue), and in non-brain tissue (to further look into the expression patterns of the DD risk genes that are more ubiquitously expressed). Universal tissues, total RNA samples, and tissues with too shallow sequencing depths or low RNA quality were excluded from the analysis. The final dataset included 78 non-brain tissues and 49 brain tissues, listed in Supplementary Table 1. The 'robust' set of promoters was used for the gene-based analysis.
Spearman correlations (Rho) between the gene expressions were calculated for brain tissues, non-brain tissues, and all tissues (brain and non-brain tissues). As this non-parametric, ranking-based method gives R-values from −1 (complete negative correlation of ranks) to 1 (complete positive correlation of ranks), we here considered R-values of 0.8 or higher to indicate strong (positive) correlation. Hierarchical clustering was used to produce the correlation heatmaps.
Brain imaging. As common variants in a number of previously identified DD susceptibility genes have shown significant associations with certain brain structures, we analyzed the potential association of a common variant in NCAN with structural variation in brain MRI data. We chose the polymorphism rs1064395 (NC_00019.10:g.19250926 G > A) in NCAN, a common variant previously associated with cognitive performance in healthy individuals 35 and with grey matter volumes 36 . The minor alleles of rs1064395 and the rare variant rs146011974 (introduced further in Results) are carried on the same rare haplotype (data not shown) and therefore likely tag the same rare haplotype segregating with DD in the present family.
A longitudinal dataset (Brainchild) of brain imaging in 76 healthy Swedish children and young adults (age 6 to 25 years, 41 males and 35 females) was used to test for correlation of the NCAN genetic variant to white-and grey matter volumes. These same individuals have been studied previously 5,13,21,22,37 . Informed consent was obtained from all subjects (ethical approvals 2007/241-31/3 and 2012/116-32, Stockholm, Sweden); genomic DNA was analyzed using the Affymetrix Genome-wide SNP array 6.0 (Santa Clara, CA, USA). T1-weighted brain imaging was performed three times, each two years apart, by 3D magnetization prepared rapid gradient echo (MP-RAGE) sequence with TR = 2300 ms, TE = 2.92 ms, field of view of 256 × 256 mm 2 , 176 sagittal slices, and 1 mm 3 voxel size. Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL) 38 was performed to segment the brain into grey matter, white matter and CSF. An 8-mm Gaussian kernel was applied to the segmented white matter images. All modulated white and grey matter images were then analyzed using a flexible factorial design in Statistical Parametric Mapping (SPM) software (www.fil.ion.ucl.ac.uk/spm). The genetic variant rs1064395 was entered as a main factor, and the model was corrected for age, sex, handedness and total size of white matter volume. The gene interactions by age and sex were also added into the model. The main effect of the genetic variant on white matter structure was assessed by considering two thresholds (p = 0.01 and p = 0.001), and the clusters found significant at FDR corrected at the cluster level using nonstationary cluster extent correction 39 . The genotype counts for rs1064395 (NCAN) were 56 (CC) and 20 (CT). The frequency of the minor allele (T) of rs1064395 was 13% in the Brainchild dataset. To assess the overlap between the regions of white matter we overlaid the current NCAN findings at a threshold of p = 0.01 with the previous results from our imaging studies on DD/reading disability 5,13,21,22,37 and susceptibility genes. Significant regions were overlaid on a human brain MRI template using MRIcro software (www.mccauslandcenter.sc.edu/mricr).
A second, independent dataset was used to explore the effect of rs1064395 on early structural characteristics of the brain. The FinnBrain Cohort is a Finnish general population-based pregnancy cohort where the main interest is to delineate the associations between maternal psychological well-being during pregnancy and the future development of the children (www.finnbrain.fi, Karlsson et al., submitted). The study was approved by the Ethics Committee of the Hospital District of Southwest Finland (ETMK: 31/180/2011 nr. 210) and informed consent was obtained from all subjects. The FinnBrain sub-sample available for the current study included 26 healthy, term-born infants (14 girls and 12 boys), imaged at mean age 22.5 days (SD 7.2) from birth. All MRI scans were performed at the Turku University Hospital using Siemens Magnetom Verio 3 T scanner (Siemens Medical Solutions, Erlangen, Germany). 12-element Head Matrix coil allowed the use of the Generalized Autocalibrating Partially Parallel Acquisition (GRAPPA) technique to accelerate acquisitions (PAT factor of 2 was used). 2D Dual Echo TSE (Turbo Spin Echo) sequence was used to acquire anatomical PD-and T2-weighted images. Parameters were optimized so the "whisper" gradient mode could be used in order to reduce acoustic noise during the scan. Slice thickness was 1 mm in order to acquire isotropic 1.0 × 1.0 × 1.0 mm voxels. TR time of 12070 ms and effective TE times of 13 ms and 102 ms were used to produce both PD-and T2-weighted images from the same acquisition. The total number of slices was 128. Only T2-weighted images were used in the subsequent analysis. Visual quality control (QC) was performed on the acquired volumes. Images were analyzed with iBEAT software 13 that enables N3 bias correction, tissue segmentation and anatomical labeling (AAL atlas). iBEAT segmentation produces RAVENS maps that resemble modulated segmentation maps that are used in adult voxel-based morphometry (VBM) with more reliable metrics on highly abnormal brains (eg. infants). Subject to well-known uneven intensity distribution of the infant brain 14 , there were sub optimal results in segmenting the white matter, namely, portions of subcortical grey matter structures and myelinated central white matter appeared in grey matter segments. Correspondingly, white matter analysis in this data set was deemed unreliable and was not used in the analysis. Grey matter RAVENS maps were smoothed with 8 mm FWHM in SPM12 (http://www.fil.ion.ucl.ac.uk/ spm/software/spm12/). Smoothed segmentations were entered into statistical modeling within the SPM. Similar to the Brainchild data set, we used a flexible factorial model, where the genetic variant rs1064395 was entered as a main factor and gestational age and sex as nuisance variables. The results were thresholded with two thresholds to provide an open view to the results: at p < 0.001 and with p < 0.01, FDR corrected (p < 0.05) for multiple comparison at the cluster level. The analysis was carried out in SPM and the MNI coordinates (in adult space) were used to identify the brain regions by overlaying the contrast on top of iBEAT AAL template with mricron.
In a complementary analysis, we used the individual tissue volumes that are produced in the iBEAT image processing pipelines. The volumes were calculated from the AAL template labeling and represent individual grey matter volumes (as contrasted to relative volumes/density of RAVENS maps).
The genotype counts for rs1064395 were 20 (CC) and 6 (CT). Genotyping was performed at the Estonian Genome Centre (Tartu, Estonia) on the Illumina Infinium PsychArray BeadChip. See Supplementary methods for further details on the analysis.

Results
Ten suggested regions of linkage to DD. We performed linkage analysis using single nucleotide variant genotype data. NPL plots and scores are shown in Supplementary Fig. S1 and Supplementary dataset S1, respectively. In the pedigree under study (Fig. 1), we identified ten genomic regions with NPL scores > 1 (Table 1) constituting the putative regions of interest in the subsequent exome data analysis. These regions did not overlap with the findings in Kaminen et al. 40 , where only a subset of the current pedigree was included (data not shown).

Analysis of exome variants in the linkage regions.
We extracted the coding variants in affected individuals 3935 and 3821 (Fig. 1). Table 2 shows the number of variants at each filtering step and in each region of linkage while Supplementary datasets S2 and S3 show the full list of these variants in 3935 and 3821, respectively. We identified only one rare non-synonymous variant within the regions of linkage that was shared by both individuals; it was located on chromosome 19. Sanger sequencing confirmed that all but one DD affected individual  Gene expression pattern correlations with other dyslexia candidate genes. In order to better understand the role NCAN may play in DD, we studied the correlation of NCAN expression with the expression profiles of a small subset of previously described known or candidate DD susceptibility genes across a wide selection of human tissues/organs. The aim was not to look systematically at all known or suggested DD genes, merely to compare NCAN to a subset of these and test if NCAN might make a plausible addition to this group. Figure 2 shows heat maps of these correlations. When all the tissue samples from the FANTOM5 dataset ( Fig. 2A)  For expression in brain only (Fig. 2B), we observed a strong correlation (R = 0.84) between NCAN and GRIN2B, and a somewhat weaker one between NCAN and KIAA0319 (R = 0.75) (Supplementary Table S2). The correlations were weaker in non-brain tissues (Fig. 2C), the strongest being between NCAN and CNTNAP2 (R = 0.65).
Genetic variation and brain structure. We have previously presented genotyping and neuroimaging data indicating that common genetic variation in several DD candidate genes significantly associated with white or grey matter volumes 5,13,21,22,37 . In order to further investigate the role of NCAN in DD, we analyzed the association of a common genetic variant (rs1064395) in NCAN with structural white and grey matter variation in a longitudinal sample of 76 children and young adults (Brainchild dataset).
We found significant association of the minor allele of rs1064395 with increased white matter volume in the right temporoparietal and frontal regions (p = 0.01; p at cluster level: p FWE corrected = 1.56 × 10 −6 ) and the left temporoparietal, frontal and occipital regions (p = 0.01; p at cluster level p FWE corrected = 4.48 × 10 −6 ) (in red in Fig. 3A and Supplementary Table S3). We repeated the analysis with a lower p threshold to have fewer false positive voxels. The same clusters were significant at this threshold level (shown in orange in Fig. 3A   (JHU) white matter atlas 42 . Moreover, we found an association of the same allele with greater grey matter volume in the right superior temporal cortex (cluster level p uncorrected = 0.009, peak coordinate: 68,−7,−5). However, this region did not survive FDR or FWE correction at the cluster level (cluster level p FWE corrected = 0.22). Previous studies have shown that the minor allele variants in several known DD candidate genes associate with larger white matter volume in certain brain regions 5,13,21,22,37 . We overlaid these brain regions with the NCAN regions from the present study. The right temporoparietal region associated with rs1064395 overlapped with a region previously associated with the DD susceptibility genes KIAA0319, DYX1C1 and MRPL19 21 as well as CEP63, a more recently published DD susceptibility gene 13 (Fig. 3B and Supplementary Table S3). In addition, the left frontal region associated with rs1064395 overlapped with a region associated with variation in CTNND2, A -All tissues B -Brain tissues C -Non-brain tissues Expression data was extracted from the FANTOM5 dataset, at http://fantom.gsc.riken.jp/5/data/.
another recent candidate DD susceptibility gene 5 . White matter volume in a third region, the left temporoparietal region, was associated with variation in the DYX1C1, DCDC2 and KIAA0319 genes 21 as well as in the MRPL19 gene 37 . In summary, these results showed that brain regions for which white matter volume associated with genetic variation in NCAN overlapped to a significant extent with those previously implicated for other DD susceptibility genes.
To further explore the effect of the NCAN rs1064395 variant on brain structure, we looked at its associations with grey matter volumes in an independent set of infants (the FinnBrain study). We found that the minor allele (CT) of rs1064395 was associated with increased volume in infant grey matter in the left inferior parietal lobule, the right precentral gyrus and the right middle frontal gyrus. With a more lenient thresholding (p < 0.01) larger grey matter volumes associated to the minor allele in the bilateral middle cingulate cortices and superior frontal cortices ( (Fig. 4, Supplementary Table S5). Regression analysis on volumes yielded results that are in line with the VBM analysis (Supplementary Table S6).
While we were unable to identify any putative DD variants within the linkage region on chromosome 8, it contains the 5' and upstream regions of CSMD1 (CUB and Sushi Multiple Domains 1) gene. Variants in CSMD1 have been associated with disorders and traits affecting the central nervous system such as bipolar disorder disease 43 and schizophrenia [44][45][46] ; as well as general cognitive ability and executive function in healthy human subjects 47 . This gene may thus be of interest also in future studies of DD and related cognitive traits.

Discussion
We used linkage analysis to prioritize potential candidate regions for autosomal dominant DD in a multiplex pedigree, followed up by evaluation of exome-sequencing data in those regions. This allowed us to highlight a new plausible candidate gene for DD, NCAN. Neither this gene nor genomic region have been previously suggested by linkage or GWA studies as a susceptibility locus or gene for dyslexia. While our average sequencing depth is satisfactory, we cannot exclude the existence of additional variants that would go unnoticed because of inexact base calling. Furthermore, even though the detected NCAN variant segregating with DD in the studied family leads to an amino-acid substitution, the true genetic risk variant(s) might conceivably be non-coding variation.
The rare NCAN variant identified in the current study is unlikely to explain a significant proportion of DD in the general population, and any studies of this specific variant are likely to be underpowered. This finding served, however, to highlight the NCAN gene as a potential novel DD susceptibility gene, and our subsequent work thus focused on understanding the general involvement of this gene in DD.
NCAN shares similar expression profiles with CNTNAP2, CTNND2, KIAA0319 and GRIN2B, with higher expression in several brain regions and lower expressions in the rest of the body. This was reflected in the strong correlations seen in the expression correlation heatmaps. The specific functions of these genes and how their dysfunction might contribute to DD susceptibility are only partially understood. It is, however, tempting to speculate that these genes may represent a group of genes involved in shared mechanisms or pathways that are critical for the optimal development of the brain and its function in literacy skills. DD genes with different patterns of expression, e.g. the CEP63 and DCDC2 genes that are expressed in most of the tissues in the human body, are likely to contribute to DD risk through other, potentially less brain-specific, mechanisms.
NCAN is important for cell adhesion and neuronal cell migration and is a negative regulator of neurite outgrowth 48 . The rare variant rs146011974 lies within a highly conserved region in an EGF-like domain near the C -terminus of the protein. While it remains unclear how an amino acid change in this domain might affect DD risk, slight variation in signaling through this domain could plausibly influence communication of neurons with each other and their shared environment.
Carriers of the rare NCAN variant were unfortunately not available for imaging analysis. Instead, we looked further at how common variation in NCAN might affect white and grey matter volumes in two independent imaging datasets. The rationale behind this approach is that if a rare high-risk variant in NCAN drives DD susceptibility in the pedigree under study, common variants in the same gene might modulate brain structure in the general population. The common variant rs1064395 within the NCAN 3' UTR here correlates with variation in white and grey matter structures and has previously been implicated in a number of psychiatric phenotypes such as bipolar disease 49 , schizophrenia 50 , cortical folding in schizophrenia 51 and mania 52 . This is not the first time the same genetic variants have been connected to both dyslexia and other neurological disorders 53,54 .
The white matter variability linked to NCAN in the Brainchild dataset was found in the left and right temporoparietal, occipital and frontal regions. Of these regions, the left inferior frontal area, with an extension to the region functionally defined as the Broca's area, is known to be of importance for language processing, and overlaps with the region previously associated with CTNND2 5 . The other cluster in the left temporoparietal region is also associated with several previously reported DD candidate brain regions 22,37 . Disrupted brain activation patterns have also been reported in the left temporoparietal region when comparing poor readers to normal controls [55][56][57] . Moreover, both grey matter and white matter deviations in this region have been related to dyslexia and impaired reading 17,20,58 . A recent study 35 showed association of rs1064395 with brain activation in the temporal lobe during a semantic verbal fluency task in healthy subjects. Previous studies have also reported associations of the same genetic variant with grey matter volumes of the hippocampus and amygdala 36 as well as with volumes of the occipital region and prefrontal cortex 51 . We went on to look further at this common NCAN variant in an independent dataset based on infant brain imaging. We were particularly interested in testing if the putative novel DD susceptibility gene might affect brain structure already at such a young age. We detected associations of grey matter volumes with NCAN variation in the right superior frontal cortex, inferior parietal and bilateral cingulate cortices in the infants. Interestingly, the direction of the effect was the same; the minor allele of rs1064395 was correlated with increased white matter volumes in the older imaging population and increased grey matter volumes in the infant dataset.
This suggests that NCAN may be important for efficient neuronal development, with the minor and putative DD risk allele of rs1064395 correlating with less efficient neuronal guidance. Such a defect has been suggested previously also with DYX1C1, DCDC2, ROBO1 and KIAA0319 (reviewed in ref. 8). Although the brain regions implicated are not language areas per se, the variation in them may modulate later cognitive performance in general 1 , and thus have intricate connections to reading, which requires a coordinated brain network function that directs attention 59 . It is possible that the white matter associations relate to the implicated initial grey matter characteristics, as the maturation of the white and grey matter in frontoparietal networks is strongly co-modulated 60 , but this remains to be studied with additional imaging techniques such as DTI. Of additional importance is the fact that the FinnBrain imaging study was performed soon after birth, minimizing the influence of postnatal environmental factors and thus yielding a possible insight into brain structural development as early as possible to in utero.
Although we were able to assess the genetic associations to brain morphology in a wide spectrum of ages (from infants to young adults), direct comparison of the findings from the two datasets is problematic and larger longitudinal datasets would be needed. Growing interest in low frequency variants also means that future studies need to be larger in order to attain an acceptable size for each genotype group.
The small size of both imaging datasets, the inherent limitations in studying only a common variant, and the questions arising recently about the validity of certain imaging studies 61 and their power 62 , call for caution in drawing conclusions. However, with these datasets, we can lend robust support for the idea that NCAN is related to brain developmental processes and structural development at the more general level. How these findings directly relate to development of DD needs to be studied further.
In conclusion, we report here a family with multiple cases of DD individuals, and highlight NCAN as a putative novel DD susceptibility gene, which shares similar RNA expression profiles throughout multiple tissues within the human body with KIAA0319, CTNND2, CNTNAP2 and GRIN2B. The brain imaging data reported in the present study support that NCAN variants have effects on brain structure from infancy to early adulthood, but the possible associations between genotype, neuroimaging and phenotype remain to be addressed in future studies.