Common variation near ROBO2 is associated with expressive vocabulary in infancy

Twin studies suggest that expressive vocabulary at ~24 months is modestly heritable. However, the genes influencing this early linguistic phenotype are unknown. Here we conduct a genome-wide screen and follow-up study of expressive vocabulary in toddlers of European descent from up to four studies of the EArly Genetics and Lifecourse Epidemiology consortium, analysing an early (15–18 months, ‘one-word stage’, NTotal=8,889) and a later (24–30 months, ‘two-word stage’, NTotal=10,819) phase of language acquisition. For the early phase, one single-nucleotide polymorphism (rs7642482) at 3p12.3 near ROBO2, encoding a conserved axon-binding receptor, reaches the genome-wide significance level (P=1.3 × 10−8) in the combined sample. This association links language-related common genetic variation in the general population to a potential autism susceptibility locus and a linkage region for dyslexia, speech-sound disorder and reading. The contribution of common genetic influences is, although modest, supported by genome-wide complex trait analysis (meta-GCTA h215–18-months=0.13, meta-GCTA h224–30-months=0.14) and in concordance with additional twin analysis (5,733 pairs of European descent, h224-months=0.20).

Forest plots include results from ALSPAC (Genomic-control corrected), Raine, TEDS and GenR and an inverse-variance fixed effect meta-analysis of all cohorts. Beta coefficients represent the change in rank-transformed expressive vocabulary score (adjusted for sex, age, age squared and the most significant principal components in each cohort) per effect allele from weighted linear regression of the score on allele dosage (MACH2QTL/SNPTEST). Effects are given with respect to the following effect alleles: rs7642482 (G), rs10734234 (T), rs11176749 (T) and rs1654584 (G). +-Genome-wide screen of expressive vocabulary scores between 15-18 months of age. Discovery analysis was conducted in ALSPAC and independent signals (p≤1x10 -4 ) were followed up in GenR (N=2038; Supplementary Data 1). Combined results are from inversevariance fixed effect meta-analysis. Beta coefficients represent the change in rank-transformed score (adjusted for sex, age, age squared and the most significant principal components in each cohort) per effect allele from weighted linear regression of the score on allele dosage (MACH2QTL). Lead signals are indicated in bold. E -Effect allele, A -Alternative allele, Chr -Chromosome, Pos -Position, EAF -Effect allele frequency, Dir -Direction of the genetic effect; a -hg18, b -Genomic-control corrected  +--?

Supplementary
c Genome-wide screen of expressive vocabulary scores between 24-30 months of age. Discovery analysis was conducted in ALSPAC and independent signals (p≤1x10 -4 ) were followed up in Raine (N=981), TEDS (N=1727) and GenR (N=1812; Supplementary Data 1). Combined results are from inverse-variance fixed effect meta-analysis. Beta coefficients represent the change in rank-transformed score (adjusted for sex, age, age squared and the most significant principal components in each cohort) per effect allele from weighted linear regression of the score on allele dosage (MACH2QTL/SNPTEST). Signals based on more than one missing cohort were excluded. E -Effect allele, A -Alternative allele, Chr -Chromosome, Pos -Position, EAF -Effect allele frequency, Dir -Direction of the genetic effect in the discovery and follow-up cohort; a -hg18, b -Genomic-control corrected, c -Available in ALSPAC, Raine and GenR only (Total N=9092) Association analysis comparing directly genotyped versus imputed SNP data (N=8,058) in the discovery cohort (ALSPAC). Beta coefficients represent the change in rank-transformed score (adjusted for sex, age, age squared and the most significant principal components in ALSPAC) per effect allele from linear regression of the score on allele dosage. I/G -Imputed (1)/ directly genotyped (0)SNP, E -Effect allele, A -Alternative allele, EAF -Effect allele frequency, m -months Adjustment of lead signals for potential covariates are shown for the discovery (ALSPAC) and follow-up (GenR) cohort, and inverse-variance fixed effect meta-analysis. Beta coefficients represent the change in rank-transformed score per effect allele from linear regression of the score on allele dosage (R/Stata/SPSS software). Covariate details are described in Supplementary Data 1. Baseline -Baseline models: Expressive vocabulary scores were adjusted for sex, age, age squared and the most significant principal components in each cohort before rank-transformation, Adj -Adjusted models: As baseline models with additional adjustment for the covariate; E -Effect allele, A -Alternative allele, EAF -Effect allele frequency, phet -Heterogeneity p-value based on Cochran's Q-test, R 2 -Adjusted regression R 2 in % Supplementary Association between potential covariates and lead signals for rank-transformed CDI expressive vocabulary scores between 15-18 months of age. Results are shown for the discovery (ALSPAC) and follow-up (GenR) cohort, and inverse-variance fixed effect meta-analysis. Beta coefficients represent the change in gestational age (weeks) per effect allele from linear regression of gestational age on allele dosage, adjusted for sex and the most significant principal components in each cohort. Odds ratios (OR) represent the odds of having lower compared to higher maternal education per effect allele from logistic regression of maternal education (low=1, high=0) on allele dosage, adjusted for the most significant principal components in each cohort. Covariate details are described in Supplementary Data 1 9 , and rs6803202 and rs4535189 with performance on tasks of non-word repetition 10 . Association was studied using locally imputed genotypes on chromosome 3 (based on 1000 Genomes) in ALSPAC, as some variants (rs331142 and rs12495133) had no proxies in Hapmap 2 imputed data. Beta coefficients represent the change in rank-transformed score (adjusted for sex, age, age squared and the most significant principal components in each cohort) per effect allele from weighted linear regression of the score on allele dosage (MACH2QTL). Linkage disequilibrium with proxy SNPs is given in r 2 . E -Effect allele, A -Alternative allele, Proxy -Proxy SNP, Chr -Chromosome, Pos -Position, EAF -Effect allele frequency; a -hg19

Avon Longitudinal Study of Parents and Children (ALSPAC)
Avon Longitudinal Study of Parents and Children (ALSPAC) is a population based longitudinal pregnancy-ascertained birth-cohort in the Bristol area of the UK. Specifically, recruitment sought to enrol all pregnant women with an estimated delivery date between 1st April 1991 and 31st December 1992, who were residents within three Health Districts of the former administrative county of Avon 11,12 .
The initial cohort included 14,541 pregnancies and additional children eligible using the original enrolment definition were recruited up to the age of 18 years, increasing the total number of pregnancies to 15,247 (4.1% Non-White mothers). Information on the children from these pregnancies is available from questionnaires, clinical assessments, linkage to health and administrative records as well as biological samples including genetic and epigenetic information.
Detailed information of all available data can be obtained online (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). Ethical approval was obtained from the ALSPAC Law and Ethics Committee (IRB00003312) and the Local Research Ethics Committees, and written informed consent was provided by all parents.

Generation R (GenR)
The Generation R Study is a population-based prospective cohort from fetal life onwards in Rotterdam, the Netherlands, which has been described in detail elsewhere 13 .

Northern Finnish Birth Cohort 1966 (NFBC 1966)
The Northern Finland Birth Cohort 1966 (NFBC1966) 14 was recruited through maternity health centres and data were collected from women living in Finland's two northernmost provinces, Oulu and Lapland, with expected deliveries between 1st January to 31st December 1966 (n=12,055 mothers).
A total of 12,231 babies were born from the pregnancies, of which 12,058 were live births babies and 173 were stillborn babies. Individuals born in NFBC1966 were found to be representative of all births in the area. All cohort members are Finns (white Caucasians), and less than 1% of these are Gypsies or Lapps. Birth outcomes were collected at delivery by trained medical staff and input into the medical records. The individuals were then followed-up with questionnaires from birth to ages at 1, 14 and clinical examination at 31 years, covering information on health, lifestyle and socio-economic indicators. Each participant or their parents gave written informed consent for the use of the data (Protocols approved by the Ethical Committee of the Northern Ostrobothnia Hospital District).

The Twins Early Development Study (TEDS)
The Twins Early Development Study (TEDS) is a large longitudinal sample of twins born in England and Wales between 1994 and 1996 15 . The focus of TEDS has been on cognitive and behavioural development, including difficulties in the context of normal development. TEDS began when multiple births were identified from birth records and the families were invited to take part in the study; 16,810 pairs of twins were originally enrolled in TEDS. More than 10,000 of these twin pairs remain enrolled in the study to date. DNA has been collected for more than 7,000 pairs, and genome-wide genotyping data for two million DNA markers are available for 3,500 individuals. The TEDS families have taken part in studies when the twins were aged 2, 3,4,7,8,9,10,12,14,16 and currently at 18 years of age. Ethical approval for each stage of TEDS has been obtained from the Institute of Psychiatry Ethics Committee (REC approval 05/Q0706/228), and informed consent was collected from the parents for each assessment.

Western Australian Pregnancy Cohort study (Raine)
The Western Australian Pregnancy Cohort study (Raine) 16 was started as a randomised controlled trial to evaluate the effects of repeated ultrasound in pregnant women in Perth, Western Australia. In total, 2,900 pregnant women were recruited between 1989 and 1991 prior to 18 weeks gestation at