Complex Patterns of Cannabinoid Alkyl Side-Chain Inheritance in Cannabis

The cannabinoid alkyl side-chain represents an important pharmacophore, where genetic targeting of alkyl homologs has the potential to provide enhanced forms of Cannabis for biopharmaceutical manufacture. Delta(9)-tetrahydrocannabinolic acid (THCA) and cannabidiolic acid (CBDA) synthase genes govern dicyclic (CBDA) and tricyclic (THCA) cannabinoid composition. However, the inheritance of alkyl side-chain length has not been resolved, and few studies have investigated the contributions and interactions between cannabinoid synthesis pathway loci. To examine the inheritance of chemical phenotype (chemotype), THCAS and CBDAS genotypes were scored and alkyl cannabinoid segregation analysed in 210 F2 progeny derived from a cross between two Cannabis chemotypes divergent for alkyl and cyclic cannabinoids. Inheritance patterns of F2 progeny were non-Gaussian and deviated from Mendelian expectations. However, discrete alkyl cannabinoid segregation patterns consistent with digenic as well as epistatic modes of inheritance were observed among F2 THCAS and CBDAS genotypes. These results suggest linkage between cannabinoid pathway loci and highlight the need for further detailed characterisation of cannabinoid inheritance to facilitate metabolic engineering of chemically elite germplasm.

therapeutic cannabinoid portfolio expansion 15,31 , uncertainty over the genetic and biosynthetic regulation of alkyl cannabinoid homology hinders the development of novel recombinant cannabinoid breeding lines for biopharmaceutical exploitation.
The cannabinoid structural motif is generated from substrates originating from two independent biosynthetic pathways. Aromatic prenylation of geranyl diphosphate (GPP) and a phenolic alkylresorcinolic acid intermediate form monocyclic cannabinoids that feature a linear isoprenyl residue (e.g. cannabigerolic acid (CBGA)) 32,33 . Chain length of the alkylresorcinol fatty acid (FA) starter unit is thought to determine alkyl cannabinoid homology 34,35 . This hypothesis has been supported using a synthetic cell-free enzymatic platform which produced the propyl-cannabinoid intermediate cannabigerovarinic acid (CBGVA) from a C 3 alkylresorcinol substrate (divarinic acid) 36 . In vivo production of CBGVA and divarinic acid as well as associated end products delta( 9)-tetrahydrocannabivarinic acid (THCVA) and cannabidivarinic acid (CBDVA) have also recently been reported in engineered yeast strains fed the predicted C 3 alkyl cannabinoid intermediate butanoyl-CoA 37 . However, resolution of associated in planta biosynthetic pathways has largely focused on C 5 alkyl species 33,38 .
Cannabidiolic acid synthase (CBDAS) and delta(9)-tetrahydrocannabinolic acid synthase (THCAS) perform stereoselective oxidative cyclisation of the isoprenyl moiety, forming dicyclic and tricyclic cannabinoids. Physical and genetic mapping of THCAS and CBDAS genes has recently allowed for alignment of genetic loci to resolve the cluster of closely-linked genes. These genomic regions appeared abundant with retrotransposable elements as well as pseudogenic tandem repeats, and their positions have been assigned within a larger low recombining pericentromeric gene-poor region 39,40 . Regions also appeared non-homologous between chemotypes which suggests significant divergence between chemotypic lineages, although the reported hemizygosity for THCAS and CBDAS may be an artefact of genome assembly due to the underlying complexity of this region 39,40 . While the presence of tandem THCAS as well as CBDAS arrays would imply oligogenic inheritance, genepool representative germplasm segregate in a 1:2:1 dicyclic: tricyclic cannabinoid ratio characteristic of a single codominant locus B model 41,42 . This suggests cannabinoid synthase tandem arrays may include functionally superfluous repeats which seldom recombine, that although separated in terms of physical distance (>1 Mbp) 40 , segregate in a manner that resembles mutually exclusive B THCAS (THCAS) and B CBDAS (CBDAS) alleles.
The dioecious reproduction of Cannabis often confounds genetic analysis. Previous analysis of tricyclic chemotypes segregating for alkyl cannabinoid composition inferred a multiple locus A 1 -A 2 -… A n model, whereby alleles A pr 1−n and A pe 1−n with additive effect govern the proportion of alkyl cannabinoid homologs 31 . However, chemotypic continuity of the available progeny precluded demarcation of categories, thereby preventing chi-square analysis to resolve the inheritance model. To examine alkyl cannabinoid loci and determine their allelic assortment with cannabinoid synthase genes, we analysed a population segregating for alkyl and cyclic cannabinoid composition. Biparental reciprocal crosses between chemotypes divergent for alkyl and cyclic cannabinoids were performed, generating F 1 hybrid families. A single F 2 generation derived from an F 1 male and female cross was developed for chemotypic segregation analysis. Cannabinoid profiling of F 2 progeny along with genotypic analysis using a THCAS-and CBDAS-specific DNA sequence characterised amplified region (SCAR) marker assay was conducted to investigate interactions between cannabinoid pathway loci. Frequency distributions were determined using kernel density estimation, a statistical method of applying smoothing to a frequency histogram 43 . Kernel density was used to estimate underlying distributions and to demarcate chemotypes objectively into categories, thereby exposing modes of inheritance for alkyl side-chain length.

Results
parental selection. Juvenile plants of three parental lines were screened for cannabinoid composition. C 3 / C 5 alkyl cannabinoid fractions (F C3 /F C5 ) associated with alkyl cannabinoid loci (A n loci) as well as di-/tri-cyclic cannabinoid fractions (F dicyclic /F tricyclic ) associated with the B locus complex were determined from the fresh weight (w/w) cannabinoid content of CBDVA, THCVA, cannabidiolic acid (CBDA) and delta(9)-tetrahydrocannabinolic acid (THCA). Eight individual plants which exhibited either [high F C3 + F tricyclic (e.g. THCVA)] or [high F C5 + F dicyclic (e.g. CBDA)] cannabinoid chemotypes were tentatively assigned homozygote status at the A and B locus complexes (Table 1). These plants from accessions EIO.MW15.P (n = 4), EIO.MW15.T (n = 2) and EIO. MW17.X (n = 2) were selected as parents to generate two biparental reciprocal crosses, forming four F 1 hybrid families (Fig. 1) (Fig. 2a). No consistent maternal or paternal patterns of inheritance were observed for F C3 values among the reciprocal crosses. However, discrete lineage-specific chemotypic distribution patterns were evident, with F 1 hybrid families (EIO.MW17.Y1, EIO.MW17.Y2) from EIO.MW15.T parents displaying cannabinoid composition skewed towards high F C5 as well as F dicyclic values (CBDA) (Fig. 2a,b). Individuals within hybrid families displayed transgressive segregation for a subset of cannabinoids. CBDVA and THCA proportions (%/ total) were greater than parent values, with CBDVA increasing by more than 20-fold (Fig. 2b). F C3 /F C5 variance differed between the four F 1 hybrid families (Table 2), with plants from EIO.MW17.Y1 having the least (Table 2). This, along with the B locus homozygote genotypes of EIO.MW17.Y1 parents, was interpreted as an indication of P1 and P2 homozygosity at the A locus complex. Single male and female plants of EIO.MW17.Y1 were crossed and alkyl cannabinoid segregation assessed in the resulting F 2 generation (Fig. 1).
www.nature.com/scientificreports www.nature.com/scientificreports/ inheritance patterns of the f 2 progeny. A continuous distribution of F C3 /F C5 values was observed among the F 2 progeny (Fig. 3a,b). To minimise classification error, kernel density estimates (KDE) were used to categorise individual plants objectively prior to testing the fit of genetic models. F C3 /F C5 values for the F 2 population were non-Gaussian and instead formed discrete pentapartite distributions. F C3 /F C5 values were skewed towards low F C3 and deviated significantly from the expected 1:4:6:4:1 chemotypic segregation ratio (Fig. 3a, Table 3). KDE of F dicyclic /F tricyclic values formed a predominantly tripartite distribution quasi-compatible with incomplete dominance and a 1:2:1 segregation ratio (Fig. 3b). However, discrete distributions embedded within the intermediate F dicyclic /F tricyclic chemotypes suggested the possibility of additional F dicyclic /F tricyclic categorises (Fig. 3b) Supplementary Fig. S1). For B CBDAS B CBDAS and B THCAS B CBDAS genotypes, quadripartite distributions could be discerned from F C3 /F C5 values (Fig. 4b,c). The most obvious deviation from the F 2 F C3 /F C5 distribution pattern was observed in the B THCAS B THCAS genotypes, with KDE describing a tripartite distribution (Fig. 4d). Analogous with the complete F 2 population (Fig. 3a), B THCAS B CBDAS    www.nature.com/scientificreports www.nature.com/scientificreports/ genotypes had a F C3 /F C5 chemotype distribution resembling a composite of locus B homozygote inheritance patterns ( Fig. 4b-d).
Given the high frequency of F C3 minima chemotypes among the F 2 progeny (Figs 3a, 4b-d), complete dominance at one or more A gene pair locus was considered plausible for F C3 /F C5 inheritance. Epistasis was also evaluated for locus B genotype-specific F C3 /F C5 segregation ratios due to their non-conformity with Mendelian expectations. For the B CBDAS B CBDAS -specific F C3 /F C5 quadripartite distribution pattern, a segregation ratio of 9:3:3:1 was accepted in support of a digenic model describing two independent Mendelian loci ( Table 3). The B THCAS B THCAS -specific tripartite F C3 /F C5 distribution conformed to a 7:6:3 segregation ratio, and an epistatic model describing dominance at one gene pair and partial dominance at the alternative gene pair was accepted (Table 3). A 9:3:3:1 segregation ratio was not supported by the B THCAS B CBDAS -specific F C3 /F C5 values. Given the quadripartite nature of B THCAS B CBDAS -specific F C3 /F C5 distributions, a 7:6:3 segregation ratio could not be tested (Table 3). B THCAS B CBDAS -specific F C3 categories I, II and IV did, however, share similar relative frequency and F C3 /F C5 spacial distribution as the B THCAS B THCAS -specific categories ( Table 3, Fig. 4c,d).

Discussion
The contiguous pentapartite distributions in the F 2 generation for F C3 /F C5 values were not consistent with a polygenic binomial inheritance pattern. Quantitative characters are not exclusive to polygenic modes of inheritance 44,45 . Simple Mendelian inheritance can result in phenotypic continuity when within-genotypic class variation is large and average phenotypic differences between genotypes are negligible 45 . Given that alkyl-cannabinoid loci are associated with enzymatic reactions which are several biosynthetic steps upstream of the metabolites used for chemotypic assessment 46,47 , there is potential for intracellular biophysical interactions affecting the channelling and metabolic flux of pathway intermediates. Formation of multienzyme complexes has been implicated in altering isoprenoid production in Arabidopsis thaliana due to physical interactions between geranylgeranyl diphosphate (GGPP) synthase and downstream GGPP-consuming enzymes 48 . These interactions could affect the expression of alkyl cannabinoids and contribute to the continuous variation in chemotype values observed in filial populations.
The multi-model segregation pattern within the F 2 progeny did not support a monogenic model, and so digenic inheritance was considered (Fig. 3a). In a digenic model with additive effects, a segregation ratio of 1:4:6:4:1 is expected 49 . However, F C3 /F C5 values were skewed towards the F C3 minima parent and a disproportionate number of progeny segregated in the F C3 minima category (Fig. 3a, Table 3). Unequal additive effects at different loci associated with the alkyl cannabinoid pathway, combined with aggregation of trigenic heptapartite categories may also have contributed to an F 2 chemotype segregation skewed towards the F C3 minima parent, although the frequency of F C3 maxima progeny in category V clearly exceeds the 1/64 allowed by this model (Table 3). www.nature.com/scientificreports www.nature.com/scientificreports/ The inheritance of phenotypic traits can be additive or non-additive 50 . If the inheritance of genes indicates an additive effect, the hybrid phenotype will tend to reflect the average effect of the parent genes or midparent value (MPV) 50,51 . Phenotypic traits which deviate from the MPV in hybrid progeny are assumed to be inherited in a non-additive manor 45,50 , and inheritance can be attributed to dominant or epistatic gene effects 52 . Alkyl cannabinoid proportions within F 1 family EIO.MW17.Y1, from which the F 2 generation was derived, showed a negative median deviation from the MPV (44.6% F C3 ), with hybrid progeny displaying a median F C3 value of 35.1 (±6.7 s.d.) (range 24.3-56.0) % (Fig. 2a). Incomplete dominance and/or epistasis may therefore explain the deviation of EIO.MW17.Y1 chemotypes towards the F C3 minima parent (Fig. 2a). A non-additive model may also explain the higher frequency of F C3 minima progeny observed in the F 2 generation (Fig. 3a).
Single seed descent F 8 recombinant inbred lines as well as doubled haploid lines can achieve more than 99.7% homozygosity 53 . The parental lines used in the present study were not inbred to this level of homozygosity and parent heterozygosity may have contributed to the non-orthodox F 1 and F 2 inheritance patterns. Whilst the F 1 family EIO.MW17.Y1 were descendants from parents displaying the largest F C3 divergence (Table 1), they also exhibited the highest level of F C3 homogeneity (Table 2), and displayed a uniform monopartite distribution largely consistent with a single category (Figs 2a, 3a). Taken together these factors suggest parental homozygosity at alkyl cannabinoid-determining loci. Given that within-plant C 3 /C 5 alkyl cannabinoid composition has been found to be stable over key developmental stages, environmental and ontogenetic effects are also likely to have contributed minimally to inheritance patterns observed in the filial generations.
Secondary metabolite gene clusters comprising of two or more non-homologous biosynthetic pathway genes have been identified across a number of diverse plant taxa 54 . A common feature of these clusters is that they contain 'signature genes' in addition to other downstream pathway genes [54][55][56] . Signature genes are often recruited from primary metabolism and encode the first committed biosynthetic steps of the pathway 57 . For alkyl cannabinoid biosynthesis this is predicted to be the formation of alkylresorcinol fatty acid (FA) starter units 35,46,47 , which, when incorporated into the resorcinyl skeletal core 58 , influence directly carbon number of the resulting cannabinoid alkyl side-chain 37 . While the arrangement of cannabinoid synthesis pathway genes appear to be randomly dispersed over five chromosomes 39 , the enzymatic basis for cannabinoid FA starter unit synthesis, as well as genomic positioning of associated loci has yet to be established 39,46,47,59 . Given that cannabinoid synthase loci have been localised to retrotransposon-rich genomic regions compatible with gene cluster formation 39,40 , it is conceivable that upstream alkyl cannabinoid-determining loci may be physically clustered and/or co-inherited with THCAS and CBDAS genomic intervals.
The contrasting segregation ratios identified in CBDAS (B CBDAS B CBDAS ) and THCAS (B THCAS B THCAS ) homozygote F 2 progeny suggests the possibility of linkage between alkyl and cyclic chemotype-determining loci and may explain the distortion of alkyl cannabinoid ratios from a strictly additive polygenic model (Fig. 4a- Table 3). Rearrangement of THCAS and CBDAS genomic regions is evident in the experimental population from the incomplete dominance and irregularity of the intermediate chemotypic distribution (Fig. 3b). Incomplete linkage between the SCAR markers and tandem cannabinoid synthase arrays may have precipitated synthetic genotype-specific inheritance patterns, although uncoupling of the marker with functionally relevant loci is questionable given that genotypes were largely congruent with chemotypic distributions (Fig. 4a). The association of the SCAR marker assay with chemotype has also been established across a range of geographically and genetically divergent Cannabis germplasm 60, 61 .
In vitro feeding studies indicate that THCAS and CBDAS exhibit different catalytic efficiencies towards alkyl homologs 62 . This could be contributing to genotype-specific segregation patterns, although absence of appreciable levels of CBGA at UV 272 nm in filial F C3 maxima chemotypes would suggest otherwise ( Supplementary Fig. S2). The UV profiles of F C5 plants were also dominated by CBDA and/or THCA and no comparable chromatographic peaks with a UV maxima and retention time consistent with CBGVA were observed. Whilst this would infer that cannabinoid synthases are capable of efficiently catalysing CBGA and CBGVA, it is conceivable that the affinity of alkyl homologs to THCAS and CBDAS is influencing the metabolic flux of oxidative cyclisation end-products, and hence the non-Mendelian inheritance patterns observed in filial chemotypes.
In the CBDAS homozygote (B CBDAS B CBDAS ) F 2 genotypes, the 9:3:3:1 ratio could be represented by A C5 1 and A C5 2 dominant and A c3 1 and A c3 2 recessive alleles, with double recessive genotypes A c3 1 A c3 1 A c3 2 A c3 2 resulting in F C3 maxima chemotypes. Aliphatic glucosinolate side-chain length in Brassica oleracea is also regulated in a similar manner by independent assortment of GSL-PRO and GSL-ELONG 63 . The 7:6:3 ratio identified in THCAS homozygote F 2 genotypes describes a more complex model, with dominance at one gene pair, and partial dominance at a second gene pair 64 . When homozygous recessive (A c5 1 A c5 1 ), the first gene pair is epistatic to the second gene pair 64 . Interestingly, a tripartite F C3 /F C5 alkyl cannabinoid distribution was also identified from cluster analysis of a diversity panel comprised of predominantly tricyclic cannabinoid chemotypes 15 .
One speculative scenario to describe the aforementioned epistatic model is that THCAS co-inherited alkyl cannabinoid loci encode sequential interdependent enzymatic steps 65,66 (Fig. 5). De novo short-chain FA synthesis in planta is dependent on a series of enzymatic reactions involving β-ketoacyl-ACP synthase, β-ketoacyl-ACP reductase, β-hydroxyacyl-ACP dehydrase as well as enoyl-ACP reductase 67 , followed by thioesterase hydrolysis to terminate synthesis 68 . The dominant A C3 1 allele at the first gene pair may govern one of four condensing, reductase or dehydrase reactions which contribute towards FA chain length 69 , resulting in increased production of butanoyl-ACP (Fig. 5). A C3 2 at the second gene pair could encode a thioesterase with high catalytic efficiency (kcat) towards butanoyl-ACP, thereby allowing FA plastid exportation of butanoic acid for downstream cytosolic-localised alkylresorcinol synthesis 47,68 (Fig. 5). The A C3 2 modifier would act only on butanoyl-ACP and when homozygous recessive for A c5 1 , FA synthesis would be exclusive to the C 5 alkyl cannabinoid precursor hexanoic acid (Fig. 5).
Previous analysis of six S 1 to S 6 inbred lines segregating for tricyclic C 3 and C 5 alkyl cannabinoids revealed a variety of lineage-specific distribution patterns 31  www.nature.com/scientificreports www.nature.com/scientificreports/ 100% 'pure' C 3 -alkly cannabinoid chemotypes as well as from the mutual crossing of lineages increasing C 3 alkyl cannabinoid proportion from 85.5-95.6% 31 . In the present study, digenic inheritance patterns were adequate to explain F C3 values ranging from 0.7-88.0% (Table 1, Fig. 4a).
Absence of F C3 transgressive segregation or plants displaying F C3 values > 90% suggests a chemotypic plateau has been reached in the experimental population (Table 1), and that parental genes lack a complementary additive effect on F C3 values 70 . A number of enzymatic reactions occur prior to oxidative cyclisation by THCAS and CBDAS. These involve a series of steps leading to FA formation in addition to acyl activation 47 , two-step polyketide synthesis 38 and aromatic prenylation 33 (Fig. 5), of which a minimum of two catalytic steps were found to be allelic and determinant of chemotype (Table 3, Fig. 4b,d). Analysis of cannabinoid biosynthesis in engineered yeast indicates that acyl activation, polyketide synthesis and aromatic prenylation steps are catalysed by promiscuous enzymes, with recombinant pathway proteins capable of producing a variety of alkyl homologs based on the type of FA starter unit fed 37 . Assuming cannabinoid pathway loci are allelic and encode enzymes with varying www.nature.com/scientificreports www.nature.com/scientificreports/ levels of promiscuity 37 , gene-flow at these loci may confer an additive or epistatic effect and a polygenic alkyl cannabinoid inheritance model may be correct. However, with consideration of measurement error and environmental deviation 15,71 , the lineage-specific gene effects reported from mutual crossing may only be marginal.
Regardless of the total number of loci contributing to alkyl-cannabinoid composition, inheritance patterns reported here and elsewhere suggest the partitioning of allelic variation among lineages 31 . Inter-lineage genetic heterogeneity has the potential to confound elucidation of the genetic architecture underlying alkyl cannabinoid composition when using forward genetic approaches. Quantitative Trait Locus (QTL) mapping may only capture a subset of inter-lineage allelic diversity and associated epistasis in natural populations 72 , while in Genome Wide Association Studies (GWAS), genetic heterogeneity among lineages reduces the power to detect causal variants 73,74 . In these cases, synthetic and/or ancestral marker loci may be more predictive of phenotype when two or more gene mutations have a comparable phenotypic effect 74 . Given that lineage-specific evolutionary processes are implicated at cannabinoid pathway loci 40 , comparative genomic approaches using representative germplasm may precipitate diagnostically valuable chemotype-associated markers while also potentially delineating candidate alkyl cannabinoid loci for genome engineering 8 .
The analysis of filial chemotypes was targeted towards variation in alkyl side-chain length. Whilst this analysis improved understanding of the heritability of cannabinoid homology, much remains to be examined. In addition to variation in the topological arrangement of the isoprenoid residue 13 , prenylogous versions of cannabinoids have been identified in the form of sesquicannabigerol 75 . This degree of isoprenylation improved pharmacological potency towards CB 2 R and it is possible that other medically relevant cannabinoid prenylogues may exist 75 . Further non-targeted cannabinomic analyses, combined with forward genetic screens, may further elucidate the molecular basis for cannabinoid homology and ultimately expand the number of therapeutics which can be produced in planta.
In conclusion, the inheritance of alkyl cannabinoid composition and associated allelic assortment with THCAS and CBDAS was examined. Digenic segregation patterns observed in cannabinoid synthase genotypes suggests a complex mode of inheritance for alkyl side-chain length involving epistasis, linkage as well as dominant and lineage-specific gene effects. Linking plant secondary metabolites to underlying biosynthetic genes and associated regulatory networks remains challenging and often requires a multifaceted approach 76,77 . Comparative genomic approaches may contribute to understanding of the molecular basis for alkyl cannabinoid composition and shed light on the recruitment and evolution of pathway genes. Advances in understanding of the inheritance and biosynthesis of the alkyl pharmacophore may also allow for metabolic engineering of Cannabis to accelerate development of novel efficacious plant-derived cannabinoid homologs with augmented therapeutic activities.  Table 1).

Methods
Twenty seeds per accession were sown into 400 mL round pots at a depth of 1.5 cm. Each pot contained a growing medium containing one-part vermiculite, one-part perlite and one-part peat moss, as well as dolomite  (Table 1). Plants which exhibited high F C3 and F tricyclic (e.g. THCVA) or high F C5 and F dicyclic (e.g. CBDA) cannabinoid values were selected for crossing (Table 1, Fig. 1). Sex was provisionally phenotyped from visual inspection during the flower primordia developmental stage prior to male anthesis (code 2001) 78 . Plant vigour was also considered during selection. Eight chemotypically extreme male and female plants high in THCVA (EIO.MW15.P) or CBDA (EIO.MW15.T and EIO.MW17.X) served as parents for four F 1 hybrid families, which were generated from two biparental reciprocal crosses (Table 1, Fig. 1). Generation of 210 F 2 progeny was achieved by crossing a single male and female plant from the F 1 hybrid family which exhibited the highest level of F C3 chemotypic homogeneity. Biparental crosses were performed within pollen secure growth chambers. Pollination of female plants was achieved through exposure to male plants during anthesis.

LC-MS chemotyping.
Liquid chromatography-mass spectrometry (LC-MS) cannabinoid profiling and extraction of individual plants followed methodologies described by Welling et al. 15 . At the vegetative stage (fourth leaf pair, code 1008) 78  www.nature.com/scientificreports www.nature.com/scientificreports/ liquid chromatography (HPLC) grade EtOH and mixed by agitation for 30 min. Extracts were centrifuged to remove particulate matter and 600 μL of the supernatant was transferred into a 2 mL screw cap glass vial.
LC-MS runs were performed using an Agilent 1290 Infinity analytical HPLC instrument (Agilent Technologies, Palo Alto, CA, United States), which comprised of a vacuum degasser, autoinjector, binary pump and diode array detector (DAD, 1260), coupled to an Agilent 6120 Single Quadrupole Mass Selective Detector (MSD). Analytical infrastructure was controlled using ChemStation (Agilent) software (Rev. B.04.03 [54] The mobile phase consisted of a mixture of Milli-Q ® water (channel A) and acetonitrile (channel B) containing 0.005% trifluoroacetic acid (TFA). The initial setting was isocratic at 66% B for 8 min, which was linearly increased to 95% B over 4 min. 95% B was maintained for 1 min and then re-equilibrated to 66% B for 2 min. Total run time including an internal needle wash was 16 min. Flow rate was 0.3 mL/min. Column temperature was set to 30 °C. Injection volume was 3 μL. The MSD was run in atmospheric pressure electrospray ionisation mode (AP-ESI). Selected-ion monitoring (SIM) was used for cannabinoid quantification, with abundant and representative signals obtained in positive mode [M + H] + 15 ; drying gas temperature, 350 °C; capillary voltage, 3000 V (positive); vaporiser temperature, 350 °C; drying gas flow, 12 L/min (N2); nebuliser pressure, 35 psi; scan mass range, 100-1200; fragmentor, 150.
Locus B DNA SCAR marker. Plant DNA was extracted using a Qiagen DNeasy ® Plant Mini Kit, with tissue disruption achieved using a Qiagen TissueLyser ® . DNA purity was assessed using a ThermoScientific TM NanoDrop TM 2000 UV-vis spectrophotometer. An absorbance ratio of ~1.8 at 260/280 nm and symmetric peaks at 260 nm were used to determine DNA quality.
Amplification of CBDAS (B1080) and THCAS (B1190) sequence characterised amplified region (SCAR) fragments was accomplished using a B locus-specific multiplex PCR assay comprising of three primers: a primer common to CBDAS and THCAS FW: 5′ AAGAAAGTTGGCTTGCAG 3′ as well as a CBDAS-specific REV: 5′ ATCCAGTTTAGATGCTTTTCGT 3′ and a THCAS-specific REV: 5′ TTAGGACTCGCATGATTAGTTTTTC 3′ primer 60,79 . PCR parameters followed those described by Welling et al. 61 . Reactions were performed in 0.2 mL 96 well PCR plates in a total volume of 50 µL and contained 1.5 mM of MgCl 2 , 0.2 mM of dNTPs, 0.4 µM of the forward primer, 0.2 µM of the THCAS-as well as the CBDAS-specific reverse primers, and 2 U of Life Technologies Platinum ® Taq DNA Polymerase. Thermocycling parameters for the DNA template were as follows: 94 °C for 2 min, followed by 25 cycles of 94 °C for 30 s, 58 °C for 30 s, 72 °C for 1 min 15 s. No final extension was required. CBDAS-and THCAS-specific fragments were then separated using electrophoresis with a 1% SeaKem ® LE agarose gel stained with GelRed TM . Amplicons were visualised under UV illumination with a Bio-Rad Molecular Imager ® Gel Doc TM XR + system using Image Lab TM software. Statistical analysis. CBDVA, THCVA, CBDA, THCA, CBDV, THCV, CBD and THC fresh weight (w/w) content was determined per plant. Relative proportions of these cannabinoids was used to generate C 3 alkyl (F C3 ), C 5 alkyl (F C5 ), dicyclic (F dicyclic ) and tricyclic (F tricyclic ) cannabinoid fractions within the total cannabinoid fraction. To minimise post-harvest alteration of cannabinoid composition, decarboxylated cannabinoids CBDV, THCV, CBD and THC were expressed as carboxylated acid (COOH) cannabinoids using formulae which compensate for changes in molecular weight 15 . Repeatability between LC-MS replicate extractions were calculated using coefficient of determination (r 2 ). Strong correlations between duplicate extraction replicates were found for the F C3 / F C5 (r 2 > 0.99) as well as for the F dicyclic /F tricyclic (r 2 > 0.99) values. Mean extraction replicate values were therefore used for statistical analysis.
Alkyl cannabinoid data from the F 2 generation was visualised in a graphical format used previously 31 ( Supplementary Fig. S3). Analysis of F 2 chemotypic distribution patterns revealed stepwise increases in F C3 values, although accurate demarcation of data points was not possible (Supplementary Fig. S3). Histograms were then developed to establish frequency distributions for categorisation ( Supplementary Fig. S3). However, the continuity of chemotype prevent formation of obvious break points in the data (Supplementary Fig. S3). The arbitrary selection of bins was also deemed inappropriate for determining distributions due to the potential for incorrect assignment of genotype (classification error). To address these issues, kernel density was used to estimate the unknown underlying distributions within the data. This constructed an estimate of the density function from observations within the data 43 , generating a fitted solid line over the F C3 value data points ( Supplementary  Fig. S3). The area under kernel density estimates (KDE) was then used to demarcate F C3 values and to objectively categorise plants ( Supplementary Fig. S3), circumventing arbitrary categorisation and the artificial grouping of F C3 values.
GenStat 64-bit Release 18.1 (VSN International Ltd.) software was used to calculate Bartlett's test for homogeneity of variances, KDE and Pearson's chi-squared (χ 2 ) goodness-of-fit. For KDE, automatic estimation of