Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits

It has been hypothesized that individually-rare hidden structural variants (SVs) could account for a significant fraction of variation in complex traits. Here we identified more than 20,000 euchromatic SVs from 14 Drosophila melanogaster genome assemblies, of which ~40% are invisible to high specificity short-read genotyping approaches. SVs are common, with 31.5% of diploid individuals harboring a SV in genes larger than 5kb, and 24% harboring multiple SVs in genes larger than 10kb. SV minor allele frequencies are rarer than amino acid polymorphisms, suggesting that SVs are more deleterious. We show that a number of functionally important genes harbor previously hidden structural variants likely to affect complex phenotypes. Furthermore, SVs are overrepresented in candidate genes associated with quantitative trait loci mapped using the Drosophila Synthetic Population Resource. We conclude that SVs are ubiquitous, frequently constitute a heterogeneous allelic series, and can act as rare alleles of large effect.


Supplementary
. Alignment of the founder genomes and the reference genome to the A4 (top) reference sequence to show the 2R euchromatic gap 1 . The gap (shaded region) in the 2R assembly of ISO1 is spanned in all of the sequenced strains described here. The gap falls within a repetitive region and harbors SVs in several founder strains. The alignment showed here corresponds to the genomic region in ISO1 marked by dotted lines.
Supplementary Figure 2 The 2R euchromatic gap in ISO1 and 50Kb sequence flanking it are missing in B4. The 58kb deleted sequence harbors several functional, but presumably non-essential, genes. The deleted sequence is replaced by a ~2kb Hobo TE fragment in B4.
Supplementary Figure 3. Alignment of de novo founder genome assemblies to ISO1 reference genome showing the absence of the ISO1 euchromatic gap on 3L in the other assemblies. The ISO1 gap is due to duplication of a TE which is private to the ISO1 strain 1 . The alignment gap is due to the absence of the TEs (pink lines) in the new sequenced strains.
Supplementary Figure 5. Alignment dot plot of the ISO1 genomic region (3L:7667000-7696500) harboring mis-annotated SVs in A1 and A7. The dots represent alignment start and end, whereas the line connecting two dots represent an unbroken alignment between the corresponding sequences in X and Y axes. a) Dot plot between ISO1 to ISO1; b) dot plot between ISO1 sequence to its corresponding region in A1 (A1.3L:7601446-7636670); c) dot plot between ISO1 sequence and its corresponding sequence in A7 (A7.3L:7735063-7774259). The mis-annotations designate mutations in A7 and A1 as tandem array size increase and non-TE insertions . As evidenced here, both A7 and A1 possess more sequence compared to their counterpart in ISO1. Thus mis-annotations identify the mutations correctly, but the inferred insertion coordinates are off by few hundred bases. Cyp6g1. An Accord LTR fragment 3,4 is also inserted upstream of the Cyp6g1 copy. The AB8 allele has Roo insertions in the last exons of the single copy Cyp6g1 and Cyp6g2, presumably disrupting these two genes. B4 contains a 5Kb Accord and a 6.7 Kb Gypsy insertion in the same position where A6 has the full length Accord insertion. B4 also possess an Accord LTR in the same position as A6. b) Cyp6g1 expression level in female heads for A6 and AB8 (A8 and B8) genotypes are among the highest and lowest Cyp6g1 expression levels in the RILs. This is consistent with their SV genotypes.
Supplementary Figure 8. a) Duplication alleles of Drsl5 in A3 and B2. The spacer sequence is derived from the first exon and intron of the gene Kst and likely harbors enhancer sequences 5,6 . The spacer sequence in B2 also contains a 5 Kb Tirant LTR retrotransposon. B) Drsl5 expression in A3 is very high but nearly absent in A4 which possesses only one Drsl5 copy.
Supplementary Figure 9. Assembly alignment of DSPR founders to the reference strain ISO1 showing insertion of a retrotransposon 412 in the gene Dat that causes the visible mutation speck (sp 1 ) in ISO1. The de novo assemblies described here enable discovery of such mutations simply either by looking at the UCSC browser representation of the multiple genome alignment as displayed here 7 (gap corresponding to the pink bar), or by searching through the VCF file (displayed as red bar in the SVMU track).
Supplementary Figure 10. Multiple genome alignment of the de novo assemblies viewed through UCSC genome browser reveals deletion of a 1.8Kb segment of the 3rd exon of the Cinnabar gene in the reference strain ISO1 and A4. This deletion underlies the classical mutation cn1 8 and causes bright red eye color. The assemblies provide convenient means of identifying the molecular nature of the A4 eye color mutation.
Supplementary Figure 11. Different SV alleles at Cyp28d genes. The POGON1 element in A7 is inserted within an intron of Cyp28d2, whereas the FW element in A1 is inserted within an exon of the second Cyp28d1 copy. The second sequence copy of A1 and B4 is missing part of the first exon of Cyp28d2. As evidenced here, B1 consists of three copies of the same 15kb segment that is duplicated in A2, along with a 7.5kb Gypsy insertion in the second copy. Although the nicotine resistance data for RILs carrying the B1 allele are not available 9 , the similarity of the genomic region copied in B1 and A2 suggest that RILs with B1 genotype at this locus could be as resistant to nicotine as the A2 genotype RILs.