Correlations between long inverted repeat (LIR) features, deletion size and distance from breakpoint in human gross gene deletions

Long inverted repeats (LIRs) have been shown to induce genomic deletions in yeast. In this study, LIRs were investigated within ±10 kb spanning each breakpoint from 109 human gross deletions, using Inverted Repeat Finder (IRF) software. LIR number was significantly higher at the breakpoint regions, than in control segments (P < 0.001). In addition, it was found that strong correlation between 5′ and 3′ LIR numbers, suggesting contribution to DNA sequence evolution (r = 0.85, P < 0.001). 138 LIR features at ±3 kb breakpoints in 89 (81%) of 109 gross deletions were evaluated. Significant correlations were found between distance from breakpoint and loop length (r = −0.18, P < 0.05) and stem length (r = −0.18, P < 0.05), suggesting DNA strands are potentially broken in locations closer to bigger LIRs. In addition, bigger loops cause larger deletions (r = 0.19, P < 0.05). Moreover, loop length (r = 0.29, P < 0.02) and identity between stem copies (r = 0.30, P < 0.05) of 3′ LIRs were more important in larger deletions. Consequently, DNA breaks may form via LIR-induced cruciform structure during replication. DNA ends may be later repaired by non-homologous end-joining (NHEJ), with following deletion.

L ong inverted repeats (LIRs) are imperfect or near to perfect repetitive DNA sequence elements that can form secondary stem-loop structures in prokaryotic and eukaryotic genomes [1][2][3] . LIRs may induce stem loops through matching complementary repeats placed in inverted orientation, convertible to the hairpins in single stranded DNA or cruciforms in double stranded DNA 4,5 . It was found that LIRs involved deletion and recombination events in yeast Saccharomyces cerevisiae 6,7 .
Gross gene deletions are genomic rearrangements that can be observed in many types of human cancers and inherited diseases [8][9][10][11][12][13] . Deletion and duplication mutations can vary in size from thousands to hundreds of thousands of base pairs in length in the human genome 14 . It has been proposed that three major mechanisms are responsible for genomic rearrangements, including human genome deletions 15 . They are non-allelic homologous recombination (NAHR), non-homologous end-joining (NHEJ), and fork stalling and template switching (FoSTeS) models. Some genomic rearrangements are recurrent, with a common size and fixed breakpoints between low copy repeats (LCRs). Recurrent rearrangements are mostly mediated by NAHR between two LCRs 16 . Conversely, non-recurrent rearrangements have different sizes and distinct breakpoints in each event, and are performed by NHEJ and FoSTeS models 15 . Gu et al. suggests that the FoSTeS model is a replication-based rearrangement pathway that may operate over long distances (from 120 to 550 kb) through template switching 15 . Alternatively, it has been proposed that palindrome or cruciform structures may stimulate the FoSTeS model 15 .
Breakpoints of gross gene deletions coincide with non-B DNA conformations, including hairpin/cruciform structures 17 . Hairpins are reported to form by direct repeats 18 . Direct repeats have ranges from 2 to 8 bp, and are associated with small deletion breakpoints in human genetic diseases 19 . Moreover, retinoblastoma gene deletion involves direct repeats within the deletion breakpoints 20 .
Short direct repeats were also detected in 15 proximal breakpoints of the dystrophine gene, which has large deletions 21 . Short IRs and IR inversions were found in 83% of deletions 1 small insertions, while short direct repeats were detected only in simple deletion breakpoints 22 .
Two highly homologous Alu repeats in inverted orientations were found in the vicinity of gross deletion breakpoints in the von Willebrand factor (VWF) gene 23 . Furthermore, LINEs, LTR repetitive elements, and SINEs (including Alus), were enriched at breakpoints of rare pathogenic microdeletions 24 . Vissers et al. also suggests that microhomology levels of breakpoint junctions play an important role in replication-based mechanisms, such as FoSTeS and microhomology-mediated break-induced replication (MMBIR) 24 . Zhang et al. also suggested replication fork stalling to initiate FoSTeS 25 . Gordenin et al. showed that LIRs cause deletion in Saccharomyces cerevisiae 6 . Lobachev et al. suggested they form stem-like secondary structures on single stranded DNA during replication, thereby causing deletions 26 . Warburton et al. found that some IRs are capable of transforming into cruciform structures, with intrastrand double helices, termed stems and unpaired loops forming internal spacers 1 . The four-way junction of this suggested IR pattern is similar to the Holliday structure. Eichman et al. showed formation of the Holliday junction in synthetic IR DNA using X-ray crystallography 27 . From this work, it was proposed that IRs may be involved in homologous recombination. Bacolla and Wells indicated that IRs may form cruciform structures, and are often found at genomic rearrangement breakpoints 18 .
Genomes of many complex organisms have been investigated for larger IRs. It was determined that higher eukaryotic genomes include many imperfect and near-to-perfect LIRs [28][29][30][31] . In mice, a perfect LIR was shown to create a large deletion 32 . Subsequently, it was decided that criteria for LIRs in genomic rearrangements involved recombination. In this regards, Wang and Leung reported that LIRs with stem length .30 bp, identity between stem copies (hereafter stem identity) .85% and internal spacer of ,2 kb, are recombinogenic in genomes of humans and some other organisms 2 . Voineagu et al. demonstrated that Alu IRs with 100% sequence homology of stem copies, triggers strong replication blockage 3 . However, Alu IRs with 75% stem identity between repetitive halves caused mild replication blockage in E. coli cells 3 .
Potential models referred to as replication slippage and hairpin nicking were proposed by Akgün et al. to explain the mechanism underlying LIR induced deletions 4 . With these models, many deletions formed inside palindrome stems or loops are explained. However, alternative models are required for clarifying the mechanisms of larger deletions formed in close proximity to palindromes. To understand how gross gene deletions occur in human cancers and inherited diseases, this present study investigated the significance of LIRs on breakpoint regions of human gross gene deletions.

Results
Identification of long inverted repeats in breakpoint regions of gross gene deletions. Sequences from 218 breakpoint regions of 63 gross gene deletions were taken from references 33-89 (see Supplementary Table S1 online) listed in the HGMD 90,91 and GRaBD 92,93 (Figure 1a). LIRs with stem length .20 bp on surrounding (610 kb) each breakpoint were investigated using IRF 94,95 (Figure 1b). In total, 218 genomic regions, including 59 and 39 breakpoints from 109 gross deletions involving 63 different genes (Table 1), were analysed. Total number of LIRs was determined within 610 kb regions flanking each breakpoint. In the deletion group, a total of 2723 LIRs were detected (see Supplementary  Table S2 online). A total of 1345 LIRs were also identified in 20 kb segments from 220 control sequences (see Supplementary Table S2  online).
Mean ranks of LIR numbers were compared between gross deletion breakpoints and control sequences using the Mann Whitney U test.The mean LIR number was significantly higher at the breakpoint regions from gross gene deletions, than in control group (P , 0.001).
In addition, associations between 59 and 39 LIR numbers within 610 kb regions flanking each breakpoint were determined using Pearson's correlation coefficients. Positive, strongly significant association was found between LIR numbers from 59 and 39 breakpoints in 109 gross deletions (r 5 0.85, P , 0.001).
Additionally, Spearman's correlation showed that a negative moderately significant associations were found between deletion size and 59 LIR number (r s 5 20.30, P , 0.003), and 39 LIR number (r s 5 20.30, P , 0.002) in 109 gross deletions respectively.
Features of LIRs selected within 63 kb genomic regions flanking 59 and 39 deletion breakpoints. Next, LIRs were selected using appropriate criteria (outlined in Materials and Methods) (Figure 1c). Properties of these selected LIRs from 59 and 39 breakpoints were analysed. In total, 138 LIRs at distance of 0-3 kb from breakpoints, with stem length .20 bp, internal spacer of 0-2.5 kb, and stem identity .70% were detected (see Supplementary Table S3  Associations between features of these LIRs were examined using Pearson's correlation coefficient. Low to moderately significant correlations were found between certain LIR features (e.g. stem length and identity, internal spacer length and distance from breakpoint). In all 138 LIRs located at the regions including 59 and 39 breakpoints, negative correlations were found between stem length and stem identity (r 5 20.49, P , 0.001), internal spacer length and distance from breakpoint (r 5 20.18, P , 0.05), stem length and distance from breakpoint (r 5 20.18, P , 0.05), and internal spacer length and stem identity (r 5 20.17, P , 0.05). Conversely, a moderately positive correlation was found between internal spacer length and stem length (r 5 0.27, P , 0.002). No correlation was found between stem identity and distance from breakpoint (r 5 20.008, P . 0.1).
Moreover, associations between gross gene deletion size and features of the 138 LIRs were analysed by Pearson's correlation coefficient. It was found that positive significant correlation between internal spacer length and deletion size (r 5 0.19, P , 0.05). However, no correlations were found between deletion size and three other LIR features, specifically, stem length (r 5 0.01, P . 0.1), stem identity (r 5 20.06, P . 0.1), and distance from breakpoint (r 5 0.08, P . 0.1).
In addition, 59 and 39 LIR features from 89 gross deletions were reexamined individually. Thus, associations between properties of 70 and 68 LIRs located on 59 and 39 breakpoints, respectively, and deletion size, were analysed by Pearson's coefficient. Negative moderate to strong correlations were found between internal spacer length and stem identity (r 5 20.28, P , 0.02), and stem length www.nature.com/scientificreports SCIENTIFIC REPORTS | 5 : 8300 | DOI: 10.1038/srep08300 and stem identity (r 5 20.57, P , 0.001) for LIRs within 59 breakpoints.
A positive moderate correlation was found between internal spacer length and stem length (r 5 0.35, P , 0.004) for LIRs within 39 breakpoints. In addition, negative moderate correlations were found between stem identity and stem length (r 5 20.40, P , 0.002), and distance from breakpoint and stem length (r 5 20.31, P , 0.02).
Furthermore, positive moderately significant correlation was found between internal spacer length of 39 LIRs and deletion size (r 5 0.29, P , 0.02). However, no correlation was found between internal spacer length of 59 LIRs and deletion size (r 5 20.16, P .  and internal spacer lengths of 0-2,435 bp (see Supplementary Table  S3 and Figure S6 online). Features of these 98 LIRs were analysed using Pearson's correlation coefficient. Low to moderately significant correlations were found between certain LIR features, including stem length, stem identity, loop length and distance from breakpoint. Positive correlation was found between internal spacer length and stem length (r 5 0.23, P , 0.05). Negative moderate correlations were found between stem identity and stem length (r 5 20.39, P , 0.001), and distance from breakpoint and stem length (r 5 20.31, P , 0.003). However, no correlations were found between internal spacer length and stem identity (r 5 20.08, P . 0.1), internal spacer length and distance from breakpoint (r 5 20.13, P . 0.1), and stem identity and distance from breakpoint (r 5 0.06, P . 0.1).
Furthermore, 59 and 39 breakpoint regions of these 98 LIRs were examined individually. Associations between LIR features from 59 and 39 breakpoint locations and gross gene deletion size in 49 gross deletions were analysed by Pearson's correlation method. A negative moderate correlation was found between stem length and distance from breakpoint for LIRs in 59 breakpoint regions (r 5 20.30, P , 0.05). Negative moderate correlation was also found between stem length and distance from breakpoint for LIRs in 39 breakpoint regions (r 5 20.33, P , 0.05). Strong negative correlation was found between stem length and stem identity from 39 LIRs (r 5 20.51, P , 0.001). Positive moderate correlation was found between stem length and internal spacer length from 39 LIRs (r 5 0.36, P , 0.02).
In addition, the relationship between 59 and 39 LIRs were analysed. Positive moderate correlation was found between distance from breakpoint for 59 LIRs and stem identity of 39 LIRs, involving 49 gross gene deletion regions (r 5 0.28, P , 0.05).
Associations between deletion size and 59 and 39 LIR features from these 49 gross deletions were also analysed by Pearson's correlation method. Negative moderate correlation was found between stem identity of 59 LIRs and deletion size (r 5 20.40, P , 0.005). Positive moderate correlation was found between stem identity of 39 LIRs and deletion size (r 5 0.30, P , 0.05). However, no correlations were found between deletion size and loop length (59:  Supplementary Table S3 online).
Associations between LIR features were analysed by Pearson's correlation coefficient. Moderate to strong significant correlations were found between LIR features, including stem length and stem identity, and internal spacer length and distance from breakpoint. In 40 LIRs, a positive moderate correlation was found between internal spacer length and stem length (r 5 0.34, P , 0.05). In addition, negative correlations were found between internal spacer length and stem identity (r 5 20.33, P , 0.05), and stem length and stem identity (r 5 20.61, P , 0.001). However, no correlations were found between distance from breakpoint and internal spacer length (r 5 20.25, P . 0.1), stem length (r 5 20.02, P . 0.1), or stem identity (r 5 20.12, P . 0.1).
Deletion size and LIR features were also analysed by Pearson's correlation method. A positive moderate correlation was found between internal spacer length of LIRs and deletion size (r 5 0.35, P , 0.05). However, no correlations were found between deletion  In 24 of the 40 gross deletions, new LIRs between 5-and 10-kb genomic segments from 59 and 39 breakpoints containing LIR or no LIR, respectively, were found (see Supplementary Table S4 online; Figure 4). From these 24 gross deletions, LIR stem identities and lengths were determined to be 70.19-86.66% and 173-1789 bp, respectively ( Figure 5). In addition, these LIRs were located at distance of 642-9,330 bp from breakpoints ( Figure 5).
Features of these 24 LIRs were analysed by Spearman's correlation method. A strong significantly negative correlation was found between stem length and stem identity (r s 5 20.51, P , 0.02). No correlations were found between distance from breakpoint and stem length (r s 5 20.08, P . 0.1) or stem identity (r s 5 20.08, P . 0.1).

Discussion
Deletion breakpoints are often associated with Alu and non-B DNAforming elements such as short direct and inverted repeats, and inversions of inverted repeats in human genomic rearrangements 17,[19][20][21][22][23] . In this study, LIRs within 610 kb regions flanking 218 breakpoint sequences from gross gene deletions in human cancers and inherited diseases, were investigated by using IRF 94,95 software. As a program that uses an algorithm presented by Benson 95 , IRF software can efficiently detect two or more contiguous approximate inverted repeats in sizes up to 700 kb at the same location on DNA sequences without the need to specify either the pattern or pattern size. In this way, IRF software served that present study accurately analyzes significance of relationship between LIR numbers and breakpoint regions in human gross gene deletions.
This work showed that the mean LIR number was significantly higher at the breakpoint regions of gross gene deletions, than in control group (P , 0.001). In addition, strongly significant positive correlation was found between 59 and 39 LIR numbers from breakpoint regions (r 5 0.85, P , 0.001). In this regards, increasing LIR numbers can cause or induce chromosomal rearrangements (including duplication, recombination and/or deletion) in human genome during evolutionary process.
Furthermore, negative moderately significant associations were found between deletion size and 59 and 39 LIR numbers (r s 5 20.30, P , 0.003; r s 5 20.30, P , 0.002) in 109 gross deletions, respectively. This result indicates that increasing 59 or 39 LIR numbers at the breakpoints cause smaller deletion sizes. Over-LIR intensity may impede efficiency, strengthens and further kinetic properties of inverted repeats because of competing LIRs with each other. Consequently, these findings suggest that DNA sequence evolution may also be prosecuted by LIRs in human genome.
In Saccharomyces cerevisiae, Saini et al. reported that IRs induce mutagenesis by break formation at distant sites (up to 8 kb) 96 . Similarly, Lobachev et al. suggested that LIRs may stimulate recombination and deletion by forming secondary structures on the single strand DNA during replication 26 . In addition, Bacolla and Wells indicated that repetitive DNA motifs may fold into non-B DNA structures including cruciforms/hairpins, leading to genomic rearrangements associated with neurodegenerative and genomic disorders 18 .
In 138 LIRs identified in 89 gross deletion, significant associations were found between internal spacer length and distance from breakpoint (r 5 20.18, P , 0.05), stem length and distance from breakpoint (r 5 20.18, P , 0.05). These associations suggest DNA strand breaks potentially in locations close to larger LIRs. Similarly, Lobachev et al. reported that stimulation of deletions was positively correlated with IR size 26 . In addition, Lim et al. reported that IRs $ 800 bp are required for gene deletion effectiveness in Saccharomyces cerevisiae, showing IRs improve gene deletion efficiency up to 1.2 kb 97 .
In addition, a positive significant correlation between internal spacer length and deletion size in 138 LIRs was found (r 5 0.19, P , 0.05), suggesting LIRs with bigger loops cause larger deletions at fragile DNA sites. Weiss and Wilson reported that loops with 25-247 nucleotides (nt) were efficiently and accurately repaired during homologous recombination 98 . It was suggested that bigger loops (.247 nt) cannot repair and excise in homologous recombination  accurately, therefore cells with these loops may be subject to either apoptosis or NHEJ. If cells cannot induce apoptosis, it was suggested that LIRs . 247 nt may break DNA, and be repaired by NHEJ.
In conclusion, larger deletions may more efficiently form by LIRs with larger loops at 59 or 39 breakpoints in human cancers and inherited diseases. DNA end may gain further kinetic properties, and match with distant brekpoint site ( Figure 6).
Moreover, correlation between distance from breakpoint and stem length (r 5 20.31, P , 0.02) was observed in 39 LIRs from 89 gross deletions. These data suggest that DNA strand is potentially broken in locations closer to 39 LIRs with larger stem lengths. In addition, a positive moderately significant correlation was found between deletion size and internal spacer length of 39 LIRs (r 5 0.29, P , 0.02), with no correlation between internal spacer length of 59 LIRs (r 5 20.16, P . 0.1). These results show that 39 LIRs with bigger loops are more important than 59 LIRs, for larger gross deletions in human genome.
Similarly, associations between deletion size and stem identities of 59 (r 5 20.40, P , 0.005) and 39 (r 5 0.30, P , 0.05) LIRs were found in 49 gross deletions including LIR on the both of 59 and 39 breakpoints. These data suggest that 39 LIRs with greater stem identities cause larger deletion sizes, while similar 59 LIRs cause smaller deletion sizes. Furthermore, a association between distance from breakpoint of 59 LIRs and stem identity of 39 LIRs (r 5 0.28, P , 0.05) was also found, suggesting 39 LIRs with greater stem identities are more likely to induce DNA breakage than 59 LIRs.
Consequently, LIRs may induce DNA breakages at the nearby locations through forming cruciform structures. Free DNA ends between distant sites may come together by NHEJ, with following gene deletion (Figure 7). Similarly, Varga and Aplan reported that DNA breaks produced various deletions exhibiting NHEJ features in the human monocytic cell line, U937 99 . They showed that aberrant double-strand break repair by NHEJ may lead to gross chromosomal rearrangements including interstitial deletion and large insertions.
In 40 gross deletions containing 59 or 39 LIR, positive moderate correlation between internal spacer length and deletion size (r 5 0.35, P , 0.05) was found, similar to the group that included 138 LIRs. In addition, in 24 of 40 gross deletions, new LIRs between distant free ends containing LIR and no LIR were detected (Figures 4 and 5). These results show that LIRs with bigger loops cause larger deletions in human genome, suggesting that larger loops may give rise to greater stress and transition activity on the DNA strand during replication. Moreover, it was reported that bigger inverted repeats can dominate strand separation and B-Z transition, with Zhabinskaya and Benham, showing that long IRs occupy clinically important chromosomal breakpoints corresponded closely with translocation frequencies through probably cruciform extrusion 100 .
In conclusion, these results suggest that a LIR found in 59 or 39 breakpoints, may break DNA strand via cruciform structure and match with homolog sequences in other breakpoint site, resulting in a back-folded stem-loop structure during replication ( Figure 6). In this way, DNA breakage may also occur in other breakpoint location containing no LIR. After double-strand breakages are formed at 59 and 39 breakpoints, DNA ends between distant sites may combine by NHEJ, with following gene deletion.
As presented in Fig. 6, this model is supported with a study carried out in Saccharomyces cerevisiae 101 . In this study, IRs with internal spacer of 21 kb were placed into Saccharomyces cerevisiae chromosome. After double-strand break was induced, large dicentric inverted dimers were observed, leading to gross chromosomal rearrangements during anaphase stage. In addition, it has been suggested that p53-binding protein 1 (53BP1) combines free DNA ends between distant sites for NHEJ 102 .
An algorithm such as internal spacer ,2 kb, stem copy identity .85% and stem length .30 bp for recombinogenic LIRs in human and other organism genomes was suggested 2 . In the present study, only 35 (25.36%) of 138 LIRs located close to the 59 and 39 breakpoints from 89 gross deletions, correspond to this criteria (see  Table S3 online). However, the present findings indicate that significant relationship between LIR numbers and breakpoint regions of gross gene deletions. There is also a strongly positive correlation between 59 and 39 LIR numbers on breakpoint regions. On the other hand, 59 and 39 LIRs may have converse effects on deletion size. However, over-LIR intensity on 59 or 39 breakpoint locations cause smaller deletion sizes. In addition, this study showed that 39 LIRs may be more active than 59 LIRs in deletional and recombinational events. Moreover, internal spacer length affects breakage site and deletion size in the gross deletions. Therefore, the present study suggests necessity of a new algorithm for LIRs in breakpoint regions of gross gene deletions associated with human cancers and inherited genetic diseases.
Consequently, LIRs detected in genomic regions including breakpoint sequences of many gross gene deletions, may lead to cruciform structure formation during DNA replication and break DNA strand. After double-strand breaks occur in 59 and 39 breakpoints, gene deletions may be formed by combining free DNA ends with 53BP1 for NHEJ.

Methods
Gross gene deletions and breakpoint regions. In total, 109 gross gene deletions involving 63 genes, were obtained from the Human Gene Mutation Database (HGMD) 90,91 (see Supplementary Table S1 online). Base sequences of 59 and 39 deletion breakpoints were taken from references 33-89 listed in the HGMD 90 , or obtained from the Gross Rearrangement Breakpoint Database (GRaBD) 92,93 (see Supplementary Table S1 online). Sequences of genes associated with deletions were downloaded from NCBI 103 . Gene accession numbers are provided (Table 1). Each deletion breakpoint sequence and corresponding genes were compared using NCBI BLAST 104 , and breakpoint locations matched with related genes (Figure 1a). For each gene deletion, nucleotide positions of 59 and 39 breakpoints are shown (Table 1). Sequences (610 kb) spanning 59 and 39 breakpoints of gross gene deletions were   Supplementary Table S2 online). In total, 218 breakpoint sequences from 109 gross gene deletions were examined for LIR identification (Figure 1b).
For the control group, the DNA sequences of 68 different genes were downloaded from NCBI 103 to be selected randomly (see Supplementary Table S2 online). Searching the HGMD 90 site confirmed that selected control genes were not associated with deletions. Subsequently, 20 kb segments of DNA sequence from each control gene were included in the control group. In total, 220 control sequences were examined for LIR identification.
LIR identification. Identification of LIRs was performed within genomic regions (including the 218 breakpoint sequences from 109 gross gene deletions of 63 genes, and 220 control sequences from 68 genes) using IRF 94,95 software (Figure 1b). The 2, 3, 5 and 40 (match, mismatch, indel and minimum score) parameters of IRF 94 were selected for identification.
LIRs with stem length .20 bp, internal spacer of 0-10 kb, stem identity $70%, and within 610 kb fragments flanking each of the 59 and 39 breakpoint sequences of human gross gene deletions, or 20 kb segments of control genes, were investigated ( Figure 1b). Total LIR numbers were determined (see Supplementary Table S2 online) and statistically compared between control and deletion groups. In addition, associations between LIR numbers on 59 and 39 breakpoints and also deletion size were statistically investigated.
Recently, Wang and Leung reported that LIRs with stem length .30 bp, stem identity .85% and internal spacer ,2 kb were highly recombinogenic in humans and other organisms 2 . It was also shown that long Alu IRs with 75% stem identity caused mild replication blockage in E. coli 3 . Thus, LIRs with distance of 0-3 kb from breakpoints, stem length .20 bp, internal spacer of 0-2.5 kb, and stem identity $70%, were selected for determining associations between LIR features, distances from breakpoint and deletion size (see Supplementary Table S3 online; Figure 1c). At this stage, if many LIRs were observed in the same breakpoint region, the one which best fits the above criteria was chosen.
In addition, 40 of 109 gross gene deletions containing LIRs in only one of regions flanking 59 and 39 breakpoints, were further examined. The capacity to form new LIRs between breakpoints with LIRs and other breakpoint sites (including non LIRs of related deletion regions) was researched using IRF 94 .
For this, 5 kb of DNA sequence from breakpoints containing LIRs, and 10 kb of DNA sequence including other breakpoints but containing no LIRs, were combined before scanning for LIRs using IRF 94 . During this process, deleted gross genes were excluded and combined DNA sequences used. LIRs with stem length .150 bp and .70% stem identity were selected for determining associations between LIR features and distance from breakpoints (see Supplementary Table S4 online).
Statistical analysis. Mann-Whitney U test was used for statistical comparison of mean ranks of LIR numbers between gross gene deletion and control groups. Pearson's (r) and Spearman's (r s ) correlation coefficients were used to examine associations between LIR features (stem length and identity, and loop length), and distance from breakpoint and gene deletion size. In addition, Pearson's and Spearman's correlation coefficients were also used for determining associations between deletion size and 59 and 39 LIR numbers within 610 kb sequence spanning each breakpoint in 109 gross deletions. Correlation coefficients (r, r s ) were classified according to criteria as low (0.00-0.24), moderate (0.25-0.49), strong (0.50-0.74) and strongly (0.75-1.00) 105 . Two-sided P values , 0.05 were considered statistically significant. All analyses were performed using SPSS 11.0 software (Chicago, USA).