Bishara, A et al. Genome Res. 10.1101/gr.191189.115 (18 Aug 2015).

Around 5% of the human genome is made up of low-copy repeat regions containing structural and copy-number variations that likely have phenotypic effects, but they are inaccessible to current short-read sequencing technology because the reads cannot be unambiguously aligned. Bishara et al. developed an algorithmic solution, the Random Field Aligner (RFA), which at uses the TruSeq synthetic long-read protocol. TruSeq retains contiguity information in long pieces of DNA by fragmenting the genome and barcoding short reads derived from the same long fragment before sequencing pooled reads. RFA uses high coverage of fragments but low coverage of read clouds, groups of reads with the same barcode, to iteratively map the synthetic long reads to a reference. The authors used RFA to discover variants in repeat regions of normal and cancer-derived human genomes.