HANDS2: accurate assignment of homoeallelic base-identity in allopolyploids despite missing data

Khan, Amina; Belfield, Eric J.; Harberd, Nicholas P.; Mithani, Aziz

doi:10.1038/srep29234

Download PDF

Article
Open access
Published: 05 July 2016

HANDS2: accurate assignment of homoeallelic base-identity in allopolyploids despite missing data

Amina Khan¹,
Eric J. Belfield²,
Nicholas P. Harberd² &
…
Aziz Mithani¹

Scientific Reports volume 6, Article number: 29234 (2016) Cite this article

2134 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Characterization of homoeallelic base-identity in allopolyploids is difficult since homeologous subgenomes are closely related and becomes further challenging if diploid-progenitor data is missing. We present HANDS2, a next-generation sequencing-based tool that enables highly accurate (>90%) genome-wide discovery of homeolog-specific base-identity in allopolyploids even in the absence of a diploid-progenitor. We applied HANDS2 to the transcriptomes of various cruciferous plants belonging to genus Brassica. Our results suggest that the three C genomes in Brassica are more similar to each other than the three A genomes, and provide important insights into the relationships between various Brassica tetraploids and their diploid-progenitors at a single-base resolution.

Genome assemblies of 11 bamboo species highlight diversification induced by dynamic subgenome dominance

Article Open access 15 March 2024

Gradual polyploid genome evolution revealed by pan-genomic analysis of Brachypodium hybridum and its diploid progenitors

Article Open access 29 July 2020

High-quality genome and methylomes illustrate features underlying evolutionary success of oaks

Article Open access 19 April 2022

Introduction

Allopolyploidy is an important evolutionary process in plants which involves interspecific hybridization of two or more differentiated genomes as well as genome doubling¹. As a result, allopolyploid genomes consist of two or more homeologous subgenomes that have high sequence similarity. This makes it difficult to assign individual sequences to the specific subgenome from which they are derived. Nevertheless, despite their extensive sequence relatedness, the subgenomes present in a polyploid genome evolve over time and diverge in sequence from their common ancestor, resulting in positions in a polyploid genome where the homeologous subgenomes have different bases^1,2. This is in addition to the nucleotide differences accumulated by the diploid-progenitors since their own divergence which are carried forward to the polyploid subgenomes at the time of polyploidization^3,4. Collectively, these base differences between the subgenomes within a polyploid genome are termed as Homeolog-Specific Polymorphisms (HSPs; Fig. 1a)⁵. HSPs are the genetic markers of choice in many transcriptomic and evolutionary studies involving polyploids and have been used to characterize homeolog-specific gene-expression^3,4,6. However, this has been done for a limited number of genes due to the complexity arising as a result of genome-wide duplication as well as the cost associated with the process⁷.

**Figure 1: Characterization of homoeallelic base-identities using HANDS2.**

With the advent of next-generation sequencing (NGS), it is now possible to survey the whole polyploid genome at a single-base resolution for positions where the individual subgenomes differ from each other^8,9. We have previously developed a tool ‘HSP Assignment using NGS data through Diploid Similarity’ (HANDS) that uses the RNA-seq data for the polyploid and its progenitor-diploids, and characterizes HSPs with an accuracy of >90%⁵. Similarity with progenitor-diploids has recently been exploited to classify gene assemblies in bread wheat (Triticum aestivum)⁸ and rapeseed (Brassica napus)⁹ subgenomes, and led to the development of tools like PolyCat¹⁰ and PolyDog¹¹ for classification of sequencing reads in allopolyploid cotton. Despite its high predictive accuracy, the applicability of HANDS as well as that of other tools is, however, limited by the fact that they require single-base substitution data for all diploid-progenitors for the characterization of homoeallelic base identities. This is a major limitation since in some cases a diploid-progenitor may be unknown or the genome/transcriptome are unsequenced, as in the case of some Brassica species (see below). Also, the existing tools support only up to three diploid-progenitors and hence cannot be used to study complex polyploids such as strawberry and sugarcane, which have four to six diploid-progenitors^12,13.

To address these limitations, we have developed HANDS2, a significantly improved tool than its predecessor that characterizes homoeallelic base-identities with high accuracy even in the absence of RNA-seq data for one of the diploid-progenitors. It also supports up to ten diploid-progenitors allowing it to analyze a wide range of natural as well as synthetic allopolyploids. This restriction to ten diploid-progenitors is due to the fact that there are no known allopolyploids with more than ten diploid-progenitors. The underlying architecture of HANDS2 is designed such that it is able to analyse allopolyploids containing any number of homeologous subgenomes provided sufficient computational resources are available. We have used HANDS2 to study the relationship between various cruciferous plants belonging to genus Brassica thus providing important insights into the relationship between different Brassica tetraploids and their diploid-progenitors.

Results and Discussion

The HANDS2 framework

HANDS2 involves comparative alignments of next-generation sequencing (NGS) reads from polyploid and diploid-progenitors onto a suitable reference sequence and uses diploid similarity to assign base-identities to subgenomes at HSP positions. It is able to characterize homoeallelic base-identities with high accuracy even in the absence of RNA-seq data for one of the diploid-progenitors and supports up to ten diploid-progenitors (see above). For this, HANDS2 first creates base patterns (sequence of the pairs (position, nucleotide) for all HSPs found in a read/read-pair) from the NGS reads and then assigns these base patterns, and subsequently subgenome-specific base-identities, to individual subgenomes based on their similarity with the diploids (Fig. 1b). To achieve high accuracy when dealing with missing data, HANDS2 provides a unique option of iteratively merging the overlapping base patterns, an option not available in HANDS, to obtain longer base patterns that have a higher chance of unambiguous assignment to one of the subgenomes than the original base patterns (see below).

HANDS2 Input

HANDS2 uses the position-sorted sequence alignment/mapping (SAM)¹⁴ file of the polyploid genome obtained using sequence alignment tools such as BWA¹⁵, BOWTIE¹⁶ or BOWTIE2¹⁷ along with VCF files containing lists of HSPs and single base substitutions (SBSs) present in the polyploid and the diploid-progenitors respectively to assign base-identity to the polyploid subgenomes. The VCF files can be obtained using standard variant calling tools such as SAMtools¹⁴, GATK¹⁸ or FreeBayes¹⁹. HANDS2 uses VCF version 4.0 or greater unlike HANDS, which used a non-standard format for HSP and SBS lists. HANDS2 also requires a General Feature Format (GFF) file (http://www.sequenceontology.org/gff3.shtml) containing start and end coordinates for each gene/contig. This file is automatically generated by HANDS2 when a transcriptomic reference is constructed from a set of unigenes/contigs using ‘seq2ref’ command. HANDS2 only uses entries of the ‘gene’ type from the GFF file when assigning homoeallelic base-identities.

HANDS2 also accepts optional base coverage files containing number of reads supporting a particular base at each position in a tab-delimited format for HSP/SBS validation and an optional list of positions in the reference (a tab delimited file containing the sequence/chromosome names, positions and reference base) to be checked for HSPs in addition to the positions present in the HSP list during pre-processing step. Base coverage files can be generated from a SAM file using ‘coverage’ command available in HANDS2.

Data Pre-processing

HSP characterization can be preceded by an optional pre-processing step, which validates the lists of HSPs and SBSs provided as input. The pre-processing step can be instigated by providing the base coverage files (see above) containing the number of reads supporting a particular base at each reference position as a part of the input for one or more genomes. A base must be present in at least 5% of the reads present at an HSP position for it to be considered as valid subgenome base whereas at least 30% of the reads must support a diploid base for it to be considered as a part of an SBS. Both these cut-offs are user-driven parameters in HANDS2 (unlike HANDS) thus providing a better control to the users. In addition to the supplied HSP positions, HANDS2 also checks for the HSPs at positions containing SBSs in one or more diploid-progenitors, unlike its predecessor where diploid positions were not checked for HSPs, resulting in a higher number of characterized positions than before (Table 1). HANDS2 also accepts an optional list of positions as input to be checked for HSPs in addition to the positions present in the HSP list as a part of pre-processing. This option is not available in HANDS and allows the user to study and compare HSPs across multiple polyploids as in the case of Brassica species (see below).

Table 1 Performance comparison of HANDS2 versus HANDS using T. aestivum data.

Full size table

Base characterization using HANDS2

HANDS2 characterizes homoeallelic base-identities using a six-step algorithm (Fig. 1b). These are described below.

Step 1: Creation of Base Patterns

First, base patterns are created from the aligned reads. A base pattern is a sequence of pairs (position, nucleotide) for all HSPs found in a read-pair (or a read for single-end sequencing). Duplicate patterns are removed and a count of reads is kept for each unique base pattern.

Step 2: Filtering of base patterns and removal of embedded patterns

In the second step, base patterns containing a base pair (bases present at two consecutive HSP positions) that is present in less than 5% (a user-specified parameter) of the reads are filtered to remove noise from the data. Furthermore, base patterns that are embedded within another base pattern are also removed.

Step 3: Iterative merging of overlapping base patterns

This is a new feature introduced in HANDS2. After removing the noise from the data, the overlapping base patterns are iteratively merged (Fig. 1c). This is done as follows. The base patterns are first sorted in descending order on size and then the pairwise overlap is calculated between them. The two base patterns with the longest overlap are merged together and the resulting base pattern is added to the list of base patterns replacing the original patterns. The base patterns are resorted on size and the process is continued until no more base patterns can be merged. This is the default mode of HANDS2 when dealing with missing diploid and allows it to achieve high accuracy but can be turned off by the user, if desired.

Step 4: Assignment of base patterns to subgenomes

In the fourth step, the merged base patterns are assigned to subgenomes based on their similarity to the diploid bases. A base pattern is assigned to a subgenome if at least 50% (a user-specified parameter) bases match with the corresponding diploid genome. When a diploid-progenitor is missing, HANDS2 exploits the fact that a base pattern must come from one of the subgenomes of the polyploid and consequently assigns the base pattern that is not assigned to any subgenome (due to low diploid-identity) to the subgenome corresponding to the missing diploid-progenitor.

Step 5: Assignment of base-identities to subgenomes

Once the base patterns are assigned to each subgenome, HANDS2 checks each position in turn and assigns bases to the subgenomes using the already assigned base patterns. In the case of more than one base being present at an HSP position, the base belonging to the base pattern having the maximum percentage identity with the diploid is assigned at that position. HANDS2 introduces a new option of using additive mode instead of the default maximum mode for base assignment whereby a base having the highest sum of percentage identities across all base patterns containing the base is assigned at that HSP position. In both modes, no base assignment is made at a position in the case of a tie. Once all positions have been processed, all base patterns are rechecked for consistency and those that conflict with the assigned bases are removed. This step is repeated until no more bases can be assigned.

Step 6: Finalization of base-identities

In the sixth and the final step, the base assignments to the subgenomes are finalized using the inherent information from HSPs and NGS platform. Since all the bases present at an HSP position must belong to one of the polyploid subgenomes, a base that is left unassigned at a position is allocated to a subgenome if all the remaining bases have been assigned to other subgenomes. This additional step, which is not present in HANDS, results in a higher number of fully characterized positions by HANDS2 compared to HANDS. This step, however, may lead to incorrect base assignments at positions where the polyploid genome contains allelic polymorphism within a subgenome and can be turned off by the user when dealing with heterozygous species. In this step, base patterns that are left unassigned or were removed in the previous step are also rechecked for base assignment by calculating their percentage identity with the already assigned bases.

HANDS2 output

HANDS2 writes the base assignments for all subgenomes as standard VCF files and reports all positions where one or more subgenome has been assigned a base-identity. This in the default output format in HANDS2 unlike HANDS, which only allows tab-delimited format. VCF output enables a user to use the HANDS2 output as an input to other tools. For example, ‘FastaAlternateReferenceMaker’ command in GATK¹⁸ could be used to create homeolog-specific fasta file using the VCF files. HANDS2 also provides an option to generate the tab-delimited output with each file containing the name of reference sequence, HSP position, reference base, diploid base (‘0’ for no coverage, ‘<’ for low coverage, ‘*’ for ambiguous/heterozygous base and ‘?’ for missing diploid), assigned base in the subgenome and a yes/no flag to indicate fully characterized positions (feature not available in HANDS) for all positions where one or more subgenome has been assigned a base-identity. The tab-delimited output allows a direct comparison between polyploid subgenomes and their diploid progenitors.

Assessment of HANDS2 performance

To evaluate the performance of HANDS2, we analyzed the high-throughput RNA sequencing (RNA-seq) data for hexaploid bread wheat and its diploid-progenitors generated for the validation of HANDS⁵, and compared the results, with and without enabling the support for missing genome in HANDS2, to those obtained using HANDS. HANDS2 was able to characterize 20% more HSP positions than HANDS with similar accuracy (>96%) for wheat chromosomes 1 and 5 (Table 1 and Methods). Even when Aegilops speltoides, the distant donor of B-subgenome in bread wheat, was specified as missing genome, we obtained the same accuracy level (>96%) (Table 1) thus demonstrating highly predictive accuracy of HANDS2 even in the absence of complete data.

Characterization of base-identities in Brassica tetraploids

We next used HANDS2 to characterize different cruciferous plants belonging to genus Brassica. In Brassica, three diploids species (Brassica rapa, AA; Brassica nigra, BB; and Brassica oleracea, CC) have paired up in all possible combinations giving rise to three tetraploid species (Brassica napus, AACC; Brassica juncea, AABB; and Brassica carinata, BBCC), known as ‘Triangle of U’^20,21 (Fig. 2). The unavailability of RNA-seq data for B. nigra has so far prevented the study of relationships between the three tetraploids at a single-base resolution since base assignments cannot be made in B. juncea and B. carinata, which have B. nigra as B diploid-progenitor. We applied HANDS2 on published transcriptomic sequencing datasets of different Brassica species (Supplementary Table S1), which were aligned against an in silico B. rapa transcriptomic reference constructed using Ensembl Plants build 1.27 (http://plants.ensembl.com) (Supplementary Fig. S1 and Methods). We first characterized base-identities in B. napus since data for both its diploid-progenitors is available. HANDS2 reported a total of 495,164 HSP positions out of which 448,972 (91.8%) positions were fully characterized, i.e. base-identities were assigned to both subgenomes (Fig. 2, Table 2 and Supplementary Table S2). When either B. rapa or B. oleracea was designated as missing genome, 467,321 (94.4%) and 461,528 (93.2%) positions respectively were fully characterized. Out of these fully characterized positions, ~92% positions had the same homoeallelic base assignments as those obtained when both the diploid-progenitors were specified. HANDS, on the other hand, reported only 401,653 HSP positions in B. napus out of which 319,239 (79.5%) positions were fully characterized. Out of these, 294,279 (92.1%) positions were present in the list of fully characterized positions reported by HANDS2 with ~94% positions having the same base-identities assigned to the two subgenomes by both the tools. To test whether the use of B. rapa as the reference sequence had resulted in any bias, we repeated the above analysis using B. oleracea as the transcriptomic reference sequence (Table 2). No significant difference was found (G-test, P-value≈1) suggesting that base assignments made by HANDS2 were independent of the choice of the reference sequence.

**Figure 2: Characterization of homoeallelic base-identities in *Brassica*.**

Table 2 HSP characterization in B. napus using HANDS2.

Full size table

We subsequently characterized homoeallelic base-identities in the remaining two tetraploids B. juncea and B. carinata resulting in a total of 579,667 and 225,280 fully characterized HSP positions, respectively (Fig. 2, Supplementary Tables S3 and S4). Given the high accuracy level of HANDS2 on T. aestivum and B. napus genomes, it is safe to assume that these base assignments are of high quality even in the absence of B. nigra RNA-seq data. The low number of HSP characterization in B. carinata is likely due to the low coverage of sequencing reads resulting in a number of HSP positions being ignored (see Methods) as well as unavailability of paired-end sequencing data (Supplementary Table S1), which results in shorter base patterns having a lower chance of unambiguous assignment to subgenomes compared to the paired-end data.

Analysis of relationship between different Brassica tetraploids and their diploid-progenitors

Once the homoeallelic base-identities were characterized, we analyzed the relationship between different Brassica tetraploids and their diploid-progenitors. B. napus subgenomes were found to have lower similarity with their diploid-progenitors (83.5% for A and 90.4% for C subgenomes) compared to B. juncea (87.3% for A subgenome) and B. carinata (96.3% for C subgenome) (Fig. 2, Supplementary Tables S5–S8). Also, the number of shared bases between the B. napus A subgenome and B. rapa were significantly lower than the bases shared between the B. napus C subgenome and B. oleracea (G-test, P-value < 1.6 × 10⁻¹⁶). Interestingly, a higher gene loss in the A subgenome compared to C subgenome has been reported in B. napus recently⁹. The highest level of similarity between B. carinata C subgenome and B. oleracea can be attributed to the recent origin²² and low genetic diversity of the tetraploid²³. Next we evaluated the positions that were shared between different tetraploids (Fig. 2 and Supplementary Tables S9–S11). The two C subgenomes had the highest similarity to each other with 50,581 out of 54,767 shared HSP positions (92.4%) having the same base. Out of these, 95.8% positions were also shared with B. oleracea (Supplementary Table S11). However, the A subgenomes were least similar to each other with 102,071 out of 119,334 positions (85.5%) having the same base and shared 87.8% positions with B. rapa (Supplementary Table S9). To test if the use of both diploid-progenitors for B. napus versus single diploid-progenitors for the other two tetraploids while assigning homoeallelic base-identities had resulted in any bias, we repeated the whole analysis using a single diploid-progenitor for B. napus (B. oleracea and B. rapa designated as missing when comparing A and C subgenomes respectively). The two C subgenomes were still found to be more similar to each other with 54,100 out of 56,634 HSP positions (92.3%) having the same base compared to the two A subgenomes where only 106,819 out of 126,680 positions (84.3%) had the same base. Finally, we studied the positions that were shared between all three tetraploids (Fig. 2 and Supplementary Table S12). Again, the two Brassica C subgenomes had the highest similarity amongst each other suggesting a significantly higher conservation in C subgenomes compared to A and B subgenomes (G-test, P-value < 1.6 × 10⁻¹⁶). Collectively, these observations suggest that Brassica C genomes are more similar to each other than the A genomes.

Conclusion

In summary, HANDS2 provides a highly accurate approach to genome-wide discovery of homoeallelic base identities in allopolyploids and works even in the absence of a diploid-progenitor. HANDS2 is implemented in Java and is available for download at https://genomics.lums.edu.pk/software/hands2/ with a user manual and test datasets from T. aestivum and B. napus genomes. Since HANDS2 uses read alignments to characterize HSP bases, it requires the RNA-seq data to be of high quality with sufficient coverage to be able to accurately identify and assign the base-identities. Also, like other current tools that work on diploid similarity^5,10,11, it has difficulty in detecting instances where a gene is silenced in one or more subgenomes, and may therefore incorrectly assign base-identities at silenced positions. Similarly, cases like gene conversion or homeologous exchanges, which are frequent in polyploids^9,24, may also result in biased base assignments. Nevertheless, the ability to accurately assign base-identities at non-silenced positions despite missing data and the support for up to ten diploid-progenitors make HANDS2 a viable approach for HSP characterization in polyploids thus enabling important insights into the complex genome architecture and evolution of polyploids at a single-base resolution.

Methods

Assessment of HANDS2 accuracy

We tested the accuracy of base assignments made by HANDS2 using the high-throughput RNA sequencing (RNA-seq) data for hexaploid bread wheat (T. aestivum; AABBDD) and its diploid-progenitors (Triticum urartu; AA, Aegilops speltoides; BB, and Aegilops tauschii; DD), which was generated for the validation of HANDS⁵. The sequencing data for the polyploid and diploid-progenitors was first aligned, filtered and variants were called (see below). Base characterization was subsequently done using HANDS and HANDS2. To test the accuracy of the tool, especially in the case of missing genome, we used the RNA-seq data for wheat chromosomes 1 and 5 nullisomic-tetrasomic (NT) lines^5,26. Wheat NT lines are a set of lines each missing a single chromosome (nullisomic) which is substituted by an additional copy of a homeologous chromosome (tetrasomic)²⁵, and provide an ideal framework, albeit at a very high cost, to accurately characterize wheat HSPs at the genome-wide level^5,26. The base assignments made by HANDS and HANDS2 were evaluated against those obtained using these NT lines⁵.

Construction of B. rapa and B. oleracea in silico transcriptomic references

The B. rapa and B. oleracea transcriptomic references were constructed using Ensembl Plants build 1.27 (http://plants.ensembl.com) containing 41,393 and 59,225 cDNA sequences respectively. The in silico transcriptomic references were obtained using ‘seq2ref’ command in HANDS2, which concatenates the contigs/cDNA sequences such that two consecutive sequences are separated by a gap of 200 (a user specified parameter) Ns (Supplementary Fig. S1). The B. rapa reference comprised of a total of 56,434,134 bases out of which 8,278,800 bases were ‘N’s used as separators. Similarly, B. oleracea reference contained a total of 73,571,942 bases with 11,845,000 ‘N’s.

Sequence alignment, filtering and visualization

Whole transcriptome sequencing (RNA-seq) reads for T. aestivum were mapped to the T. aestivum Unigene build 60 reference⁵ whereas the reads for Brassica species (Supplementary Table S1) were mapped to B. rapa and B. oleracea transcriptome 1.27 references using Burrows-Wheeler Aligner (BWA)¹⁵ using default parameters (Supplementary Tables S13 and S14). Reads with low mapping quality (phred score ≤20) were filtered out using custom scripts written in C++. Additionally, for paired-end data, reads for which the only one read in a pair was aligned as well as those reads which mapped in a different gene than their mates were also removed. The alignments were visualized using Integrative Genome Viewer (IGV; Fig. 1a)²⁷.

Variant calling and filtering

The filtered alignments were used to generate pileups using SAMtools¹⁴ version 0.1.19 ‘mpileup’ command with probabilistic realignment for the computation of base alignment quality (BAQ) disabled, and the minimum and maximum coverage thresholds set to 3 and 50,000 respectively. These pileup files were then used to call variants (HSPs for the polyploid and single base substitutions (SBSs) for the progenitor-diploids) using bcftools version 0.1.19 ‘view’ command (available as part of SAMtools) using default parameters. The variant lists were subsequently filtered using ‘varFilter’ command of vcfutils.pl with parameters ‘-1 0 -4 0 -d 3 -D 50000’ available in SAMtools version 0.1.19 to remove potential false positives including errors arising during DNA sequencing itself. Variants with low quality (phred score ≤20) were also removed. For diploids, ambiguous (heterozygous) base calls were also ignored.

Additional Information

How to cite this article: Khan, A. et al. HANDS2: accurate assignment of homoeallelic base-identity in allopolyploids despite missing data. Sci. Rep. 6, 29234; doi: 10.1038/srep29234 (2016).

References

Wendel, J. F. Genome evolution in polyploids. Plant molecular biology 42, 225–249 (2000).
Article CAS PubMed Google Scholar
Blanc, G. & Wolfe, K. H. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. The Plant cell 16, 1679–1691, 10.1105/tpc.021410 (2004).
Article CAS PubMed PubMed Central Google Scholar
Udall, J. A., Swanson, J. M., Nettleton, D., Percifield, R. J. & Wendel, J. F. A novel approach for characterizing expression levels of genes duplicated by polyploidy. Genetics 173, 1823–1827 (2006).
Article CAS PubMed PubMed Central Google Scholar
Akhunova, A. R., Matniyazov, R. T., Liang, H. & Akhunov, E. D. Homoeolog-specific transcriptional bias in allopolyploid wheat. BMC genomics 11, 505 (2010).
Article PubMed PubMed Central Google Scholar
Mithani, A. et al. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids. BMC genomics 14, 653, 10.1186/1471-2164-14-653 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z. J. & Pikaard, C. S. Transcriptional analysis of nucleolar dominance in polyploid plants: biased expression/silencing of progenitor rRNA genes is developmentally regulated in Brassica. Proceedings of the National Academy of Sciences of the United States of America 94, 3442–3447 (1997).
Article CAS ADS PubMed PubMed Central Google Scholar
Saintenac, C., Jiang, D. & Akhunov, E. D. Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome. Genome biology 12, R88, 10.1186/gb-2011-12-9-r88 (2011).
Article CAS PubMed PubMed Central Google Scholar
Brenchley, R. et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491, 705–710, 10.1038/nature11650 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
Chalhoub, B. et al. Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950–953, 10.1126/science.1253435 (2014).
Article CAS ADS PubMed Google Scholar
Page, J. T., Gingle, A. R. & Udall, J. A. PolyCat: a resource for genome categorization of sequencing reads from allopolyploid organisms. G3 (Bethesda) 3, 517–525, 10.1534/g3.112.005298 (2013).
Article CAS Google Scholar
Page, J. T. & Udall, J. A. Methods for mapping and categorization of DNA sequence reads from allopolyploid organisms. BMC genetics 16 Suppl 2, S4, 10.1186/1471-2156-16-S2-S4 (2015).
Article PubMed PubMed Central Google Scholar
Le Cunff, L. et al. Diploid/polyploid syntenic shuttle mapping and haplotype-specific chromosome walking toward a rust resistance gene (Bru1) in highly polyploid sugarcane (2n approximately 12x approximately 115). Genetics 180, 649–660, 10.1534/genetics.108.091355 (2008).
Article PubMed PubMed Central Google Scholar
Rousseau-Gueutin, M. et al. Comparative Genetic Mapping Between Octoploid and Diploid Fragaria Species Reveals a High Level of Colinearity Between Their Genomes and the Essentially Disomic Behavior of the Cultivated Octoploid Strawberry. Genetics 179, 2045–2060, 10.1534/genetics.107.083840 (2008).
Article PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, 10.1093/bioinformatics/btp352 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, 10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology 10, R25, 10.1186/gb-2009-10-3-r25 (2009).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357–359, 10.1038/nmeth.1923 (2012).
Article CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research 20, 1297–1303, 10.1101/gr.107524.110 (2010).
Article CAS PubMed PubMed Central Google Scholar
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 (2012).
Nagaharu, U. Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Jap J Bot 7, 389–452 (1935).
Google Scholar
Stewart, C. N., Halfhill, M. D. & Warwick, S. I. Transgene introgression from genetically modified crops to their wild relatives. Nature reviews. Genetics 4, 806–817 (2003).
Article CAS PubMed Google Scholar
Warwick, S. I., Gugel, R. K., McDonald, T. & Falk, K. C. Genetic Variation of Ethiopian Mustard (Brassica carinata A. Braun) Germplasm in Western Canada. Genet Resour Crop Evol 53, 297–312, 10.1007/s10722-004-6108-y (2006).
Article CAS Google Scholar
Song, K. M., Osborn, T. C. & Williams, P. H. Brassica taxonomy based on nuclear restriction fragment length polymorphisms (RFLPs). Theoret. Appl. Genetics 75, 784–794, 10.1007/BF00265606 (1988).
Article CAS Google Scholar
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423−+, 10.1038/Nature11798 (2012).
Article ADS PubMed Google Scholar
Sears, E. R. In Chromosome Manipulations and Plant Genetics (eds Riley, R. & Lewis, K. R. ) 29–45 (Oliver and Boyd, 1966).
Leach, L. J. et al. Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid bread wheat. BMC genomics 15, 276, 10.1186/1471-2164-15-276 (2014).
Article CAS PubMed PubMed Central Google Scholar
Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in bioinformatics 14, 178–192, Doi 10.1093/Bib/Bbs017 (2013).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study was supported by LUMS Faculty Start-up Grant to A.M.

Author information

Authors and Affiliations

Department of Biology, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), D.H.A., Lahore, 54792, Pakistan
Amina Khan & Aziz Mithani
Department of Plant Sciences, University of Oxford, Oxford, OX1 3RB, United Kingdom
Eric J. Belfield & Nicholas P. Harberd

Authors

Amina Khan
View author publications
You can also search for this author in PubMed Google Scholar
Eric J. Belfield
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas P. Harberd
View author publications
You can also search for this author in PubMed Google Scholar
Aziz Mithani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.M. and N.P.H. conceived HANDS2 with E.J.B. providing further initial input. A.M. developed and implemented HANDS2. A.K. validated HANDS and analysed the data. E.J.B., N.P.H and A.M. provided additional input. A.M. wrote the manuscript. All others read and approved the final manuscript.

Corresponding author

Correspondence to Aziz Mithani.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information (DOC 84 kb)

Supplementary Tables S1,S13-14 (XLS 36 kb)

Supplementary Table S2 (XLS 17681 kb)

Supplementary Table S3 (XLS 7768 kb)

Supplementary Table S4 (XLS 7729 kb)

Supplementary Table S5 (XLS 6888 kb)

Supplementary Table S6 (XLS 6888 kb)

Supplementary Table S7 (XLS 6888 kb)

Supplementary Table S8 (XLS 6888 kb)

Supplementary Table S9 (XLS 7793 kb)

Supplementary Table S10 (XLS 6889 kb)

Supplementary Table S11 (XLS 6517 kb)

Supplementary Table S12 (XLS 2182 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Khan, A., Belfield, E., Harberd, N. et al. HANDS2: accurate assignment of homoeallelic base-identity in allopolyploids despite missing data. Sci Rep 6, 29234 (2016). https://doi.org/10.1038/srep29234

Download citation

Received: 08 December 2015
Accepted: 14 June 2016
Published: 05 July 2016
DOI: https://doi.org/10.1038/srep29234

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.