Genome-Wide SNP discovery and genomic characterization in avocado (Persea americana Mill.)

Talavera, Alicia; Soorni, Aboozar; Bombarely, Aureliano; Matas, Antonio J.; Hormaza, Jose I.

doi:10.1038/s41598-019-56526-4

Download PDF

Article
Open access
Published: 27 December 2019

Genome-Wide SNP discovery and genomic characterization in avocado (Persea americana Mill.)

Scientific Reports volume 9, Article number: 20137 (2019) Cite this article

5350 Accesses
24 Citations
18 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 21 February 2023

This article has been updated

Abstract

Modern crop breeding is based on the use of genetically and phenotypically diverse plant material and, consequently, a proper understanding of population structure and genetic diversity is essential for the effective development of breeding programs. An example is avocado, a woody perennial fruit crop native to Mesoamerica with an increasing popularity worldwide. Despite its commercial success, there are important gaps in the molecular tools available to support on-going avocado breeding programs. In order to fill this gap, in this study, an avocado ‘Hass’ draft assembly was developed and used as reference to study 71 avocado accessions which represent the three traditionally recognized avocado horticultural races or subspecies (Mexican, Guatemalan and West Indian). An average of 5.72 M reads per individual and a total of 7,108 single nucleotide polymorphism (SNP) markers were produced for the 71 accessions analyzed. These molecular markers were used in a study of genetic diversity and population structure. The results broadly separate the accessions studied according to their botanical race in four main groups: Mexican, Guatemalan, West Indian and an additional group of Guatemalan × Mexican hybrids. The high number of SNP markers developed in this study will be a useful genomic resource for the avocado community.

Genetic diversity, population structure, and relationships of apricot (Prunus) based on restriction site-associated DNA sequencing

Article Open access 01 May 2020

A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci

Article Open access 05 October 2020

Whole-genome sequencing of 13 Arctic plants and draft genomes of Oxyria digyna and Cochlearia groenlandica

Article Open access 18 July 2024

Introduction

Avocado (Persea americana Mill.) is a subtropical evergreen tree native to Mesoamerica. Avocado belongs to the Lauraceae, a family in the order Laurales that, together with the orders Canellales, Piperales and Magnoliales, is included in the Magnoliid clade of early-divergent angiosperms¹. This pantropical family has about 50 genera and 2500 to 3000 species. Besides avocado, only a few species in the family have economic importance and these include mainly spices [bay laurel (Laurus nobilis L.) and cinnamon (Cinnamomum verum J.Presl)], camphor (C. camphora (L.) J.Presl) and timber trees (Nectandra spp., Ocotea spp. and Phoebe spp.).

Traditionally, avocado genotypes have been classified in three horticultural races or subspecies mainly related to ecological preferences and botanical characteristics². The Mexican and Guatemalan subspecies are adapted to highland areas in Central America (cold climates), being the Guatemalan race more susceptible to low temperatures. The West Indian subspecies is adapted to low-land areas in the same region (tropical climates).

Avocado market demand has increased exponentially in recent years and in 2017 avocado world production was close to 6 million tons. Most of the production is concentrated in a few countries (Mexico, Dominican Republic, Peru, Indonesia, Colombia, Brazil), Mexico being the largest producer with 34% of the total world production (more than 2 million tons)³. However, in spite of the increasing importance of this crop, there are important bottlenecks for efficient breeding and development of new avocado cultivars, due to the absence or poor availability of molecular resources and phenotypic data and to the limited genetic pool in breeding programs worldwide. Developing new high quality avocado cultivars is an urgent need in this crop since approximately 90% of the avocado production worldwide depends on a single cultivar, ‘Hass’, that originated as a chance seedling in California ninety years ago⁴.

Different types of genetic markers have been utilized in avocado for genotype fingerprinting, paternity analyses, diversity and phylogenetic studies, linkage map construction and screening for traits of interest. Initial works included minisatellites⁵, Variable Number of Tandem Repeats (VNTRs)⁶, Random Amplified Polymorphic DNA (RAPDs)⁷ and Restriction Fragment Length Polymorphism (RFLPs)^8,9. More recently, Single Sequence Repeats (SSRs), which are codominant and highly polymorphic facilitating the study of intraspecific relations and diversity, have been specifically developed in avocado and used for fingerprinting and diversity analyses^{10,11,12,13,14,15,16,17,18,19}. However, in spite of the inherent advantages of SSR markers, their frequency of distribution is not uniform over the genome and their use in association analyses is problematic²⁰. Moreover, it is difficult to compare SSRs from different populations or systems, and the analyses are laborious and costly compared to new sequencing technologies (NGS)²¹. Indeed, Single Nucleotide Polymorphism (SNP) markers are becoming the marker of choice in crop genetic studies with different aims: linkage mapping, analysis of quantitative trait loci (QTL), association studies, marker-assisted selection (MAS) or genomic selection (GS)²². The advantages of SNPs include the large number of markers that can be generated at a reduced cost, the fact that they are the most frequent source of variation in eukaryotic genomes, their bi-allelic nature that offers accuracy in variant calling, their high reproducibility or their reduced cost that makes them accessible to most laboratories^23,24,25. Those advantages are specially relevant in woody perennial crops since their application would significantly reduce time and cost of breeding programs.

Up to now, NGS applied to avocado research has been reduced to transcriptome analyses^26,27 and the development of SNPs to characterize genetic diversity^28,29,30. In addition, very recently, a first avocado nuclear genome sequence has been published³¹. In order to provide additional high quality SNPs for the avocado research community, in this work a collection of 71 avocado accessions representing the three classical botanical races were genotyped and characterized using newly developed SNP markers. Those markers were mapped to a draft genome of the most important avocado cultivar worldwide, ‘Hass’, in order to increase the quality of the markers developed.

Results

Development of an avocado draft genome for mapping the raw reads

A draft genome of the avocado ‘Hass’ variety was developed to assist with read mapping and SNP calling. The sequencing of ‘Hass’ DNA produced 487.54 million raw Illumina reads (73.13 Gb) and 487.21 million processed reads (72.15 Gb). The estimated haploid genome size for ‘Hass’ ranged from 1.33 Gb (17-mer) to 1.63 Gb (73-mer) with an estimated genomic heterozygosity ranging from 1.05% (73-mer) to 1.41% (17-mer). The stats are summarized in Table 1. The assembly size represents 77% of the estimated genome size (1.33 Gb). The total number of sequences indicates highly fragmented assemblies in which the average sequence size (0.54 Kb) and the L50 (0.68 Kb) are below the average plant gene length (e.g. 2.01 Kb for Arabidopsis thaliana) and, consequently, no gene structural annotation could be performed³².

Table 1 Summary of the Persea americana Mill. cv ‘Hass’ draft genome assembly.

Full size table

GBS sequencing, mapping and variant calling

GBS (Genotyping-By-Sequencing) libraries for 71 avocado accessions (Table 2) were constructed and sequenced by Illumina HiSeq 2500 (1 × 100) and Illumina HISeq 4000 (2 × 150). The sequencing produced 405.93 million raw Illumina reads. After processing (see Methods), 345.37 million reads were obtained with differences among accessions in the number of reads (Supplementary Fig. S1

Table 2 List of the 71 Avocado accessions studied with SNPs in this work.

Full size table

). A higher number of processed reads is often associated to a higher number of mapped reads to each of the GBS locations. These reads of the individual genotypes were mapped onto the reference genome to retain only mapped reads to a unique localization in the genome. Such uniquely mapped reads represented approximately 80% of the total. Finally, 1,070,902 variants were detected. Of those, 945,064 were SNPs, 22,321 were InDels, 69,500 were MNPs (multi-nucleotide polymorphisms) and 6,604 were complex (as combination of the previous types).

SNP development

After filtering (see Methods), 7,108 SNPs with no missing data, of which 19.45% were private (Supplementary Table S1), were detected for the 71 accessions (Table 2). The SNPs were categorized according to nucleotide substitutions: 61.04% were transitions [C/T (2195) or A/G (2144)] and 38.96% transversions [A/C (778), C/G (646), A/T (666), G/T (679)]. The transition/transversion ratio was 1.57, similar to the results reported in other species^33,34,35. The mean of observed heterozygosity was 0.16 whereas the mean of expected heterozygosity was 0.17 and the average frequency of minor alleles was 0.11, although, for the samples studied, the population was not in Hardy-Weinberg equilibrium. This last result was expected taking into account that the material studied does not represent a randomly obtained population.

Diversity and population structure using filtered SNPs

Distinct relationships among accessions were obtained with different analyses of the filtered SNPs. A first approximation to study genetic structure was obtained using principal component analysis (PCA) for the complete set of biallelic SNPs (Fig. 1). The first two components explained more than 40% of the variation (26.1% and 15.1%). Three differentiated groups that correspond with the three different horticultural races were observed. As expected, interracial hybrid accessions could be observed between the three main groups.

Prevosti’s distance³⁶ was used to evaluate the genetic structure as a second approximation. This distance determines the fraction of different sites between samples. It was plotted as a dendrogram based on Neighbor Joining (NJ) showing the relationships between genotypes (Fig. 2a). Two main clusters weakly supported by bootstrap values (27.8) were revealed in the dendrogram. One of the clusters was composed of a big strongly supported subgroup (71.8) which included mainly Guatemalan x Mexican (GxM) hybrid genotypes (‘Pinkerton’, ‘Lyon’, ‘Iriet’, ‘Gem’, ‘Hass’, ‘Lamb Hass’, among others), a few genotypes categorized as Mexican (‘Teague’, ‘Negra de la Cruz’), as well as genotypes considered as Guatemalan (‘Shepard’), and a genotype of unknown race (‘TX531’). Another subgroup (bootstrap value of 38.1) included mainly accessions considered as Guatemalan (‘Reed’, ‘Nabal’, ‘Nimlioh’, ‘Linda’, ‘Murrieta Green’) and it was close to genotypes of unknown race (‘A0.67’, ‘Mike’,‘Mrs Tooley’). Moreover, the other two genotypes that are reported as Guatemalan (‘NN10’, ‘NN63’) form a strongly supported cluster (67.6), whereas ‘Maluma’ and ‘Alcaraz’ appear isolated of these subgroups.

The second cluster was formed by two genotypes of unknown origin (‘A0.68’ and ‘1.14.2’) and a strongly supported group (bootstrap value of 80.5) composed of two subgroups. One of them (well supported with a bootstrap value of 85.9), contained genotypes considered as Mexican (‘G-6’, ‘Thomas’, ‘Gottfried’), a MxWI hybrid (‘Vero Beach No. 1’), as well as genotypes of unknown race (‘RR-86’, ‘Telez’, ‘Rustenburg Round’, ‘C.A. Bueno’ and ‘Hansie’). The other subgroup was weakly supported (bootstrap value of 26.1) and was composed of two subgroups. One of them (29.1 bootstrap value), contained mostly West Indian genotypes (‘Pollock’, ‘Bernecker’, ‘Waldin’, ‘Russel’, ‘Catalina’, ‘Butler’, ‘Wester’, ‘Trapp’, ‘Fuchsia’,‘Largo’), together with some Guatemalan × West Indian (GxWI) (‘Beta’, ‘Collinred B’) or Mexican x West Indian (MxWI) (‘Lisa’) hybrids. The other subgroup was also weakly supported (52.6), and was represented by GxWI hybrids (‘Yon’, ‘Choquette’, ‘Collinson’, ‘Melendez 2’ and ‘Semil 43’) and a MxWI hybrid (‘Monroe’).

An admixture analysis using the ADMIXTURE software³⁷ was performed after the PCA analysis. The most favorable number of clusters was 4, followed by 3 and 5 although the differences among the number of populations were small with a cross-validation error between 0.28 and 0.29. At K = 4, the division between genotypes reported as Mexican, West Indian and Guatemalan was evident. Furthermore, a separated cluster was formed with the GxM hybrid genotypes (Fig. 2b). In order to have a broader view of the genetic structure of the populations, the STRUCTURE software³⁸ and STRUCTURE HARVESTER³⁹ were also implemented. In agreement with the ADMIXTURE results, K = 4 was revealed as the most probable number of clusters (Supplementary Figs. S2 and S3b) but, in this case, accessions considered as Guatemalan and as GxM hybrids were not clearly differentiated.

In order to describe the diversity between pre-defined groups, Discriminant Analysis of Principal Components (DAPC) was performed to obtain the number of clusters. These results were consistent with the cross-validation errors (ADMIXTURE) and Evanno algorithm (STRUCTURE) regarding the number of clusters (K). K = 4 was again revealed as the most likely scenario, closely followed by K = 3 and K = 5 (Fig. 3) (Supplementary Table S2). At K = 3, accessions were divided in agreement with the other methods (ADMIXTURE and STRUCTURE). One group included mainly Guatemalan race accessions and GxM hybrids. A second group consisted of West Indian race accessions, GxWI hybrids and MxWI hybrids. The third group included Mexican race genotypes, GxM hybrids and MxWI hybrids (Supplementary Table S2). For K = 4, the West Indian race accessions were divided into two groups, one which included mainly pure West Indian genotypes and another one which included mainly GxWI hybrid genotypes. For K = 5, Guatemalan genotypes and GxM hybrid genotypes were split into two different groups (Supplementary Table S2).

In order to validate the pre-defined clusters shown above, the fixation index (Fst value) was calculated for every pair of populations using the pre-defined groups (K = 3–5) by DAPC (Supplementary Table S2). In all cases, a contrast between populations was shown and supported the previous analysis. For K = 4, the lowest value was 0.18 between groups two (mostly genotypes considered as GxM hybrids, and some cultivars considered Guatemalan) and one (mostly cultivars considered as GxWI hybrids). The highest value was 0.61 between groups three (mostly cultivars considered as West-Indian) and two (mostly cultivars considered as GxM hybrids) (Table 3).

Table 3 Fst genetic differentiation of 71 avocado accessions grouped by K = 4.

Full size table

Nucleotide diversity was also studied for each cluster using different indexes (Pi and Watterson’s Theta) (Table 4). For K = 4, Pi ranged from 270.14 to 515.27, and Watterson’s Theta ranged from 304.74 to 471.15. A higher diversity was obtained in the cluster with mainly Mexican genotypes, followed by the cluster with mainly West Indian and Guatemalan genotypes, whereas a lower diversity was shown in the group with mainly GxM hybrids.

Table 4 Nucleotide diversity statistics according to population structure (K = 3, K = 4, and K = 5) performed by DAPC.

Full size table

The genetic diversity per group established by DAPC and minor allele frequencies were also analyzed. The highest observed heterozygosity (0.20) was shown in the cluster with mainly Mexican race cultivars and, in the case of minor allele frequencies, the highest values (0.11) were observed in the same group (Table 5).

Table 5 Proportion of observed heterozygosity (Ho) and average minor allele frequency for K = 3, K = 4, and K = 5.

Full size table

Assignment of genotypes of unknown or confusing pedigree to established groups

Based on the above analyses, the assignment of some genotypes of unknown or confusing pedigree to racial groups could be established. Among known genotypes with ambiguous racial assignments, examples include ‘Bacon’, ‘Edranol’, ‘Fuerte’, ‘Gem’, ‘Gwen’, ‘Hass’, ‘Lyon’, ‘Pinkerton’, ‘Toro Canyon’ and ‘TX531’ which have been considered by different authors as pure Mexican⁴⁰, Guatemalan^4,12,41 or GxM hybrids^4,11,12 (Table 2). The ADMIXTURE results obtained in this work indicate that all are indeed GxM hybrids, although in ‘Edranol’ a West Indian component was also found. Some samples whose pedigree was unknown (‘A0.25’, ‘A0.68’, ‘87.17.1’, ‘1.14.2’ and ‘Alcaraz’) seem to be GxM hybrids although some probably are three-race hybrids with a low proportion of West Indian heritage. Other accessions (‘Mike’ and ‘Mrs Tooley’) seem to be pure Guatemalan whereas others (‘Hansie’ and ‘C.A. Bueno’) appear as pure Mexican.

Discussion

Although numerous crop breeding programs are benefiting from new molecular genotyping approaches, these advances are slower in most woody perennial species and especially in tropical and subtropical fruit crops since, in most cases, no previous significant genomic information is available. Regarding avocado, in spite of the different ongoing breeding programs and different types of molecular markers that have been developed and used in the last two decades^{5,8,10,14,15,16,17,18,19,28,29,30,31,40,42,43}, there is still a need to generate additional markers that can be used at a large scale, especially to link molecular markers to most of the traits of agronomic interest, that are controlled by multiple genes. Thereby, the use of new approaches such as high throughput sequencing can fill this gap in order to speed up avocado breeding as has occurred in other crops.

A draft ‘Hass’ avocado genome for diversity analyses

In this study an avocado (cv. ‘Hass’) fragmented genome with small contigs was developed. This fragmentation presents several limitations for genomic studies, such as the impossibility to perform a gene structure annotation, and, consequently, its use for gene discovery. Nevertheless, this draft genome allowed aligning the reads from a reduced-representation approach, and obtaining a high number of molecular markers. Since the use of non-reference variant calling approaches such as Stacks⁴⁴, TASSEL-UNEAK⁴⁵ and GBS-SNP-CROP⁴⁶ can increase the possibilities of variant miscalls^46,47,48 the approach followed in this work using a fragmented genome draft is appropriate to reduce this problem. Previous studies have developed some SNP markers in avocado^{28,29,30,31,43} but, to our knowledge, this is the first time that an avocado draft genome has been used to facilitate SNP calling from a reduced-representation sequencing. Current work is underway to generate a reference genome of avocado starting from the draft ‘Hass’ genome developed in this work.

Diversity analyses and population structure

A total of 7,108 Single-Nucleotide Polymorphism (SNPs) were detected for the 71 accessions studied using a ‘Hass’ draft genome to align the reads. These molecular markers showed a higher proportion of transition substitutions (61.10%) over transversions (38.89%). This is commonly known as ‘transitions bias’ and it is explained by the fact that transitions are more conservative on proteins and has been reported in previous studies with different crops including avocado^28,49,50,51. Probably due to the lack of sterility barriers between the avocado horticultural races, a low percentage (19.45%) of private SNPs was observed.

The average observed heterozygosity (0.16) was lower than the results reported in other studies based on simple sequence repeat (SSR) markers^15,16,17 and with different accessions than those analyzed in this work. These differences have been obtained in other studies^50,52 and were expected considering the nature of SSRs^49,53. A lower level of observed heterozygosity was also reported compared to other woody perennial crops such as peach, litchi or olive^54,55,56. These differences could be due to the kind of accessions considered. Thus, avocado market worldwide is currently dominated by a single cultivar, ‘Hass’, whereas in other fruit crops, as peach and olive, a wide range of cultivars is grown around the world. ‘Hass’ or ‘Hass’ descendants, such as ‘Gwen’, are part of the pedigree of different varieties in the GxM group (the most representative in this study) and this biased selection could result in a decrease of heterozygosity.

In this work, different analyses utilizing SNP markers (PCA, Neighbour-Joining, ADMIXTURE, STRUCTURE, and DAPC) were performed. These show a clear separation between horticultural races, although with exceptions in some STRUCTURE and DAPC results, in which a clear distinction between genotypes considered as Guatemalan and GxM hybrids was not obtained for K = 4 in contrast to ADMIXTURE with which a separation between those two groups was found. This difficulty in separating both groups was expected since Guatemalan genes predominate in current avocado germplasm⁵⁷. Moreover, as there are not sterility barriers among the botanical races, admixture between different races may have occurred during avocado evolutionary history and domestication processes². In any case, overall, the clustering inferred with DAPC resulted in lower admixture among accessions than that inferred with either STRUCTURE or ADMIXTURE. Similar results of genetic admixture underestimation with DAPC have been shown in other studies and could be due to overestimation of posterior membership probability by DAPC^58,59. Interestingly at K = 5 a new subgroup is obtained with ADMIXTURE (Fig. 2b) in the GxM group. This new group could represent accessions with a higher Mexican component.

The group with mainly Mexican race accessions shows the highest genetic diversity and the highest proportion of private SNPs (46.42%) (Supplementary Table S3) together with a high observed heterozygosity. Similar results were also obtained in other studies^11,12,16. Regarding the genetic diversity results, it should be noted that the group with mainly Guatemalan accessions and the group with mainly Mexican accessions show a higher genetic diversity than the GxM hybrid group, despite their lower sample size. The results obtained also show a clear separation of West Indian accessions from the two other horticultural races as has been reported in previous studies^9,16,18,40 using a lower number of molecular markers. This is expected taking into account that the Mexican and Guatemalan races have a common ecological niche, in the tropical highlands, whereas the West Indian race is adapted to lowlands in Central America².

Assignment of genotypes of unknown pedigree to established groups

In avocado the main criteria to assign genotypes to the three specific botanical races have been based on morphological traits and, since most of the accessions are developed from chance seedlings, their pedigree is unknown. The approach followed in this work allowed the assignment of some unknown or unclear genotypes to established groups. In agreement with previous works⁴⁰, admixture among the three botanical races are shown for some cultivars, although GxM genotypes involve most of the accessions studied. These hybrids represent the most important avocado cultivars grown worldwide.

In this study, the development of a high number of SNPs after mapping the raw read to a draft avocado (cv. ‘Hass’) genome has allowed the genotyping and efficient discrimination of avocado accessions revealing a clear grouping based on racial origin. The SNP markers developed are a public resource that will be useful for future studies of avocado germplasm management and characterization, Genetic Selection (GS), Marker Assisted Selection (MAS), Genome Wide Association Studies (GWAS) or Quantitative Trait Loci (QTL) analyses and, consequently, helping to significantly reduce breeding costs in this crop. However, this progress will need additional studies to increase the number of available markers in order to have an optimum number of markers in the different avocado breeding populations.

Methods

Plant material

Seventy one avocado (Persea americana Mill.) accessions were selected and young leaves were collected in the field. The accessions analyzed combine genotypes from the different avocado races obtained from breeding programs (such as ‘Gem’, ‘Gwen’, ‘Iriet’ or ‘Lamb Hass’), commercial varieties (‘Bacon’, ‘Choquette’, ‘Edranol’, ‘Fuerte’, ‘Hass’ or ‘Reed’), rootstocks (‘Dusa’, ‘Thomas’ or ‘Toro Canyon’) and local Spanish accessions with interest as possible source of new rootstocks (‘La Piscina’ or ‘C.A. Bueno’). Those accessions are maintained in three different germplasm collections: IHSM La Mayora (IM; Algarrobo Costa, Spain), Westfalia Fruit (WF; Tzaneen, South Africa) and the US National Avocado Germplasm Repository (UA; Miami, FL, US) (Table 2). Two different samples of ‘Hass’ from two different germplasm collections were included in the analyses as control of the results obtained.

DNA extraction, library preparation, sequencing and processing the raw reads

DNA from leaves of each accession was isolated using a Qiagen DNeasy Plant Mini Kit following the manufacturer’s guidelines. The DNA purity and concentration were determined using NanoDrop spectrophotometer and Qubit 2.0 Fluorometer. The optimization of a library enzyme was performed on a ‘Hass’ genomic DNA sample digested with PstI, EcoT221, and ApeKI restriction enzymes. The DNA fragment distribution was assessed with Agilent 2100 Bioanalyzer System. Libraries were prepared using Sonah et al.⁶⁰ protocol digesting 100 ng genomic DNA of each variety with ApeKI. The resulting libraries were sequenced with the Illumina HiSeq 2500 platform (1 × 100) at the Duke Center for Genomics and Computational Biology and the Illumina HiSeq 4000 platform (2 × 150) at the Novogene Corporation.

The raw reads were demultiplexed using GBSx package⁶¹. Then reads were processed to remove possible adapter sequences, discard reads shorter than 50 bases and filter low-quality regions by using Fastq-mcf software version 1.04.807⁶² (-l 50 and -q 30).

A draft avocado (cv.‘Hass’) genome assembly

In order to map the reads to a draft avocado genome, the ‘Hass’ genotype was sequenced (2 × 150) with a depth of 100X using the Illumina platform. The genome size and heterozygosity were estimated using the Kmer distribution approach described in Liu et al. 2013⁶³. In brief, Kmer distributions for 19, 25, 31, 37, 43, 55, 61, 67, 73 and 85-mers were calculated with Jellyfish and then loaded in the GenomeScope web portal⁶⁴. Two different assemblers were used to assemble the Illumina reads, Minia⁶⁵ and SOAPdenovo2⁶⁶. Although both of them use algorithms for de novo short read assemblies, Minia requires lower computational resources that SOAPdenovo2 and filters false positives⁶⁵. Kmer sizes ranging from 17 to 115-mers (steps of 8) were used with both assemblers. The assembled contigs stats were compared across the different conditions and assemblers and the assembly produced by Minia⁶⁵ with a Kmer of 115 was selected as the one that produced the most contiguous assembly as reported in other studies⁶⁵. Contigs were scaffolded using SSPACE v3.0⁶⁷.

Mapping, SNP discovery and filtering

The generated reads were mapped with BWA version 0.7.10-r789⁶⁸ with default parameters. Unmapped reads were removed using Samtools version 1.3.1⁶⁹ and BAM files were produced with the retained reads. All BAM files were merged by Bamaddrg (https://github.com/ekg/bamaddrg), and Samtools package version 1.3.1⁶⁹ was used to sort and index BAM files. FreeBayes version 0.9.20⁷⁰ was run to detect variants and remove SNPs with mapping quality lower <20 and read depth <5. The raw SNPs obtained were further filtered using the VCFtools package version 0.1.12.⁷¹ removing no biallelic SNPs, missing data and SNPs within 1000 bp distance. Before and after filtering, a summary statistic was generated using Vcf-stats version 0.1.12⁷¹. Finally, only SNP variants were retained and their diversity was analyzed using Adegenet package version 2.1.1⁷² and Hardy-Weinberg equilibrium was tested using pegas package version 0.10⁷³.

Analysis of the genetic structure of diverse avocado accessions

In order to show the usefulness of the SNPs generated, the genetic relationships, genetic structure and group divergence of 71 avocado accessions were thoroughly analyzed using different methods such as PCA, NJ distance tree, DAPC and Bayesian clustering as well as genetic properties of these populations through parameter such as Fst, Pi and Watterson’s theta.

PCA was performed using Adegenet package version 2.1.1⁷² and was plotted using ggplot2 packages version 3⁷⁴ in RStudio version 1.1.453⁷⁵ and R version 3.5.1.

Prevosti’s distance (\(D\,{\Pr }evosti\,(a,b)=\,\frac{1}{2r}\,\mathop{\sum }\limits_{k=1}^{\upsilon }\,\mathop{\sum }\limits_{j=1}^{m(k)}\,|Pajk-Pbjk|\) where \(\upsilon \) is the number of loci considered, Pajk the frequency of the allele arrangement k in the locus j in the population a, and Pajk the corresponding value in the population b³⁶) matrix and Neighbor-joining (NJ) tree were generated via the Poppr package version 2.8.2^76,77 with 2000 bootstrap replicates using the SNP data set. The figures were plotted with FigTree version 1.4.4⁷⁸.

The population structure was studied with three different approaches (ADMIXTURE, STRUCTURE and DAPC). The three programs basically assign each of the accessions to one or more ancestral populations or clusters. They differ in how the data are processed and the algorithm used. Thus, maximum likelihood estimation of individual ancestries was analyzed with ADMIXTURE version 1.3³⁷ that was run iterating K from 1 to 20. This analysis is based on the same statistical model as STRUCTURE although it performs a maximum likelihood estimation of individuals instead of a Bayesian approach and, consequently, allows a faster cluster estimation from a large SNP dataset. Furthermore, in order to choose the optimum number of populations (K), a cross-validation approach was used for all the Single Nucleotide Polymorphism (SNPs). Each chosen value of K was plotted using RStudio version 1.1.453⁷⁵ and R version 3.5.1. The STRUCTURE program was run five times per each number of populations (K). Each run was implemented with a burn-in period of 20000 steps followed by 200000 Monte Carlo Markov chain replicates^79,80,81 Evanno et al.⁸² method was used to determine the most probable number of K with the software STRUCTURE HARVESTER³⁹. Subsequently, since STRUCTURE-like approaches assume that markers are not linked and that populations are panmictic³⁸, Discriminant Analysis of Principal Components (DAPC) was also applied in order to identify and describe well-defined clusters of genetically related genotypes using the R package Adegenet version 2.1.1⁷². To perform this analysis, data were transformed using PCA. The find.clusters function was used to identify the number of clusters. The Bayesian Information Criterion (BIC) was calculated to associate with the correct number of subgroups, and a cross-validation function (XvalDapc) was used to corroborate the best number of PCA retained. Before this analysis, the files were read using read.vcf and converted into Genind and Genlight class with VcfR2genind and VcfR2genlight.

Finally, the Fixation index (Fst) which allows differentiating populations with ranges between 0 (no differentiation) and 1 (complete differentiation)⁸³ was also obtained with the R package PopGenome version 2.6.1⁸⁴ to analyze group distinction. Moreover, Nucleotide diversity statistics Pi and Watterson’s theta were estimated considering the grouping produced by DAPC, K = 3, K = 4, and K = 5 and were also determined with the same package.

Data availability

The ‘Hass’ draft genome raw reads have been deposited at NCBI under the BioProject PRJNA564097. The GBS dataset is deposited under PRJNA564105. Most of the analyses have been carried out using R software 3.5.1. All scripts have been deposited at https://github.com/IHSMFruitCrops/Hass-genotyping.

Change history

21 February 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41598-023-29346-w

References

Chase, M. W. et al. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181(1), 1–20 (2016).
Article Google Scholar
Schaffer, B., Wolstenholme, B. N. & Wiley, A. W. Introduction in The Avocado: Botany, Production, and Uses. (eds. Schaffer, B., Wolstenholme, B. N & Whiley, A. W.) 1–9 (CABI, Wallingford, UK, 2013).
FAO. Statistics Division of Food and Agriculture Organization of the United Nations (FAOSTAT) http://www.fao.org/faostat/es/#data/QC (Accessed September 13th 2019).
Crane, J. H. et al. Cultivars and rootstocks in The Avocado: Botany, Production, and Uses (eds. Schaffer, B., Wolstenholme, B. N & Whiley, A. W.) 1–9 (CABI, Wallingford, UK, 2013).
Lavi, U., Hillel, J. & Vainstein, A. Application of DNA fingerprints for identification and genetic analysis of avocado. J. Am. Soc. Hort. Sci. 116, 1078–1081 (1991).
Article CAS Google Scholar
Mhameed, S. et al. Level of heterozygosity and mode of inheritance of variable number of tandem repeat loci in avocado. J. Am. Soc. Hort. Sci. 121, 778–782 (1996).
Article Google Scholar
Fiedler, J., Bufler, G. & Bangerth, F. Genetic relationships of avocado (Persea americana Mill.) using RAPD markers. Euphytica 101, 249–255 (1998).
Article Google Scholar
Furnier, G. R., Cummings, M. P. & Clegg, M. T. Evolution of the avocados as revealed by DNA restriction site variation. J. Hered. 81, 183–188 (1990).
Article CAS Google Scholar
Davis, J., Henderson, D., Kobayashi, M., Clegg, M. T. & Clegg, M. T. Genealogical relationships among cultivated avocado as revealed through RFLP analysis. J. Hered. 89, 319–323 (1998).
Article CAS Google Scholar
Sharon, D. et al. An integrated genetic linkage map of avocado. Theor. Appl. Genet. 95, 911–921 (1997).
Article CAS Google Scholar
Schnell, R. J. et al. Evaluation of avocado germplasm using microsatellite markers. J. Am. Soc. Hort. Sci. 128, 881–889 (2003).
Article CAS Google Scholar
Ashworth, V. E. T. M. & Clegg, M. T. Microsatellite markers in avocado (Persea americana Mill.): genealogical relationships among cultivated avocado genotypes. J. Hered. 94, 407–415 (2003).
Article CAS PubMed Google Scholar
Ashworth, V. E. T. M., Kobayashi, M. C., De La Cruz, M. & Clegg, M. T. Microsatellite markers in avocado (Persea americana Mill.): development of dinucleotide and trinucleotide markers. Sci. Hortic. 101, 255–267 (2004).
Article CAS Google Scholar
Borrone, W. J., Schnell, R. J., Viola, H. A. & Ploetz, R. C. Seventy microsatellite markers from Persea americana Miller (avocado) expressed sequences tags. Mol. Ecol. Notes 7, 439–444 (2007).
Article CAS Google Scholar
Alcaraz, M. L. & Hormaza, J. I. Molecular characterization and genetic diversity in an avocado collection of cultivars and local Spanish genotypes using SSRs. Hereditas 144, 244–253 (2007).
Article CAS PubMed Google Scholar
Gross-German, E. & Viruel, M. A. Molecular characterization of avocado germplasm with a new set of SSR and EST-SSR markers: genetic diversity, population structure, and identification of race-specific markers in a group of cultivated genotypes. Tree Genet. Genomes 9, 539–555 (2013).
Article Google Scholar
Guzmán, L. F. et al. Genetic structure and selection of a core collection for long term conservation of avocado in Mexico. Front. Plant. Sci. 8, 243, https://doi.org/10.3389/fpls.2017.00243 (2017).
Article PubMed PubMed Central Google Scholar
Boza, J. E. et al. Genetic differentiation, races and interracial admixture in avocado (Persea americana Mill.), and Persea spp. evaluated using SSR markers. Genet. Resour. Crop. Ev. 65, 1195–1215 (2018).
Article CAS Google Scholar
Ge, Y. et al. Transcriptome sequencing of different avocado ecotypes: de novo transcriptome assembly, annotation, identification and validation of EST-SSR Markers. Forests 10, 411, https://doi.org/10.3390/f10050411 (2019).
Article Google Scholar
Ching, A. et al. SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genetics 3, 19, https://doi.org/10.1186/1471-2156-3-19 (2002).
Article PubMed PubMed Central Google Scholar
Rasheed, A. et al. Crop breeding chips and genotyping plataforms: progress, challenge, and perspectives. Mol. Plant 10, 1047–1064 (2017).
Article CAS PubMed Google Scholar
Scheben, A., Batley, J. & Edwards, D. Genotyping-by-sequencing approaches to characterize crop genomes: choosing the right tool for the right application. Plant Biotecnol. J. 15, 149–161 (2017).
Article CAS Google Scholar
Studer, B. & Kölliker, R. SNP Genotyping Technologies. In Diagnostics in Plant Breeding (eds. Lübberstedt, T. & Varshney, R. K.) (Springer Science + Business Media Dordrecht, 2013).
Chagné, D. et al. Development of a set of SNP markers present in expressed genes of the apple. Genomics 92, 353–358 (2008).
Article PubMed Google Scholar
Wang, B., Tan, H. W. & Fang, W. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm. Hortic. Res. 2, 14065, https://doi.org/10.1038/hortres.2014.65 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ibarra-Laclette, E. et al. Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids. BMC Genomics 16, 599, https://doi.org/10.1186/s12864-015-1775-y (2015).
Article CAS PubMed PubMed Central Google Scholar
Vergara-Pulgar, C. et al. De novo assembly of Persea americana cv. “Hass“ transcriptome during fruit development. BCM Genomics 20, 108, https://doi.org/10.1186/s12864-019-5486-7 (2019).
Article Google Scholar
Kuhn, D. N. et al. Application of genomic tools to avocado (Persea americana) breeding: SNP discovery for genotyping and germplasm characterization. Sci. Hortic. 246, 1–11 (2019).
Article CAS Google Scholar
Ge, Y. et al. Genome-wide assessment of avocado germplasm determined from Specific Length Amplified Fragment sequencing and transcriptomes: population structure, genetic diversity, identification, and application of race-specific markers. Genes 10, 215, https://doi.org/10.3390/genes10030215 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rubinstein, M. et al. Genetic diversity of avocado (Persea americana Mill.) germplasm using pooled sequencing. BMC Genomics 20, 379, https://doi.org/10.1186/s12864-019-5672-7 (2019).
Article PubMed PubMed Central Google Scholar
Rendón-Anaya, M. et al. The avocado genome informs deep angiosperm phylogeny, highlights introgressive hybridization, and reveals pathogen-influenced gene space adaptation. PNAS 116, 17081–17089 (2019).
Article ADS PubMed PubMed Central Google Scholar
Wortman, J. R. et al. Annotation of the Arabidopsis genome. Plant Physiol. 132, 461–468 (2003).
Article CAS PubMed PubMed Central Google Scholar
Soorni, A., Fatahi, R., Salami, S. A., Haak, D. C. & Bombarely, A. Assessment of genetic diversity and population structure in Iranian cannabis germplasm. Sci Rep. 7, 15668, https://doi.org/10.1038/s41598-017-15816-5 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Shearman, J. R. et al. SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome. PLoS. One 10, e0121961, https://doi.org/10.1371/journal.pone.0121961 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pootakham, W. et al. Genome-wide SNP discovery and identification of QTL associated with agronomic traits in oil palm using genotyping-by-sequencing (GBS). Genomics 105, 288–295 (2015).
Article CAS PubMed Google Scholar
Prevosti, A., Ocaña, J. & Alonso, G. Distance between populations of Drosophila subobscura based on chromosome arrangement frequencies. Theor. Appl. Genet. 45, 231–241 (1975).
Article CAS PubMed Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Article CAS PubMed PubMed Central Google Scholar
Earl, D. A. & vonHoldt, B. M. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Resour. 4, 359–361 (2012).
Article Google Scholar
Chen, H., Morrell, P. L., Ashworth, V. E. T. M. & Clegg, M. T. Tracing the geographic origins of major avocado cultivars. J. Hered. 100, 56–65 (2009).
Article PubMed Google Scholar
Variety Database of the Univ. of California at Riverside, http://ucavo.ucr.edu/ (Accessed September 13th 2019) (2019).
Lavi, U., Cregan, P. B. & Hillel, J. Application of DNA markers for identification and breeding of fruit trees. Plant Breed. Rev. 12, 195–226 (1994).
Google Scholar
Chen, H., Morrell, P. L. & de la Cruz, M. Nucleotide diversity and linkage disequilibrium in wild avocado (Persea americana Mill.). J Hered. 99, 382–389 (2008).
Article CAS PubMed Google Scholar
Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, W. & Postlethwait, J. H. Stacks: Building and genotyping loci de novo from short-read sequences. G3-Genes Genom. Genet. 1, 171–182 (2011).
CAS Google Scholar
Lu, F. et al. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS. Genet. 9, e1003215, https://doi.org/10.1371/journal.pgen.1003215 (2013).
Article CAS PubMed PubMed Central Google Scholar
Melo, A. T. O., Bartaula, R. & Hale, L. GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data. BMC Bioinformatics 17, 29, https://doi.org/10.1186/s12859-016-0879-y (2016).
Article CAS PubMed PubMed Central Google Scholar
Leggett, R. M. & MacLean, D. Reference-free SNP detection: dealing with the data deluge. BMC Genomics 15, S10, https://doi.org/10.1186/1471-2164-15-S4-S10 (2014).
Article PubMed PubMed Central Google Scholar
Berthouly-Salazar, C. et al. Genotyping-by-Sequencing SNP identification for crops without a reference genome: using transcriptome based mapping as an alternative strategy. Front. Plant. Sci. 7, 777, https://doi.org/10.3389/fpls.2016.00777 (2016).
Article PubMed PubMed Central Google Scholar
Taranto, F., D´Agostino, N., Greco, B., Cardi, T. & Tripoli, P. Genome-wide SNP discovery and population structure analysis in pepper (Capsicum annum) using genotyping by sequencing. BMC Genomics 17, 943, https://doi.org/10.1186/s12864-016-3297-7 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pootakham, W. et al. Construction of high-density integrated genetic linkage map of rubber tree (Hevea brasiliensis) using genotyping-by-sequencing (GBS). Genomics 6, 367, https://doi.org/10.3389/fpls.2015.00367 (2015).
Article Google Scholar
Kujur, A. et al. Employing genome-wide SNP discovery and genotyping strategy to extrapolate the natural allelic diversity and domestication patterns in chickpea. Front. Plant. Sci. 6, 162, https://doi.org/10.3389/fpls.2015.00162 (2015).
Article PubMed PubMed Central Google Scholar
Micheletti, D. et al. Whole-Genome Analysis of diversity and SNP-major gene association in peach germplasm. Plant. Genome 5, 92–102 (2015).
Google Scholar
Helyar, S. J. et al. Application of SNPs for population genetics of nonmodel organisms: new opportunities and challenges. Mol. Ecol. Resour. 1, 123–36 (2011).
Article ADS Google Scholar
Aranzana, M. J., Illa, E., Howad, W. & Arús, P. A first insight into peach [Prunus persica (L.) Batsch] SNP variability. Tree Genet. Genomes 8, 1359–1369 (2012).
Article Google Scholar
Biton, I. et al. Development of a large set of SNP markers for assessing phylogenetic relationships between the olive cultivars composing the Israel olive germplasm collection. Mol. Breed. 35, 107 (2015).
Article Google Scholar
Liu, W. et al. Identifying litchi (Litchi chinensis Sonn.) cultivars and their genetic relationships using single nucleotide polymorphism (SNP) markers. PLoS. One 10, e0135390, https://doi.org/10.1371/journal.pone.0135390 (2015).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Chanderbali, A. S., Soltis, D. E.,Soltis, P. S. & Wolstenholme, B. N. Taxonomy and botany in The Avocado: Botany, Production, and Uses. (eds. Schaffer, B., Wolstenholme, B. N & Whiley, A. W.) 32–50 (CABI, Wallingford, UK, 2013).
Söderquist, P. et al. Admixture between released and wild game birds: a changing genetic landscape in European mallards (Anas platyrhynchos). Eur. J. Wildl. Res. 63, 98, https://doi.org/10.1007/s10344-017-1156-8 (2017).
Article Google Scholar
Frosch, C. et al. The genetic legacy of multiple beaver reintroductions in Central Europe. PLoS. One 9, e97619, https://doi.org/10.1371/journal.pone.0097619 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Sonah, H. et al. An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS. One 8, e54603, https://doi.org/10.1371/journal.pone.0054603 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Herten, K., Hestand, M. S., Vermeesch, J. R. & Van Houdt, J. K. J. GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments. BMC Bioinformatics 16, 73, https://doi.org/10.1186/s12859-015-0514-3 (2015).
Article CAS PubMed PubMed Central Google Scholar
Aronesty, E. Comparison of sequencing utility programs. Open Bioinforma. J. 7, 1–8, https://doi.org/10.2174/1875036201307010001 (2013).
Article MathSciNet Google Scholar
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Preprint at, https://arxiv.org/abs/1308.2012 (2013).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chikhi, R. & Rizk, G. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithm. Mol. Biol. 8, 22, https://doi.org/10.1186/1748-7188-8-22 (2013).
Article CAS Google Scholar
Luo, R. B. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18, https://doi.org/10.1186/2047-217x-1-18 (2012).
Article PubMed PubMed Central Google Scholar
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–9 (2011).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transformation. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Garrison E. & Marth G. Haplotype-based variant detection from short-read sequencing. Preprint at, http://arxiv.org/abs/1207.3907 (2012).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Jombart, T. Adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405 (2008).
Article CAS PubMed Google Scholar
Paradis, E. Pegas: an R package for population genetics with an integrated–modular approach. Bioinformatics 26, 419–420 (2010).
Article CAS PubMed Google Scholar
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2009).
R core Team. R: a language and environment for statistical computing. R foundation for statistical computing, Vienna; https://www.R-project.org (Accessed September 13th 2019) (2018).
Kamvar, Z. N., Tabina, J. F. & Grünwald, N. J. Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ Prepr. 2, e281, https://doi.org/10.7717/peerj.281 (2014).
Article Google Scholar
Kamvar, Z. N., Brooks, J. C. & Grünwald, N. J. Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Front. Genet. 6, 208, https://doi.org/10.3389/fgene.2015.00208 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rambaut, A. FigTree version 1.4.4, http://tree.bio.ed.ac.uk/software/figtree/ (Accessed September 13th 2019).
Larrañaga, N. et al. A Mesoamerican origin of cherimoya (Annona cherimola Mill.): Implications for conservation of plant genetic resources. Mol. Ecol. 26, 4116–4130 (2017).
Article PubMed Google Scholar
Martin, C., Herrero, M. & Hormaza, J. I. Molecular characterization of apricot germplasm from an old stone collection. PLoS. One 6, e23979, https://doi.org/10.1371/journal.pone.0023979 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Pritchard, J. K., Wen, X. & Falush, D. Documentation for structure software: version 2.3. Preprint at, http://burfordreiskind.com/wp-content/uploads/Structure_Manual_doc.pdf (Accessed September 13th 2019) (2010).
Evanno, G., Regnaut, S. & GOUDET, J. Detecting the number of clusters of individuals using the software: STRUCTURE: a simulation study. Mol. Ecol. 14, 2611–2620 (2005).
Article CAS PubMed Google Scholar
Hahn, M. W. Population structure in Molecular Population Genetics. (eds Sinauer Associates) 81–83 (Oxford University Press. U.S.A., 2018).
Pfeifer, B., Wittelsbürger, U., Ramos-Onsins, S. E. & Lercher, M. J. PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31, 1929–36, https://doi.org/10.1093/molbev/msu136 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hofshi, R. Avocado database, http://www.avocadosource.com/AvocadoVarieties/QueryDB.asp (Accessed September 13th 2019).
U.S. National Plant Germplasm System, https://npgsweb.ars-grin.gov/gringlobal/search.aspx? (Accessed September 13th 2019).
Avocado information database, https://www.myavocadotrees.com/beta-avocado.html (Accessed September 13th 2019).
Wolfe, H. S., Toy, L. R. & Stahl, A. L. Avocado production in Florida. Fl. Agr. Ext. Serv. Bull. 141 (1949).
Ben-Ya’cov, A., Zilberstaine, M., Goren, M. & Tomer, E. The Israeli avocado germplasm bank: where and why the items had been collected. In Proc. V World Avocado Congress. Spain. October 19–24 (2003).

Download references

Acknowledgements

This work was supported by Ministerio de Economía y Competitividad- European Regional Development Fund. (AGL2016-77267-R). AT was supported by an FPI fellowship from Ministerio de Economía y Competitividad (BES-2014-068832). We thank T. Hasing for help in library preparation and Y. Verdún for technical assistance. The authors acknowledge Advanced Research Computing at Virginia Tech for providing computational resources and technical support that have contributed to the results reported within this paper. The authors also thank Therese Bruwer and Zelda van Rooyen (Westfalia Fruit, South Africa) for providing some of the leaf material used in this study.

Author information

Authors and Affiliations

Instituto de Hortofruticultura Subtropical y Mediterránea La Mayora (IHSM La Mayora -UMA-CSIC), 29751, Algarrobo-Costa, Málaga, Spain
Alicia Talavera, Antonio J. Matas & Jose I. Hormaza
Department of Biotechnology, College of Agriculture, University of Technology, Isfahan, 84156-83111, Iran
Aboozar Soorni
School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA
Aureliano Bombarely
Department of Biosciences Università degli Studi di Milano, Milan, Italy
Aureliano Bombarely
Departamento de Biología Vegetal, Universidad de Málaga, Málaga, Spain
Antonio J. Matas

Authors

Alicia Talavera
View author publications
You can also search for this author in PubMed Google Scholar
Aboozar Soorni
View author publications
You can also search for this author in PubMed Google Scholar
Aureliano Bombarely
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Matas
View author publications
You can also search for this author in PubMed Google Scholar
Jose I. Hormaza
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.I.H., A.B., A.T. and A.J.M. conceived the experimental design. A.T. participated in the sample collection and DNA extraction. A.T. and A.S. prepared the libraries. A.T. and A.B. analyzed the data. All the authors discussed the results and contributed to the preparation of the final manuscript.

Corresponding author

Correspondence to Jose I. Hormaza.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Talavera, A., Soorni, A., Bombarely, A. et al. Genome-Wide SNP discovery and genomic characterization in avocado (Persea americana Mill.). Sci Rep 9, 20137 (2019). https://doi.org/10.1038/s41598-019-56526-4

Download citation

Received: 14 May 2019
Accepted: 13 December 2019
Published: 27 December 2019
DOI: https://doi.org/10.1038/s41598-019-56526-4

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Genetic diversity, population structure, and relationships of apricot (Prunus) based on restriction site-associated DNA sequencing

A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci

Whole-genome sequencing of 13 Arctic plants and draft genomes of Oxyria digyna and Cochlearia groenlandica

Introduction

Results

Development of an avocado draft genome for mapping the raw reads

GBS sequencing, mapping and variant calling

SNP development

Diversity and population structure using filtered SNPs

Assignment of genotypes of unknown or confusing pedigree to established groups

Discussion

A draft ‘Hass’ avocado genome for diversity analyses

Diversity analyses and population structure

Assignment of genotypes of unknown pedigree to established groups

Methods

Plant material

DNA extraction, library preparation, sequencing and processing the raw reads

A draft avocado (cv.‘Hass’) genome assembly

Mapping, SNP discovery and filtering

Analysis of the genetic structure of diverse avocado accessions

Data availability

Change history

21 February 2023

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary materials

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links