Local ancestry inference provides insight into Tilapia breeding programmes

Avallone, Alex; Bartie, Kerry L.; Selly, Sarah-Louise C.; Taslima, Khanam; Campos Mendoza, Antonio; Bekaert, Michaël

doi:10.1038/s41598-020-75744-9

Download PDF

Article
Open access
Published: 29 October 2020

Local ancestry inference provides insight into Tilapia breeding programmes

Alex Avallone¹,
Kerry L. Bartie¹,
Sarah-Louise C. Selly¹,
Khanam Taslima^1,2,
Antonio Campos Mendoza³ &
…
Michaël Bekaert¹

Scientific Reports volume 10, Article number: 18613 (2020) Cite this article

1470 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Tilapia is one of the most commercially valuable species in aquaculture with over 5 million tonnes of Nile tilapia, Oreochromis niloticus, produced worldwide every year. It has become increasingly important to keep track of the inheritance of the selected traits under continuous improvement (e.g. growth rate, size at maturity or genetic gender), as selective breeding has also resulted in genes that can hitchhike as part of the process. The goal of this study was to generate a Local Ancestry Interence workflow that harnessed existing tilapia genotyping-by-sequencing studies, such as Double Digest RAD-seq derived Single-Nucleotide Polymorphism markers. We developed a workflow and implemented a suite of tools to resolve the local ancestry of each chromosomal locus based on reference panels of tilapia species of known origin. We used tilapia species, wild populations and breeding programmes to validate our methods. The precision of the pipeline was evaluated on the basis of its ability to identify the genetic makeup of samples of known ancestry. The easy and inexpensive application of local ancestry inference in breeding programmes will facilitate the monitoring of the genetic profile of individuals of interest, the tracking of the movement of genes from parents to offspring and the detection of hybrids and their origin.

Whole genome re-sequencing reveals recent signatures of selection in three strains of farmed Nile tilapia (Oreochromis niloticus)

Article Open access 13 July 2020

Linkage mapping, comparative genome analysis, and QTL detection for growth in a non-model teleost, the meagre Argyrosomus regius, using ddRAD sequencing

Article Open access 29 March 2022

Detection of selection signatures in farmed coho salmon (Oncorhynchus kisutch) using dense genome-wide information

Article Open access 06 May 2021

Introduction

Despite their prominent role in aquaculture, the volume of research involving tilapia is relatively low when compared to other fish species, like salmonids. As a result, existing research on ancestry tracing in tilapia is not abundant either. A fast and accurate method for tracing the hybridisation of admixed fish, i.e. fish of mixed ancestry, would uncover their composition, thus reconstructing their origins and even aiding identification of escapees and monitor introgression of native species. It would also help follow the movement of unwanted or unexpected traits alongside selected ones in a population, thus yielding useful information to produce more economically and environmentally favourable variants. Local Ancestry Inference (LAI) applications are more frequent in studies of dog breeds, as in the case of Alaskan sled dogs¹, in which tracing of ancestry in sprint and long-distance sled dogs contributed to the identification of the genomic regions that correlated with performance-enhancing traits. Similarly, in humans, such tools have been more widely used to analyse how past migration events have impacted existing populations² and to improve identification of ancestry-specific genetic susceptibility to disease in genome-wide association studies³.

Due to the relative scarcity of research specific to tilapia, or even fish in general, most of the literature currently available on inference of local ancestry focuses on human applications⁴. In over 15 years, more than 20 new LAI methods for human applications have been introduced⁵. Less often, relevant literature can be found on other animals like insects⁶ or, as already mentioned, dogs⁷.

While it is still possible to apply processes and tools developed for other animals to tilapia, a major obstacle persists, which is the vast difference in the amount, quality and variety of genotyped individuals available to build a reference panel. In humans, genetic studies often benefit from thousands⁸, if not tens of thousands, of individuals of certain descent, as well as publicly available data like that produced by the 1000 genomes project⁹. In tilapia, only hundreds of individuals are usually available, and it is much more difficult to accurately trace specific families, which limits the variety of the reference samples and negatively impacts the accuracy of phasing and LAI tools.

The selective breeding of tilapia revolves around the creation and maintenance of variants which would ideally display the most economically and environmentally favourable traits of their ancestors. Tilapia, and in particular Nile tilapia (Oreochromis niloticus), are highly common among breeding programmes due to their relatively short reproduction cycle, hardiness, and resistance to disease and parasites¹⁰. It has become increasingly necessary to track the inheritance of selected traits under continuous improvement, as selective breeding may also result in genes to hitchhike along in the process. Implementation of LAI in breeding programmes allows the monitoring of the genetic makeup architecture of each individual, the tracking the genes inheritance from parents to offspring, and this ensures that only loci of interest are selected by the breeding programmes.

The goal of this project was to generate a LAI workflow that harnessed existing tilapia genotyping-by-sequencing studies^11,12,13, such as Double Digest RAD-seq (ddRAD) derived Single-Nucleotide Polymorphism (SNP) markers^14,15. This provided an insight into breeding programmes with a more in-depth look at the genetic makeup of admixed individuals, significantly contributing to the identification of hybrids, and the development of new variants for aquaculture. We resolved the local ancestry of admixed individuals successfully and in detail, and the workflow was applied to the samples sourced from breeding programmes. We implemented a fast and accurate pipeline providing useful insights for breeding programmes of both tilapia and other animals, whether these are aimed at maintaining specific broodstocks or producing new variants.

Results

ddRAD library sequencing

High throughput sequencing of the animal from the four breeding programmes and additional individuals (93 individuals, Supplementary Table S1 online) produced 34,091,027 paired-end reads in total. After the filtering the reads, 82.5% of the total reads were retained (28,113,599 paired-end reads). The new reads as well as the published reads (275 samples; Supplementary Table S2 online) were mapped against the O. niloticus genome assembly (NCBI Assembly accession GCA_001858045.3). A total of 19,041 bi-allelic SNPs was extracted with a minor allele frequency (MAF) of at least 0.01, no deviation from the expected Mendelian segregation (P > 0.01) and common to at least 4 populations and 50% of their individuals (Table 1 and Supplementary Data S3 online).

Table 1 Tilapia species and populations.

Full size table

Population structures

A Multidimensional scaling (MDS) analysis of identity by state (IBS) was utilised to separate the individuals into clusters based on their genetic distance¹⁶. This process grouped individuals of same origin together, while positioning the hybrids between the populations which more heavily contributed to their genome (Fig. 1).

The O. n. filoa (CAN-M) individual originating from Lake Metahara¹⁷ was grouped with O. n. cancellatus (CAN-H/K). As their similarity has prompted a proposition for a re-classification of O. n. cancellatus as O. cancellatus, with two sub-populations, O. c. cancellatus and O. c. filoa¹⁸, these samples were grouped with the remaining O. n. cancellatus populations due to the species being virtually indistinguishable.

Most populations were clearly resolved (Fig. 1, Table 1 for abbreviations), with the exception of the breeding programmes populations (BRE-C/L/M/V and Genetically Improved Farmed Tilapia (GIFT) programmes), which remained tightly clustered together with NIL-E/K/N (O. niloticus populations, their species of expected origin). For most populations, species-specific grouping was representative of the genetic closeness of their samples, e.g. AUR-E and AUR-I (O. aureus), or the three CAN (O. n. cancellatus) sub-populations: while still distinguishable, these populations of different origin were clustered under one (species) group. The same could be said for O. niloticus: the NIL-E, NIL-K and NIL-N sub-populations were grouped under NIL. Multidimensional Scaling Analysis also highlighted the presence of outliers, especially in the form of one MOZ-Z (O. mossambicus, Zambia), one AND (O. andersonii) and one BRE-V individual (Fig. 1, marked by a *). BRE-V disparate positioning was found to be due to the high incidence of missing genotypes in some individuals, rather than due to sample impurity. Finally, the genetic closeness of some species was noted, especially of some of those only represented by a single small population (KAR and MAC, or GAL and MEL), and was expected to cause ambiguities when trying to resolve the ancestry of individual samples.

Ancestry inference

Before undergoing ancestry inference, these genotypic data were phased with BEAGLE¹⁹. Phasing is required to improve ancestry recognition, as separating the paternal and maternal contributions allows to infer the origin of each separately, since they could belong to different species or populations. Once these genotypic data are phased, RFMix separates each chromosome into a series of equally-sized windows, and the likelihood of each window belonging to each of the reference populations is calculated²⁰. For each one, a random forest is trained to recognise the ancestry based on the reference panel: each tree of the random forest infers a putative ancestry, and a sum of all the votes determines the probabilities of that window originating from each possible ancestral population.

The inference accuracy using only the 155 references samples from 10 species was optimised to minimise the fragmentation, while maximising recognition of the reference samples (Fig. 2). The final combination featured 500 BEAGLE iterations, combined with 50 EM iterations and a SNPs window size of 7.

Digital chromosome painting

Using the reference samples as a training set, the breeding programme individuals were analysed for LAI. In contrast to the reference population, all of the individual samples exhibited a relatively high level of fragmentation (5 to 30%). As expected, the main contributor of the genome composition was O. niloticus, with variable contribution of O. aureus and O. mossambicus (Fig. 3). Several individuals showed a different composition (Fig. 2, samples marked with *).

Discussion

Regardless of the species, LAI studies follow similar steps. First, a large number of markers are gathered from populations of known descent to build a reference panel²¹. The genotypes then undergo phasing to reverse crossing-overs, separating the contributions of the two parents²². The reference panel is then used to train the LAI model^21,23. This also includes a “smoothing” algorithm, which improve the results by solving phasing errors, in case the maternal and paternal contributions have been swapped at certain loci, as well as genotyping errors, i.e. mistakes in the genome sequencing process that can be identified by their dissimilarity with the rest of the genome. Finally, the results are displayed graphically, on the chromosome³, or by probability representations²⁴.

Intergenetic tilapia hybrids such as Sarotherodon melanotheron × O. niloticus as not uncommon and have been used in aquaculture to produce highly saline-tolerant hybrid²⁵. Closer species crosses have commonly observed in the wild or feral populations. The ability of the methodology to correctly identify ancestries remains directly dependent on the “purity” and closeness of the reference samples, and on the quality of their genotyping: in fact, some species in this study could not be accurately recognised due to a high incidence of missing genotypes found in their samples (e.g. AND), the small sample number available (e.g. GAL or MEL), or because of their genetic closeness with other populations of small numbers (e.g. KAR and MAC), which prevented the algorithm from distinguishing the species appropriately.

While missing genotypes can be easily detected by examining the dataset, the same cannot be said for a lack of purity of the samples. Once the individuals are sampled and only sequenced SNPs remain from them, it is difficult to verify whether samples truly belong to the species and population they may be claimed to or if they have been mislabelled. Therefore, some of the uncertainty in the re-assignment of known ancestries may derive from unknown levels of admixture in supposedly pure samples. However, the ddRAD markers did support purity of species by tight species clustering.

O. n. cancellatus, also known as Tilapia cancellata, has been assimilated to O. niloticus since the sub-species was identified²⁶, and is considered to be synonymous. The same is true for O. n. filoa. In this study these sample grouped with O. n. cancellatus. The results produced by this project, however, draw a clear distinction between NIL and CAN, as the multidimensional scaling analysis indicates that although genetically close, the groups do not overlap (Fig. 1). If samples of Tilapia cancellata were considered to be part of O. niloticus, and the two were mixed to form a supposedly pure broodstock, O. niloticus would feature not only NIL, but also CAN markers, as well as contributions from all the other species that are considered synonymous to the species. This would explain the presence of CAN contamination in the hybrids, while also justifying the difference in accuracy of recognition of CAN versus NIL samples.

Similarly, further investigation of the relationship between O. niloticus and O. aureus revealed that the significant incidence of AUR contribution in the NIL samples as well as the breeding hybrids was not an isolated occurrence of this project. Different studies^17,27 observed that the two species are likely to share a common ancestral mitochondrial haplotype: the presence of some degree of hybridisation between these two species would explain why, even though AUR is genetically distant from all other populations, the AUR species has such a significant presence in the ancestry assignment of NIL and NIL-derived samples, while also being absent from all other species.

Regarding the quality of the breeding hybrids, the analysis of their ancestry showed that, on average, more than 90% of their genome derived from O. niloticus, with only minor contributions from other species. These results showed that the pipeline is capable of confidently recognising the ancestry of admixed individuals, as the hybrids analysed were indeed descendants of O. niloticus. Fish from the GIFT breeding programme have already been shown as having a small contribution from O. mossambisus genome and O. aureus^28,29,30. The actual extent depends on the methodology used, but it ranges from 1 to 7% for O. mossambisus and 0.5% to 3% for O. aureus. In this study, which used a genome wide approach, we identified the contribution to be on average 3% and 2% for O. mossambisus and O. aureus respectively.

Hybridisation is common and affects original phenotypes in most of the areas. However, it lacks the suitable method to identify the individual derived from hybrid and introgression in a simple and inexpensive way. We report, the new pipeline which could be used in further to evaluate the most the marker-based studies without further expensive experimental sampling or sequencing. This methodology based on ddRAD SNP markers has shown itself capable of identifying the contribution of multiple ancestral populations to the genome of admixed individuals, both from a local and global perspective. If provided with a large body of fully genotyped populations of known origin, the results produced by this pipeline would contribute to a more informed breeding process for the creation and maintenance of tilapia variants.

Methods

Biological materials

Fin samples were collected from a total of 71 individuals from four breeding programmes located in Mexico: Colima (BRE-C), Morelos (BRE-M) and Veracruz (BRE-V) broodstock; descendants from the Institute of Aquaculture fish, over 15 years ago. BRE-L, YY fish were obtained from a stock originating in Costa Rica. An additional 6 O. mossambicus (3 MOS-A and 3 MOS-Z) and 16 O. niloticus (NIL) reference samples were included. Samples were stored in 95% ethanol at − 20 °C until required. Details of the samples and origins are listed in Table 1. Attempts were made to balance the sex ratios (Supplementary Table S1 online) in order to minimise any potential bias due to sex-specific regions of the genome.

Genomic DNA extraction

Purified DNA was extracted by a modified salt precipitation method³⁰. Small pieces of fin tissue were digested in 300 μL SSTNE lysis solution (0.3 M NaCl, 50 mM Tris base, 0.2 mM EDTA pH 8.0, 0.2 mM EGTA, 0.5 mM spermidine, 0.25 mM spermine and 0.1% SDS) containing 1.5 μL Proteinase K (10 mg/mL) at 55 °C overnight. Lysed samples were treated with 5 μL RNaseA (2 mg/mL) at 37 °C for 1 h and the supernatant centrifuged twice at 21,000×g after precipitation with 180 μL 5 M NaCl on ice. The resulting DNA was precipitated in an equal volume of isopropanol, washed twice in 70% ethanol and dissolved in TE buffer (10 mM Tris, 1 mM EDTA pH 8.0) until DNA quantification. The quantity and quality of DNA were assessed by measurement on a Nanodrop spectrophotometer (Labtech International Ltd, UK) and by agarose gel electrophoresis. Standardised dilutions of 8 ng/μL DNA for each sample were prepared in 5 mM Tris buffer pH 8.0 according to fluorimetry values.

Double Digest RAD library preparation and sequencing

Two libraries were constructed (Supplementary Table S1 online) following the ddRAD library preparation protocol with slight modifications¹¹. Briefly, for each library, individual DNA samples (36 ng–4.5 μL) were simultaneously digested with two high fidelity restriction enzymes (New England Biolabs, NEB, UK): SbfI (CCTGCA|GG recognition site), and SphI (GCATG|C recognition site). Digestions were incubated for 90 min at 37 °C, using 0.72 U of each enzyme in 1× CutSmart Buffer (NEB) and in a 9 μL reaction volume. The reactions were then cooled to 22 °C, 4.5 μL of a pre-made barcode/adaptor mix was added to each digested DNA sample and incubated at 22 °C for 10 min. The adaptor mix included individual-specific barcoded combinations of P1 (SbfI-compatible) and P2 (SphI-compatible) adaptors at 6 nM and 72 nM concentrations respectively, in 1× reaction buffer 2 (NEB). The adaptors included an inline five- or seven-base barcode for sample identification. Ligation was performed over 2.5 h at 22 °C by addition of a further 4.5 μL of a ligation mix including 4 mM rATP (Promega, UK), and 2000 cohesive-end units of T4 ligase (NEB) per μg DNA in 1× CutSmart buffer. Samples for each library were combined into a single pool. The pooled libraries were column-purified (MinElute PCR Purification Kit, Qiagen, UK), and eluted in 60 μL EB buffer (Qiagen, UK). Size-selection of fragments, ranging from 320 to 590 bp, was performed by agarose gel separation. Following gel purification (MinElute Gel Extraction Kit, Qiagen, UK), the eluted size-selected template DNA (65 μL in EB buffer) was PCR amplified (11–12 cycles PCR dependent on library; 32 separate 12.5 μL reactions, each with 1.25 μL template DNA) using a high fidelity Taq polymerase (Q5 Hot Start High-Fidelity DNA Polymerase, NEB). The PCR reactions were combined (400 μL total), and column-purified (MinElute PCR Purification Kit). The c. 50 μL eluate, in EB buffer, was then subjected to a further size-selection clean up using an equal volume of AMPure magnetic beads (Perkin-Elmer), to maximise removal of small fragments (less than c. 200 bp). Each final library was eluted in 15 μL EB buffer, QUBIT quantified and diluted to 10 nM stocks and sequenced in house on a separate Illumina MiSeq run (v2 chemistry, 300 cycle kit, 150 base paired-end reads).

Data origins

A total of 10 different species^11,12, along with individuals sourced from breeding programmes, were used to produce ddRAD markers³¹ following the same protocol: restriction enzymes set (SbfI and SphI), size selection (320 bp to 590 bp) and comparable sequencing platforms (150 nucleotide paired-ends), rendering their results compatible. Efforts were made to use populations with known histories, an absence of hybridisation, and from multiple locations (Table 1). The O. niloticus samples consisted of three sub-species (O. niloticus sensu stricto and O. n. filoa and O. n. cancellatus); O. aureus, O. mossambicus and Tilapia zillii (Gervais: reclassification as Coptodon zillii proposed by Dunz and Schliewen³²) comprised samples from two locations each, while O. karongae (Trewavas), O. urolepis hornorum (Norman), O. andersonii, O. macrochir, Sarotherodon galilaeus (Linnaeus) and S. melanotheron consisted of samples from one location each. As far as could be ascertained, each originated from a single wild population (in some cases then maintained and bred in captivity). Additionally, a ddRAD dataset from the popular Genetically Improved Farmed Tilapia (GIFT) breeding programme¹³ and samples from four breeding programmes in Mexico were assessed (Table 1).

Dataset preparation

Reads of low quality (i.e., with an average quality score less than 20), lacking the restriction site or having ambiguous barcodes were discarded during the samples demultiplexing stage. Retained reads were aligned against the genomic assembly of the tilapia species O. niloticus (NCBI Assembly accession GCA_001858045.3) using bwa³³ and assembled using Stack³⁴. Markers produced through ddRAD sequencing were collected from the 275 samples. All loci that were common to at least 4 populations and at least 50% of their individuals, a minor allele frequency over 0.05 and not deviating from the expected Mendelian segregation (P > 0.01) were retained, as the missing data could be inferred by imputation.

Ancestry inference

BEAGLE¹⁹ was used for the phasing of genotypes. BEAGLE performs multiple phasing iterations per SNP. After the phasing was carried out and the model was fit, the data were analysed again to obtain new estimates that allowed a better refit of the model. RFMix²³ was used for LAI. To optimise the inference accuracy using only the 155 references samples from 10 species, the number of phasing iterations, number of expectation-maximisation (EM) iterations, and the chromosomal window size were varied, and their results were compared. The combination of parameters that produced the least amount of fragmentation in theoretically pure individuals was chosen as most suitable.

Multidimensional scaling analysis

R v3.5.2³⁵ was used to carry out Multidimensional Scaling Analysis on the dataset using the package R/SNPRelate v1.16.0³⁶ to calculate the Identity-By-State (IBS) proportion for each sample.

Digital chromosome painting

Inferred local ancestry data, produced by RFMix, were visualised using R for the distribution of the local probabilities, and a dedicated script rendered the final distribution as a painted karyotype for each sample. Full scripts and pipelines are available on GitHub at https://github.com/pseudogene/fish_pedigree.

Ethical approval

Animal handling and collection was conducted under the UK Home Office guidelines and regulations [Samples MOZ-A/Z and NIL] and the Michoacán de Ocampo authority (Mexico) guidelines and regulations [Samples from the Breeding programme; BRE-C/L/M/V]. The ethical approval for the study was obtained from the University of Stirling (UK) Ethical committee. The data analytics and bioinformatics were assessed by the Institute of Aquaculture Ethical Review Committee and passed the University of Stirling Ethical Review Process.

Data availability

The raw sequencing reads generated for this article are available from the EBI/ENA, accession Numbers ERR4170942 and ERR4171069, project PRJEB38387.

Code availability

The versions, settings and parameters of the software used in this work are as follows: (1) process_radtags.pl: Stack version 2.3, parameters: -E phred33 -filter_illumina -s 20 -c -q -t 135 -inline_inline -renz_1 sbfI -renz_2 sphI; (2) bwa: version 0.7.17, default parameters; (3) ref_map.pl: Stack version 2.3, parameters: -unpaired; (4) populations: Stack version 2.3, parameters: -write-single-snp -p 4 -r 50 -R 50 -min-maf 0.05 -hwe -vcf; (5) BEAGLE: version 5.1, parameters: java -Xmx30688m -jar beagle.21Sep19.ec3.jar imp-nsteps=50 iterations=1000 window=0.7 overlap=0.07; (6) RFMix: version 2.03, parameters: -e 50.

References

Huson, H. J. et al. Breed-specific ancestry studies and genome-wide association analysis highlight an association between the myh9 gene and heat tolerance in alaskan sprint racing sled dogs. Mamm. Genome 23, 178–194. https://doi.org/10.1007/s00335-011-9374-y (2012).
Article CAS PubMed Google Scholar
Henn, B. M. et al. Genomic ancestry of North Africans supports back-to-africa migrations. PLoS Genet. 8, e1002397. https://doi.org/10.1371/journal.pgen.1002397 (2012).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649. https://doi.org/10.1016/j.ajhg.2017.03.004 (2017).
Article CAS PubMed PubMed Central Google Scholar
Tang, H., Coram, M., Wang, P., Zhu, X. & Risch, N. Reconstructing genetic ancestry blocks in admixed individuals. Am. J. Hum. Genet. 79, 1–12. https://doi.org/10.1086/504302 (2006).
Article CAS PubMed PubMed Central Google Scholar
Geza, E. et al. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Brief. Bioinform. https://doi.org/10.1093/bib/bby044 (2018).
Article PubMed Central Google Scholar
Corbett-Detig, R. & Nielsen, R. A hidden markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy. PLoS Genet. 13, e1006529. https://doi.org/10.1371/journal.pgen.1006529 (2017).
Article CAS PubMed PubMed Central Google Scholar
Choi, B. H. et al. Genome-wide analysis of the diversity and ancestry of korean dogs. PLoS ONE 12, e0188676. https://doi.org/10.1371/journal.pone.0188676 (2017).
Article CAS PubMed PubMed Central Google Scholar
Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101. https://doi.org/10.1038/nature07331 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Clarke, L. et al. The 1000 genomes project: data management and community access. Nat. Methods 9, 459–462. https://doi.org/10.1038/nmeth.1974 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gupta, M. V. & Acosta, B. O. From drawing board to dining table: the success story of the gift project. NAGA, WorldFish Center Q. 27, 4–14 (2004).
Google Scholar
Syaifudin, M. et al. Species-specific marker discovery in Tilapia. Sci. Rep. 9, 13001. https://doi.org/10.1038/s41598-019-48339-2 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Syaifudin, M., McAndrew, B. J. & Penman, D. J. Species-Specific DNA Markers for Improving the Genetic Management of Tilapia. Ph. D. thesis, University of Stirling (2015).
Taslima, K. et al. Sex determination in the gift strain of Tilapia is controlled by a locus in linkage group 23. BMC Genet. 21, 49. https://doi.org/10.1186/s12863-020-00853-3 (2020).
Article CAS PubMed PubMed Central Google Scholar
Miller, M. R., Dunham, J. P., Amores, A., Cresko, W. A. & Johnson, E. A. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated dna (rad) marker. Genome Res. 17, 240–248. https://doi.org/10.1101/gr.5681207 (2007).
Article CAS PubMed PubMed Central Google Scholar
Baird, N. A. et al. Rapid snp discovery and genetic mapping using sequenced rad markers. PLoS ONE 3, e3376. https://doi.org/10.1371/journal.pone.0003376 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374, 20150202. https://doi.org/10.1098/rsta.2015.0202 (2016).
Article ADS MathSciNet MATH Google Scholar
Bezault, E. et al. Spatial and temporal variation in population genetic structure of wild Nile tilapia (Oreochromis niloticus) across Africa. BMC Genet. 12, 102. https://doi.org/10.1186/1471-2156-12-102 (2011).
Article PubMed PubMed Central Google Scholar
Seyoum, S. & Kornfield, I. Taxonomic notes on the Oreochromis niloticus subspecies-complex (pisces: Cichlidae), with a description of a new subspecies. Can. J. Zool. 70, 2161–2165. https://doi.org/10.1139/z92-291 (1992).
Article Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for wole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097. https://doi.org/10.1086/521987 (2007).
Article CAS PubMed PubMed Central Google Scholar
Padhukasahasram, B. Inferring ancestry from population genomic data and its applications. Front. Genet. https://doi.org/10.3389/fgene.2014.00204 (2014).
Article PubMed PubMed Central Google Scholar
Moreno-Estrada, A. et al. Reconstructing the population genetic history of the caribbean. PLoS Genet. 9, e1003925. https://doi.org/10.1371/journal.pgen.1003925 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223. https://doi.org/10.1038/nrg2950 (2011).
Article CAS PubMed PubMed Central Google Scholar
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. Rfmix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288. https://doi.org/10.1016/j.ajhg.2013.06.020 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gravel, S. Population genetics models of local ancestry. Genetics 191, 607–619. https://doi.org/10.1534/genetics.112.139808 (2012).
Article PubMed PubMed Central Google Scholar
Lemarié, G., Baroiller, J. F., Clota, F., Lazard, J. & Dosdat, A. A simple test to estimate the salinity resistance of fish with specific application to O. niloticus and S. melanotheron. Aquaculture 240, 575–587. https://doi.org/10.1016/j.aquaculture.2004.07.014 (2004).
Article CAS Google Scholar
Nichols, J. T. A new wrasse and two new cichlids from Northeast Africa. Am. Museum Novitat 65, 1–4 (1923).
Agnèse, J. F., Adépo-Gourène, B., Abban, E. K. & Fermon, Y. Genetic differentiation among natural populations of the nile tilapia Oreochromis niloticus (teleostei, cichlidae). Heredity 79, 88–96. https://doi.org/10.1038/hdy.1997.126 (1997).
Article PubMed Google Scholar
Van Bers, N. E. M., Crooijmans, R. P. M. A., Groenen, M. A. M., Dibbits, B. W. & Komen, J. SNP marker detection and genotyping in tilapia. Mol. Ecol. Resour. 12, 932–941. https://doi.org/10.1111/j.1755-0998.2012.03144.x (2012).
Article PubMed Google Scholar
Hong Xia, J. et al. Signatures of selection in tilapia revealed by whole genome resequencing. Sci. Rep. 5, 14168. https://doi.org/10.1038/srep14168 (2015).
Article ADS CAS PubMed Central Google Scholar
Bartie, K. L. et al. Species composition in the molobicus hybrid Tilapia strain. Aquaculture https://doi.org/10.1016/j.aquaculture.2020.735433 (2020).
Article Google Scholar
Peterson, B. K., Weber, J. N., Kay, E. H., Fisher, H. S. & Hoekstra, H. E. Double digest radseq: an inexpensive method for de novo snp discovery and genotyping in model and non-model species. PLoS ONE 7, e37135. https://doi.org/10.1371/journal.pone.0037135 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Dunz, A. R. & Schliewen, U. K. Molecular phylogeny and revised classification of the haplotilapiine cichlid fishes formerly referred to as Tilapia. Mol. Phylogenet. Evol. 68, 64–80. https://doi.org/10.1016/j.ympev.2013.03.015 (2013).
Article PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25, 1754–1760. https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, W. & Postlethwait, J. H. Stacks: building and genotyping loci de novo from short-read sequences. G3 Genes Genomes Genet. 1, 171–182. https://doi.org/10.1534/g3.111.000240 (2011).
Article CAS Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019).
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328. https://doi.org/10.1093/bioinformatics/bts606 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

All Authors wish to thank Mochamad Syaifudin, Khalfan M. Al Rashdi, Andrew M. Deines, Gideon Hulata, Helena D’Cotta, Jean-Francois Baroiller and Brendan J. McAndrew for the original samplings. As well as David J. Penman for his kind recommendations. This research was funded by the AQUAculture infrastructures for EXCELlence in European fish research towards 2020 (Grant reference ERC H2020 652831). The Commonwealth Scholarship Commission (UK) supported the PhD studies of K.T. (CSC reference BDCA-2013-5). The Institutional Link Newton Initiative (British Council) supported A.C.M. (Grant reference 172726393). Bioinformatic analysis for the study was also partly supported by the MASTS pooling initiative (The Marine Alliance for Science and Technology for Scotland) funded by the Scottish Funding Council (Grant reference HR09011) and contributing institutions. Publication fees were covered by the University of Stirling.

Author information

Authors and Affiliations

Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, FK9 4LA, Scotland, UK
Alex Avallone, Kerry L. Bartie, Sarah-Louise C. Selly, Khanam Taslima & Michaël Bekaert
Department of Fisheries Biology and Genetics, Bangladesh Agricultural University, Mymensingh, 2202, Bangladesh
Khanam Taslima
Faculty of Biology, Universidad Michoacana de San Nicolás de Hidalgo, 58040, Morelia, Michoacán, Mexico
Antonio Campos Mendoza

Authors

Alex Avallone
View author publications
You can also search for this author in PubMed Google Scholar
Kerry L. Bartie
View author publications
You can also search for this author in PubMed Google Scholar
Sarah-Louise C. Selly
View author publications
You can also search for this author in PubMed Google Scholar
Khanam Taslima
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Campos Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Michaël Bekaert
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.A. performed the analytic experiments, interpreted these data, wrote the initial draft of the manuscript; M.B. conceived the study, interpreted the data and wrote the final manuscript. K.T. provided the extra O. mossambicus and O. niloticus samples (NIL) and completed the ddRAD libraries. A.C.M. provided the samples from Mexico (BRE-C/M/V/L). K.L.B. completed the ddRAD-seq on the breeding programmes, interpreted the data and wrote the final manuscript. S.L.C.S. processed the breeding programmes samples. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Michaël Bekaert.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Avallone, A., Bartie, K.L., Selly, SL.C. et al. Local ancestry inference provides insight into Tilapia breeding programmes. Sci Rep 10, 18613 (2020). https://doi.org/10.1038/s41598-020-75744-9

Download citation

Received: 16 June 2020
Accepted: 12 October 2020
Published: 29 October 2020
DOI: https://doi.org/10.1038/s41598-020-75744-9

This article is cited by

Genetic resources of Nile tilapia (Oreochromis niloticus Linnaeus, 1758) in its native range and aquaculture
- Temesgen Tola Geletu
- Jinliang Zhao
Hydrobiologia (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Whole genome re-sequencing reveals recent signatures of selection in three strains of farmed Nile tilapia (Oreochromis niloticus)

Linkage mapping, comparative genome analysis, and QTL detection for growth in a non-model teleost, the meagre Argyrosomus regius, using ddRAD sequencing

Detection of selection signatures in farmed coho salmon (Oncorhynchus kisutch) using dense genome-wide information

Introduction

Results

ddRAD library sequencing

Population structures

Ancestry inference

Digital chromosome painting

Discussion

Methods

Biological materials

Genomic DNA extraction

Double Digest RAD library preparation and sequencing

Data origins

Dataset preparation

Ancestry inference

Multidimensional scaling analysis

Digital chromosome painting

Ethical approval

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Genetic resources of Nile tilapia (Oreochromis niloticus Linnaeus, 1758) in its native range and aquaculture

Comments

Search

Quick links