Introduction

Sexual interactions between species, also termed reproductive interference, were first documented by behavioral biologists in 1929 and has been observed in a wide range of animal taxa such as arthropods, insects, and vertebrates (Groning and Hochkirch 2008). Although cichlid fish species differ in traits relevant for mating preferences, courtship behavior, timing, and location, they can be readily hybridized in the laboratory to produce viable offspring (Stelkens et al. 2010). This makes cichlid hybrids an ideal model system for understanding the genetics of adaptation and speciation for these traits as well as reproductive interference (Henning and Meyer 2014). Nevertheless, there are no published reports of causative genes that affect behavioral sexual incompatibility between cichlids.

In tilapia, courtship behavior is known to involve chemical, visual, and auditory sensations (Keller-Costa et al. 2014; Longrie et al. 2013; Simoes et al. 2015). Of the multiple genomic loci that mediate these sensory pathways, genes that mediate olfactory stimuli conveyed by the urine may have a major impact (Keller-Costa et al. 2014; Simoes et al. 2015). Based on blood concentration profiles, expression pattern, level of glycosylation, and protein structures, two families of major urinary protein (MUP) orthologs in tilapia, male-specific protein (MSP) and tributyltin-binding protein (TBTBP) gene families, were proposed to have a role as reproductive pheromones (Shirak et al. 2008; Shirak et al. 2012). Pheromones are mainly detected by their binding to neuronal G protein-coupled receptors, and little is known about the function of these receptors in cichlids (Keller-Costa et al. 2015). In cichlids, chemosensory receptors may belong to three gene families: trace amine-associated receptors (TAARs) and vomeronasal type 1 and 2 receptors (V1R and V2R). As fish lack a vomeronasal organ, this nomenclature follows their orthology to similar gene families in tetrapods (Keller-Costa et al. 2015). Metabolites such as F prostaglandins, C18, C19, and C21 sex steroid derivatives, and amino and bile acids have pheromonal function in phylogenetically distant fish species (Sorensen and Baker 2015). Yet, individuals of closely related species can distinguish between conspecific and allospecific partners. It was hypothesized that, while the set of pheromones is common among fish, their mixtures, behavioral context, timing, and location convey the species specificity (Stacey 2015). In goldfish and common carp, initial communal signals promoting female ovulation are not species specific. However, advanced stages of courtship result in interspecific barriers mostly by visual and auditory sensations (Kobayashi et al. 2002). Pheromone transporters such as lipocalins, which have a species-specific structure and expression, may also form interspecific barriers (Shirak et al. 2012).

Four tilapia species: Nile tilapia (Oreochromis niloticus, On), blue tilapia (Oreochromis aureus, Oa), Mozambique tilapia (Oreochromis mossambicus), and Wami tilapia (Oreochromis urolepis hornorum), and their hybrids are agriculturally important. Tilapia males grow faster than females and better perform when cultured in the absence of females, thus eliminating territorial and reproductive behavior. Most of the intensive production of tilapia relies on inducing mono-sex cultures of males through interspecific crossbreeding of On females and Oa males followed by post-larval masculinization using steroid hormones supplemented to the feed. This steroid treatment is essential to reduce residual production of females, which appears to result from contamination of both purebred stocks by hybrids (Beardmore et al. 2001; Rothbard and Pruginin 1975). The use of hormones has been criticized by consumer advocates and has even been banned in Europe. Moreover, steroid treatment still results in the appearance of 2ā€“8% females (Desprez et al. 1995; Wohlfarth and Wedekind 1991).

In Israel, starting from 1960, crosses between On females and Oa males are practiced as the best strategy for achieving all-male population with uniform and high growth rate, cold tolerance, and most important, preventing uncontrolled reproduction (Wohlfarth et al. 1994). Since the end of the 1970s, On stock in Bar-Ilan University originating from Ghana was the reference for a purebred stock (Galman et al. 1987; Timan et al. 2002). Artificial fertilization of females from this stock always produced all-male progeny with wild-type Oa males from the Dead Sea region (Trombka and Avtalion 1993). Although these two stocks were important models for immunologic and genetic studies, they were unsuitable for aquaculture, due to the reproductive barrier between them. Tilapia hybridization in Eastern Africa between local species was reported since the 1960s (Elder and Garrod 1961; Pruginin et al. 1975). This is probably due to contamination of On natural populations that were used to establish cultivated and laboratory broodstocks (Timan et al. 2002). Several studies have been focused on finding the optimal conditions for their crossbreeding in tanks. However, these studies resulted in fry numbers that were insufficient for commercial aquaculture (Mires 1980, 1995). Thus the problem of inefficient mass production of all-male tilapia by crossing purebred species remained unsolved.

In commercial ponds, males of Oa built normal nests along the pond perimeter but failed to attract purebred On females that remain in the center of the pond (Lovshin 1980; Mires 1980). In aquaria, the male cleans a place for spawning and tries to attract the On females, but the females try to avoid the males, frequently resulting in injuries and even in death, due to the male aggression (Mires 1995). In aquaria, Oa males may attract F2 hybrid females. However, when the courtship dance is too prolonged the male removes the non-responding females from the nest despite their ripeness. In natural conditions, a number of studies reported that, by experimental testing and through a specific scheme of crosses between Oa males and On females, it is possible to detect ā€œrespondersā€, which are hybrid females that interact with purebred Oa males bearing all or near all-male progeny (Lahav and Lahav 1994). However, over several generations, the parental stocks either lose their capability of interspecies communication or their ability to produce all male spawn. Hence, understanding the genetic regulation of responsiveness can be the key to sustainability.

Three sex determination (SD) loci on linkage groups (LGs) 1, 3, and 23 have been detected in different Oreochromis species and their hybrids (Eshel et al. 2012; Shirak et al. 2006). In purebred tilapia species, SD is apparently mono-factorial (Hammerman and Avtalion 1979; Wohlfarth and Wedekind 1991). In O. mossambicus, an XX/XY SD system was detected on LG1 (Liu et al. 2013). In Oa and O. urolepis hornorum, the SD system is WZ/ZZ encoded by a locus on LG3 (Ezaz et al. 2004; Zhu et al. 2016). In On of the Ghana strain, the SD factor resides on LG23, resulting in an XX/XY system (Eshel et al. 2012; Shirak et al. 2006). In On broodstocks originating from Eastern Africa, several SD loci were observed, indicating multi-factorial SD (Cnaani et al. 2008; Lee et al. 2003; Palaiokostas et al. 2013) probably due to original stock contamination. To date, only the master key regulator, AMH, on LG23 has been identified (Eshel et al. 2014; Li et al. 2015). AMH is present in two and single copies on the Y and X chromosomes, respectively. On the Y chromosome, one copy is similar to that of the X chromosome and the other has a 5ā€‰bp insertion in exon 6 and 233ā€‰bp deletion in exon 7. A supernumerary copy of AMH is present in most of the On stocks that originated from Eastern and Western Africa (Eshel et al. 2014; Li et al. 2015).

Our preliminary study demonstrated that selecting hybrid females with On alleles, for the SD loci, restored the ability to produce all-male progeny when mated to Oa males (https://www.aquacultureinisrael.com/he/component/k2/item/download/24_1c3cd4678c35e56910b6e448896d3c83). However, this resulted in a decrease of progeny yield of up to 90%. The objective of this study was to detect quantitative trait locus (QTL) for interspecific female responsiveness. Via marker-assisted selection, it should then be possible to increase progeny yield.

Materials and methods

Fish

On broodstock (OnDOR2006) of Dor Research Station was produced in 2006 by artificial egg fertilization of six On females (a strain originating from Ghana, Bar-Ilan University) by the milt of a single On male (a strain originating from Egypt). Oa Dor stock (OaDOR2006) originated from the Eynot Tzukim nature reserve - Ein Feshkha, Dead Sea region, Israel (Zak et al. 2014). All experimental protocols were approved by the Institutional Animal Care and Use Committee at Animal Experimentation Ethics Committee of the Agricultural Research Organization, Volcani Center, Rishon LeTsiyon, Israel.

Stock control by microsatellite markers

From the original 2006 broodstocks, current Dor broodstocks were established in years 2008 (G1, OaDOR2008; OnDOR2008) and 2011 (G5, OaDOR2011; OnDOR2011) (Fig. 1). Both stocks were genotyped for a panel of nine microsatellite markers (BYL018, UNH168, TA13, UNH310, UNH362, GM472, UNH890, UNH954, and MIC176; Table 1). Three markers BYL018, UNH168, and MIC176 were positioned in the three SD loci on LG 1, 3, and 23, respectively (Shirak et al. 2006). Genotyping of markers for SD was performed in each generation (G2ā€“G5) through the period 2008ā€“2011 and individuals that carried alien alleles were removed. As all On microsatellite alleles were different in length from the Oa alleles, alleles observed in generations G2ā€“G5 were considered alien for a broodstock if they were not observed in the G1 fish of this broodstock. Markers with the prefixes UNH and GM have been previously published and used for the construction of the tilapia second-generation linkage map (Lee et al. 2005); BYL018 marker on LG1 was developed by Dr. Bo-Yung Lee in Professor T.D. Kocher's laboratory. MIC176 on LG23 and TA13 on LG3 were developed in our laboratory.

Fig. 1
figure 1

Description of seven generations of natural matings and artificial insemination in experimental ponds and tanks and detection of female responders. Ponds are represented by ovals, and tanks by rectangles. The Oa species is denoted in blue and the On species in red. Purebred generations are denoted G1ā€“G7. Lines between ovals (or rectangles) indicate parenthood and arrows denote physical relocation and re-mating of the same fish. Artificial inseminations or natural matings are indicated by black or green crosses, respectively. In generation G5, one pond and five tanks were used for mating progeny of G4 to produce the G6 generation and G6 hybrids. The numbers of known female responders in each tank are denoted by bold and italicized numerals. The unsuccessful tank mating is denoted by a rectangle covered by an ā€œXā€. A total of 46 tanks were used to mate 175 G6 female progeny to G6 males of both species: 23 tanks for Onā€‰Ć—ā€‰Oa matings, and 23 for Onā€‰Ć—ā€‰On matings, with 5 G6 females in each tank. The same 175 G6 Oa females were mated to both Oa and On males. The rate of detection of female responders is indicated in percentages (large red font) to the right of the year and generation annotation

Table 1 Primer sequences used for microsatellite genotyping and variant sequencing

Genotyping of microsatellite markers

Fin samples or whole larvae were immersed in absolute alcohol and stored until extraction. DNA was extracted using the MasterPureā„¢ DNA Purification Kit (EpicentreĀ®; Biotechnologies). PCR was applied using Super-Therm Taq DNA polymerase (JMR Holding Inc., London, UK), dye-labeled (FAM, PET, VIC, or NED) forward primers (Table 1) and 20ā€‰ng of genomic DNA. The amplified products were separated on an ABI3130 DNA sequencer and automatically sized using the GeneMapper software v. 4.0 (Applied Biosystems) using GeneScan-500 LIZ size standard (Applied Biosystems) (Dor et al. 2016).

Reproduction in experimental ponds and tanks

From 2006 to 2011, five generations of purebred mating were produced by artificial insemination and rearing in ponds (Fig. 1). To reduce contamination from uncontrolled crosses in the facilities, samples of the OaDOR2008 (nā€‰=ā€‰50), OnDOR2008 (nā€‰=ā€‰50), OnDOR2011 (Nā€‰=ā€‰100 females), and OaDOR2011 (nā€‰=ā€‰40 males) were genotyped; the unique alleles in each stock were identified and the numbers of their occurrences were determined (Table S1). Broodstocks (G1ā€“G4) were routinely selected against alien alleles in the SD loci (BYL018, UNH168, and MIC176) by genotyping a sample of 80ā€“90 fish for each gender. This resulted in culling 3ā€“16% of the sampled fish that carried at least one alien allele.

To test for the rate of interspecific female responsiveness in the fifth generation (Fig. 1, G5), in 2011, 100 OnDOR2011 females and 40 OaDOR2011 males were stocked at the age of 1.5 years in an experimental earth pond of 16ā€‰Ć—ā€‰25ā€‰m2 and 1.2ā€‰m depth. The pondā€™s water temperature varied between 26 and 29ā€‰Ā°C. Prior to transfer, all individuals were tagged by individual PIT-tags and their fins were clipped for DNA extraction. Tilapia larvae and fingerings were collected from the drained pond 40 days after stocking and were reared in tanks (28ā€“29ā€‰Ā°C) until they reached 2ā€‰g weight (about 2 months). Gender was determined by microscopic examination of the gonads using the squash technique (Guerrero and Shelton 1974). The first groups of larvae were observed near the pool shore 12 days after stocking. After a period of 40 days, we reduced to half the pond water level and collected ~6000 offspring. We observed two age groups: fingerlings born just after parent stocking, and larvae born at least 20 days later. A comparison of the genotypes between fingerlings (nā€‰=ā€‰32) and their putative parents revealed that they belonged to three pairs of parents (D51ā€‰Ć—ā€‰S4, D79ā€‰Ć—ā€‰S4, and D76ā€‰Ć—ā€‰S29), whereas larvae (nā€‰=ā€‰32) belonged to the same dams and a single sire (D51ā€‰Ć—ā€‰S4, D79ā€‰Ć—ā€‰S4, and D76ā€‰Ć—ā€‰S4). At the age of 4 months, sexing of a sample of hybrids from both groups (nā€‰=ā€‰2400) indicated that all were males. To determine the involvement of reproductive hierarchy in breeding performance of both genders, three experiments were performed following removal of: (1) reproducing males (38 males were retained); (2) reproducing females (97 females were retained); and (3) all reproducing individuals from the pond. After removing only the two reproducing males (S4 and S29) from the pool for 40 days, ~4,000 offspring belonging to two aging groups were collected and genotyped (nā€‰=ā€‰32, each). These fingerlings belonged to two families (D79ā€‰Ć—ā€‰S20 and D76ā€‰Ć—ā€‰S20); larvae were offspring of three pairs of parents (D51ā€‰Ć—ā€‰S20, D79ā€‰Ć—ā€‰S20, and D76ā€‰Ć—ā€‰S20). At the age of 4 months, sexing of a sample of hybrids from both groups (nā€‰=ā€‰1700) indicated that all were males. When the three dams that responded to Oa males (D51, D76, and D79) were removed and the reproducing sires (S4 and S29) were re-introduced, or removed, no offspring were detected through a period of the next 40 days.

Intraspecific or interspecific reproductive activities were examined in tanks of 1ā€‰m3 at temperatures of 28ā€“30ā€‰Ā°C with five water exchanges a day. These tanks were used for three different purposes: first, an On female responder and four On non-responding females were introduced into each tank with an Oa male to re-examine the findings in the experimental pond; second, this male was replaced with an On male to confirm that the non-responding females can successfully reproduce with conspecific males; third, 5ā€“6 daughters from the latter experiment (Onā€‰Ć—ā€‰On) were introduced into each tank with an Oa male to prepare segregating families for genome-wide association study (GWAS). Every 7ā€“10 days females were examined for the presence of eggs or larvae in their mouth cavity. Collected eggs were incubated in Zuger bottles (28ā€“29ā€‰Ā°C) to confirm their fertilization. To test sex ratio, fry of interspecific spawns were transferred into rearing tanks (28ā€“29ā€‰Ā°C) and gender was determined at the age of 4 months by microscopic examination of the squashed gonads after fry dissection.

Selecting a broodstock for female responders

To select for the sixth generation (Fig. 1, G6), progeny of the D51 and D76 dams from the intraspecific crosses were reared and sexed. A total of 175 daughters were distributed, 5 individuals to a tank, with an On male for a period of 3 weeks (Fig. 1, G6). Sixty (34%) of them reproduced. These females were removed and the rest were again tested with On males. After repeating this test for the third time, a total of 116 (66%) females successfully reproduced. These were assigned randomly to groups of 5ā€“6 females and re-examined in 23 tanks each with an Oa male. Six females having several non-fertilized eggs in their mouth cavity and an injured female were removed from the experiment.

Genotyping-by-sequencing (GBS)

DNA samples (of 500ā€‰ng each) of 22 and 47 responding and non-responding females, respectively, from two full-sib families and their parents were submitted to the sequencing facility of University of Wisconsin Biotechnology Center. All samples were digested by ApeKI restriction enzyme. A set of barcoded adapters with an ApeKI overhang and a common Y-tail adaptor were designed according to Poland et al. (2012). After the ApeKI digestion and the adaptors ligation, all samples were pooled and were PCR-amplified in a single tube, producing a library of the 73 samples, which was sequenced on a single lane of Illumina HiSeq2500 flow cell.

Analysis of single-nucleotide polymorphism (SNP) data

Using the barcode adapters, sequences of each individual were grouped into separate files. Sequence reads were mapped to the repeat-masked version of the On genome (GenBank accession No. ASM185804v2) using the Fast Alignment Search Tool suite (mrsFAST-ultraā€“3.2.00 (Hach et al. 2014)). The obtained BAM-formatted files were analyzed using the GenomeAnalysisTK module of Genome Analysis Toolkit (GATK), (McKenna et al. 2010) for variant discovery and genotyping, following the recommended pipeline of the Broad institute (https://software.broadinstitute.org/gatk/best-practices/), which includes realignment around indels, base recalibration, and variant discovery using the haplotype caller function.

Filtering and validation of SNP data

Initially, only SNPs that were informative in both families were considered. Hence, we further analyzed only two types of crosses (ABā€‰Ć—ā€‰AA; ABā€‰Ć—ā€‰AB). SNPs were deleted if Mendelian inheritance was not observed. This occurred in two cases: (i) obtaining BB progeny from a cross of ABā€‰Ć—ā€‰AA parents; (ii) not obtaining at least two homozygous progeny of each type (AA and BB) from an ABā€‰Ć—ā€‰AB parental cross. Of the 69 G6 progeny analyzed, 28 were deleted owing to low SNP call rate, leaving 13 responders and 28 non-responders. As an initial cutoff for the minimal read coverage, we set the number of 154 reads, which is required for representation of at least 5 reads for 75% of the 41 progeny. The 29,132 candidate SNPs that passed these criteria were further examined for balanced allele frequencies with minimal allele frequency (MAFā€‰=ā€‰0.15) in both families. Of the 23,255 SNPs with this MAF, 18,648 SNP candidates that showed no >75% heterozygotes were further analyzed, under the assumption that SNP with >75% heterozygotes probably represent sites with multiple genome copies. For assigning individual heterozygous genotypes, we required that the frequencies of reads of the two SNP alleles would have a ratio ranging between 1/5 and 4/5. SNPs with valid genotypes for <75% of the individuals were deleted, leaving 4983 valid SNPs. The obtained sequence reads were mapped onto the On genome and genotypes were called taking into account realignment around indels, base recalibration, and haplotype variation.

Deep sequencing of Oa female and male genomes

The On reference genome was used to find variations between the tilapia species that may underlay the differences in reproductive behavior. To allow this comparison, Oa single female and single male DNA were deep-sequenced using the Illumina (San Diego, CA, USA) HiSeq2000 platform according to the manufacturerā€™s pair-end protocol (ENA accession no. PRJEB23203). Average fragment length was 580ā€‰bp, and 100-bp sequence reads were obtained from both ends. Each DNA sample was applied to one lane yielding ~32- and āˆ¼26-fold coverage for the male and female Oa genomes, respectively.

Analysis of candidate genes

Candidate textual search in NCBI Gene database was performed using the following text query: (Oreochromis niloticus[Organism]) AND LG9[Chromosome] (sex OR pheromone OR olfactory OR taste OR prostaglandin OR lipocalin). Expression of candidate genes was evaluated using NCBIā€™s Sequence Read Archive (SRA), Nucleotide BLAST tool, and RNA-seq SRA data of On (GenBank accession No. PRJNA78915). For identifying variation between Oa and On within the coding sequence of candidate genes, the On reference gene sequences were used as templates for mapping DNA-seq reads obtained for Oa using the GAP5 software (Bonfield and Whitwham 2010). BWA options for this mapping were set to bam bwasw -t 8 -T 60 (Li and Durbin 2010). Further analysis of variation was performed with Sanger sequencing: DNA was amplified using PCR primers (Table 1) and the Bio-X-ACTā„¢ Long Kit (Bioline Ltd., London, UK) according to the manufacturerā€™s instructions under the following conditions: 30 cycles for 40ā€‰s at 92ā€‰Ā°C, 60ā€‰s at 63ā€‰Ā°C, and 60ā€‰s at 68ā€‰Ā°C. The PCR products were separated on agarose gels, excised, and purified with the AccuPrepĀ® Gel Purification Kit (BioNeer Corp., Seoul, Korea). Chromatograms were obtained by ABI3730 sequencing using a BigDyeĀ® Terminator v1.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA). Detection and characterization of indels was performed using ShiftDetector and the ABI tracefiles (Seroussi et al. 2002). Copy number proportions were estimated based on the ABI chromatograms peak-height ratios (Seroussi et al. 2013; Shirak et al. 2017).

Statistics

The 4983 SNP that passed the edits were tested for their effects on mating response of females from the two families that were selected as broodstock for female responsiveness. The effect of each SNP was determined by Studentā€™s t test comparing the 13 responders to the 28 non-responders and a chi-squared test for random association of genotypes with respect to response status. For the t test for each SNP, homozygotes for the more frequent allele were scored as 0, heterozygotes as 1, and homozygotes for the less frequent allele as 2. Owing to the large number of comparisons, experiment-wise significance was determined first by application of the Bonferroni correction for multiple tests and also by controlling the false discovery rate (FDR) at 15% (Benjamini and Hochberg 1995). That is, the highest probability was determined with an expectation that no >15% of the selected SNPs would obtain this level of nominal significance by chance. SNPs that met this criterion were also analyzed by Fisherā€™s exact test, which unlike the chi-squared test can be applied even if the expected cell frequencies are <5.

The effects of 9 microsatellites on response status of 69 G6 progeny were tested by the General Linear Model (GLM) procedure of SAS (Statistical Analysis System 1999). In the first stage, the effect of genotypes of each of the microsatellite markers on breeding response was tested with responders coded as ā€œoneā€ and non-responders coded as ā€œzero.ā€ Since progeny were from two families, and the parental genotypes were different in each family, the effect of progeny genotype was nested within family, and the model also included a family effect. All effects were considered categorical effects. In the second stage, all two-way combinations of markers with significant effects were tested, and coefficients of determination were computed. The two-way combinations were also analyzed by the GLIMIX procedure of SAS that assumes a categorical dependent variable. However, this procedure does not estimate the coefficient of determination.

Results

Association between infiltration of foreign alleles and interspecific reproductive interaction

To investigate the relationship between cross-contamination and the ability to produce all-male progeny in Onā€‰Ć—ā€‰Oa crosses, we analyzed our broodstocks for the presence of foreign genetic alleles for the three SD loci of tilapia. These broodstocks were maintained by artificial insemination generating a new generation each year in most years (G1ā€“G6, Fig. 1). To ensure purebred quality, the parents of the Oa stock (OaDOR2006) were isolated from a nature reserve in the wild. The initial On broodstock (G1, OnDOR2006) originated of laboratory broodstocks and its females produced all-male progeny when artificially inseminated with OaDOR2006 sperm (data not shown). In the 2008 purebred stocks, each of the Oa and On stocks had specific range of allele lengths. Thus alleles of these markers can be used to estimate the level of cross-contaminations in current broodstocks. In the period between 2009 and 2010, broodstocks were selected against alien alleles in the SD loci; and thus in the 2011 stocks, no contaminating alleles were observed for the three microsatellites residing in the SD regions. However, in 5 out of 6 other markers placed on autosomal regions that are scattered in the genome, we detected Oa alleles in the OnDOR2011 stock with low frequency of 4ā€“5.5% (Table S1, red font on bluish background). Hence, the number of alleles in OaDOR2008 and OaDOR2011 remained stable, whereas OnDOR2011 had more alleles than the original OnDOR2008 stock, suggesting infiltration of autosomal alleles that may mediate interspecific reproductive interaction with Oa.

In 2012, OnDOR2011 females (nā€‰=ā€‰100) and OaDOR2011 males (nā€‰=ā€‰40) were crossed in natural conditions in an experimental pool (G5, Ɨ, green font, Fig. 1). Genetic analysis of the ~6000 offspring revealed that they belonged to three dams, which were denoted responders. These females were placed in tanks with other four non-responders and tested for interspecific and intraspecific interactions with males. The tests confirmed that only these dams were capable of interspecific reproduction, whereas all other females were healthy and capable of intraspecific reproduction, thus indicating detection frequency of responders of 3% (Fig. 1). This rate is close to the rate of autosomal Oa alleles that infiltrated the OnDOR2011. Therefore, we devised an experiment that would examine the hypothesis that the responsiveness trait is genetically inherited and related to the infiltrating Oa alleles. Progeny of two of the dams from the intraspecific crosses (G6, Fig. 1) were reared and further tested for interspecific and intraspecific interactions with males. The detection frequency of responders in G6 was estimated to be ~20% (Fig. 1). The responding status of each remaining females was corroborated by re-examining the responders and non-responders in separated groups. In all cases, the original status indicated in the first test was confirmed. Thus 22 responders (20.2%) were detected, and 87 were determined to be non-responders. The frequency of responders in each family was similar: 12 of 57 (21.1%) and 10 of 52 (19.2%). Thus, by application of a single generation of genetic selection, the rate of responders that produce nearly all-male offspring increased from 3.0% (3/100) to 20.2% (22/109). The progeny (nā€‰=ā€‰1080) of the 22 responders were reared for 4 months and sexed. Unlike in the ponds for which all progeny were males, in the tanks 0.74% of the progeny were females. The elevation in the rate of detection of female responders supported the hypothesis that the responsiveness trait is genetically inherited and suggested that the G6 population may be used to genetically map the genomic regions that underlie female responsiveness.

GWAS for female responsiveness

Following the indication that the percentage of female responders increases (~7Ɨ) in a broodstock raised from female responders, we concluded that this trait is likely to have significant heritability and that a genome-wide scan of genetic variants in responders and non-responders females may point to the major genomic regions involved. This study can be facilitated with the use of SNPs genotyped by sequencing. GWAS t test results based on the 41 females with valid genotypes and determination of responder status are given in Fig. 2. The number of individuals with valid genotypes for each SNP varied from 31 to 41 with a mean of 36.5. The mean frequency of heterozygotes was only 0.35, as compared to the expected value of 0.5. This bias may be partially due to the requirement that the frequencies of reads of the two SNP alleles would have a ratio ranging between 1/5 and 4/5 for calling heterozygotes. The most significant association with femalesā€™ reproductive interaction was at position 22.9ā€‰Mb on LG9. The association with this SNP was the only one that met experiment-wise significance after Bonferroni correction (experiment-wise pā€‰=ā€‰0.001). Six additional SNPs on LGs 1, 5, 14, and an unknown location of scaffold 320 met an FDR of 6% (Benjamini and Hochberg 1995), (Table 2). It is expected that at least five of these markers represent real effects. Chi-squared and Fisherā€™s exact test probabilities for the SNPs with the lowest probabilities were higher (data not shown). It should be also noted that the observed distribution of SNPs with the greatest effects on LGs 9 and 14 was not concentrated in a single peak, as these were positioned on both ends of these LGs (Fig. 2).

Fig. 2
figure 2

Whole-genome association study of 4983 SNPs for femalesā€™ reproductive interaction

Table 2 Top ten ranking SNPs associated with female responsiveness controlling for false discovery rate of 15%

Selection of candidate genes on LGs 9 and 14

Seeking corroboration for the existence of major QTLs for female responsiveness, we screened for candidate genes within LGs 9 and 14, for which the most significant SNPs were located. A gene was considered a strong candidate if its orthologs were shown to mediate species-specific reproductive interaction; its expression pattern fits this role; and if there was significant functional variation between the coding sequences of the On and Oa gene, for which we determined the sequence as part of our whole-genome Oa sequencing project. Using key words (sex, pheromone, olfactory, taste, prostaglandin, lipocalin), a search in the On annotated genome build (NCBI Gene database) revealed 12 and 74 relevant genes on LGs 9 and 14, respectively.

On LG9, we flagged a gene annotated as lipocalin (LOC100712094, PTGDSL, GenBank accession No. XP_003443553), which was highly orthologous (identity 30%, similarity 49%) to the human prostaglandin-H2 D-isomerase (PTGDS, GenBank accession No. NP_000945). As prostaglandins comprise the postovulatory female sex pheromone in goldfish (Sorensen et al. 1988) and their synthesis also play a key role in ovulation, with species-specific differences in the regulation or timing (Lister and Van der Kraak 2009), we further analyzed this gene expression in On using RNA-seq data that was deposited in Short Read Archive (GenBank accession No. PRJNA78915). This analysis indicated that PTGDSL is mostly expressed in the skin, which is compatible with mediating exogenous interactions (Fig. 3a). Comparison of the coding region between the putative On and Oa prostaglandin synthases revealed that the latter had a non-conservative amino acid substitution encoded by the fourth exon (M86T).

Fig. 3
figure 3

Expression and sequence variation of candidate genes. a Transcript abundance histogram showing the expression of PTGDSL and CASRL in 11 tissues of Nile tilapia. Data of RNA sequencing of Oreochromis niloticus from a project in which each tissue sample was sequenced in three runs (GenBank accession No. PRJNA78915) was meta-analyzed and transcript abundance was estimated in fragments per kilobase of transcript per million (FPKM). The standard error was approximated from three runs (error bars). b Typical sequencing chromatograms of the second exon of CASRL. Genomic DNA extracted from Oreochromis aureus males (nā€‰=ā€‰4) and responsive and non-responsive females (nā€‰=ā€‰10, each) was sequenced using Sanger sequencing of a PCR amplicon (Table 1) from the forward primer. Three arrows point to three polymorphic positions that encode I130V, V131L amino acid variations, and a synonymous nucleotide substitution. A larger fourth arrow indicates the first position in the codon for the methionine (ATG), which is missing in the gene variant with the shorter exon 2

On LG14, we observed three clusters of chemosensory receptors each including a large number of candidate genes. Of special interest was the telomeric cluster stretching over a million bases (15.1ā€“16.1ā€‰Mb on LG14) with over 90 genes and pseudo-genes encoding extracellular calcium-sensing receptor-like (CaSRLs) protein domains, one of which, LOC100690618 (CASRL, GenBank accession No. XP_005464210), was specifically transcribed in the ovary (Fig. 3a). The fact that CaSR genes are expressed in ovarian surface epithelial cells, and function in oocyte proliferation, maturation, and follicle survival, supports the possible involvement of CASRL in the calcium signal that activate oocytes maturation. The signal for egg activation is species dependent (Ellinger 2016). Analysis of variation in CASRL coding sequences between On and Oa was complicated by copy number variation and indicated an amino acid deletion (Ī”M134) and non-conservative amino acid substitutions encoded by the second exon (Fig. 3b). Variant Ī”M134 was associated with female responders for which the peak ratio in sequencing chromatogram also suggested the presence of additional copy (e.g., ratio 1:3 in the synonymous substitution, Fig. 3b).

To further evaluate association of the selected candidate genes with female responsiveness, we designed microsatellite markers within these genes. These markers were positioned at the 5ā€™ intergenic region and in the untranslated region of the last exon of PTGDSL and CASRL, respectively (Table 1).

Confirmation of effects on LGs 9 and 14 using microsatellite markers

To confirm the SNP associations obtained in our GWAS analysis, we tested microsatellite markers on the two most significant LGs. For each of LGs 9 and 14, we applied two established markers (GM171, GM613 and GM103, GM237, respectively (Lee et al. 2005)) and one marker developed within an attractive gene candidate (PTGDSL-LOC100712094 and CASRL-LOC100690618, respectively). All markers were significantly associated with female responsiveness (pā€‰<ā€‰0.0001) in the individual marker GLM analyses. Significance was also determined for the nine two-way combinations of the six markers, including one marker from each LG. The limit of detection (LOD) scores ranged from 1.9 to 6.1 (Fig. 4). LOD scores and coefficients of determination were highest for the combination of the GM171 and CASRL microsatellites, supporting the hypothesis that the CASRL gene cluster affects this trait. LOD scores were lower for combinations with PTGDSL, suggesting that the related gene is a less likely candidate. The combined effects of GM171 and CARSL from LGs 9 and 14, respectively, resulted in a coefficient of determination of 0.37 for female responsiveness. Although this includes the ā€œfamily effectā€, as noted previously there was virtually no difference between response rate in the two families, and this effect was not significant in any of the models tested.

Fig. 4
figure 4

Combined effects of microsatellite markers originating from LGs 9 and 14 on femalesā€™ reproductive interaction. LOD scores were derived from the F-values for the complete models that include the effects of family and marker genotypes for the two markers nested within family

Discussion

Our study aimed at better understanding of the genetics of adaptation and speciation for traits related to reproductive incompatibility between two closely related tilapia species and their hybrids. Identification of the master genes for SD and the genes for the behavioral reproductive barrier between Oa and On will have important implications for the production of monosex culture required for effective aquaculture (Beardmore et al. 2001). In tilapia, it has been previously demonstrated that LGs 1, 3, and 23 play a major role in SD and that Oa male has a super male genotype (LG3, ZZ, Fig. 5a) when crossed with the On female. In both broodstocks and in wild crosses, contamination between these species may occur (Bakhoum et al. 2009; Rognon and Guyomard 2003) and therefore careful monitoring of the master SD genes alleles involved is advised when using this hybridization scheme for maintaining all-male culture. However, currently only the master gene AMH on LG23 was identified; and genomic selection for the relevant SD alleles is based on microsatellite analysis (Eshel et al. 2014; Shirak et al. 2006).

Fig. 5
figure 5

A proposed strategy for mass production of all-male tilapia hybrids. a The allele patterns that are assumed to determine all-male progeny by an Oa maleā€‰Ć—ā€‰On female cross are represented using a delineation of the tilapia sex determination (SD) systems. These are derived from: O. mossambicus, which has a XX/XY system on LG1 (alleles A, A) (Hammerman and Avtalion 1979); O. aureus and O. hornorum having WZ/ZZ system on LG3 (alleles Z, B) (Ezaz et al. 2004; Zhu et al. 2016); and O. niloticus, which has a XX/XY system on LG23 (alleles C, X) (Shirak et al. 2006). b A putative crossing scheme for mass production of all-male tilapia. The allele patterns that determine SD and types of sexual interaction are represented using a delineation of two crossing steps for acquiring On female with the original SD systems (aaBBXX) and restored capability of sexual interaction. The final crossing step between this female (aaBBXX) and Oa male (AAZZCC) is expected to derive all-male progeny. Lettering of SD alleles follows that of a. Types of sexual interaction of individual females are symbolized by arches colored according to their sexual-mediated interaction with either Oa (blue) or On (red)

In the present study, we showed that contamination with 4ā€“5.5% Oa alleles for both the SD and autosomal loci accumulated during two generations of 2009 and 2010 in our On stock. According to the reports of the Israeli Division of Fishery and Aquaculture in 2011, 100 females of this stock produced with 40 Oa males approximately 60,000 fry in an experimental pond over a period of 40 days, but 27ā€“31% females were detected in the hybrid broodstock. As a control, On females harboring a single Oa allele for any of the SD loci were crossed with Oa males, yielding up to 50% of females in their fry (data not shown). After culling of hybrid females with alien Oa alleles for the three SD loci (Table S1) and mating with Oa males, all-maleness was restored in progeny, although the extent of fry production was reduced by 90% (4000ā€“6000). Thus culling of On females containing Oa alleles prior to mating with Oa males is essential to maintain all-male progeny.

Interestingly, LG1 was involved in SD of these hybridsā€™ offspring, in spite of the fact that only LGs 3 and 23 are associated with SD in the parental Oa and On stocks, respectively. This may be in line with the hypothesis that some SD loci that do not segregate in intraspecific crosses become polymorphic and influential only in interspecific crosses (Hammerman and Avtalion 1979). Other explanations suggest that LG1 contains the original SD locus of On, as it is active in Eastern Africa On sub-species (Palaiokostas et al. 2013) or that it is an ancient cross-contamination of On stock by O. mossambicus or by other tilapia species with original SD locus on LG1 (Elder and Garrod 1961; Pruginin et al. 1975). It has also been shown that interaction between genetic factors and environmental temperatures may mediate LG1ā€™s effect on SD in Oa femaleā€‰Ć—ā€‰On male crosses (Baroiller et al. 2009).

Under our experimental conditions, interspecific reproductive activity was apparent in only 3 out of 100 On females (3%). This coincides with a similar rate of contamination of alien Oa alleles for autosomal loci in On stock (Table S1; 4ā€“5%). Moreover, we detected that 20% daughters of these active females also responded to allospecific males, producing all-male progeny or nearly all-male progeny. Enrichment of stock by responsive females from 3% to 20% through one generation supported our initial hypothesis for genetic control of the reproductive barrier between Oa male and On female. This prompted us to conduct GWAS aimed at the initial characterization of the genetic loci underlying the observed female responsiveness.

Using the GBS method, we analyzed 4983 SNPs for GWAS and found 7 SNPs with an FDRā€‰<ā€‰0.06 that were distributed in 5 regions: LGs 1, 5, 9, 14, and an unmapped scaffold 320, with the most significant effects on LGs 9 and 14. Neither t test nor Chi-squared or Fisherā€™s exact tests are completely valid. t Test assumes a continuous trait distribution, Chi-squared test is generally considered valid only if the expected value is >5 individuals for each cell, and Fisherā€™s exact test has been found to be too conservative (Crans and Shuster 2008), which is critical for the very low probabilities considered. The effect of the number of rare alleles in each SNP (0, 1, 2) on the response status was analyzed. Estimating the number of rare allele as a regression effect, as is done by t test, is more powerful than Chi-squared or Fisherā€™s exact tests, which only test for deviation from statistical independence. Logistic regression could not be applied owing to the small sample size and the presence of empty cells.

The combined effects of microsatellites on LGs 9 and 14 explain 37% of the variance between female responders and non-responders. Unlike GWAS, which included genotypes for part of the sample that passed the filtering criteria (36.5 individuals, on the average), the microsatellite analysis included 69 individuals. The results from this follow-up experiment, which included additional individuals and tested for new markers, support the validity of the most significant findings from GWAS of QTLs on LGs 9 and 14 affecting female responsiveness.

SNPs with the greatest effects on LGs 9 and 14 were positioned on both ends of their LGs (Fig. 2). We suspect that this distribution may be an artifact due to assembly problems in the current On genome version, which also makes use of linkage data. Erroneous end construction in linkage maps is often due to existence of one obligate crossing-over on short chromosomal arms (Lukaszewski and Curtis 1993) and discrepancies between the current On genome build and the On radiation-hybrid map have been indeed observed in the lower parts of LGs 9 and 14 (Guyon et al. 2012). Thus repeating GWAS with larger families and with a better genome map and coverage is essential to further narrow the critical regions for female responsiveness, for which the current confidence intervals apparently encompass the entire LGs.

Once the critical regions are obtained, marker-assisted selection (MAS) can be applied to identify females with sexual interaction. In Fig. 5, we present the putative genetic systems for SD and for reproductive interaction. A strategy of three crossing steps is proposed to control the SD system and restore the sexual interaction, thus enabling all-male progeny production. In the first crossing of Oa female and On male, both females and males are obtained in the progeny (Fig. 5b). Based on genetic test for reproductive interaction, a female responder offspring will be selected and backcrossed to On male. In the final cross, based on genetic tests for the original genotype for SD (aaBBXX) and for sexual interaction, the desired female will be selected from the progeny and crossed with Oa male (AAZZCC) to derive all-male progeny.

In conclusion, our study has shown that female responsiveness in tilapia has high heritability. We have taken the initial steps toward identifying the major genomic loci responsible for sexual incompatibility among tilapia species. The identification of the genomic loci underlying both SD and female responsiveness will enable mass production of all-male tilapia through Oa male and On female hybridization, backcrossing, and MAS (Fig. 5b).

Data archiving

Sequencing data from this article have been deposited in ENA https://www.ebi.ac.uk/ena/data/view/PRJEB23203.