Cross-species gene enrichment revealed a single population of Hilsa shad (Tenualosa ilisha) with low genetic variation in Bangladesh waters

Tenualosa ilisha is a popular anadromous and significant trans-boundary fish. For sustainable management and conservation of this fish, drawing an appropriate picture reflecting population status of this species is very essential based on their all-strategic habitats in Bangladesh. In this study, 139 samples from 18 sites were collected and cross-species gene enrichment method was applied. Like most of the Clupeiforms, nucleotide diversity of this shad was very low (0.001245–0.006612). Population differences between most of the locations were low and not significant (P > 0.05). However, P values of a few locations were significant (P < 0.05) but their pairwise FST values were very poor (0.0042–0.0993), which is inadequate to recognize any local populations. Our study revealed that the presence of a single population in the Bangladesh waters with some admixtured individuals, which may contain partial genes from other populations. Most of the individuals were admixed without showing any precise grouping in the ML IQtree and Network, which might due to their highly migratory nature. Fishes from haors and small coastal rivers were not unique and no genetic differences between migratory cohorts. The hilsa shad fishery should be managed considering it as a single panmictic population in Bangladesh with low genetic diversity.

www.nature.com/scientificreports/ Therefore, to avoid these previously discussed confusions and better understanding, we collected sequence data of 4434 nuclear genes from 139 Hilsa samples taken from the Bay of Bengal, its estuaries and all possible lotic and lentic waters and two migratory cohorts, applying a cross-species gene enrichment method 22 , to examine the genetic diversity and population structure of this shad. Our goal is to provide a solid estimation of the population status of Hilsa shad using genome-wide data and to infer its genetic diversity. Our study will provide a comprehensible look into the genetic diversity of this commercially important species and an evaluation of its population genetic structure. The findings should be important for the management and conservation of this important fisheries resource.

Material and methodology
Sample collection and DNA extraction. Samples (dead fish) have been taken from the commercial fishing boats or directly from fishermen at fish landing sites. In total 139 individuals of Hilsa shad were collected from the diverse ecosystems of Bangladesh including 18 locations involved all fresh water, brackish and marine habitats for this fish (Fig. 2). Furthermore, three primary routes of migration of Hilsa shad from Bay of Bengal were also considered. These sampling locations were categorized into seven different habitat groups based on their habitat nature i.e., 1. Western Riverine (Freshwater) 2. Eastern Riverine (Freshwater) 3. Haor and hill stream river 4. Middle Meghna 5. Meghna Estuary 6. Small Coastal Rivers (Estuary) 7. Bay of Bengal (Fig. 2, Table 1). The samples were identified based on morphological features 23,24 . Five closely related Kelee shad (Hilsa kelee) were collected from the Arabian Sea coast for using as out-group. For sampling, muscles were collected from the base of dorsal fin and fin clips were collected from the tip of caudal fin. For fixation and preservation of tissue samples, 100% and 95% ethanol were used respectively. Finally, samples were stored in 4 0 C refrigerator until DNA extraction started. DNA was extracted from 25 mg of tissue using an Ezup DNA extraction kit following the protocol of the manufacturer (Sangon Biotech, Shanghai, China).
DNA library preparation, gene capture and sequencing. Extracted genomic DNA was sheared to about 500 bp using Covaris M220 Focused-ultrasonicator (Woburn, Massachusetts, USA) according to the manufacturer' instructions. Size of sheared DNA and product of every further step was measured by using agarose gel electrophoresis. DNA libraries were constructed and "with-beads" method was adopted in this protocol to obtain higher yield 22 . Inline indices were added to the adapter to label the samples in the ligation step of library preparation to ease the possible risk of cross contamination among the samples during subsequent gene capture step. After that library preparation, products were pooled together equimolarly.
A cross species gene capture was done and genes were captured for two consecutive trials that increase the recovery rate of enriched gene 22 . A bait set was designed based on the sequence of two Clupeiform species Denticeps clupeoides (Acession number: GCA_900700345.2) and Ilisha elongata (unpublished) for capturing highest number of genes. The enriched libraries (average concentration: 17,073 ng/ml) were amplified by IS4 and indexing primers 25 . Finally, captured genes were pooled in equimolar ratios for sequencing on Illumina HiSeq X10 lane at Annoroad Inc (Beijing, China).
Data preparation read assembly and post assembly processing. According to the description in Assexon pipeline 26 , data processing, read assembly and post assembly processing were done. Raw reads from each sample were parsed according to their 8 bp barcodes (139 unique barcodes were used) on P7 adapter using bcl2fastq v1.8.3 (Illumina). Trim galore v0.4.1 (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ trim_ galore/) was used to trim low quality bases and sequence adaptors. Coding frame of each marker sequence was predicted and corrected using a Perl script (predict_frames.pl). Coding sequences were extracted and translated into amino acid sequences by using Bio:: Seq module in Bioperl 27 . PCR duplicates were excluded by using "-fastx_uniques" command in USEARCH v10.0.240 28 . Sequence of Danio rerio (https:// doi. org/ 10. 5061/ dyrad. 2j5b4) was used as a reference sequence to parse reads to each gene file. Reads were sorted to references with BLAST hit using UBLAST with a relaxed e-value of 1 × 10 -4 . Reads of each locus were assembled (De novo assembly) into contigs by a conservative assembler SGA 29 . Contigs were locally aligned to protein sequences of references using the "protein2dna" model in the package under Exonerate 30 . Reciprocal blast method was used to pick up the orthologous genes.
The output amino acid (AA) sequences were aligned in batch using MAFFTv7.369b 31 . The AA sequences were translated back to DNA after alignment. Poor and badly aligned sequences in coding regions were removed by filter.pl to avoid interfere in phylogenetic inference. Summary statistics (e.g., number of enriched samples, GC content and percentage of missing data) for coding and flanking region of each locus and sample was extracted by using statistics.pl 26 . SNP calling. A custom Perl script was used to make consensus sequences for each target locus from assembled contigs 26 and then reads with adapter sequences were trimmed (Quality Phred score cutoff: 20) and low quality reads were excluded and finally mapped to the consensus by using BWA v0.7.5. The sequence map format (SAM) files were converted into binary format (BAM) by using Samtools 32 . SNP sites were genotyped based on the BAM files using GATK-3.2.2 33 . GATK Best Practices recommendations were followed 34 . Single SNP per locus with least amount of missing data and highest quality score was kept for most analyses to meet the requirement of linkage equilibrium. Custom Perl script was used again to convert the SNP VCF file into NEXUS file and STRU CTU RE input file 26  with the input of population summary 37 . SNP arlequin (.arp) file was used as an input file in the analysis of molecular variance (AMOVA), which was performed using ARLEQUIN 3.5.2 with 10,000 permutations 38 .

Result
Sequencing results (NGS). Each sample produced 4,015,989 raw reads on average and then 4,000,188 filtered reads (on average) were obtained from raw reads after trimming off adapter sequences and reads with low quality score (Q < 20). After removing the 230,829 reads (on average) of PCR duplicates, 93.87% of filtered reads were scrutinized as unique reads (3,769,359 on average). From each sample 1,395 target loci were obtained on average with the best one had 2,223 loci and the lowest one had 504 captured loci (Table S1). The average number of captured loci of the out-group sample was 1,104. All loci (3,399 loci) of studied species and out-group were checked manually. Loci with weird segments, samples from only one location or with lower than four samples were also excluded. After exclusions, 2,461 loci were kept. The deviated locus number was (average): 1,344 (28.05%) and highest deviation from Hardy-Weinberg equilibrium was 0.00053.
Phylogenetic relationships based on genome-scale nuclear data. The maximum likelihood tree was built using IQtree on all of the individuals of Bangladesh waters, collected from different ecosystems mixed together. No location had any unique cluster, but some portion of phylogenetic tree had partial groupings (Fig. 3, www.nature.com/scientificreports/ S3). The phylogenetic tree revealed that the Hilsa shad represent a single genetic population in the Bangladesh water and there is no significant cluster. In network of 842 SNP loci, all samples were randomly interconnected together without any type of pattern (Fig. 4). There is no isolation based on distance, water quality, nature of the habitat and migratory seasons. Negligible samples of same location showed inter-connections among them. That means network result also supported the presence of single population in Bangladesh water like the assumption depicted from maximum likelihood IQtree.
Genetic diversity and differentiation. Average nucleotide diversity (Pi) of Hilsa shad of Bangladesh waters was 0.004632, with highest value in Chilmari (0.008811) and lowest in Balashi Ghat (0.001809) ( Table 2). Analysis of molecular variance (AMOVA) represented that percentage of variation among suspected significant habitat groups ( i.e., Western riverine freshwater, Eastern riverine freshwater, haor and hill stream river, Middle Meghna freshwater, Meghna river estuary, small coastal rivers, the Bay of Bengal) was very low (0.99%).
Percentage of variations of among population within groups and within populations were 1.07% and 97.93% respectively (Table 3). Pairwise F ST values of maximum locations were very poor (61% F ST value was in between 0.0009 and 0.0993, 2% F ST value was more than that value and rest showed negative value) and most of the case, P value was not significant. Populations of fresh water rivers (Western turbid and eastern clean rivers) also had poor F ST value and non significant P value except between Manik Ganj (MG) and Chilmari (CM), which was more than significant level (P < 0.05) ( Table 4). Populations of main migratory route and alternative migratory  www.nature.com/scientificreports/ route had some differences. Samples of the Kocha river (PP) of alternative migratory route were different from all of the locations of main migratory route in the downstream (i.e., CF, MP, BL and CP) based on significant P value (P < 0.05) and samples of another alternative route location (KN) was also different from MP in the same way. However, F ST values among them were not high. The dendrogram based on the F ST values also represented same pattern (Fig. S4).
Population structure. Population of Hilsa shad belonged two groups (K = 2) was supported by Structure analysis, dominant group (green colored group) belonged to the maximum individuals of the population and only few individuals carrying some genes of other group (red colored group) along with dominant group genes (Fig. 5, Fig. S2, Table S2). Samples of CM, CN, MG, MO and SS belonged to the dominant group without any admixtured individuals whereas BG, BR, CB, CF, CP, PC, PG and RS mostly belonged to the dominant group with few admixtured individuals. Moreover, BL, KN, MK, MP and PP had more admixtured individuals than dominant group individuals (Fig. 5). There was no single location that had only admixtured individuals or no one individual that only carried the genes of other small group (red colored group). DAPC result was also similar to structure result (Fig. S1). All samples of Bangladesh water including all types of strategic ecosystems made a single cluster that represent only one population. There was no isolation between sea, estuary and freshwater ecosystems and no separate clusters between western and eastern freshwater rivers.  www.nature.com/scientificreports/

Discussion
Present results showed that Hilsa shad had low nucleotide diversity (0.001809-0.008811) like most of the Clupeiforms, e.g., Elongate ilisha (0.001-0.010), Tapertail anchovy (0.0011-0.0029) in Yangtze river and Japanese anchovy (0.0014-0.0090) [44][45][46] . Sea fish population had higher genetic diversity than anadromous population within same species or among same group 47 . Although, Hilsa and Kelee shad belonged to the same subfamily Dorosomantinae but Hilsa shad is anadromous in nature and Kelee shad is exclusively marine 48 . Because of this habit, nucleotide diversity of Hilsa shad was lower than Kelee shad (Hilsa kelee) (0.010337-0.014690) 49 . Correspondingly, marine Pacific herring (0.020) 50 also had higher nucleotide diversity than Hilsa shad. There were several researchers also reported low nucleotide diversity of Hilsa shad population in the Hoogli, the Ganges and the Brahmaputra river of India 10,17,18 . Low genetic diversity suggested that only small portion of the total population had the scope of successful spawning. That might be associated with their long anadromous breeding migration journey. At that time huge numbers of individuals were caught in their long migratory routes by  www.nature.com/scientificreports/ the fishermen. Frequent changing of spawning pattern is another reason of unsuccessful spawning 51 . Therefore, Government of Bangladesh should place some safety and protection actions including, public conscious, restriction on fishing gear, Hilsa fisheries management activities and proper timing of the fishing ban period. Previous studies on genetic population structure of T. ilisha were mostly based on allozymes, allele frequencies, microsatellite DNA markers and mitochondrial DNA regions: Cytochrome b (CytB), ATPase 6&8 (ATPase), 12 s and 16 s rRNA 10,[15][16][17][18] . However, genomic data is more powerful marker than previous markers to present the history, evolution, population status and phylogeny of a fish. Recently, A study discover the population genomics and structure of Hilsa shad in Bangladesh waters based on genomic data at NGS platform by NextRAD sequencing, however they mistakenly assigned samples collected from the confluent of the Meghna River as the north-eastern riverine group 19,20 . Our study was also based on genomic data at the NGS platform. Conversely, we collected sequence data of 4434 nuclear genes applying a cross-species gene enrichment method 22 , to examine the genetic diversity and population status of hilsa shad from the Bay of Bengal, its estuaries and all possible lotic and lentic waters and two migratory cohorts.. This study provided a solid estimation of the population status of Hilsa shad using genome-wide data and to infer its genetic diversity.
Result of the maximum likelihood IQtree and the population structure suggested that the fresh, estuarine and marine water of Bangladesh have a single population of Hilsa shad. In-addition DAPC, dendrogram and network on SNP loci analysis also represented the same trend. In the phylogenetic tree, samples of all locations were mixed together without making any specific cluster. In the population structure analysis, a single population was present with some admixtured individuals bearing small portion of genes from other group. Pairwise F ST value between most locations were poor with non-significant P value (P > 0.05), that support the deprived local population differences and homogeneity of this fish population throughout our studied locations. The hilsa shad population in Bangladesh might retrieve from a collapsed population. Once upon a time (upto first half of 1990s), this fish was most available and cheap fish in Bangladesh. Because of overexploitation and lack of proper management, the fish population was collapsed more than one decade. After that period, because of fishing ban Bangladesh has diversified fresh water habitats for Hilsa shad migration including main river system, coastal and freshwater small rivers, hill stream rivers, haors etc. but anadromous migration of this shad starts from same marine water body, the Bay of Bengal, which is their living ground. Furthermore, this fish has highly migratory nature among marine, estuarine and fresh water bodies. Therefore, it is difficult to draw a conclusion that there is more than one population in this water system. Low variation among groups and among population within groups also did not support more than one population. F ST value between most of the locations was poor with non-significant P value, which suggested that the population differences were not significant. Although in some cases, P value was significant but due to their poor F ST value that did not provide strong support of local population differences. Here present findings of this study were supported by the findings of some previous researchers who represented the single gene pool or stock of this species in the Bay of Bengal with a substantial gene flow 18,52,53 .
All of the spawning grounds of Hilsa shad were identified in the coastal areas of Bangladesh especially at the lower stretches of the Meghna, the Tetulia, the Ander Manik and the Shahabazpur River e.g., Hatia (Moulavir char) Sandwip (Kalir char) and Bhola (Dhal char and Monpura) 6,21 . However, migratory plan is mainly initiated during the spawning season, which is activated with follow of fresh water runoff from the inland rivers, and naturally it occurs with the commencement of the south-west monsoon and consequent flooding of all the major rivers draining down to the upper Bay of Bengal and there are no considerable differences in any context. Isolation of spawning ground is an important factor for population differentiation 11 . Due to presence of unalienated spawning grounds, it is less feasible to draw population differences of Hilsa shad in the upper streams of different rivers and in their living ground, Bay of Bengal. Therefore, the unique spawning grounds and sole major migratory down-stream route strengthen the presence of single population in all over the Bangladesh water without any significant population clusters. Without specify exact spawning grounds for every cluster, it is unrealistic to draw several clusters in this population.
Hilsa population studies in Indian part across the Hoogli, the Bhagirathi, the Ganges and the Brahmaputra Rivers also suggested single and genetically homogeneous population in Indian part 10,17,18 . Hilsa shad population of the Hoogli-Bhagirathi river system and Hilsa stock of Bangladesh water used same natal habitat, Bay of Bengal. Moreover, the River Ganges is the upstream of the Padma River (Bangladesh) and the Bhagirathi River (India) as well as the Brahmaputra is the upstream of the Jamuna River (Bangladesh). Most of the Hilsa shad of River Ganges comes from the Padma River and as the same way the Brahmaputra river has no other significant source of this fish except the Jamuna River. So genetic homogeneity and unique population across these rivers of Indian part also supported the Hilsa shad's single population in the Bangladesh water.
Nevertheless, Rahman and Naevdal (2000) based on allozymes and muscle proteins as well as Mazumder and Alam (2009) based on mitochondrial D-loop region figured out more than one Hilsa population in Bangladesh waters 15,54 . Rahman and Naevdal (2000) mentioned two populations: 1. Marine and 2. Estuary and fresh water but they processed without explaining how this highly migratory species was separated into two distinct cohorts. Mazumder and Alam (2009) divided the population into two clusters like previous study but poor pairwise F ST value between two groups showed that there were no differences between fresh water and marine-estuarine locations. Recently  reported three clusters in the Hilsa population in Bangladesh waters, first one was in marine and estuarine waters and another two belonged to north-western riverine (turbid freshwater) and north-eastern riverine (clear freshwater) ecotypes 20 . Existing of a single population, the most likely assumption from the present research varied with their findings. Our result suggested that as a highly migratory species, Hilsa shad is incapable to belong to more than one population when sampled at different www.nature.com/scientificreports/ sections of their migration route. Our postulation is the presence of single cluster in the Bangladesh water because all water bodies are almost connected to each other, raising high rate of gene flow and created large population size. Western and eastern river systems of Bangladesh have immaterial dissimilar water quality (e.g., turbidity) but this is not enough to make population differences of Hilsa shad since they migrate and start their life from same spawning grounds and used almost same route across the lower stream and coastal estuaries during their breeding migration.  reported that samples of the Meghna river (MR) was included in the north-eastern riverine (clear freshwater) ecotypes by DAPC and neighbor-joining tree analysis 20 . However, their sample collection site (MR) was located in the common migratory route for north-western riverine (turbid freshwater) and north-eastern riverine (clear freshwater) ecotypes. Therefore, this site should be representing the samples of both ecotypes rather than specific one. If we draw several specific populations or clusters in the upper streams of Bangladesh that means we had the scope to find this shad in the freshwater all over the year round. However, in the freshwater of Bangladesh, this fish was available in the summer (June-October) and winter season (January-March) only; these were related to their summer and winter migration respectably 55 . If one or two groups of this fish, continue their complete lifecycle in the freshwater (Western/Eastern part of Bangladesh) that states the assurance of continuous supply of this fish almost year round. However, the original scenario does not support this hypothesis. Finally we can conclude that, only one population of this fish inhabit in the Bangladesh waters without any instance of different populations and clusters (2-4) but in some specific locations, they had some particular characteristics. The Bay of Bengal is their main living ground, at the time of their breeding they come to the freshwater upper streams, spawn in the estuaries and finally return to the sea. Therefore, using all the same ecosystems (sea, estuary and freshwater rivers) in a cyclic fashion is essential to support their life cycle, which certainly pushes all the individuals to belong a unique population.
In the population structure analysis, only one population of Hilsa shad was identified with some admixtured individuals (32%) containing partial genes from other population in the water bodies of Bangladesh. The mentioned other population might not represent the Hilsa population of the Hoogly and Bhagirathi river system, India because, the Hilsa shads of both migratory routes of Bangladesh and India showed genetic homogeneity 10,17 . The Ganges and Brahmaputra rivers of Indian part are the upstream of the Padma and the Jamuna river of Bangladesh and might be belonged to the same population. However, Hilsa population of the Arabian Sea was genetically heterogeneous from the Bay of Bengal 18 and those different population genes of admixtured individuals might come from the Arabian Sea by oceanographic dispersion. Once (almost 18,000 years ago) the Arabian Sea had a close connection with the Bay of Bengal through the Laccadive Sea, the Gulf of Mannar and the Palk Bay. Therefore, this likely was an easy way for oceanographic dispersion of Hilsa shad between these two water bodies. After that period, a bridge of limestone shoals, coral reefs and tombolo called as 'Ram Bridge' or ' Adam's Bridge' (about 48 km) originated between Pamban Island off the south-eastern coast of Tamil Nadu, India, and Mannar Island, off the north-western coast of Sri Lanka 56,57 . Sarker et al. (2020) also mentioned that type of oceanographic dispersion between these two water bodies for another Clupeid fish species, Hilsa kelee 49 . The Irrawaddy, the Naaf and the Sittang River of Myanmar were also regarded as another important route for Hilsa migration 6,58 . There is also a possibility of inflowing of these different genes of other population from such population. Still there is no population structure study was conducted in the Myanmar part. Therefore, there is no scope to compare those admixtured individuals with the Hilsa population of Myanmar. However, for completing the full scenario, the Hilsa population of Myanmar also claims research attention in population genomics field.
In the present study, Samples of both migration cohorts (summer and winter) were studied. The maximum likelihood IQ tree, population structure and DAPC suggested that samples of both migration cohorts were homogenous. Similarly, Jhingran and Natarajan (1969) and Ramakrishnaiah (1972) also did not find any significant temporal population differences in their previous studies 59, 60 . Dwivedi (2019) found morphometric variations between seasonal migrants of Hilsa shad from Hooghly estuary, India using geometric morphometrics (GM) data 61 . They explained that these morphotypes might be related to the food availability and temperature fluctuation of summer and winter season but they did not incorporate that to the genetic level of the population. Quddus et al. (1984) reported two seasonal migratory populations of Hilsa shad in Bangladesh water based on spawning, fecundity and sex ratio 8 . Based on our findings and previous studies we can conclude these mentioned seasonal cohorts might be associated with their food availability and breeding rather than genome level.
Hill stream river and haor were two important and unique ecosystems for fish diversity in Bangladesh, they belong to the unique characteristics in the ecological factors as well as fish diversity 62,63 . Infrequently Hilsa shad use these two water bodies as their migratory routes. Samples were collected from the Shomeswari River and the Dingapota Haor, Mohanganj as the representatives of hill stream river and haor population respectively. However, Hilsa shad of these two exclusive water bodies were similar to the samples of the some other fresh water bodies (i.e., CM, CN and MG) as they were belonging to the Hilsa population without any admixtured individuals. Samples of SS do not have any significant P value with other locations whereas MO samples had significant P value with five other locations but having poor F ST value with three locations (i.e., BL, PP, MG). MO samples had only mentionable F ST value with MP (estuarine) and MK (marine), which might be the result of differences in water quality of these two water bodies. In DAPC, phylogenetic tree and in network, the samples of hill stream river and haor failed to make any unique cluster or monophyletic clade that represent they are also the part of single unique Hilsa population of Bangladesh waters.
Main migration was occurred through the Meghna river estuary, which is connected to the Padma, Meghna and Jamuna river system. However, there are some other alternative routes through some small coastal rivers e.g., the Pashur, the Bishkhali, the Balaswar, the Kocha river, which are connected to the Padma river through the Modhumati and the Gorai river. These coastal rivers passed through or beside the world largest mangrove forest Sundarban. Thus, these two routes are ecologically different from each other. Samples of these two routes have some genetic differences, because most of the locations (MK, CF and BL with PP and KN) of these two estuarine www.nature.com/scientificreports/ routes had significant P value, but their F ST value was not satisfactorily high to make population differences. Ecological differences of these two routes might be played an important role to create this type of slight differences among them. Therefore, these scenarios were not significant enough to describe noteworthy differences in the population level, but may make a sign of upcoming population differences.

Conclusion
In conclusion, the Hilsa shad collected from diverse habitats of Bangladesh belonged to the same population without mentionable more clusters. Although, recently Hilsa shad supply in Bangladesh is almost satisfactory but genetic diversity of this fish was very poor. Because of breeding failure of large group in the breeding migration and changing spawning pattern, the fish might experience a genetic bottleneck currently. This scenario is not a good sign for the survival of this population. Bangladesh, India and Myanmar already took some fisheries management strategy that may increase their number but failed to increase the genetic variation. Therefore, all three coastal countries of the Bay of Bengal should take a joint plan for the fisheries management and conservation of this fish species.

Data availability
Gene-capture data with adapters and low-quality reads were deposited in NCBI (PRJNA643346). www.nature.com/scientificreports/