Genome Wide Analysis for Growth at Two Growth Stages in A New Fast-Growing Common Carp Strain (Cyprinus carpio L.)

In order to identify candidate genes or loci associated with growth performance of the newly established common carp strain, Xinlong, we conducted a genome-wide association analysis using 2b-RAD technology on 123 individuals. We constructed two sets of libraries associated with growth-related parameters (weight, length, width and depth) measured at two different grow-out stages. Among the 413,059 SNPs identified using SOAP SNP calling, 147,131 were tested for GWAS after quality filtering. Finally, 39 overlapping SNPs, assigned to four genomic locations, were associated with growth traits in two stages. These loci were assigned to functional classes related to immune response, response to stress, neurogenesis, cholesterol metabolism and development, and proliferation and differentiation of cells. By overlapping results of Plink and EMMAX analyses, we identified three genes: TOX, PLK2 and CD163 (both methods P < 0.05). Our study results could be used for marker-assisted selection to further improve the growth of the Xinlong strain, and illustrate that largely different sets of genes drive the growth of carp in the early and late grow-out stages.


Results
Evaluation of growth-related traits. We analyzed four growth parameters in two different grow-out stages: five months after tagging, and at the end of the production cycle (17 months). Within-stage comparisons revealed a statistically significant linear relationship between the four growth traits. Notably, we observed particularly strong correlation between Blen5m/Bwid5m (r = 0.94) and Blen5m/Bdep5m (r = 0.91) in the first stage, and between hBwt and hBdep (r = 0.93) in the second stage (Table 1). At the end of the growth cycle, all four growth parameters were statistically significantly higher in the Selection group, whereas control group exhibited higher SD values (Fig. 1).
2b-RAD raw data processing. Raw 2b-RAD data were filtered via several quality control steps (see methods for details), clean data were then mapped to the genome of domesticated common carp 19 , and 413,059 SNPs detected after SNP calling by RADtyping program. After the SNP quality checking (QC), 147,131 SNPs were left and successfully assigned to 50 chromosomes and 3,377 scaffolds of the draft genome. The average sequencing depth was 44.56 and mapping successfulness rate ranged from 46.89% to 50.83%. We obtained 27,730,499 unique tags (225,451 tags per specimen) (Supplementary Table S1). Five regions accounted for 96.01% of all SNPs; with the highest proportion annotated in the intron (34.00%), followed by the intergenic (27.18%), upstream genes in the annotation file of reference genome (13.37%), downstream genes (13.18%), and finally exon regions (8.28%) (Supplementary Table S2). exploring genomic markers associated with growth traits in two grow-out stages. Analyses with Plink and EMMAX were conducted for candidate SNPs associated with the four growth performance traits in each stage: 44,306 SNPs were associated with growth traits in the first stage and 33,039 in the second stage (P < 0.05). After the Bonferroni correction, statistically significant values (P < 0.05) were found for only 109 SNPs in the first stage and only one in the second stage (Table 2).
Manhattan graphs plotted for parameters of all growth stages [−log10 (P value)> 5] allowed us to isolate 345 related SNPs. Using EMMAX software, which can account for full sib information, we identified 39 SNPs that were associated with the studied growth parameters via comparison of 45 (10 traits * 9/2 combination pairs) pairs of growth traits in both grow-out stages (Table 3). No overlapping loci were identified in pairwise comparisons of loci associated with growth parameters between first and second stages. These 39 SNPs were annotated using the reference genome (Table 4). Among these 39 SNPs, 36 were identified only by EMMAX, and three were further corroborated by plink analysis. These 39 loci were located in the 5' flank regions, 3' flank regions, CDS and intron regions of corresponding genes, and belonged to the following five functional classes: 1) Immune response to pathogens. In response to the microbial infection, interferon alpha/beta receptor 2 (IFNA/betaR2) triggers a complex cascade of events 20 . Many genes associated with these downstream events are involved in cellular immune processes. For T-cell development and trafficking, thymocyte selection-associated high mobility protein (TOX) is involved in chromatin assembly, transcription, and modulation of T-cell development, and growth regulation 21 . DCs and T cells in semaphorin-4A (Sema4a)-knockout mice displayed poor allostimulatory activities and T helper cell (Th) differentiation, respectively 22 . Spinster homolog 2 (SPNS2) regulates the levels of Sphingosine-1-phosphate (S1P) gradient that controls lymphocyte trafficking 23 . In addition, dnaJ homolog subfamily C member 25 (DNAJC25) can significantly increase cell apoptosis, and its overexpression inhibits cell growth 24 . Deletion of inositol polyphosphate multikinase (IPMK) in cell lines and mice virtually abolished lipophagy, caused liver damage and inflammation, and impaired hepatocyte regeneration 25 . Among the regulation factors, laccase domain-containing protein 1 (LACC1) regulates TNF and IL-17 in mouse models of arthritis and inflammation 26 . A knockout of scavenger receptor cysteine-rich type 1 protein M130 (CD163) improved the resistance to viral infection in pigs 27 . A knockout of apolipoprotein A-I (APOA1) decreased parenchymal and vascular β-amyloid pathology in the Tg2576 mouse model of Alzheimer's disease 28 . 2) Development, proliferation and differentiation of cells. In the embryonic development of mice, tyrosine-protein kinase JAK1 deficiency causes arrested development, and apnea and death in newborn pups 29 .
In the cell proliferation and differentiation category, nudC domain-containing protein 1 (NUDCd1) expression was positively correlated with cell proliferation, migration, and invasion in A498 cells 30 . Compared to smarcb1a NUDCd1 function, SWI/SNF-related, matrix-associated, actin-dependent regulator   of chromatin, subfamily B, member 1-A (smarcb1a) is a critical component of the mammalian SWItch/ Sucrose Non-Fermentable (mSWI/SNF) protein, which promotes MyoD-mediated muscle differentiation by altering the chromatin structure in promoter regions of endogenous loci 31 . APC membrane recruitment protein 1 (Amer1) acts as a scaffold protein for the β-catenin destruction complex and promotes stabilization of Axin at the plasma membrane, thereby exerting negative regulatory role in Wnt signaling (key role in embryonic development) 32 . Complete loss of DNA damage checkpoint protein RAD9A in male mice caused a radical loss of spermatogenic cells (infertility or sub-fertility) 33 . 3) Cholesterol metabolism. Phosphatidylinositol 5-phosphate 4-kinase type-2 alpha (PIP4K2A) regulates intracellular cholesterol transport 34 . As a result, hormone-sensitive lipase (HSL) has negative correlation with cholesterol content in the testes 35 , and dela(24)-sterol reductase (DHCR24) knockout in brain caused cholesterol deficiency in mice 36 . 4) Neurogenesis. The genes involved in this category can be divided into two groups, one of which is related to the neurons and neuritogenesis. Rotein kinase C-binding protein (NELL2) promotes survival of neurons 37 . Homeobox protein (DBX1-B) is crucial for production of dorsal habenular neurons 38 . Synapse differentiation-inducing gene protein 1 (SYNDIG1) regulates the excitatory synapse maturation in rats 39 . Mice lacking Nrxn2α (α-variant of NRXN2 (neurexin 2)) exhibit behavioral abnormalities (social interaction deficits and increased anxiety) 40 . The other group is associated with signal transmission and regulation factors. Septin 8 B (sept8b) expression pattern in glial cells of zebrafish indicated that it was involved in signal transmission in nervous systems 41 . Mtss1 impinges on directional persistence and neuritogenesis 42 . Rap guanine nucleotide exchange factor 6(Rapgef6)-knockout mice exhibited mild behavioral abnormalities (hyperlocomotion and working-memory defects) 43 . 5) Response to stress, comprising genes included in the interaction between environment and genetics.
Catechol-O-methyltransferase (COMT) was identified as gene × environment interaction candidate gene (probably a regulatory factor), associated with childhood adversity, posttraumatic stress disorder, and major depressive disorder 44 . Mixed lineage kinase dual leucine zipper kinase (DLK) regulates the JNKbased stress response pathway 45 . Overexpression of IMPACT in yeast cells inhibited growth under all stress conditions that require GCN2 (eIF-2-alpha kinase) and GCN1 for cell survival, probably by the IMPACT promoting the dissolution of the GCN2-GCN1 complex 46 . Serine/threonine-protein kinase PLK2-like (PLK2-like) is involved in catalysis of the following reaction in response to stress: ATP + protein serine = ADP + protein serine phosphate, and ATP + protein threonine = ADP + protein threonine phosphate 47 .

Validation of growth-related markers.
To verify the authenticity of the three genes (TOX, PLK2 and CD163) identified by overlapping the results of Plink and EMMAX analyses in 188 specimens, three primer pairs were designed to amplify and sequence the PCR products. Among the sequenced products, two PLK2 genotypes, AG and GG, were identified. We found that two genotypes exhibited significant differences in the BWid5m growth parameter (Fig. 2).

Discussion
Genome-wide association studies are a very efficient approach for associating SNPs with important economic traits, as they allow analysis of large amounts of SNP data and identification of specific markers 6,48 . Genomic selection based on specific genes can be a cost-effective and fast (regarding the number of genome-wide markers used) method to considerably facilitate the selection for complex quantitative traits by precisely selecting favourable alleles. Obtaining thousands of SNP markers is becoming easier with the development of new technologies for population genotyping, thus allowing construction of high-density genetic maps for domesticated animals, as well as identification of QTL and specific markers for economically important traits 49 . High and low-density genome-wide association assays have been constructed both for terrestrial and aquatic farmed animals. Examples include the bovine SNP50 chip with 54,001 SNPs 50 , the SNP linkage map developed for Atlantic salmon, with around 5500 markers resulting from 16.5 K SNPs, and the SNP linkage map of common carp with 28,194 SNPs from a 250 K genechip 51,52 . These SNP arrays were used to explore QTL related to important economic traits, such as body weight and age at sexual maturation 4 . Xu et al. relied on QTL mapping to identify a number of growth performance regulators, including those associated with cell-proliferation (IGF1) and growth and energy metabolism (ERBB4, BMPR1B, SMTLB, GS, KISS2 and NPFFR1) 52 . In our previous genome-wide analysis of the Xinlong strain, we found a number of genes associated with growth parameters at the early grow-out stage, including those related to neural pathways (RHEB, MDGA2, NRXN1a, BDNF, SHANK2 and REEP2), sex dimorphism (NRXN1A, SMOC2, RAB10 and HSD17B), and fatty acid metabolism (RAB10, TNMD, VPAC2 and ACSF2) 18 . However, it remained unknown which candidate genes or loci contribute to the growth performance during later growth stages.
In this study, we were interested in genes controlling the growth parameters in later grow-out stages, up until the marketable fish size. By studying ten growth traits in two different grow-out stages, and then merging the results of Plink and EMMAX analyses, we identified 39 growth-related SNPs. SNPs associated with five functional classes were associated with four growth parameters in the first grow-out stage. No overlapping SNPs were found between growth performance parameters in the first and second grow-out stages. This may be a genetic molecular marker contribution to the comparatively high correlation coefficients among the former four parameters and lower coefficients between the growth parameters in the two grow-out stages (Table 1). These overlapping genes may be interacting in a way that produces the observed increased growth performance. Firstly, both HSL (regulates the cholesterol content 35 ) and PIP4K2A (regulates the intracellular cholesterol transport 34 ) can influence the availability of cholesterol. The cholesterol level appears to limit development of the central nervous system (CNS) 36,53 . The LENSSPQAPARRLLPP (BigLEN) neuropeptide delivered to synapse is believed to act through the G protein-coupled receptor 171 (GPR171) to regulate body weight by food intake and metabolism 54 . Beside the body weight, hypothalamic neurons may be involved in the regulation of metabolism, energy balance, and social www.nature.com/scientificreports www.nature.com/scientificreports/ behavior 55 . Within the brain, endogenous leptin plays a physiologically important role in the control of food intake 56 . In the neural cell lines, leptin-induced transactivation of NPY gene (involved in the central and sympathetic regulation of food intake) promoter can be mediated by JAK1 57 . This gene was singled out in our pairwise comparisons of growth parameters, and six other genes were involved in pathways associated with neurons and neurogenesis (NELL2, dbx1b, Mtss1, SYNDIG1, NRXN2 and BDNF). Thus, cholesterol content and transportation can affect the central nervous system and the brain, both of which play a role in the control of food intake via leptin, JAK1 and GPR171 genes.
By overlapping Plink and EMMAX results, we identified three genes. TOX is required for a critical transitional step in selection and maturation of thymocytes, so this gene may have a direct effect on the cell-mediated immunity 58 . Therefore, we speculate that this gene may play a role in immune responses and growth of the common carp. PLK2-like is associated with metabolic responses to stress, involved in cell proliferation by contributing to the mitosis and centrosome cycle, and plays an important role in embryonic and skeletal development, as evidenced by experiments on cultured PLK2 embryonic fibroblasts, where the proliferation was stronger in cells expressing PLK2 59,60 . And finally, CD163-knockout pigs were completely resistant to viral infection during the PRRSV challenge due to the fact that this gene acts as a cellular receptor for PRRSV 27 .
By LD analysis of three common loci, two candidate genes were found: AGPAT5 and MRPL57. Polymorphism of AGPAT5 gene was associated with the pig and cattle meat quality 61,62 . Liver-specific Agpat5 knockout mice had significantly reduced fasting plasma insulin and hepatic triglycerides after 12 weeks of high-fat diet 63 . AGPAT2 enzyme catalyzes the acylation of lysophosphatidic acid to form phosphatidic acid, a key intermediate in the biosynthesis of triacylglycerol and glycerophospholipids 64 . Therefore, regulation of biosynthesis of triacylglycerol and glycerophospholipids may be the genetic explanaiton behind higher content of polyunsaturated fatty acids in the muscle of the Jinlong carp 18 .
The candidate genes explored in this paper indicate that growth of common carp is a complex trait, associated with immunity, metabolism, stress and development, and contingent upon the interaction between environment and genetic backgroud 65 . Therefore, genome-based selective breeding should not only focus on traditional growth-related genes, but also use reliable breeding population to explore additional genes that may affect growth. As many of these genes remain unknown for the time being, it may be a promising direction for future studies.

conclusions
The objective of this study was to use the cost-effective 2b-RAD technology to investigate the genomic regions related to the growth performance of a new common carp strain. Growth-related parameters, measured at 8 months of age and at the end of the grow-out cycle, were evaluated for 123 specimens. According to these results, we analyzed 2b-RAD data and identified a total of 147,131 SNPs. A comparison of loci associated with all growth parameters in the first and second grow-out stages identified 39 SNPs. None of these SNP overlapped between the first grow-out stage and second grow-out stage. Genes associated with these SNPs belonged to the following five functional classes: immune response to pathogens, development, proliferation and differentiation of cells, cholesterol metabolism, neurogenesis and response to stress. Plink and EMMAX analyzes singled-out three genes: TOX, PLK2 and CD163.
Methods experimental material and sampling. Focusing on growth performance as the target trait, a genetically distinct strain was developed from the traditional Huanghe carp strain at the Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences using BLUP 66 method according to the best-predicted breeding values. A population of the new strain was obtained using artificial breeding (spawning in females was stimulated with hormonal injections, and then roe and milt mixed manually), fertilized eggs were incubated in separate cage settle nets for one week, and larvae then transferred to labelled (ID + spawning time) nursery happas. After 3 months in the nursery, parents and 8 of their full-sib F1 progenies randomly selected from each family were anesthetized with clove oil (75 mg/L) 67 and tagged with passive integrated transponder tags (PIT) produced by Biomark. In total, 123 tagged full-sib common carp progeny specimens belonging to 25 families were selected for the experiment. Xinlong strain was represented by 95 specimens belonging to 19 families, and the control population (Huanghe strain) by 28 specimens belonging to 6 families. These two populations were cultivated in concrete tanks supplied with aerated and filtered water. During 17 months of the culture period, counting from the date of tagging (3 months of age) to the harvesting time (20 months of age), the fish were fed with a commercial feed (30% protein content) at a daily dose of 5% of their approximate body weight. Physicochemical parameters of the water were regularly monitored: dissolved oxygen, pH and temperature.
Growth parameters were measured after 5 (5 m) and 17 (h) months of the growth trial: body weight (Bwt5m and hBwt), body length (Blen5m and hBlen), body depth (Bdep5m and hBdep) and finally body width (BWid5m and hBWid), respectively. Fish were anesthetized with clove oil (75 mg·l −1 ) during the measurement procedure. Between 50 and 100 mg of caudal fin tissue was collected from each individual at the harvesting stage and preserved in 95% alcohol at 4 °C until DNA extraction.
Experimental procedures and animal handling were carried out in accordance with guidelines for the care and use of animals for scientific purposes set by the Institutional Animal Care and Use Committee of the Freshwater Fisheries Research Center and approved by the animal ethics committee of Chinese Academy of Fishery Sciences.
DNA extraction and 2b-RAD sequencing. DNA was isolated using universal genomic DNA kit (CWBIO, China) following the manual. Briefly, around 25 mg of fin tissue was digested for 1 hour at 56 °C with 20 µl Proteinase K to remove the proteins, and then incubated for 10 minutes at 70 °C after adding Buffer GL and 100% ethanol. Once the solutions were transferred to a column with a collection tube (Spin Columns DM), they Scientific RepoRtS | (2020) 10:7259 | https://doi.org/10.1038/s41598-020-64037-w www.nature.com/scientificreports www.nature.com/scientificreports/ were processed with Buffer GW1 and Buffer GW2 in order to improve the DNA purity. Finally, Buffer GE was added and DNA precipitated, collected and stored at −20 °C. Quality and concentration of the extracted DNA were checked by spectrophotometry (optical density reading at 260 and 280 nm) and electrophoresis on 1.0% agarose gel.
In order to construct the 2b-RAD libraries for each of the 123 individuals, we followed a simplified RAD (restriction site-associated DNA) genotyping method, as described by Su et al. 18 , with minor modifications. This method is based on sequencing of uniform fragments produced by type IIB restriction endonucleases 8 , as described in the 2b-RAD protocol 3 . PCR products were purified and quantified using SPRI select purification kit (Beckman Coulter, Pasadena, CA, USA) and Qubit 2.0 fluorometer (Invitrogen). The quality of all amplicon libraries was checked on 1.8% agarose gels and verified on Agilent 2100 Bioanalyzer.
2b-RAD raw data processing. In order to filter the sequenced reads and get high-quality reads, quality check (QC) (conducted on the basis of two criteria: max-missing (integrity parameter) <0.8 and Minor Allele Frequency (maf)> 0.01) and adapter trimming 9 were performed. SNPs were discovered by aligning reads against a reference genome 68 using STACKS v1.23 (parameters m3, M2 and N4). Genotyping was then done following steps reported in Jiao et al. 69 , excluding 3-bp positions, ambiguous reads, long homopolymer regions, and excessive numbers of low-quality positions (between 5 to 10).
The paired-end reads were merged by Pear software (Version 0.9.6) 70 . The merged reads were processed using a custom Perl script to trim adaptor sequences. The terminal 3-bp positions were also excluded from each read to eliminate artifacts that might have arisen from ligation sites. Reads with ambiguous bases (N) exceeding 8%, poor quality (15% nucleotide positions with a Phred quality <30), or without restriction sites were removed. The BsaXI tags in the genome of common carp were extracted based on the enzyme's recognition site, which served as a reference for SNP discovery. High quality reads of each individual were aligned to the reference genome using SOAP2 (version 2.21) 71 with the following parameters: r = 0, M = 4, v = 2. The aligned data for each individual were then used for SNP detection by RADtyping 72 program with default parameters. For co-dominant markers, we used an ML algorithm to estimate for homozygotes or heterozygotes. In order to obtain robust results in the subsequent analyses, the following criteria were applied for SNP filtering. (1) Segregating markers that could be genotyped in at least 80% of the individuals were kept for analyses. (2) SNPs with a minor allele frequency (MAF) < 0.01 were discarded. (3) Polymorphic loci with more than two alleles possibly derived from sequencing or clustering errors were excluded. (4) Tags with more than two SNPs were excluded Genome-wide analysis and candidate SNP identification. Two programs were used to identify the candidate SNPs. Plink applies a general linear model and EMMAX applies a mixed linear model, which can control for the false positive rate. The BH method in p.adjust function in R was used to calculate the false discovery rate. Using Bonferroni correction (P < 0.05 and P < 0.01), −log10 (P value) was calculated and plotted on Manhattan graphs with 100,000 window size using Plink 73 , EMMAX 74 and qqman package 75 in R. In order to identify the candidate SNPs, two models (linear and mixed linear model) were considered. Plink applies the general linear model, and EMMAX applies the mixed linear model, which can control for the false positive rate. The BH method in p.adjust function in R language was used to calculate the false discovery rate. For each stage, we identified significant SNPs, and their chromosomal locations. After we selected the SNPs for these two stages, we filtered the common SNPs according to their positions on chromosomes in the reference genome.
EMMAX conducts GWAS analysis on the basis of variance component approach. The algorithm conducts association analysis of quantitative traits as described below: Let n be the sample size, p the total number of genotyped SNPs and Y the vector of observed phenotypes. We used the genotype data to calculate the n x n matrix ∧ S pairwise genetic relatedness between individuals, such as IBS or Balding-Nichols matrix, and normalized ∧ S to have sample variance 1 using a Gower's centered matrix. We also used BN matrix as recommended by the authors. This tests the hypothesis H0: σ = 0 a 2 . If the null hypothesis is rejected, it proceeds to step 3; otherwise, use the ordinary least squares to estimate the coefficients of each of the SNPs genotyped. For each marker, we used GLS F-test, or alternatively a score test, to estimate the effects β k and test the hypothesis β ≠ 0 k in the following model: We substituted β 0 by a multicolumn matrix containing the group information, as we thought that group information may be a confounding variable. We suspected that groups may cause population stratification (e.g. First and second columns are sample names, third column is always 1, and fourth column is group information, with selected group coded as 1 and control group coded as 0. In order to identify the candidate growth-associated genes, loci containing the selected SNP were annotated by SnpEff (version 4.1 g) 76 and designated according to the published reference genome of the common carp (Songpu strain) 19 . Finally, plink and EMMAX results were merged by seeking a mathematic Union set. For each stage, we found significant SNPs, and identified their chromosome positions. After we selected the SNPs for these two stages, we filtered the common SNPs, i.e. the ones with corresponding relevant information (positions on chromosomes) and significant P values between the two stages. For these significant SNPs, we conducted LD analysis using Plink. The overlapping loci identified by these two methods were used to conduct the validation test. The common SNPs were used to conduct the linkage disequilibrium (LD) analysis, with ±0.5 M window along the chromosome or scaffold, using Plink and LDheatmap package 77 . Position name was constructed using chromosome or scaffold name plus position.
Validation of growth performance markers. On the basis of the genomic region of DNA designated from the reference genome of common carp, specific primers for the genome-wide loci significantly associated with growth-related traits were designed using Primer3web 4.1.0 (for picking primers from DNA sequences) and Primer-BLAST (NCBI; for finding specific primers). PCR was conducted using 2×Es Taq MasterMix (CWBio, China), and results were checked on 1.8% agarose gels and verified on Agilent 2100 Bioanalyzer.
In total 180 individuals from the F3 generation of Xinlong carp strain (no relationship with the 2b-RAD sequence data) were randomly selected and used to validate the efficiency of the identified markers. After amplification of the SNPs of the three genes in these 188 individuals, three primer pairs were designed on the basis of corresponding sequences in the reference common carp genome (Table 5). PCR products were sequenced (oneway) and aligned using DNAMAN6.0 software. Statistical data analysis. All data were represented as mean ± SE and subjected to analysis by Chi-square tests and Student Test using Statistical Package for Social Sciences (SPSS) for Windows (SPSS Inc., Chicago, IL, USA). Differences were considered statistically significant when P values were <0.05.

Data availability
All of the 2b-RAD sequences used during this study are available from the NCBI Sequence Reads Archive (SRA) database under the accession numbers SRR6241620 and SRR6262716. The other datasets supporting the conclusions are included within the article and its additional files..  Table 5. Specific primers designed for the loci overlapping between Plink and EMMAX analyses. Note: SNP name using scaffold or chrosomone name plus position