Large-scale SNP screenings identify markers linked with GCRV resistant traits through transcriptomes of individuals and cell lines in Ctenopharyngodon idella

Grass carp (Ctenopharyngodon idella) is an important economic species in freshwater aquaculture and its industry has been confined due to variety degeneration and frequent diseases. Marker-assisted selection is a feasible method for selective breeding of new varieties. Transcriptome data have greatly facilitated high-throughput single nucleotide polymorphism (SNP) marker discovery and phenotype association study. In this study, we gained a total of 25,981 and 5,775 high quality SNPs in two transcriptomes from individuals and cell lines, respectively. Comparative transcriptome analysis identified 413 and 832 grass carp reovirus (GCRV)-resistant-association SNPs as well as 1,381 and 1,606 GCRV-susceptible-association SNPs in individuals and cell lines, respectively. Integrated analysis indicated 22 genes with single SNP share common resistant/susceptible traits in two transcriptomes. Furthermore, we infected grass carp with GCRV, genotyping and association analyses were performed, and 9 in 22 SNPs were confirmed by PCR-RFLP. Meanwhile, mRNA expression profiles of 6 genes containing confirmed SNPs were examined by qRT-PCR. The results demonstrated that mRNA expressions were significant differences in resistant/susceptible individuals and cell lines. The present study develops an important strategy for high throughput screening of phenotype association genetic markers and the results will serve in grass carp breeding for GCRV resistance.


SNP detection and screening. Illumina MiSeq (2 × 250 bp) in individuals and Illumina HiSeq2500
(2 × 125 bp) in cell lines generated 20.04 Gb (89.07% of raw data) and 17.23 Gb (96.74% of raw data) clean data bulk, respectively (Table 1). SAMtools were employed for SNP detection 17 . In the present study, SNP only refers to single base substitution (transition and transversion), InDel represents single base insertion and deletion. Meanwhile, in order to improve the accuracy of SNP analysis, those with total read depth over 20 and mutation read depth over 10 are regarded as high quality SNPs and InDels. A total of 33,433 and 12,081 high quality SNPs were picked out from SS1, SR2, KS3 and KR4 libraries of individuals and C1, R2 and S3 libraries of cell lines, respectively (Fig. 1A,C and Supplementary Dataset 1). In addition, a total of 2,746 and 782 high quality InDels were found in two transcriptomes, respectively (Fig. 1B,D and Supplementary Dataset 2). However, in view of the identical SNPs in SS1, SR2, KS3 and KR4 libraries, it showed a total of 25,981 SNPs in individuals. Similarly, taking into consideration the identical SNPs in C1, R2 and S3 libraries, it showed a total of 5,775 SNPs in cell lines ( Fig. 1E and F). In the identified SNPs, the frequencies of transitions (66.79% in individuals and 72.32% in cell lines, respectively) were higher than those of transversions (33.21% in individuals and 27.68% in cell lines, respectively). In terms of transition, the similar amounts of A/G (11,375) and C/T (10,957) were found. Likewise, the frequencies of the four transversion types (A/C, A/T, G/C and G/T) were approximately alike as well ( Fig. 1A and C). As expected, the ratio of transition to transversion was about 2.0. With respect to InDels, the amounts of A and T as well as G and C were similar, while the ratio of A and T to G and C was about 7.0 ( Fig. 1B and D).
Based on these high quality SNPs, exploring specific SNPs associated with resistance/susceptibility to GCRV was feasible. We used the venn diagram to display the SNPs among different tissues and cell lines by comparative transcriptome analysis. The results showed 1,381 identical SNPs between SS1 and KS3 (excluding SR2 and KR4) and 413 identical SNPs between SR2 and KR4 (excluding SS1 and KS3) in individuals, 832 specific SNPs in R2 (excluding S3) and 1,606 specific SNPs in S3 (excluding R2) in cell lines (Fig. 1E,F and Supplementary Dataset 3). In the subsequent analysis, these SNPs were regarded as resistant or susceptible SNPs in individuals and cell lines. Furthermore, in order to find communal SNPs in individuals and cell lines, these SNPs (1,381 and 413 in individuals, 832 and 1,606 in cell lines) were mapped to the corresponding genes. The results showed that 3 communal genes were shared by resistant individuals (IR) and cells (CR) as well as 19 communal genes were shared by susceptible individuals (IS) and cells (CS) (Fig. 1G). Meanwhile, These 22 genes possess single SNP associated with resistance/susceptibility to GCRV in individuals or cells, whose non-redundant annotations and gene names were listed in Supplementary Table S1. And the positions of SNPs in 22 genes except C7N1 were different between individuals and cell lines, so there were totally 43 SNPs in the 22 genes in individuals and cells. To study the practical significance of SNPs, 22 SNPs in the transcriptome of individuals were selected for verification and association analysis, in which fish from different population were used. Meanwhile, other SNPs that were not selected for further analysis and verification might also play some roles in the antiviral immune responses.
Read depth and distribution of SNPs associated with resistance/susceptibility to GCRV. As the read depth in SNP position is closely related to the prediction accuracy of SNP 18 , statistical analyses of read depth for each SNP were performed in the transcriptomes of individuals (including SS1, SR2, KS3 and KR4 libraries) and cell lines (including C1, R2 and S3 libraries), respectively ( Fig. 2A and D). SNPs with a read depth between 10 and 59 times account for 99% and 67% in individuals and cell lines respectively. While SNPs with a read depth from 60 to 100 times and more than 100 times account for nearly 16% in cell lines separately. Among these SNPs associated with resistance/susceptibility to GCRV, corresponding genes with single SNP were more common and those with no more than 5 SNPs occupied more than 94% of total genes in individuals and cell lines ( Fig. 2B,C,E,F and Supplementary Dataset 4). On the other hand, the number of genes with single SNP was nearly 2.6-fold higher in susceptible groups than that in resistant groups in individuals ( Fig. 2B and C) and approximate-ly1.5-fold in cell lines ( Fig. 2E and F). SNP distribution among genes is important when considering the marker density and genome coverage 19 . We examined the genomic distribution of SNPs associated with resistance/susceptibility to GCRV by BLAST analysis and found that SNPs from individuals were located throughout the genome (Fig. 3A). However, no SNP was blasted to the chromosomes 4, 8, 9, 10, 15, 17, 20 and 23 in cell lines (Fig. 3B). The distribution of SNPs in susceptible samples was more dispersed than that in resistant samples of individuals and cell lines (excluding the SNPs that could not be mapped to genome) ( Fig. 3 and Supplementary Dataset 5). The number of SNPs in susceptible groups was 2-fold higher than that in resistant groups except on chromosomes 7, 10, 11 and 20 in individuals and chromosomes 5, 7, 13, 16, 19, 21 and 24 in cell lines. More than 30 SNPs were located on chromosomes 6, 12 and 21 in susceptible group in individuals (Fig. 3A), and more than 200 SNPs were located on chromosomes 6 and 12 in susceptible group in cell lines (Fig. 3B). On the contrary, chromosome 21 carried more than 150 SNPs associated with resistance to GCRV, which was 10-fold higher than those in susceptible groups in cell lines (Fig. 3B). . All transcripts had significant hits to proteins in the non-redundant database and these genes were annotated by the corresponding top best BLASTX hit. After GO annotation, many genes were assigned with one or more GO terms. The plotted GO annotation of these genes are shown in Fig. 4. The number of genes with a GO term was 76 and 305 in resistant and susceptible groups respectively (Supplementary Dataset 6).
In addition, the top 20 assignments to KEGG pathways in resistant and susceptible groups are shown in Fig. 5 and Supplementary Dataset 7. The venn diagram describes the overlap between resistant and susceptible top 20 assignments. Herein, 10 pathways were shared between resistant and susceptible groups. In these pathways, the number of SNPs in susceptible groups is 2-fold higher than that in resistant groups except viral myocarditis  pathway. The pathway with the highest density of SNPs in resistant groups is viral myocarditis with 14 SNPs, and that in susceptible groups is phagosome with 22 SNPs (Fig. 5).

SNP verification and significant relationship with resistance/susceptibility to GCRV.
During the whole challenge experiment, no dead fish was found in the control group. According to the symptoms and death status, 30 fish were grouped into resistant stock and 30 fish were grouped into susceptible stock. Based on the results of above venn diagram analysis (Fig. 1G), 22 genes with single SNP associated with resistance/susceptibility to GCRV were selected to validate their reliability. The primers for SNP verification were listed in Table 2. The results showed that 9 SNPs were validated by PCR-RFLP and their target fragment can be digested with corresponding enzyme, their gene structure and polymorphism confirmation were displayed in Fig. 6. Agarose gel electrophoresis showed that Gsto, Yes, C7N1 and Napi-llb2 genes contain three genotypes, Hnrpa and Chsg have two genotypes, while Hiat, Mef2d and Zfyve26 are heterozygote and possess just one genotype (Fig. 6). Subsequently, SNPs with more than one genotype were genotyped in resistant/susceptible groups and the results were listed in Table 3. Genotyping and association analysis results preliminarily revealed that heterozygote in the site (4807309 A/G) of C7N1 gene was significantly more in resistant population than that in susceptible population (P = 0.017). Aside from this SNP, there was no other SNP that was significantly related to the resistance/ susceptibility to GCRV (P > 0.05) ( Table 3).  The mortality of C. idella post GCRV infection ranges from 50% to 90%, and that of rare minnow (Gobiocypris rarus) is almost 100%. Interestingly, D. rerio post GCRV challenge is nearly 100% survival 20 . In order to investigate the differences among them, comparisons of genomic base types of the corresponding SNP positions among C. idella, G. rarus and D. rerio were listed in Supplementary Table S2. Meanwhile, chromosome locations of 22 SNPs and corresponding amino acids in C. idella were listed in Supplementary Table S3. Most SNPs were synonymous except for the mutation of C7N1, which can cause a change of amino acid (M/I). C7N1 is a novel gene that is homologous with human C15orf39 and locates on chromosome 7 (named as C7N1). Interestingly, comparative genomic analysis revealed that corresponding base of C7N1 SNP in G. rarus and D. rerio were G and A, respectively (Supplementary Table S3). Polymorphisms in resistant and susceptible groups were in Hardy-Weinberg equilibrium (HWE) ( Table 3). These results threw light on the significant association with resistance/susceptibility to GCRV. Furthermore, considering some SNPs may change the modifications (phosphorylation and ubiquitination) and localizations of proteins which are very critical for protein functions, and SNPs could change transcription and translation levels of genes by interacting with microRNAs, corresponding predictions were conducted (http://www.expasy.org/tools/). But these results revealed that there were no differences among them.

Gene expression signatures in spleen tissue and C. idella kidney (CIK) cells. Intraspecific allelic
variation may bring phenotypic variation by influencing the gene expression, including the possibility of hybrid vigour as beneficial traits that are exploited in animal and crop breeding 2, 21 . To investigate mRNA expression levels of genes corresponding to resistant/susceptible SNPs, we selected 6 genes with validated genetic variation to examine their mRNA expression levels in spleen tissue and CIK cells. The results showed that their expression patterns were accordant between RNA-Seq and qRT-PCR. Meanwhile, mRNA expression profiles were similar between individuals and cell lines except for Hnrpa ( Fig. 7A and B). The expression level of Hnrpa was 2-fold higher in resistant groups than that in susceptible groups and was consistent with transcriptome data in individuals ( Fig. 7A and C), while that was nearly 2-fold higher in susceptible groups than that in resistant groups and was consistent with transcriptome data in cell lines ( Fig. 7B and D). Yes, Chsg, C7N1 and Napi-llb2 were highly expressed in susceptible individuals and cell lines. Yes and Napi-llb2 expressions were nearly 2-fold higher in susceptible groups than those in resistant groups in cell lines, and they were 3-fold higher and 2-fold higher in susceptible groups than those in resistant groups in transcriptome of cell lines respectively ( Fig. 7B and D). The expression level of Gsto was 4-fold higher and 5-fold higher in resistant groups than that in susceptible groups in individuals and cell lines respectively ( Fig. 7A and B).

Discussion
In recent several years, the rapid development of high-throughput sequencing technology boosts the deep and efficient probing of transcriptomes and genomes 22,23 . It ensures a sufficient resource for SNP discovery. SNP diversity is important source for genetic diversity, molecular evolution and disease resistance. Some researchers primarily focus on non-synonymous coding SNPs, because those SNPs might influence the protein activity directly. However, human GWAS show that the synonymous SNPs play important roles as well as non-synonymous coding SNPs 24 . In this study, individuals and cell lines were challenged with GCRV and divided into resistant and susceptible groups according to their antiviral ability, and their mRNAs were sequenced to identify the different SNPs. In order to generate more SNP information associated with resistance/susceptibility to GCRV, integrative analysis was employed to explore the conserved SNPs. Herein, the purpose of this study is to develop a great many of convinced and specific SNPs for the selective breeding of GCRV resistant varieties through transcriptomes in individuals and cell lines. It is notable that substitution is conserved in the process of evolution 25 . The ratio of transition to transversion was about 2.0 in transcriptomes of individuals and cell lines, similar to the results of other SNP studies 26 . Meanwhile, the ratio of A/T to G/C were about 7 and 6 in individuals and cell lines separately, which are similar with that in higher vertebrates, for example, it is about 5 in human 27 . These results may indicate that the preferences of base mutations result from their molecular structures and are conserved. The significant differences of SNPs and InDels in individuals (between female and male) have been reported in the C. idella genome 1 . However, SNPs and InDels associated with resistance/susceptibility to GCRV were not reported by omics sequencing till now. Population differentiation has also been observed in Streblospio benedicti 28 . Abundant genetic diversity in wild C. idella germplasm resources is observed and the breeding of improved C. idella varieties has a good genetic basis 2 .
Owing to the read depth play important roles in prediction accuracy of SNPs 18 . One advantage of Illumina sequencing platform is the higher read depth comparing to 454 sequencing platform 29 , which could ensure that most of expected SNPs in the sequenced population could be detected 30 . It's worth noting that SNPs with much higher read depth should be excluded since too high read depth might be caused by paralogous sequence variants 31 . In this study, read depth of SNP positions between 10 and 59 accounted for majority, meanwhile, all read    depth did not surpass 300. These were enough to ensure the accuracy of predicted SNPs. What's more, genes with single SNP were more common, which increases the coverage of SNPs associated with resistance/susceptibility to GCRV in C. idella genome. In view of the whole genome assembly, the exact assessment pattern of the SNP distribution in the genome is possible. For this reason, when these genes containing SNPs associated with resistance/ susceptibility to GCRV were plotted to the C. idella genome by BLAST analysis, they had a good coverage of all 24 chromosomes. Some SNPs were not plotted to the corresponding chromosomes due to the gaps in genome sequence, which is acceptable at the genomic scale. These sequences and SNPs will provide novel materials for genome sequence amendment and facilitate the studies on selective breeding and immune mechanisms.
In this study, we identified many differential genes between resistant and susceptible population in individuals. To better understand the molecular functions of these candidate genes, we performed GO classification and KEGG pathway analyses. The ratio of differentially expressed sequences to the total sequences of corresponding GO categories or KEGG pathways were regarded as the major criteria for enrichment assessment 32,33 . GO subcategories or KEGG pathways with high ratio of SNPs are the major concerns. Our results revealed that GO subcategory with the highest SNP enrichment ratio was "binding" in resistant and susceptible groups. GO analysis also showed that some cellular components tended to be less polymorphic. Whereas KEGG pathway analysis showed that some pathways tended to be more polymorphic. Because disease-related mutations are unequally distributed throughout protein sequences, having a higher occurrence in structurally/functionally important sites, we can expect the number of localization mutations to be higher than that of the calculated. Localization mutations are rare events, but they should be taken into account when predicting consequences of mutations 34 . Biological systems are defined by their components and the interactions among these components. Likewise, mutations can affect the components and their interactions. Mutations that alter interactions are most likely to be detrimental 35,36 . These observations should be explored and verified in future studies.
A higher number of short-read alignments at regions of interest may be helpful in more precisely resolving the real allele frequency of mutant allele in bulked DNA 37 . Therefore, genotyping-by-sequencing (GBS) may be employed for further gene fine-mapping and allele analysis 38 . All the high-quality sequence reads were aligned to the C. idella whole genome 1 , with the objective of SNP identification. In this work, we picked up 22 SNPs associated with resistance/susceptibility to GCRV infection for verification and 9 SNPs were validated by PCR-RFLP. With SHEsis genotyping and association analysis in 30 resistant and 30 susceptible individuals, the mutation (4807309 A/G) in C7N1 gene was significantly associated with resistance/susceptibility to GCRV infection in individuals. Our study preliminarily showed that heterozygous genotype A/G was significantly resistant to GCRV infection than homozygous genotype A/A and G/G (P < 0.05). Furthermore, comparative genomic analysis showed that the mutation site (4807309 A/G) in C7N1 gene corresponds to A and G in D. rerio (resistant to GCRV) and G. rarus (susceptible to GCRV) respectively. These results may suggest that the base G was associated with susceptibility to GCRV and the heterozygous genotype can improve the ability of resistant to GCRV in C. idella. Similarly, heterozygous SNP variation can contribute to increase latex yield in the hybrid 21 . The molecular functions of C7N1 have not been studied in fish and even higher vertebrates. The relationship between above significant SNPs and resistance/susceptibility to GCRV was preliminarily verified by an independent infection experiment, these significant SNPs can be considered as candidate genetic markers and further verification needs to be done in other natural populations in the future. In the present study, SNPs were identified at the antiviral transcriptome level, which will enable us to further understand the roles of SNPs in antiviral responses.
A powerful feature of transcriptome is fully examining changes of gene expression levels among individuals or populations. There has long been a realization that gene expression differences play vital roles in species differentiation and population adaptation 39 . Several studies on closely related species indicate that there is a genetic basis for differences in transcript levels 40 , which could lead to adaptive divergence in the wild 41,42 . In this study, Hnrpa has different expression patterns between individuals and cell lines, which may indicate the difference in antiviral immune mechanism between in vivo and in vitro. Quantitative estimate of gene expression can be associated with change in nucleotide sequence 43 . Most SNPs controlling gene expression occur outside the coding regions of genes, so finding relationship between SNP genotype and expression level can provide an indirect link between SNP and phenotype 44 . Other methods can furthermore be used to study regulatory network changes by analysing co-expression patterns and associating with nucleotide changes and phenotypic traits 45 . The superiority of this approach is that experiments on natural selection for gene expression differences can now be monitored in a more effortless way than that in the past. Finally, cross-species comparisons of transcriptomes have recently shown promise for conservation genetics of endangered animals 46 and also for enhanced understanding of the fundamental principles of population genomics 47 , allowing us to potentially predict the responses of natural populations to future environmental perturbations.
SNP as molecular marker plays vital roles in animal breeding. As an important economic fish with long growing and breeding cycle, conventional breeding in C. idella is labor-intensive and time-consuming 2, 48 . Therefore, it is necessary for breeders to use an effective selection method to increase breeding efficiency as opposed to the traditional pure phenotype-based selection process. Natural germplasm resources of wild C. idella were genetically diverse, which provides the basis for constructing basal populations for QTL and GWAS analysis 2 . This study is an important step towards the generation of SNPs in specific feature and provides precious SNP resource for the selective breeding of resistant varieties. In addition, our surveys provide valuable information that will facilitate the studies on genomic variation underlying traits of interest in fish, including immune responses, regulatory mechanisms and environmental adaptability.

Statement. All experiments were approved by the Animal Care Committee of Huazhong Agricultural
University. The Administration of Affairs Concerning Animal Experimentation Guidelines stated approval from the Science and Technology Bureau of China. The methods were carried out in accordance with the approved guidelines. Total 200 experimental individuals were fed in four 300-liter aquaria with suitable illumination, water temperature, dissolved oxygen content, and adequate forage in the Huazhong Agricultural University, China. Approval from the Department of Wildlife Administration is not required for the experiments conducted in this paper. All surgery was performed to minimize suffering by using 3-Aminobenzoic acid ethyl ester methanesulfonate (MS-222) anesthesia.

Transcriptome data collection.
For the viral challenge, healthy C. idella were infected with GCRV (097 strain, suspended in phosphate-buffered saline (PBS)). In this process, the moribund fish with hemorrhagic symptom between 24 and 72 hours post challenge were regard as susceptible individuals and the surviving fish after 10 days post challenge were resistant individuals. The spleen and head-kidney tissues from 12 resistant and 12 susceptible individuals were collected and their RNA-Seq libraries were constructed and sequenced, which were divided into 4 groups: spleen tissue from susceptible fish (SS1) and resistant fish (SR2), head-kidney tissue from susceptible fish (KS3) and resistant fish (KR4). These transcriptome data were deposited in the NCBI with the Sequence Read Archive (SRA) accession number of SRP049081 8 . Meanwhile, to obtain monoclonal cells, CIK cells were digested and then filtered twice with a 150 μM nylon mesh. 100 μL Dulbecco's modified Eagle's medium (DMEM) supplemented with 7% fetal bovine serum (FBS), 100 U/mL of penicillin and streptomycin sulfate as well as 5 μL/mL insulin (Gibco, USA) were added to each well of the 96-well culture plates beforehand, the single cell was ultimately instilled utilizing the BD FACSAria ™ III Cell Sorter (USA) and subcultured up to being the monoclonal cells with a cell density of 5 × 10 4 /well in 96-well plates. Identified as resistant, susceptible or ambiguous by three strategies: (1) CPE analysis. (2) Cell proliferation assay. (3) Antiviral activity assay. These samples were divided into control (C1, unsorted), resistant (R2) and susceptible (S3) groups. The RNA-Seq data were obtained by Illumina HiSeq2500 sequencing technology and deposited in the NCBI Gene Expression Omnibus with accession number of GSE87414. All reads were filtered with NGS QC Toolkit for further analysis.
SNP and InDel detection. Short reads of two transcriptomes were separately mapped to C. idella genome using BWA version 0.5.9 (http://bio-bwa.sourceforge.net/) with the default settings except for no gap tolerance. The data from individuals (SS1, SR2, KS3 and KR4) and cell lines (C1, R2 and S3) were aligned for SNP and InDel identification. One reliable and frequently used software program, SAMtools, was independently applied for the identification of SNP and InDel with C. idella genome as reference sequences. InDel represents single base insertion and deletion. The software package SAMtools (http://samtools.sourceforge.net/) was used to convert sequence alignment/map (SAM) file to sort binary alignment/map (BAM) file. And the command dump was employed to remove duplicates. SNPs were further investigated by BCFtools. Those with total read depth more than 20 and mutation read depth over 10 were identified as high quality SNPs or InDels. Since accuracy of SNP prediction is dependent on sequence coverage 30 , we combined two transcriptomes to improve the prediction accuracy.

Statistical analysis of SNP information.
The ratio of mapped reads in each dataset was calculated by flagstaff command in SAMtools software 8 . SNP and InDel ratios were obtained by each type of DNA substitution and SNP read depth in the result file in SAMtools. SNPs involved in resistance/susceptibility to GCRV were analysed using the tool (venny 2.1) of BioinfoGP (http://bioinfogp.cnb.csic.es/index.html) in individuals and cell lines, respectively. Meanwhile, the number of these SNPs in genes (including exon, intron and flanking region) was analysed. And the BLAST was employed to analyse the chromosome location of genes with resistance/susceptibility SNP in individuals and cell lines.
Functional annotation of genes containing SNPs. The genes possessing SNPs associated with resistance/susceptibility to GCRV were annotated using NCBI non-redundant database by BLASTX (e-value < 0.00001) 49 . After that, BLAST2GO software were employed to allot the genes with GO terms of biological process, molecular function and cellular component 33 . Subsequently, annotated information was imported into BGI WEGO program (http://wego.genomics.org.cn) in WEGO native format to plot GO annotation results. KEGG pathways were assigned to genes containing SNP associated with resistance/susceptibility to GCRV by the online KEGG Automatic Annotation Server (KAAS) (http://www.genome.jp/tools/kaas/). KEGG Orthology (KO) assignment was applied by Bi-directional Best Hit (BBH) method.

GCRV infection, SNP verification, genotyping and association analysis.
For verification experiment, grass carp (approximate body length of 10 cm) were collected from three fish farms (Hubei province, China) where no hemorrhagic disease of C. idella was found in recent years, which were different population from RNA-Seq. Fish were injected with GCRV (097 strain, suspended in PBS) at a dose of 3.63 × 10 7 TCID 50 /g as previous report 6 . The moribund fish were sampled between 24 h to 72 h post injection and the surviving fish after 10 days post-injection were serenely sacrificed. Spleen tissue was collected and kept at −80 °C until DNA and RNA isolation. These samples were divided into resistant and susceptible groups. DNA and RNA were prepared for SNP verification and gene expression analyses.
In order to validate the accuracy of SNPs screened from transcriptome, 22 SNPs associated with resistance/ susceptibility to GCRV in individuals were investigated by PCR-RFLP, tetra-primer ARMS-PCR 50 and sequencing all samples. The primers were designed to amplify the target sequence with fragment length of 100 ~300 bp containing polymorphism sites and synthesized in TsingKe Biotech (Wuhan, China) ( Table 2). 30 resistant and 30 susceptible DNA samples were used as templates for SNP verification, genotyping and association analysis. 5 μl PCR products were electrophoresed on 1.0% agarose gel for quality measuring. Another 5 μl products were digested with corresponding restriction enzyme (Supplementary Table S3) according to the protocol and the mixtures were examined by electrophoresis on 2.0% or 4.0% agarose gel. The confirmed SNPs were selected to analyse their relationship with resistance/susceptibility to GCRV.
Single site analysis and haplotype analysis of SHEsisPlus online version -beta in SHEsis (http://analysis.bio-x.cn) software were employed to estimate allele and genotype frequencies and analyse their relationship with resistance/ susceptibility to GCRV. The logistic regression model was performed to verify the interaction by using SPSS 16.0 software, the odds ratio (OR) and 95% confidence interval (95% CI) were calculated. Propensity covariate adjustment was also performed to verify results. P-value less than 0.05 was considered to be statistically significant. HWE for genotypic frequencies was evaluated by the goodness-of-fit χ 2 -test for each genotyped SNP.

Confirmation of gene expression profiles by qRT-PCR.
To confirm the expression profiles of genes containing resistant/susceptible SNPs, qRT-PCR was performed using a Roche LightCycler ® 480 system. Total RNA were extracted from spleen tissues and CIK cells with TRIzol Reagent (TaKaRa, Japan). 18S rRNA and EF1α genes were employed as reference genes in individuals and cell lines separately. qRT-PCR and data analysis were performed according to the protocol and method as described previously 51 . Fold changes of gene expression levels were relative to corresponding susceptible samples (defined as 1). P-value less than 0.05 was accepted as significant difference (*P < 0.05; **P < 0.01; ***P < 0.001).