Transcriptome profiling of interaction effects of soybean cyst nematodes and soybean aphids on soybean

Soybean aphid (Aphis glycines; SBA) and soybean cyst nematode (Heterodera glycines; SCN) are two major pests of soybean (Glycine max) in the United States of America. This study aims to characterize three-way interactions among soybean, SBA, and SCN using both demographic and genetic datasets. SCN-resistant and SCN-susceptible soybean cultivars with a combination of soybean aphids (biotype 1) and SCN (HG type 0) in a randomized complete block design (RCBD) with six blocks were used to evaluate the three-way interactions in a greenhouse setup. Treatments receiving SCN were infested at planting with 2000 nematode eggs, and the treatments with soybean aphids were infested at second trifoliate growth stage (V2) with 15 soybean aphids. The whole roots were sampled from plants at 5 and 30 days post SBA infestation for RNA sequencing using Illumina Hiseq. 3000. The data comprises of 47 libraries that are useful for further analyses of important genes, which are involved in interaction effects of SBA and SCN on soybean.


Background & Summary
Soybean [Glycine max (L.) Merr.], considered as the source of high-quality sugar, protein, and oil, is one of the most important crops worldwide 1 . Soybean aphid (SBA), Aphis glycines Matsumura (Hemiptera: Aphididae) and soybean cyst nematode (SCN), Heterodera glycines Ichinohe (Tylenchida: Heteroderidae) are the two most economically important pests of soybean in the Midwestern United States 2,3 . Soybean aphid, an aboveground herbivore (pest), feeds on phloem sap whereas SCN, a belowground pest, infests the soybean roots. These infestations can co-occur and amplify further reduction in soybean yield 4,5 . In the United States, annual economic losses due to the SBA and SCN have been estimated to be approximately $4 billion and $1.3 billion, respectively [6][7][8] . To counteract these devastating pests, farmers rely on various management strategies that include host plant resistance and chemical measures [9][10][11] . For SBA, dependency on the use of chemical management has resulted in pyrethroid resistance in SBA populations in Iowa, Minnesota, North Dakota and South Dakota as well as the impacts on non-target beneficial organisms 12,13 . In addition, the long-term use of SCN resistance has resulted in SCN populations that are capable of overcoming the resistance genes (i.e., HG types) 14 . Although host plant resistance has not been implemented on a large scale for SBA management, multiple virulent SBA biotypes have been discovered in the U.S. Virulent SBA biotypes and SCN races threaten the sustainability of host plant resistance for these two pests [14][15][16][17] . Thus, genetic data generated from greenhouse experiments on the effects of SBA and SCN on soybean cultivars are of tremendous importance for unraveling resistance genes and regulatory networks that can potentially be used for developing durable resistance in soybean to both pests.
Although above-and below-ground herbivores are spatially segregated, they both share the host plant through systemic tissues and are able to influence each other 18 . Previously, the influence of SCN on soybean aphid infestation or vice versa has been studied on soybean using demographic datasets 4,5,[19][20][21] 5 demonstrated the relationship between the aboveground feeding of soybean aphid and belowground reproduction of SCN in the SCN resistant Dekalb 27-52 (PI 88788 derived) cultivar, and SCN susceptible Kenwood 94 cultivar. In 30 days, both SCN eggs and the number of females increased by 33% in SCN-resistant cultivar and reduced by 50% in the SCN-susceptible cultivar. In 60 days, the number of SCN eggs and female count remained unaffected in the resistant cultivar but decreased in the susceptible cultivar. The authors concluded that soybean aphid feeding improved the quality of soybean as a host for SCN but this result was varied significantly with the cultivar and length of the experiment. Apart from these demographic studies, molecular characterization of SBA-SCN-soybean interaction has not been reported previously.
RNA-Sequencing (RNA-Seq) has been a standard tool for studying qualitative and quantitative gene expression assays that provide information on transcript abundance with their variation 22,23 . The major objective of this study was to evaluate differential gene expression of soybean plants that are infested with SCN in the presence or absence of SBA. To achieve the objective, we conducted experiments on two genotypes of G. max [H. glycines susceptible Williams 82 (PI518671), and H. glycines resistant MN1806CN] that were infested with biotype 1 SBA and HG Type 0 SCN for RNA-sequencing. More than 1.1 billion reads (61.4 GB) of transcriptomic data were obtained from 47 samples derived from the experiment using whole roots of G. max. An overview of the experimental design, methods and transcriptome analysis pipeline is shown in Fig. 1a-c, respectively. A comprehensive understanding of these transcriptome data will enhance our understanding of interactions among soybean, SBA, and SCN at the molecular level. The rapid advancement of bioinformatics tools is facilitating the search of candidate genes and their function that might play a crucial role in various pathways for host resistance against both herbivores.

Methods
Plant material, soybean aphid, and SCN. Two cultivars of soybean-Williams 82 (PI518671) and MN1806CN were used in this experiment. Williams 82 is susceptible to both HG Type 0 (race 3) of the SCN and SBA. MN1806CN is resistant to HG Type 0 (race 3) of the SCN but susceptible to SBA. Soybean aphid biotype 1 populations were originally obtained from the Ohio State University and were reared on susceptible cultivar LD12-15838R at South Dakota State University. This biotype is defined by an avirulent response to all known SBA resistance (Rag) genes and was first identified in Illinois 24 . The SCN population used was HG type 0, which is defined by having less than 10% reproduction documented by studies of SCN resistance and is avirulent to all SCN resistance genes in soybean. experimental design and sample collection. A greenhouse experiment was designed using a randomized complete block design (RCBD) with eight treatments (four treatments per cultivars) with eight experimental units (plants) in six blocks. The treatments were factors of soybean genotype, SBA infestation, and SCN For this experiment, the soil-sand mixture was prepared by adding construction sand and clay soil including SCN infested clay soil in the ratio of 3:1. The 125 cc of the mixture was distributed in cone-tainers (diameter of 3.8 cm, a depth of 21 cm and a volume of 164 cc; Greenhouse Megastore, USA). For SCN included treatments, each cone-tainer received approximately 2,000 SCN eggs. The cone-tainers with three soybean seeds were arranged in a 2.0 U.S. gallon (7.57 liter) plastic buckets (Leaktite, USA) filled with construction sand (Quikrete, GA). www.nature.com/scientificdata www.nature.com/scientificdata/ These buckets were kept in a water bath for maintaining soil temperature between 26.7 °C and 28.9 °C to ensure the reproduction of SCN (i.e. ~30 days) 5 . The temperature of the water bathes were regularly monitored using thermometers. The plants were grown under 16:8 (L:D) in a greenhouse with a temperature of 28 °C and 45% relative humidity. The plants were thinned down to one plant per cone-tainer upon reaching the second vegetative growth stage (V2). The V2-staged plants with the SBA included treatments were infested with 15 mixed age (i.e., fourth instar nymphs and adults) biotype 1 SBA using a 000 fine tip paintbrush (Winsor & Newton, England). The SBA were applied on the abaxial surface of the first trifoliate of V2-staged plants. All plants in each bucket were covered with a large no-see-um mesh net (Quest Outfitters, Sarasota, FL) to prevent inter-bucket movement of aphids. After SBA infestation, soybean plants were regularly checked to confirm the successful establishment of soybean aphids. Soybean aphid populations were counted at 5, 15, and 30 days post infestation (dpi). For SBA only treatment, the populations on the two soybean varieties were not significantly different, indicating that both lines were susceptible to SBA. SCN eggs were sampled at 30 dpi. The whole roots were collected on 5 and 30 dpi by snap freezing in liquid Nitrogen and stored at −80 °C for further analysis. The 5 dpi and 30 dpi root samples treated with each treatment were collected from Water bath I and Water bath II, respectively, representing each plant from three blocks (three biological replicates). The SCN soil and SCN infested roots were used for SCN cysts collection (except root samples collected for transcriptomic study) and the soil was examined for SCN counts.
RNA extraction, library construction, and RNA sequencing. RNA was extracted from all samples representing three biological replicates of each treatment that constituted 24 samples collected at 5 and 30 dpi each. As the major foci of the project were to determine whether the gene expression differed between SCN resistant and SCN susceptible soybeans, and to evaluate the gene expression of soybeans that were dual infested with SCN and SBA, we selected two timepoints (5 and 30 dpi). We selected 30 dpi to observe gene expression of treatment effects on a single generation of SCN reproduction keeping 5 dpi as a reference in the presence or absence of SBA. Frozen root samples from each treatment were grounded in liquid nitrogen with a mortar and pestle to a fine powder followed by total RNA extraction using PureLink RNA mini kit (Invitrogen, USA). RNA samples were treated with TURBO TM DNase (Invitrogen, USA) to remove any DNA contamination following the manufacturer's instructions. Assessment of the isolated RNA integrity was performed by 1% agarose gel electrophoresis, and RNA concentration was measured by Nanodrop 2000 (Thermo Fisher Scientific, USA). The cDNA libraries were constructed using NEBNext Ultra II RNA library 96 single index prep kit and sequenced using Illumina HiSeq. 3000 (single read end utilizing 100 bases read length) at Iowa State University Sequencing Facilities.

Data Records
All sequence reads were deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (accession SRR8427366-SRR8427408) under Bioproject PRJNA514200 (Project ID: SRP178193) 33 ( Table 1). The raw transcript abundance counts for all the samples was deposited at the Gene Expression Omnibus (GEO) database, GSE125103 34   www.nature.com/scientificdata www.nature.com/scientificdata/ Technical Validation Quality control. Forty-eight RNA libraries were prepared and sequenced with the sequencing depth ranging from 9,847,269 to 57,520,568 reads (see Table 1). Sequencing of one of the replicates of control resistant cultivar (MN1806CN collected at 30 dpi) came in error. Therefore, total reads of more than 1.1 billion from 47 libraries were subjected to FastQC analysis, which helped determine the data quality using various quality metrics such as mean quality scores (Fig. 2a), per sequence quality scores (Fig. 2b), per sequence GC content (Fig. 2c), and sequence length distribution (Fig. 2d). Phred quality scores per-base for all samples were higher than 30 and GC www.nature.com/scientificdata www.nature.com/scientificdata/ content ranged from 43 to 45%, following a normal distribution. After trimming, more than 99% of the reads were retained as clean and good quality reads. Upon mapping these reads, a high mapping rate of 73.8% to 94.3% was obtained. Among these, 67.1% to 87.6% reads were uniquely mapped.
Assessment of transcriptomic data. The 43,122 genes passed the filter upon filtering with 0.5 CPM in at least one sample. To reduce the mean dependent variance, the quantified transcript reads were transformed as shown in Fig. 3a-c and available in Figshare 32 (the transformed transcript abundance count for all the samples). The transformed data were subjected to hierarchical clustering and principal component analysis (PCA) followed by visualization using t-SNE map 35 in order to assess the global transcriptomic data. The hierarchical clustering of top 6000 variable genes based on two time points (5 dpi and 30 dpi) showed distinct clustering except for some samples [ Fig. 4a; Figshare 32 (the hierarchical clustering of top 6,000 variable genes)]. Figure 4b represents the standard deviation (SD) distribution of the top variable 6,000 genes. Figure 4c represents the Pearson's correlation between the samples using the top 75% genes and available in Figshare 32 (the correlation between the samples using the top 75% genes). The t-SNE map revealed four clusters (A, B, C, and D) for 6,000 variable genes [ Fig. 4d; Figshare 32 (the four clusters for 6,000 variable genes)]. Regarding the PCA, PC1 is correlated with time (P = 1.16e-06) with 28% variance, and PC2 is correlated with Treatment (P = 2.02e-08) with 15% variance (Fig. 4e).

Usage Notes
This data represents the first publicly available transcriptomic data for soybean roots from the three-way interaction among G. max, H. glycines, and A. glycines. The raw compressed fastq files (fastq.gz) were submitted to the National Center for Biotechnology Information (NCBI) and are available with accession numbers (SRR8427366-SRR8427408; http://identifiers.org/ncbi/insdc.sra:SRP178193) 33 . The data could be retrieved using fastq-dump tool SRA toolkit (https://www.ncbi.nlm.nih.gov/sra). There are various tools such as Trimmomatic 36 , cutadapt 37 , Fastq_clean 38 that could be used for trimming purpose. Apart from the Salmon tool for the alignment and quantification of reads, other tools such as STAR aligner (https://github.com/alexdobin/STAR), Bowtie 39 , HISAT2 40 , TopHat2 41 , Cufflinks with HTSeq can be employed, which requires reference genome of G. max and annotation file in gff3 format. For differential gene expression analysis, EdgeR 42 and limma 43 could be used instead of DEseq. 2 31 . Apart from the standalone tools like iDEP 30 , Galaxy (https://usegalaxy.org), CyVerse (http://www.cyverse.org), MeV (http://mev.tm4.org) 44 , and integrated RNA-seq interpretation system for gene expression data analysis tool (http://bmbl.sdstate.edu/IRIS/) 45 could also be used for both analysis and visualization of RNA-seq data.

Code Availability
Codes used for RNA-seq data processing in the current study are available as supplementary material in Figshare at: https://doi.org/10.6084/m9.figshare.7755152.v3 32 (Codes used for RNA-seq data processing).