De novo transcriptome assemblies of four accessions of the metal hyperaccumulator plant Noccaea caerulescens

Blande, Daniel; Halimaa, Pauliina; Tervahauta, Arja I; Aarts, Mark G.M.; Kärenlampi, Sirpa O

doi:10.1038/sdata.2016.131

Download PDF

Data Descriptor
Open access
Published: 31 January 2017

De novo transcriptome assemblies of four accessions of the metal hyperaccumulator plant Noccaea caerulescens

Daniel Blande¹,
Pauliina Halimaa¹,
Arja I Tervahauta¹,
Mark G.M. Aarts² &
…
Sirpa O Kärenlampi¹

Scientific Data volume 4, Article number: 160131 (2017) Cite this article

4385 Accesses
26 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Noccaea caerulescens of the Brassicaceae family has become the key model plant among the metal hyperaccumulator plants. Populations/accessions of N. caerulescens from geographic locations with different soil metal concentrations differ in their ability to hyperaccumulate and hypertolerate metals. Comparison of transcriptomes in several accessions provides candidates for detailed exploration of the mechanisms of metal accumulation and tolerance and local adaptation. This can have implications in the development of plants for phytoremediation and improved mineral nutrition. Transcriptomes from root and shoot tissues of four N. caerulescens accessions with contrasting Zn, Cd and Ni hyperaccumulation and tolerance traits were sequenced with Illumina Hiseq2000. Transcriptomes were assembled using the Trinity de novo assembler and were annotated and the protein sequences predicted. The comparison against the BUSCO plant early release dataset indicated high-quality assemblies. The predicted protein sequences have been clustered into ortholog groups with closely related species. The data serve as important reference sequences in whole transcriptome studies, in analyses of genetic differences between the accessions and other species, and for primer design.

Design Type(s)	replicate design • strain comparison design • organism part comparison design
Measurement Type(s)	transcription profiling assay
Technology Type(s)	RNA sequencing
Factor Type(s)	selectively maintained organism
Sample Characteristic(s)	Noccaea caerulescens • shoot system • root

Machine-accessible metadata file describing the reported data (ISA-Tab format)

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Deep learning the cis-regulatory code for gene expression in selected model plants

Article Open access 25 April 2024

Long noncoding RNAs underlie multiple domestication traits and leafhopper resistance in soybean

Article 29 April 2024

Background & Summary

Noccaea caerulescens, also known as Alpine pennycress, is a metal hyperaccumulating plant of the Brassicaceae family, previously classified as Thlaspi caerulescens¹. Hyperaccumulation is a very rare characteristic in plants, with around 500 species identified². Metal hyperaccumulation was first defined in relation to Ni hyperaccumulation³. A Ni hyperaccumulator was defined as a plant that could accumulate Ni in shoots at levels >1000 μg g⁻¹ of dry weight. Hyperaccumulation has been extended to other metals with metal-specific thresholds. For Zn, levels of 3000 μg g⁻¹ are used and for Cd 100 μg g⁻¹ (ref. 2). Plant hypertolerance refers to plants that are able to grow under high metal concentrations without showing symptoms of toxicity. Metallophytes, plants that occur on metal-enriched soils, can be obligate and require the presence of a particular metal, or facultative, which can grow with or without the metal present. Only a small subset of metallophytes are metal hyperaccumulators. Accessions of N. caerulescens are facultative hyperaccumulators of Ni, Zn and Cd, with Zn hyperaccumulation being a species-wide trait, and Ni and Cd hyperaccumulation population-level traits⁴. N. caerulescens is used as a model plant species for studies on heavy metal hyperaccumulation due to its small genome size and the high degree of variation in metal hypertolerance and hyperaccumulation profiles between different accessions^2,5,6.

Metal hyperaccumulating plants are of interest for several reasons. These include biofortification, where attempts are made to increase levels of nutrients in plants, e.g. Fe and Zn in staple crops^7,8; phytoremediation, where plants can be used to concentrate polluting or contaminating metals, which can then be removed from the environment⁹ and reducing levels of toxic metals in plants, e.g. Cd in rice¹⁰.

Here we provide transcriptomes of four commonly studied accessions for which detailed Zn, Ni and Cd accumulation and tolerance data are available⁶. Two calamine accessions, La Calamine (LC) and Ganges (GA), are much more tolerant to Zn and Cd than the nonmetallicolous accession Lellingen (LE) and the serpentine accession Monte Prinzera (MP). Furthermore, the GA accession is a Cd hyperaccumulator, whereas MP is sensitive to Cd but hyperaccumulates Ni. The LE accession is least tolerant to Zn, but also has the most efficient Zn translocation capacity among the four accessions. Overall, the accessions show metal-specific root to shoot translocation rates. These mechanisms may be related to gene expression level¹¹, but variation in hyperaccumulation or tolerance may also originate from differences in the protein sequences by, e.g., leading to different metal specificity of a metal transporter protein.

Sequence information available for N. caerulescens includes 454-sequencing of the transcriptome of the GA accession¹² yielding 23725 sequences, and an EST library of 4289 sequences from the LC accession¹³. Genome sequencing of the GA accession is underway. SOLiD sequencing of root transcriptomes of GA, LC and MP accessions has been utilised for gene expression analysis¹¹ but not for transcriptome assembly and sequence analysis.

The present data consist of assembled transcriptome sequences of the roots and shoots of the N. caerulescens accessions GA, LC, LE and MP grown in hydroponics under optimal Zn and Ni exposure. The transcriptomes have been annotated and clustered into ortholog groups with other closely related plant species. The transcriptome data can be used for genome, whole transcriptome and gene level studies, serving as a reference sequence, and also providing a sequence resource for primer design. The ortholog clustering will support comparative gene level studies for linking protein sequence variation to phenotypes. Assembly and release of annotated transcriptomes from Illumina data for the four accessions will serve as a valuable sequence resource for future studies.

Methods

Experimental design

Seeds of the N. caerulescens accessions GA, LC, MP and LE were germinated in soil, and plants with eight to ten leaves were rinsed and transferred to 10-l containers filled with half-strength Hoagland solution (modified from Schat et al.¹⁴): 3 mM KNO₃, 2 mM Ca(NO₃)₂, 1 mM NH₄H₂PO₄, 0.5 mM MgSO₄, 1 μM KCl, 25 μM H₃BO₃, 2 μM MnSO₄, 0.1 μM CuSO₄, 0.1 μM (NH₄)₆Mo₇O₂₄, 20 μM Fe(Na)EDTA. For GA and LC, 10 μM ZnSO₄, and for MP and LE 2 μM ZnSO₄ was added. In addition, 10 μM NiSO₄ was added to MP. MES (2 mM) was added and the pH was adjusted to 5.5 with KOH. The plants were grown in three climate chambers: 20/15 °C day/night, 250 μmol/m^-2/s, 75% RH, light period 14 h per day. Continuously aerated solutions were changed twice a week. After three weeks, twelve plants of uniform appearance (with approx. 14–16 leaves) were pooled from each chamber to obtain three independent biological replicates (roots and shoots separately), frozen in liquid N₂ and stored at −80 °C.

Generation of the datasets

RNA was extracted using RNeasy Plant Mini kit (Qiagen). Adequate RNA quality and quantity of RNA samples was ensured by Bioanalyzer (Agilent) analysis. Library preparation and sequencing were performed at the Weill Cornell Medical College Genomics Resources Core Facility (NY, USA). RNA libraries were prepared using Illumina TruSeq RNA-Seq Sample Prep Kit following manufacturer's instructions. Libraries were multiplexed, pooled and sequenced using the Paired End Clustering protocol with 51x2 cycles sequencing on four lanes of Illumina HiSeq2000 (Data Citation 1).

Processing of the datasets

The overall process for transcriptome assembly, annotation, ortholog clustering and validation is summarised in Fig. 1. After checking the technical quality of the sequencing with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), root and shoot samples for each accession were combined and assembled using the Trinity¹⁵ de novo assembly program at kmer values of 25 and 32. Quality of the assemblies was assessed using BUSCO (ref. 16) (Benchmarking Universal Single-Copy Orthologs) and TransRate¹⁷. For MP accession with a higher number of reads, subsampling was performed to 105 Million reads using seqtk (https://github.com/lh3/seqtk.git). This step was performed as it has previously been reported that there is an optimum coverage for de novo transcriptome assembly¹⁸. Assembly for MP accession was conducted on both subsampled and complete sets of reads.

**Figure 1: Overview of data processing.**

Quality of the assemblies was assessed using TransRate and BUSCO. The Kmer 32 assemblies and the MP subsampled kmer 32 assembly were chosen for annotation and ortholog identification. These assemblies are available in the NCBI Transcriptome Shotgun Assembly Sequence Database (Data Citations 2–5). Annotation for each assembly was conducted using the Trinotate program. Orthologs were identified using OrthoFinder. As a final step in the pipeline, each assembly was filtered to remove sequences that did not have a top blast hit to viridiplantae (green plant) sequences. After filtering, the BUSCO assessment was performed on the filtered datasets to show whether or not the coverage was reduced.

De novo assembly

Reads for all samples (three biological replicates of both roots and leaves) from each accession were combined, and each accession was assembled separately using the Trinity v2.0.6 de novo transcriptome assembler¹⁵. The total number of reads assembled for each accession is shown in Table 1. The settings that were used for Trinity included quality and adapter trimming using Trimmomatic¹⁹. No path merging was set so that all sequences with small differences were included in the output. Other settings were kept at default values. Reads were assembled using kmer values of 25 (default) and 32. For the MP accession 219 million reads were sequenced compared to approximately 105 million for the GA, LC and LE accessions. Since it has previously been reported that there is an optimum sequencing depth for transcriptome assembly¹⁸, we also subsampled 105 million reads from MP using seqtk and assembled these at kmer values of 25 and 32.

Table 1 Raw number of reads for each accession.

Full size table

Assessment of assembly quality

The quality of each assembly was checked using TransRate to generate metrics for comparison. The reads generated during the assembly following trimming were provided and used by TransRate to calculate mapping statistics. For the MP subsampled assembly, the complete read files (before subsampling) were used for the mapping. The protein set from Eutrema salsugineum²⁰ was downloaded from Phytozome 10.2 (ref. 21) and used for TransRate comparative metrics. Assemblies were compared against the BUSCO (ref. 16) plant early release dataset to calculate the extent of coverage (Table 2).

Table 2 Assembly quality metrics.

Full size table

Existing sequences for GA from a 454-sequencing experiment were obtained from the Transcriptome shotgun assembly database GASZ01000000 (ref. 12). These sequences were used for validation and to compare coverage of the assemblies. TransRate and BUSCO quality assessments were performed on this dataset. The highest TransRate scores were obtained for the kmer 32 assemblies and in the case of MP the kmer 32 assembly from sub sampled reads.

Annotation

The transcripts for each accession for the kmer 32 assemblies were annotated using the Trinotate^15,22–30 annotation pipeline following the method outlined at (http://trinotate.github.io/). Initially, the transcripts were searched against the custom UniProt and UniRef90 databases using blastx allowing one hit and with output in tabular format. No e-value cut-off was set. The expected protein translations were obtained using TransDecoder and then searched against UniProt and UniRef90 using blastp. The same blast parameters were used as for the blastx searches. The blast searches were loaded into the Trinotate.sqlite database that was obtained from the Trinity ftp site and an annotation report generated. An e-value of 1e-5 was used as the threshold for the blast results during the report generation.

OrthoFinder

Protein sequences from six other plant species were obtained to identify ortholog groups. Arabidopsis thaliana (ATH)³¹, Arabidopsis lyrata (ALY)³², Thellungiella parvula (TPA)³³, Brassica rapa (BRA)³⁴ and Capsella rubella (CRU)³⁷ protein sequences were downloaded from Plaza v 3.0 (ref. 38). Eutrema salsugineum (EUT)²⁰ sequences were downloaded from Phytozome 10.2 (ref. 21). OrthoFinder³⁷ was used to identify groups of orthologs between the species.

Filtering by top blast hit

As the annotated transcripts could still include non-plant sequences, all transcripts were also searched against the NCBI non-redundant protein sequences (nr) database using blastx and nucleotide collection (nt) database using blastn, both with an e-value cut-off of 1e-5. The blast output format was set as -outfmt ‘6 qseqid staxids sseqid’ to output the taxonomic information for each hit. A python script available in Data Citation 6 was used to parse the taxonomic group information from the NCBI Taxonomy database. Transcripts with a top blast hit to Viridiplantae (‘green plants’) were retained. The fasta files were filtered using cdbfasta (https://sourceforge.net/projects/cdbfasta/) providing the ID of the transcripts to be retained. The BUSCO scores were calculated for the filtered transcript sets to ensure that the assembly coverage was not reduced by the filtering (Table 3). Filtered transcript sequences have been deposited in the NCBI Transcriptome Shotgun Assembly (TSA) sequence database (Data Citations 2–5).

Table 3 BUSCO quality metrics after assembly filtering.

Full size table

Multiple alignment

Ortholog groups that contained one or more N. caerulescens sequence after top blast hit filtering were retained. The sequences for each group were collected into a fasta file for each individual cluster. Sequences for each cluster were multiply aligned using muscle3.8.31 (ref. 38). Output was selected in fasta and html format. Fasta files and html alignment files for each cluster are available in Data Citation 6.

Code availability

The python code used to parse taxonomy information is available in Data Citation 6.

Data Records

The raw sequence data (Data Citation 1 and Table 4) was deposited in the NCBI Sequence Read Archive. The dataset contains 24 records. For each accession (GA, LC, LE and MP) three replicates were sequenced for root and shoot samples. Each replicate was comprised of 12 plants.

Table 4 Description of samples that have been submitted to the NCBI Sequence Read Archive.

Full size table

The assemblies for each accession at a kmer size of 32 and with subsampled reads for MP (Data Citations 2–5 and Table 5) were deposited in the NCBI Transcriptome Shotgun Assembly Sequence Database.

Table 5 Description of the Accession numbers for the sequences that have been submitted to the NCBI Transcriptome Shotgun Assembly Sequence Database.

Full size table

Full annotation information for the assemblies contained in Excel files and fasta files of ortholog groups (Data Citation 6) are available on Dryad.

Technical Validation

Computational Validation

Comparison against the BUSCO plant early release dataset identified that 90 to 91% of single-copy orthologs in the benchmarking dataset were present and complete in the assemblies before and after filtering Tables 2 and 3. TransRate statistics for both mapping and reference based metrics were also high with over 90% of reads mapping to the assemblies and over 80% classed as good mappings Table 2.

Manual validation of the assemblies

To manually validate the assembly results, complete protein sequences available in Genbank for the accessions were searched. There were results for GA and LC but no sequences were available for LE or MP. In total 14 sequences for GA corresponding to 9 genes and 10 sequences for LC corresponding to 8 genes were analysed. First, a search using blastp was conducted to obtain the matching sequence from the de novo assemblies. The sequences were then grouped, where more than one Genbank sequence matched to the same assembled sequence, and a multiple alignment was performed. The similarity of known sequences to the assembly and the length of the alignment was recorded (Table 6). From these sequences, 14 out of 17 had at least 98.9% identity. Sequences that were difficult to assemble from the transcriptome included genes that are known to have multiple copies, e.g. HMA4 (ref. 39)/IRT1 (ref. 40).

Table 6 Comparison of assembled sequences to sequences available in Genbank.

Full size table

Additional information

How to cite this article: Blande, D. et al. De novo transcriptome assemblies of four accessions of the metal hyperaccumulator plant Noccaea caerulescens. Sci. Data 4:160131 doi: 10.1038/sdata.2016.131 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Koch, M. A. & German, D. A. Taxonomy and systematics are key to biological information: Arabidopsis, Eutrema (Thellungiella), Noccaea and Schrenkiella (Brassicaceae) as examples. Frontiers in plant science 4, 267 (2013).
Article Google Scholar
van der Ent, A., Baker, A. J., Reeves, R. D., Pollard, A. J. & Schat, H. Hyperaccumulators of metal and metalloid trace elements: facts and fiction. Plant Soil 362, 319–334 (2013).
Article CAS Google Scholar
Brooks, R., Lee, J., Reeves, R. D. & Jaffré, T. Detection of nickeliferous rocks by analysis of herbarium specimens of indicator plants. J. Geochem. Explor. 7, 49–57 (1977).
Article CAS Google Scholar
Pollard, A. J., Reeves, R. D. & Baker, A. J. Facultative hyperaccumulation of heavy metals and metalloids. Plant Science 217, 8–17 (2014).
Article Google Scholar
Escarre, J., Lefebvre, C., Frerot, H., Mahieu, S. & Noret, N. Metal concentration and metal mass of metallicolous, non metallicolous and serpentine Noccaea caerulescens populations, cultivated in different growth media. Plant Soil 370, 197–221 (2013).
Article CAS Google Scholar
Assunção, A. G. et al. Differential metal‐specific tolerance and accumulation patterns among Thlaspi caerulescens populations originating from different soil types. New Phytol. 159, 411–419 (2003).
Article Google Scholar
White, P. J. & Broadley, M. R. Biofortifying crops with essential mineral elements. Trends Plant Sci. 10, 586–593 (2005).
Article Google Scholar
Ortiz-Monasterio, J. et al. Enhancing the mineral and vitamin content of wheat and maize through plant breeding. J. Cereal Sci. 46, 293–307 (2007).
Article CAS Google Scholar
Bhargava, A., Carmona, F. F., Bhargava, M. & Srivastava, S. Approaches for enhanced phytoextraction of heavy metals. J. Environ. Manage. 105, 103–120 (2012).
Article CAS Google Scholar
Yu, H., Wang, J., Fang, W., Yuan, J. & Yang, Z. Cadmium accumulation in different rice cultivars and screening for pollution-safe cultivars of rice. Sci. Total Environ. 370, 302–309 (2006).
Article CAS ADS Google Scholar
Halimaa, P. et al. Gene expression differences between Noccaea caerulescens ecotypes help to identify candidate genes for metal phytoremediation. Environ. Sci. Technol. 48, 3344–3353 (2014).
Article CAS ADS Google Scholar
Lin, Y., Severing, E. I., te Lintel Hekkert, B., Schijlen, E. & Aarts, M. G. M. A comprehensive set of transcript sequences of the heavy metal hyperaccumulator Noccaea caerulescens. Frontiers in plant science 5, 261 (2014).
PubMed PubMed Central Google Scholar
Rigola, D., Fiers, M., Vurro, E. & Aarts, M. G. M. The heavy metal hyperaccumulator Thlaspi caerulescens expresses many species-specific genes, as identified by comparative expressed sequence tag analysis. New Phytol. 170, 753–766 (2006).
Article CAS Google Scholar
Schat, H., Vooijs, R. & Kuiper, E. Identical major gene loci for heavy metal tolerances that have independently evolved in different local populations and subspecies of Silene vulgaris. Evolution Vol. 50, No. 5, 1888–1895 (1996).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article CAS Google Scholar
Smith-Unna, R., Boursnell, C., Patro, R., Hibberd, J. M. & Kelly, S. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res. 26, 1134–1144 (2016).
Article CAS Google Scholar
Francis, W. R. et al. A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genomics 14, 167-2164-14-167 (2013).
Article Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS Google Scholar
Yang, R. et al. The reference genome of the halophytic plant Eutrema salsugineum. Front Plant Sci 4, b10 (2013).
Google Scholar
Goodstein, D. M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).
Article CAS Google Scholar
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).
Article CAS Google Scholar
Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 32, D138–D141 (2004).
Article CAS Google Scholar
Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 8, 785–786 (2011).
Article CAS Google Scholar
Krogh, A., Larsson, B., Von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).
Article CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).
Article CAS Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS Google Scholar
Powell, S. et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289 (2012).
Article CAS Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Article CAS ADS Google Scholar
Kaul, S. et al. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
Article CAS Google Scholar
Hu, T. T. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43, 476–481 (2011).
Article Google Scholar
Dassanayake, M. et al. The genome of the extremophile crucifer Thellungiella parvula. Nat. Genet. 43, 913–918 (2011).
Article CAS Google Scholar
Wang, X. et al. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035–1039 (2011).
Article CAS Google Scholar
Slotte, T. et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45, 831–835 (2013).
Article CAS Google Scholar
Proost, S. et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res. 43, D974–D981 (2015).
Article CAS Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 1 (2015).
Article CAS Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS Google Scholar
Lochlainn, S. Ó et al. Tandem quadruplication of HMA4 in the zinc (Zn) and cadmium (Cd) hyperaccumulator Noccaea caerulescens. PLoS ONE 6, e17814 (2011).
Article CAS ADS Google Scholar
Plaza, S. et al. Expression and functional analysis of metal transporter genes in two contrasting ecotypes of the hyperaccumulator Thlaspi caerulescens. J. Exp. Bot. 58, 1717–1728 (2007).
Article CAS Google Scholar

Data Citations

NCBI Sequence Read Archive SRP077889 (2016)
Blande, D., Halimaa, P., Tervahauta, A. I., Aarts, M. G. M., & Kärenlampi, S. O GenBank GEVI00000000 (2016)
Blande, D., Halimaa, P., Tervahauta, A. I., Aarts, M. G. M., & Kärenlampi, S. O GenBank GEVK00000000 (2016)
Blande, D., Halimaa, P., Tervahauta, A. I., Aarts, M. G. M., & Kärenlampi, S. O GenBank GEVL00000000 (2016)
Blande, D., Halimaa, P., Tervahauta, A. I., Aarts, M. G. M., & Kärenlampi, S. O GenBank GEVM00000000 (2016)
Blande, D., Halimaa, P., Tervahauta, A. I., Aarts, M. G. M., & Kärenlampi, S. O Dryad https://doi.org/10.5061/dryad.380n3 (2016)

Download references

Acknowledgements

This work was financially supported by the Academy of Finland (Project Number 260552). The authors wish to acknowledge The University of Eastern Finland Bioinformatics Center, CSC-IT Center for Science, Finland and the Finnish Grid Infrastructure (FGI) for generous computational resources.

Author information

Authors and Affiliations

Department of Environmental and Biological Sciences, University of Eastern Finland, Kuopio, 70210, Finland
Daniel Blande, Pauliina Halimaa, Arja I Tervahauta & Sirpa O Kärenlampi
Wageningen University, Laboratory of Genetics, Wageningen, 6708 PB, The Netherlands
Mark G.M. Aarts

Authors

Daniel Blande
View author publications
You can also search for this author in PubMed Google Scholar
Pauliina Halimaa
View author publications
You can also search for this author in PubMed Google Scholar
Arja I Tervahauta
View author publications
You can also search for this author in PubMed Google Scholar
Mark G.M. Aarts
View author publications
You can also search for this author in PubMed Google Scholar
Sirpa O Kärenlampi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.B. performed assembly, annotation, alignments and computational analyses. P.H. and A.I.T. collected and prepared samples. P.H., A.I.T. and S.O.K. were involved in study design. All authors were involved in writing the manuscript.

Corresponding author

Correspondence to Daniel Blande.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

Reprints and permissions

About this article

Cite this article

Blande, D., Halimaa, P., Tervahauta, A. et al. De novo transcriptome assemblies of four accessions of the metal hyperaccumulator plant Noccaea caerulescens. Sci Data 4, 160131 (2017). https://doi.org/10.1038/sdata.2016.131

Download citation

Received: 06 September 2016
Accepted: 24 November 2016
Published: 31 January 2017
DOI: https://doi.org/10.1038/sdata.2016.131

This article is cited by

Hydroponics in physiological studies of trace element tolerance and accumulation in plants focussing on metallophytes and hyperaccumulator plants
- Antony van der Ent
- Peter M. Kopittke
- Rufus L. Chaney
Plant and Soil (2024)
Hybrid de novo transcriptome assembly of poinsettia (Euphorbia pulcherrima Willd. Ex Klotsch) bracts
- Vinicius Vilperte
- Calin Rares Lucaciu
- Thomas Debener
BMC Genomics (2019)
Defence transcriptome assembly and pathogenesis related gene family analysis in Pinus tecunumanii (low elevation)
- Erik A. Visser
- Jill L. Wegrzyn
- Sanushka Naidoo
BMC Genomics (2018)
The leaf transcriptome of fennel (Foeniculum vulgare Mill.) enables characterization of the t-anethole pathway and the discovery of microsatellites and single-nucleotide variants
- Fabio Palumbo
- Alessandro Vannozzi
- Gianni Barcaccia
Scientific Reports (2018)
An annotated transcriptome of highly inbred Thuja plicata (Cupressaceae) and its utility for gene discovery of terpenoid biosynthesis and conifer defense
- Tal J. Shalev
- Macaire M. S. Yuen
- Jörg Bohlmann
Tree Genetics & Genomes (2018)

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Experimental design

Generation of the datasets

Processing of the datasets

De novo assembly

Assessment of assembly quality

Annotation

OrthoFinder

Filtering by top blast hit

Multiple alignment

Code availability

Data Records

Technical Validation

Computational Validation

Manual validation of the assemblies

Additional information

References

References

Data Citations

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

ISA-Tab metadata

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links