Abstract
Climate change and population growth are putting increasing pressure on global food security. The development of high-yielding varieties for important crops such as wheat is crucial to meet these challenges. The basis for this is extensive exploitation of beneficial genetic variation resting in genebanks around the world. Selecting suitable donor genotypes from the vast number of wheat accessions stored in genebanks is a difficult task and depends critically on the density of information on the performance of individual accessions. Therefore, this study aimed to access phenotypic data from the Czech genebank, storing over 13,000 wheat accessions. We curated and analyzed data on heading date, plant height, and thousand grain weight for more than one-third of all available accessions regenerated across 70 years. The data underwent analysis using a linear mixed model, revealing high quality of curated data with heritability reaching 99%. The raw data, but also derived data such as the best linear unbiased estimations, are now available for the wheat collection of the Czech genebank for research and breeding.
Similar content being viewed by others
Background & Summary
With a worldwide cultivation area of nearly 220 million ha and a production of about 220 million tons, wheat is one of the most widely grown crops and provides about one-fifth of the calorie and protein intake of the world’s population1,2. Wheat supplies about 40% of the dietary intake of essential micronutrients such as zinc, iron, manganese, magnesium, and vitamins B and E for millions of people, who rely on a wheat-based diet3, is an important source of energy for livestock4, and is processed for various other purposes including fuel5,6. Considering the growth rate of human population of 1.2% per year, more efficient wheat varieties will be needed in the future to ensure food security.
An important component in breeding superior varieties is to utilize the variability hidden within collections of genetic resources stored in genebanks around the globe. However, selecting suitable accessions that have the potential to improve desired traits represents a tremendous challenge in this regard, especially considering that the pool of accessions in global genebanks includes hundreds of thousands of individuals and little information is available for individual accessions.
In the Czech Republic, the Genetic Resources Department was launched in 1951, just after the establishment of the Crop Research Institute (CRI). Germplasm collections of the main crops have been gathered and are maintained. Cereals and mainly wheat represent the prevailing share of stored germplasm. The groundbreaking event was the opening of the genebank in 1988, when regeneration cycles prolonged from 5 up to 30 years. The National Program for the Conservation and Use of Genetic Resources of Plants and Agrobiodiversity (NP) was launched in 1993 and led to standardization of processes related to storing and maintaining of genetic resources. Nowadays, the collections include 1,392 species, both cultivated species as well as crop-wild relatives, and 463 genera are represented in total. The total number of accessions maintained in the NP amounts to 56,789, of which wheat accessions accounts for about one fifth. The coordinator of NP is the Genebank Department, CRI Prague-Ruzyně.
The CRI genebank contains 12,598 publicly available accessions of wheat (Triticum spp.) recorded in the GRIN Czech documentation system (GRIN Czech, accessed 31 May 2024, https://grinczech.vurv.cz/gringlobal/query/query.aspx). During the period 1951–1988, all accessions were regenerated every 5 to 7 years. This interval was extended to 20–30 years after the introduction of climatized storage in the genebank. Data collected during the regeneration cycles are available only in the form of field notes, but most accessions were subjected to systematic experiments with the aim of characterization and evaluation according to the descriptor list for wheat7. The results from these usually three-year trials were averaged, converted to descriptor scale points, and uploaded to the information system EVIGEZ for the period 1980–2015 or GRIN Czech for the period 2015 and later. The raw data of these trials were archived as field notes and published in the form of summaries in the annual or final project reports since 1951. Since 1998, the raw evaluation data have been available in digital form. It is evident that a large amount of evaluation data has been generated from the trials since 1951, but it was available only in the form of mean values in the documentation system. Raw data from the trials were only accessible in person by reviewing the handwritten field books and in the CRI library. These data represent a source of information potentially valuable for wheat breeding, but have not been curated and were not available according to the F(indable) A(ccessible) I(nteroperable) R(eusable) principles for data publications8.
Our study relies on historical phenotypic data of spring and winter wheat accessions regenerated in Prague-Ruzyne since the 1950s. The main goal of this study was to curate and publish the historical phenotypic data of the CRI spring and winter wheat collections following the FAIR principles. We focused on three important agronomic traits: heading date (HD), plant height (PH), and thousand grain weight (TGW), which have been generated during the past 70 years.
Methods
Plant material
The wheat collection of the Czech genebank, harbored at the Crop Research Institute in Prague, comprises nearly 13,000 accessions within the genus Triticum, of which the majority correspond to Triticum aestivum (GRIN Czech 2024, entered with following parameters: Accessions Available From a Site; site_acronym: CZE122; Limit: 20,000, accession IDs: 01C01* and 01C02* for winter and spring wheat, respectively). Other cultivated species include the diploid T. monococcum, the tetraploids T. durum, T. dicoccum, T. polonicum, T. isphanicum, T. timopheevii, and T. carthlicum, and the hexaploids T. spelta, T. vavilovii, T. compactum, and T. petropavlovskyi. In addition, the collection includes wild species at the diploid and tetraploid levels: T. boeoticum, T. urartu, T. dicoccoides, and T. araraticum. The phenotypic data presented in this study included 4,534 accessions (1,065 spring wheat and 3,469 winter wheat) which corresponds to more than 1/3 of the entire collection.
Phenotyping protocol
The main task of genebanks is to conserve genetic material for future generations and provide it for current users, which entails regular regeneration and multiplication in field plots. Seed multiplication is required when (i) seed stocks are no longer sufficient, (ii) germination rates decrease below a critical threshold, (iii) extensive amounts of seeds are demanded by research, or (iv) new accessions are added to the genebank. During the regeneration process, morphological and agronomical traits are scored for the phenotypic comparison with previous regenerations cycles following strict quality guidelines. In case of doubts whether there were any morphological shifts or drifts during propagation, the voucher spike collection kept in the genebank can be used for comparison. Individual accessions were regenerated between 1951 and 2020 in Prague, Ruzyně (latitude 50° 5′ 10.3698″N, longitude 14° 16′ 49.926″E, 364 m.a.s.l., local soil type Orthic Luvisol, 8.5 °C average annual temperature, 510.5 mm average annual rainfall). Not all accessions were regenerated every year, resulting in a non-orthogonal structure of the data.
For seed propagations, spring wheat accessions were usually sown from February to April, while winter wheat accessions were sown from September to October. The accessions were grown in plots with a size 2 m2 for regeneration. Evaluation trials consisted of experimental plots with a size of 4 or 10 m2 in four replications in completely randomized design. Data were recorded according to the descriptor list for wheat7 for the 25–30 traits. The experiments were repeated for 3 (in a few exceptions 2) years in different experimental fields of the genebank. Three traits were selected for this study: heading date (HD), plant height (PH), and thousand grain weight (TGW). HD was assessed for both spring and winter wheat accessions as the number of days from January 1st when 50% of the plants reached heading (BBCH 59)9. PH was assessed in cm from the soil surface to the top of spike including awns. TGW was determined after seed harvest and expressed in g on a ~15% grain moisture basis.
Data analyses
Phenotypic data of each growth type was analyzed separately based on the following mixed model:
where yij stands for observed phenotypic value of the ith accession in jth year, μ is the population mean, eij error terms (random), gi effect of accessions (fixed), and aj effect of the year (random).
ASreml-R10 v. 4.1.0.154. was used for the purpose of the analysis and variances of errors were modelled as specific for each year. In a first step, Eq. (1) was used for outlier detection. In order to do that, studentized residuals were used and Bonferroni-Holm tests were applied to correct for multiple testing11,12. The outliers were then removed from the dataset and best linear unbiased estimations (BLUEs) for each accession were generated by fitting Eq. (1) on the enhanced data set.
In the next step, the heritability of the main traits was calculated considering both years and accessions as random effects in Eq. (1):
where, \({{\boldsymbol{\sigma }}}_{{\boldsymbol{G}}}^{{\boldsymbol{2}}}\) is genetic variance of the accessions, \({{\boldsymbol{\sigma }}}_{{\boldsymbol{e}}}^{{\boldsymbol{2}}}\) stands for average error variance across regeneration years, Year is average number of years each accession was tested.
Data Records
The data analyzed in this study can be accessed in Plant Genomics and Phenomics Research Data Repository (PGP)13 and can be found here14. Dataset follows the standards of ISA-Tab format to sustain uniform and easy to read description. Original as well as processed/corrected data are supplied.
Information about investigated accessions are provided in two main files (s_winter_wheat.txt and s_spring_wheat.txt). Encompassed information are as follows: (i) National accession identifier, (ii) harvest year, (iii) values for three main traits, (iv) type of values stating whether respective value for main traits is single (sole year of cultivation and one value) or average (one average value based on multiple years of cultivation), (v) geographic origin identifying the country reported by donors or where the respective accessions were collected.
The assay files of the presented study comprise the original historical phenotypic data (“a_spring_wheat.txt” and “a_winter_wheat.txt”) as provided by the CRI genebank that were tested for presence of outliers. By discarding the identified outliers, enhanced data sets were generated (“Outlier.corrected.spring.txt” and “Outlier.corrected.winter.txt”).
There are 1,065 unique spring wheat accessions with data for at least one trait in the original records. Mexican accessions are the most represented (20.28%), followed by Russian (7.89%), former Czechoslovakian (6.29%; CSK), German (5.63%), and USA (4.32%) (Fig. 1; Table 1). Additionally, 2.54% of the accessions are of Czech provenance (CZE), which used to be a part of CSK. Therefore, 8.83% of the accessions are of Czech (CZE) or Slovak (SVK) origin. The most data are present for TGW (1,810), followed by HD (1,673) and PH (1,652). The data span 4 consecutive decades, with data from the 1990s and 2010s being the most common (Fig. 2).
Historical data for winter wheat include 3,469 unique accessions, with German accessions being the most abundant (13.09%), followed by French (8.56%), CSK (7.9%), Swiss (4.81%), and Hungarian (4.73%) (Fig. 1; Table 1). More than 4% of the accessions are of Czech origin (CZE). Together with the accessions originating from CSK, there are 12.17% accessions of either CZE or SVK origin. The most abundant data are for PH (8,105), followed by HD (7,857), and TGW (7,115). The data span 7 consecutive decades, with most data from the 2000s and 2010s (Fig. 2).
The number of accessions tested for spring wheat increased up to 300 accessions in a single year (1975), as the numbers were comparable for all three traits (Fig. 3). Individual accessions were regenerated in 1 to 8 different years, with the most frequent values being 1 and 3 years (Fig. 4). For winter wheat, the number of accessions tested within individual years was even higher, with at least 500 accessions being tested in at least four different years. Individual accessions were regenerated in 25 (HD), 28 (PH), and 23 (TGW) different years, with the prevalence of a 2- and 3-year regeneration cycle. Additionally, the Best Linear Unbiased Estimations (BLUEs; Fig. 5) were provided for the accessions in the “BLUEs.spring.txt” and “BLUEs.winter.txt” files and were calculated using the enhanced historical data files.
The strongest year effect was observed for PH followed by HD. TGW was the least affected. The same scenario applies to both spring and winter wheat (Fig. 6).
Technical Validation
Validation performed in this study involves correction for the outliers in individual traits following the routines described by Phillip et al.15. Brief description of the methods follows.
Outlier corrections to curate the data
Due to the diversity of our dataset, which spans seven different decades and includes data collected under varying climate conditions and seed regeneration cycles, the occurrence of outliers is to be expected. However, as these outliers can disrupt the statistical estimation of the data, it is essential to manage them properly. Dealing with outliers in such unbalanced historical datasets can be challenging. To address this issue, we employed an outlier inspection approach that combines the rescaled median absolute deviation of standardized residuals with a Bonferroni-Holm test to identify and flag data points as outliers. To do so, we established a predefined significance threshold of p-value < 0.05 for the implemented test and removed the identified outliers from the historical dataset, resulting in an enhanced dataset. After the removal of outliers, we re-fitted Eq. (1), treating genotypes and years as random effects, and evaluated the impact of outlier exclusion on variance components and heritability.
The correction of outliers had a significant impact on the heritability (h²) of spring wheat, leading to an improvement of up to 24% compared to the original data (see Table 2). This improvement can be attributed to a reduction in error variance, which was particularly pronounced for spring wheat, where the original dataset contained a relatively high proportion of outliers (up to 26%; see Table 3 and Fig. 7). However, in the case of HD, the correction of outliers interfered with the model used, making it impossible to calculate the effect of outlier correction on h². As for winter wheat, outliers were present in only 3.5% of the data, resulting in a modest reduction in error variance and a smaller improvement in heritability (up to 10%; see Table 2).
Usage Notes
The presented FAIR data for a further significant wheat collection are to be seen in the context of already published data15,16,17,18 and represent an important step towards a global catalog of plant genetic resource information, which is essential to transform genebanks into bio-digital resource centers19 and thus, support future research and breeding initiatives. Genebank material is available upon request and can be accessed through GRIN Czech platform under the terms of a Standard Material Transfer Agreement (SMTA).
Code availability
Described statistical approaches were performed within the R environment (version 4.0.3). Scripts used for the both outlier correction (“outlier.correction.R”) and estimation of BLUEs (BLUEs.estimation.R) along with input datasets containing original historical data can be found in the e!DAL-Plant Genomics and Phenomics Research Data Repository (PGP).To run the script, input data files (a_spring_wheat.txt; a_winter_wheat.txt) need to be downloaded and proper working directory needs to be set. The script named “outlier.correction.R” shows the outlier handling procedure on the example of plant height data in winter wheat. Output files such as “Var.comp.PH.txt”, “Outliers.PH.txt” and “Data.corrected.PH.txt” corresponding to variance components or regeneration years, list of removed outliers and enhanced (outlier-corrected) data, respectively are generated by the scripts. Script called “BLUEs.estimation.R” uses the outlier corrected data from previous step while generating the BLUEs for individual accessions, as included in the “BLUEs.PH.txt” output. As data for all traits are available in input files, script can easily be altered for other traits and spring wheat following the footnotes within script.
References
Erenstein, O. et al. in Wheat Improvement: Food Security in a Changing Climate (eds M. P., Reynolds & H.-J., Braun) 47–66 (Springer International Publishing, 2022).
Faostat, D. Crops and livestock products. Statistics Division, Food and Agriculture Organization of the United Nations: Rome, Italy (2022).
Velu, G., Singh, R. P., Huerta, J. & Guzmán, C. Genetic impact of Rht dwarfing genes on grain micronutrients concentration in wheat. Field Crops Research 214, 373–377, https://doi.org/10.1016/j.fcr.2017.09.030 (2017).
Eugenio, F. A., van Milgen, J., Duperray, J., Sergheraert, R. & Le Floc’h, N. Feeding intact proteins, peptides, or free amino acids to monogastric farm animals. Amino Acids 54, 157–168, https://doi.org/10.1007/s00726-021-03118-0 (2022).
Talebnia, F., Karakashev, D. & Angelidaki, I. Production of bioethanol from wheat straw: An overview on pretreatment, hydrolysis and fermentation. Bioresource Technology 101, 4744–4753, https://doi.org/10.1016/j.biortech.2009.11.080 (2010).
Tufail, T. et al. Wheat straw: A natural remedy against different maladies. Food Science & Nutrition 9, 2335–2344 (2021).
Bareš, I., Dotlačil, L., Stehno, Z., Faberová, I. & Vlasák, M. Původní a povolené odrůdy pšenice v Československu v letech 1918-1992. Sbírka VÚRV, Genetické zdroje č 65, 305 (1995).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018, https://doi.org/10.1038/sdata.2016.18 (2016).
Meier, U. BBCH-Monograph. Growth Stages of Plants–Entwicklungsstadien von Pflanzen–Estadios de las plantas–Dévelopement des Plantes. Berlin and Wien. (Blackwell Wissenschaftsverlag, 1997).
Asreml: Fits the linear mixed model. R package (Version 4.1. 0.149) (2021).
Holm, S. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics 6, 65–70 (1979).
Nobre, J. S. & Singer, J. M. Leverage analysis for linear mixed models. Journal of Applied Statistics 38, 1063–1072, https://doi.org/10.1080/02664761003759016 (2011).
Arend, D. et al. e!DAL - a framework to store, share and publish research data. BMC Bioinformatics 15, 214, https://doi.org/10.1186/1471-2105-15-214 (2014).
Svobodap, P., Holubec, V., Reif, J. & Berkner, M. Historical phenotypic data of spring and winter wheat accessions regenerated in Czech genebank in Prague-Ruzyne since the 1950s. e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP) https://doi.org/10.5447/ipk/2024/6 (2024).
Philipp, N. et al. Historical phenotypic data from seven decades of seed regeneration in a wheat ex situ collection. Scientific Data 6, 137, https://doi.org/10.1038/s41597-019-0146-y (2019).
Hinterberger, V., Douchkov, D., Lueck, S., Reif, J. C. & Schulthess, A. W. High-throughput imaging of powdery mildew resistance of the winter wheat collection hosted at the German Federal ex situ Genebank for Agricultural and Horticultural Crops. GigaScience 12, https://doi.org/10.1093/gigascience/giad007 (2023).
Schulthess, A. W. et al. Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nature Genetics 54, 1544–1552, https://doi.org/10.1038/s41588-022-01189-7 (2022).
Schulthess, A. W. et al. Large-scale genotyping and phenotyping of a worldwide winter wheat genebank for its use in pre-breeding. Scientific Data 9, 784, https://doi.org/10.1038/s41597-022-01891-5 (2022).
Mascher, M. et al. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nature Genetics 51, 1076–1081, https://doi.org/10.1038/s41588-019-0443-6 (2019).
Acknowledgements
This study was funded by the AGENT project which itself received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 862613 and by the Ministry of Agriculture of the Czech Republic, institutional support MZE-RO0423. MOB received funding from the aforementioned AGENT project. Special thanks belong to MSc Jiří Hermuth and MSc Alena Šímová for experimental support with the plant material.
Author information
Authors and Affiliations
Contributions
P.S. wrote the manuscript, did the data analysis supported by M.O.B. and J.C.R. and formatted the ISA-Tab compliant metadata description for the presented data. V.H. designed the study and provided background information about the historical data. J.C.R. and M.O.B. revised the manuscript and supported P.S. in the data analysis. All authors agree with the current statement.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Svoboda, P., Holubec, V., Reif, J.C. et al. Curation of historical phenotypic wheat data from the Czech Genebank for research and breeding. Sci Data 11, 763 (2024). https://doi.org/10.1038/s41597-024-03598-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03598-1