## Introduction

Fusarium circinatum Nelson Nirenberg & O’Donnell, formerly also known as Gibberella circinata Nirenberg & O’Donnell, is a harmful fungus causing pitch canker, a serious disease on pine trees. This pathogenic ascomycete attacks all Pinus species and Pseudotsuga menziesii (Mirb.) Franco, with varying levels of virulence1,2,3. All stages of pine are susceptible to the pathogen: seedling blight and damping-off, dieback of branches and stems on young and mature trees, where the most conspicuous symptoms are cankers accompanied by sometimes copious resin exudate (pitch canker)4. Dieback symptoms are also commonly observed in the crown due to the obstruction of water flow caused by the cankers and saturation of xylem by the excess resin produced by the tree. The multiplication of severe cankers on young or mature trees may lead to tree death5.

As for most tree pathogens, no economically and environmentally viable treatment is available to control or eradicate the fungus on a large scale. Management strategies are therefore focused on preventing the introduction of the pathogen, early detection and eradication of outbreaks in previously disease-free areas. Fusarium circinatum has been reported in different parts of the world, where it causes severe losses to the pine production industry (USA, South Africa, Korea, Japan, Spain) as well as in nurseries (Mexico, Haiti, Chile, Uruguay, Brazil, Colombia, South Africa) (EPPO Global database, https://gd.eppo.int/). In Europe, F. circinatum is officially present in Spain and Portugal in pine forests6,7, while it is also occasionally found in pine nurseries. The pathogen has also been found in French nurseries8, and in a public garden in Italy9, but it is currently considered as officially eradicated in both countries (EPP0 Global database, https://gd.eppo.int/). Since 2007, F. circinatum has been listed as a quarantine fungus for the EU, in order to prevent new introductions of infected material and further spread of the disease10. As a consequence of its quarantine status, a zero-tolerance policy is in force.

Management efforts should therefore focus on early detection of the pathogen in the different pathways of movement and introduction. Based on the pest risk assessment issued by the European Food and Environment Safety Agency (EFSA), the main pathways for the potential introduction of this fungus to disease-free areas are through contaminated pine seeds and seedlings8. To minimize the probability of an introduction and to reduce the cost associated with the eradication and control of this invasive pathogen, efficient management measures are needed11. For these reasons, the development of reliable diagnostic protocols is fundamental for the early and accurate detection of F. circinatum in pine-related commodities which can harbor and consequently spread the pathogen (i.e. substrates) such as seeds, seedlings, plants, young and mature trees. The diagnostic protocols should therefore be as specific and sensitive as possible10,12. False-negative detection results may lead to introduction of the pathogen in disease-free areas, while false-positive results may be responsible for unfair and inappropriate destruction of plant material, or a ban on trade with severe economic consequences.

Numerous diagnostic protocols targeting F. circinatum are currently available in the scientific literature and their use is suggested in international standard protocols by international bodies such as EPPO, ISTA or FAO-IPPC13,14,15. Despite most of them providing validation data supporting their accuracy, performance values are inconsistently reported or assessed in the original articles describing their development. Validation actions should be carried out to provide objective evidence that the test is suitable for the circumstances of use and can be considered for screening purposes16.

For instance, identification of F. circinatum may be achieved by isolation and morphological characterization of the pathogen, a common technique used in mycology. Fusarium circinatum displays several typical features, such as the presence of mono- and polyphialides as well as coiled sterile hyphae that aid diagnosis17,18. However, a recent study based on the phenotypical characteristics of isolates from a wide geographical range (Europe, America, Africa, and Asia) found that coiled sterile hyphae may not be a reliable morphological trait of F. circinatum, as previously reported19. Recent description of new Fusarium species in Colombia, which are also pine pathogens and can produce similar coiled hyphae in culture, further challenges the specificity of this morphology-based technique20. Additionally, morphological characterization is lengthy and requires considerable mycological expertise, whilst also not being efficient for the detection of quiescent forms of the pathogen that can be encountered in seedlings21 or in seeds.

A number of conventional and real-time polymerase chain reaction (PCR) assays targeting F. circinatum have been developed13,22,23,24,25,26,27,28, and another one is presented in the Supplementary Information section (Supplementary Information 1, hereafter referred to as Baskarathevan et al., unpublished). In the original articles, variable levels of validation are presented. For instance, the assessment of specificity and inclusivity used a more or less exhaustive range of Fusarium strains, depending on the availability of testing material, which included numbers of F. circinatum strains from different continents and newly described species, genetically related to F. circinatum. Considering the paramount importance of the reliability of a test when dealing with a quarantine pathogen, efforts should be focused on the continuous verification of the specificity of the protocols. This is particularly relevant when a Fusarium species is targeted, since this genus includes a steadily increasing number of newly described and cryptic species20,29,30. In addition, there is typically no data available to support the reliability of a given protocol when carried out in different laboratories, with different equipment and reagents than those described in the original papers. For instance, one may imagine that the specificity and the sensitivity of a test using conventional and real-time PCR may be altered by changing the brand of DNA polymerase enzyme, commercial ready-to-use master mix, or thermal-cycler/software. This in turn may affect the reliability of a diagnostic protocol targeting economically important pathogens such as F. circinatum31,32. Validation of diagnostic protocols is therefore a key element in establishing reference methods and to assess a laboratory’s competence and ability to produce reliable analytical data12.

The currently ongoing European COST action FP1406 “Pine Pitch Canker Strategies for Management of Gibberella circinata in Green Houses and Forests” (PINESTRENGTH) brings together 34 countries to establish a European-focused network dedicated to the Pine Pitch Canker pathogen. The main objectives are to increase knowledge on the biology, ecology and spread pathways of F. circinatum, to evaluate the potential for the development of effective and environmentally friendly prevention and mitigation strategies and to deliver these outcomes to stakeholders and policy makers. Early and accurate detection of F. circinatum is essential to achieve these goals. In this study, we compared for the first time performance of existing molecular tools targeting F. circinatum, with a wide range of DNA from target and non-target Fusarium species, in a large panel of 23 laboratories from European countries, South Africa and Chile. This collaborative study enabled us to provide useful data for the transferability of the different molecular-based diagnostic protocols. We propose recommendations for the preparation and use of future standard diagnostic protocols using conventional or real-time PCR assays.

## Results

### Indeterminate results

All 23 participants carried out the tests as requested. However, a few samples could not be tested by some of the participants due to loss or shortage of DNA template after multiple PCR run failures and were thus not considered in the analyses.

All nine protocols generated indeterminate results (Table 1, Fig. 1A). The total number of indeterminate results was 58 (1.5% of the total analysis) and ranged from 0.42% for protocol p1 to 4.58% for protocol p6 (Table 1). The participants reported several reasons for rating results as indeterminate. These included late mean cycle threshold (Ct) values or inconsistent Ct values from the replicates for hydrolysis probe-based protocols, melting temperatures (Tm) and melting peaks slightly different from the positive control, or late mean Ct values for SYBR Green based protocols. Generally, laboratories performing conventional PCR protocols did not provide an explanation for indeterminate results. However, one participant sequenced each amplicon and rated the results as indeterminate when the sequence was not readable. Comparison of indeterminate results by Fisher’s exact tests revealed significant differences (P < 0.001) between protocols (Table 1, Fig. 1A). In particular, protocol p6 based on SYBR Green real-time PCR, yielded significantly more indeterminate results than most of the other protocols. In this case, the melting peak analysis required by protocol p6 often revealed the presence of melting peaks slightly different from the positive control, the presence of double melting peaks, or only one out of two replicates were positive for some of the DNA extracts.

When indeterminate results were analyzed with only pure Fusarium strain DNA (no DNA extracts from inoculated seeds), 37 indeterminate results were reported (63.8% of the total number of indeterminate results). Fisher’s exact tests revealed significant differences between protocols for these indeterminate results (P < 0.001, Table 1, Fig. 1B), and the rate of indeterminate ranged from 0.0% (protocol p5 and protocol p7) to 4.6% (protocol p6). In general, these indeterminate results mostly concerned non-target species (88.9% of the Fusarium strain DNA material indeterminate results).

Comparisons of indeterminate results between laboratories by protocol revealed significant differences for p2 (indeterminate data by laboratory ranging from 0% to 5.1%, P = 0.02), p5 (indeterminate data by laboratory ranging from 0% to 7.6%, P = 0.002), p6 (indeterminate data by laboratory ranging from 0% to 16.5%, P < 0.001), p8 (indeterminate data by laboratory ranging from 0% to 7.6%, P < 0.001) and p9 (indeterminate data by laboratory ranging from 0% to 5.1%, P < 0.001).

### Positive and negative deviation rates

All nine protocols exhibited positive deviations (or false positives) and negative deviations (or false negatives) regardless of the data set used (Table 1, Fig. 1C–F). Analysis of DS1 (the dataset for which an indeterminate result is ultimately determined to be the expected result, see methods section) revealed positive deviation rates (PD) ranging from 3.0% (protocol p7) to 17.3% (protocol p8) (Table 1, Fig. 1C). Comparison of PD by Fisher’s exact tests revealed significant differences (P < 0.001) between protocols (Table 1, Fig. 1C). Negative deviation rates (ND) in DS1 ranged from 14.5% (protocol p3) to 19.9% (protocol p8), but according to Fisher’s exact tests, the ND rates between protocols were not different (P = 0.71) (Table 1, Fig. 1E). Concerning DS2 (the dataset for which an indeterminate result is ultimately determined to be the contrary of the expected result, see methods section), PD ranged from 3.0% (protocol p7) to 21.2% (protocol p6) (Table 1, Fig. 1D). Significant differences between protocols for PD rates were observed according to Fisher’s exact tests (P < 0.001). ND in DS2 ranged from 15.9% (protocol p3) to 20.3% (protocol p8) (Table 1, Fig. 1F). ND comparisons between protocols using Fisher’s exact tests did not reveal any significant differences (P = 0.88).

Comparisons between laboratories using DS1 revealed significant PD rate differences only for p2 (PD rate by laboratory ranging from 3.0% to 27.3%, P = 0.007) and p8 (PD rate by laboratory ranging from 9.1% to 37.5%, P = 0.04). Concerning the ND rate in DS1, significant differences were observed in p2 (ND rate by laboratory ranging from 6.5% to 32.6%, P = 0.031) and p6 (ND rate by laboratory ranging from 4.3% to 40.9%, P < 0.001). When comparisons were performed using DS2, significant differences between laboratories were revealed for p2 (PD rate by laboratory ranging from 3.0% to 27.3%, P = 0.007) and p6 (PD rate by laboratory ranging from 6.1% to 63.6%, P < 0.001), for the PD rate, and for p6 (ND rate by laboratory ranging from 6.5% to 40.9%, P < 0.001) for the ND rate.

### Pattern of cross-reactions with non-target species

As shown previously, all protocols exhibited different levels of positive deviations (Table 1, Fig. 1C,D). Cross-reactions with DNA from strains of non-target Fusarium species were encountered for all nine protocols, but inconsistently between participating labs (Table 2). For many strains, a cros-s reaction with the DNA extract was reported for only a single participant among the four, five, or six laboratories involved, which corresponds to a unique reagent/equipment/operator combination.

Fusarium subglutinans was the species that accounted for the most frequent cross-reactions, and its DNA yielded false-positive results for seven out of the nine protocols. Depending on the protocol, from three, up to six out of six laboratories observed cross-reaction with DNA of this species. Fusarium temperatum was the only, albeit consistent, cross-reaction observed with protocol p7. DNA from the newly-described species on pine from Colombia, i.e. F. marasasianum, F. pinninemorale, F. sororula, F. fracticaudum, and F. parvisorum also yielded frequent cross-reactions with four protocols (p1, p2, p8 and p9), and less frequently with protocol p6.

Reference sequences of target genes from species whose DNA cross-reacted with some of the PCR or real-time PCR tests were retrieved from GenBank and aligned with orthologous reference sequences of F. circinatum. The regions upstream and downstream of the forward and reverse PCR primers were removed. The alignments were manually scrutinized to check for the presence of interspecific polymorphism between the regions targeted by the primers. For some of the PCR or real-time PCR tests, it was shown that the presence of polymorphisms after sequencing the amplicon would be helpful to confirm the occurrence of cross-reactions, and possible false-positive results (Table 2).

### Inclusivity of the different protocols

In this work, inclusivity is defined as the ability of each protocol to detect DNA of the target species, regardless of the host plant, mating type, geographical origin and year of collection. Different patterns of inclusivity were observed between protocols. DNA from some of the F. circinatum strains yielded inconsistent negative results. Protocols p4, p7 and p9 successfully picked up all the 38 F. circinatum strains of both mating types included in the panel, regardless of the equipment, reagents and operator, thus supporting their excellent level of inclusivity. By contrast, protocols p1 and p8 almost systematically failed to yield positive results with DNA from the Japanese strain of F. circinatum NRRL2643. However, the rest of the false negative results were not reproducible between laboratories, and were only reported for one participant out of the four, five, or six involved, meaning that they were only observed for some operator/reagent/equipment combinations. Except for the Japanese strain of F. circinatum NRRL2643, these false negative results were observed with DNA from different F. circinatum strains originating from Spain, France, Chile, USA, and South Africa without any obvious pattern.

### Performance criteria and reproducibility

Diagnostic sensitivity (SE) ranged from 80.1% (protocol p8) to 85.5% (protocol p3) and from 79.7% (protocol p8) to 84.1% (protocol p3) using the DS1 and DS2 datasets, respectively. Fisher’s exact tests did not reveal significant differences for SE between the nine protocols either for DS1 (P = 0.72) or DS2 (P = 0.88) (Table 1, Fig. 2A,B). By contrast, diagnostic specificity (SP) differed significantly, using both datasets (both P-values < 0.001). SP ranged from 82.6% (protocol p8) to 97% (for both protocols p3 and p7) in DS1 (Table 1, Fig. 2C). When SP was assessed for DS2, it ranged from 78.8% (protocol p6) to 97% (protocol p7) (Table 1, Fig. 2D). Significant differences in Diagnostic accuracy (AC) were observed between protocols for both datasets (P = 0.002 and P < 0.001 for DS1 and DS2, respectively). AC ranged from 81.2% (protocol p8) to 90.3% (protocol p3) in DS1 and from 80.5% (protocol p8) to 89.2% (protocol p3) in DS2 (Table 1, Fig. 2E,F). Concordance ranged from 74.6% to 97.7% for p6 and p7, respectively. Fisher’s exact tests revealed significant differences between methods (P < 0.001, Table 1).

### Analytical sensitivity

The analytical sensitivity was assessed for each protocol using serial dilutions of DNA from P. pinaster seeds spiked with F. circinatum conidia. However, inconsistent results were obtained with the serial dilutions. For example, when analyses were performed with the DS1 scenario, seven out of the nine evaluated protocols were able to detect at least one of the samples containing F. circinatum DNA in seeds at the highest concentration, i.e. 2 105 conidia/mL of ground seed homogenate: p2, p3, p4, p5, p6, p7 and p9. Only p6 was able to give a positive result for all samples of this concentration. Similarly, these inconsistent results were also observed in data from DS2, and only p2, p3, p5 and p6 were able to detect at least one of the samples of the highest concentration of F. circinatum in seeds. This inconsistent behavior of inoculated material was observed for all of the serial dilution samples, in both datasets (Supplementary Dataset 1).

## Discussion

To establish an international surveillance network for the detection of outbreaks of pine pitch canker across Europe and other disease-free areas, harmonization of protocols for F. circinatum diagnosis is needed. To our knowledge, two international diagnostic protocols targeting this pathogen already exist. Nevertheless, one of them is a list of protocols that the international community has agreed are satisfactory, with minimal validation data available13,15 and the other relies solely on techniques such as mycological plating, that were selected because of their low cost and ease of implementation14. Most of the protocols targeting F. circinatum described in the literature lack a comprehensive evaluation of some basic performance criteria, such as specificity, inclusivity and sensitivity. Additionally, in most cases, specificity has not been re-evaluated in light of newly emerged or described Fusarium species occurring on Pinus spp20. In this study, we selected nine different detection protocols based on PCR and its variants. Their performance was assessed using a panel of 71 Fusarium strains and eight pine seed samples spiked with F. circinatum conidia. In order to achieve the best representation of F. circinatum strains (i.e. inclusivity), the panel included strains originating from six countries across four continents, and both mating types. Also included were recently-described Fusarium species isolated from pines, genetically related to F. circinatum, with overlapping morphological features, and some of them being pathogenic to Pinus20. This large panel of strains was used to assess the performance and transferability of the different protocols through an international collaboration involving a broad consortium of 23 partners. Each protocol was evaluated by a minimum of four, and up to six laboratories therefore ensuring a robust dataset. Practical and technical constraints led to an unbalanced number of laboratories involved per protocol, which means that some of the results should be read with care. In particular, although significant differences were observed for some criteria between participant laboratories and some of the protocols, caution is required before generalizing the results regarding non-significance of some of the statistical tests. Indeed, non-significant differences between protocols means non-identification of a difference rather than the absence of a difference altogether.

Our results showed that all protocols presented acceptable performance values in both datasets (>75%) for diagnostic accuracy, specificity and sensitivity, with some laboratories obtaining individual values close to 100% (Supplementary Dataset 1). Yet diagnostic specificity and accuracy differed considerably between protocols, irrespective of the technology involved (i.e. end-point PCR or real-time PCR using SYBR Green dye or specific hydrolysis probes). These differences were principally linked to cross-reactions with non-target species (positive deviations), and less commonly to consistent or erratic negative deviations with particular strains of F. circinatum.

The cross-reactions with DNA from non-target species observed in our study have not been reported in the original articles describing the protocols. We included a broader and more comprehensive panel of strains, revealing more information about the level of specificity of these protocols. However, our panel is not exhaustive and of course does not cover the entire biological range of genetically related Fusarium strains. Other unexpected cross-reactions may therefore occur, particularly with DNA from as yet undescribed Fusarium taxa. Cross-reactions were observed in all protocols, although at different levels, depending on the laboratory involved. Some of the erratic cross-reactions and false negative results may have occurred because of issues such as pipetting errors, DNA shearing, among others. However, some of the cross-reactions were more frequent and were due to lack of specificity/sensitivity of the molecular markers toward strains that had not been assessed during the original validation step of the original protocol by the authors, such as for genetically related F. temperatum and F. subglutinans. From a practical point of view, the presence of some of the Fusarium species whose DNA yielded false positive results are very unlikely on pine tissue, but the recent finding of Herron et al.20 showed that previously unknown species may be found on pine. In our experiment, a common DNA extraction procedure was followed for all the fungal strains, which sometimes differed from the original article describing each of the nine protocols. It cannot be ruled out that the DNA extraction procedure used in our study had an effect of PCR or real-time PCR specificity or inclusivity. Another aspect to consider is that for the sake of harmonization, a standard concentration of Fusarium DNA was used throughout the study (0.5 ng/µL). This may not always reflect the actual concentration that may occur when testing real pine samples contaminated with these Fusarium species, and the likelihood of cross-reactions with non-target DNA probably increases with higher concentrations. At the same time, certain strains of F. circinatum were ‘missed’ by some of the protocols (especially the F. circinatum strain from Japan), with false-negative results that were not reported in the original articles, except by Ramsfield et al. (2008) regarding protocol p1. This suggests that some of the F. circinatum strains travelling with plant material such as seeds might not be detected when using some of the protocols assessed here. Our data provide a first evaluation of the inclusivity of nine protocols, which can be useful for laboratories in charge of official analyses, by elucidating the level of uncertainty associated with some of the protocols used throughout the world. Although diagnostic specificity across all protocols was rather high (>75%), no protocol was 100% specific with the present panel of F. circinatum strains. These false-positives may not be acceptable when dealing with a pathogen subjected to strict phytosanitary regulations.

Concordance varied between protocols and ranged from 74.6 to 97.7%. Analysis of the differences between laboratories that tested the same protocol also showed that indeterminate results and negative and positive deviations differed significantly. These results suggest that molecular detection methods may not always be easily transferable. Basically, they clearly illustrate that deviations from the “original recipe”, i.e. the use of different equipment, consumables, but also operators, might compromise the stringency of the reactions, and therefore the specificity of the results. This is particularly important when dealing with a quarantine pathogen, with a zero-tolerance policy. Positive deviations may lead to the inappropriate destruction of goods, whereas negative deviations could fail to prevent introduction of the pathogen into disease-free areas.

We also showed that all the protocols exhibited some problems in result interpretation (indeterminate results), independently of the PCR technology used. However, end-point PCR generally yielded fewer indeterminate results, across all partners, probably linked to the simplicity of result interpretation, based on the observation of a band on an electrophoresis gel, with little room for doubt. Concerning the other PCR techniques, it can be suggested that interpretation of the melting curves was not straightforward when using the SYBR Green real-time PCR, and setting the fluorescence threshold for the calculation of the cycle threshold value when using a hydrolysis probe was sometimes done inconsistently between partners. In addition to this problem of result interpretation, some of the protocols required particular settings that may not have worked well under different conditions. For example, protocol p224 requires an unusually high hybridization temperature of 70 °C to ensure specificity, which seemed to cause a sensitivity problem when used in certain laboratories or with different reagents/equipment than the ones originally described. In this work, the statistical analysis of indeterminate results was enabled by processing data under two scenarios (DS1 and DS2). In all cases results from both datasets were consistent, leading consequently to the same conclusions. This is an important point because it means that differences in performance criteria between protocols were not influenced by indeterminate results, which represented less than 2% of the total results.

We did not evaluate and compare the sensitivity of a protocol based on mean Ct values, but rather on its ability to yield positive results with lower target concentrations. This approach enabled comparison of conventional PCR, for which no quantitative results are generated with real-time PCR protocols. Additionally, we chose not to provide cutoff values because the sensitivity of a test should not be dependent on Ct values, but rather on its ability to reliably amplify and detect a low concentration of target DNA33. Late Ct values may still be valid and confidently used if the test specificity has been correctly designed and evaluated34. Despite protocols p2, p3, p5 and p6 consistently yielding positive results for the highest conidia spiking quantity, the data of our study showed that positive results for lower concentrations were rarely obtained. This made it very difficult to compare the protocols to each other regarding analytical sensitivity with seeds. However, using protocols p2 and p6, successful detection of F. circinatum in naturally infected pine seeds has been reported by Ioos et al.24 and Dreaden et al.25, and protocol p2 has been used for years in ANSES for the interception of naturally infected imported seed lots (Guinet, C., ANSES, pers. Comm.). This suggests that the modified method used in our study for the preparation of artificially infected seed DNA was not able to provide samples with a sufficient level of contamination, probably inferior to what is expected with real-world samples. In this respect, a preliminary biologic enrichment of the seed in a broth of culture medium seems a very efficient method to improve detection of F. circinatum in seeds at low levels13.

One of the main recommendations resulting from the present study is that the transferability of a PCR or real-time PCR protocol should be thoroughly and continuously assessed before becoming a standard. In other words, the ability to yield accurate results when used under slightly different conditions should be checked. Indeed, it is very unlikely that all the specific brands of reagents and equipment described in the original scientific papers are available to end-users. In this study, partner laboratories were free to use their own real-time equipment and brand of reagents, such as DNA polymerases, PCR or real-time PCR master mixes. Discrepancies of results between laboratories regarding false positive and false negative rates, as well as the different analytical sensitivities confirm that changing the brand and type of DNA polymerase35 and equipment31,36 may affect the reliability of the results. Changing the DNA polymerase may also generate the amplification of non-specific amplicons, especially when working with symptomatic pine DNA extracts (Piškur, B., pers. comm.). This observation is in line with Bustin & Huggett33 who showed that the performance of a real-time PCR assay varied with different master mixes, probably due to differences in Mg2+ concentrations and the addition of undisclosed stabilizers to the buffer affecting primer and probe annealing.

Another parameter to be considered is that interpretation of the fluorescence levels yielded in the real-time PCR reactions requires the enforcement of decision rules, either by the operator, or the analysis software. In turn, the decision to rate a Ct value as a positive or indeterminate result may be influenced by internal rules, which are not the same between laboratories. In addition, slight variations induced by the operator or the equipment, such as pipetting errors, temperature drift or thermic heterogeneity of the thermal cycler block may have an effect on the stringency of the PCR reaction and thus, in turn may affect the analytical specificity and sensitivity33,37,38,39.

In line with other guidelines proposed for testing genetically modified organisms40, we therefore recommend that a preliminary assessment of the robustness and transferability of a new protocol should be carried out to provide an indication of its performance under different conditions than the ones used during its development. This assessment should be carried out in addition to the classical performance criteria assessed during the initial validation process. This may be achieved by, for instance, the organization of a collaborative study, using a large and representative panel of target and non-target taxa, and involving as many different reagent brands and thermal cycler types as possible. Therefore, the end users should bear in mind that the performance data of a conventional or real-time PCR protocol described in the original articles are intimately linked to the reagents, equipment and decision rules used. It is strongly suggested that individual laboratories should carry out their own characterization if these parameters are modified, and even if no parameter is modified at all. To this aim, a series of “reference samples” should be maintained and provided by a “reference laboratory” to any laboratory intending to establish and maintain an accurate diagnostic test41. The organization of training sessions by these reference laboratories would also help to share experience and knowledge about the use of a given protocol, and would harmonize the practices and decision rules. This is of paramount importance when targeting a quarantine pathogen, for which very strict regulations are enforced.

We also advocate the continuous verification of the specificity of published protocols, in order to consider new taxa that are continuously described in the literature. This can be achieved by in silico evaluation, by blasting the primer and probe sequences on international DNA databases such as GenBank on a regular basis, and by wet lab testing of newly described strains. Another suggestion to ensure the accuracy of the positive results is to analyze the amplicon sequence and/or to use additional tests targeting other loci in the genome of the target organism. Currently, the two international protocols for the diagnosis of F. circinatum13,15 recommend sequencing of the amplicon after a positive result via conventional or SYBR Green real-time PCR using the CIRC1A-CIRC4A primers22. However, our study suggests that a similar procedure should also apply for the other available protocols targeting F. circinatum, even for those using hydrolysis probe-based methods. It is advisable that such a complementary approach should be followed to verify results of particular importance such as first reports in disease-free areas. For some of the protocols, analysis of the amplicon sequence, trimmed from the primers’ sequences may help confirm the accuracy of the result. However, sequencing will not always be sufficient. Firstly, this is dependent on reference sequences being available in databases. Secondly, this approach will not work if undescribed species share a 100% match to F. circinatum (see Table 2 footnotes). Lastly, confirmation by sequencing is not always possible when the conventional or real-time PCR test targets a region of unknown function such as a Sequence Characterized Amplified Region (SCAR). Hence, no orthologous sequences for other Fusarium species are available for the SCAR targeted by protocol p1 and p8.

In addition, positive samples could be further processed in order to isolate the pathogen in pure culture, allowing the identification of the pathogen by both morphological and molecular features42. Combining molecular and morphological data would of course secure identification of the pathogen, particularly important for first reports, and will help to increase knowledge of the morphologic and genetic diversity of the pathogen. In this respect, it is necessary to establish protocols providing a representative sampling strategy, starting from plant tissue. With the exception of seeds, for which a strategy has been proposed43, there is to our knowledge no standard for sampling plants or adult trees, tackling for instance the minimum number of samples that should be taken for assuring the absence of the pathogen, irrespective of the analysis technique chosen (molecular or isolation).

## Methods

### Participants and selection of protocols

An official call for participation was issued in 2016 in the framework of the COST action FP1406 PINESTRENGTH. In all, 23 laboratories representing 18 countries participated in the study (Table 3). Only laboratories with sufficient experience in molecular biology-based detection techniques and appropriate equipment were involved. Samples were sent to each participant on June 7th, 2017. The analyses were to be completed and results returned to the organizer by the end of August 2017.

At the time the project was started, a total of nine conventional or real-time PCR protocols targeting F. circinatum (p1 to p9) were available in the literature or were brought to our knowledge (Table 4). Protocols included several formats of PCR amplification and labeling. Protocols p1 and p9 use conventional or end-point PCR13,23, p4, p5, and p6 use SYBR Green-based real-time PCR22,25,26, and p2, p3, p7, and p8 use real-time PCR hydrolysis probe-based tests24,27,28. In order to balance the comparison among the protocols and to provide a sufficient amount of data to compute the performance criteria, each protocol was assessed by at least four different partners.

Protocols were conducted following the description in the original article, observing the amplification parameters (cycling conditions, temperatures settings) and reaction mixtures (primers and probe concentrations, reaction and DNA template volumes) indicated by the authors and summarized in a reference document that was sent to each participant along with the DNA samples (Supplementary Information 2). However, if not available in the participant laboratory, the DNA polymerase or commercial real-time PCR master mix described in the original articles were replaced by the reagents typically used by the participant laboratory (for further information refer to Supplementary Information 3). Each participating laboratory was free to use its own PCR equipment.

### F. circinatum-specific primers and probes

Each partner provided the primers for the protocols using end-point PCR and SYBR Green real-time PCR, as described in the original articles. To cut down costs, the primers/probe combinations required for the different hydrolysis probe real-time PCR protocols were only purchased once by one of the partners and distributed to all the participants as ready-to-use aliquots of 30 µM (primers) or 10 µM (probe) solutions, in 1.5 mL amber microtubes. Primers/probe combinations for p2 and p7, p3, and p8 were custom made by Eurogentec (Seraing, Belgium), Integrated DNA technology (Skokie, Illinois), and Biosearch Technologies (Petaluma, California), respectively. Primers and probes were shipped at room temperature by a fast delivery service and kept in a freezer until used for testing.

### Fungal strains and preparation of panels of DNA samples

A panel of 71 monosporic Fusarium spp. strains representing 29 distinct species was used (Table 5). Species identity was confirmed by EF1 alpha gene sequencing44, if the strain was not obtained from an international fungal collection. It included 38 F. circinatum strains from different geographical origins, mating types, host tree species and environments, thus covering as much of the genetic diversity of the pathogen as possible. Thirty-three other Fusarium strains were also included. They represented species that are either genetically close to F. circinatum, or inhabit the same ecological niche, i.e. pine woody tissue, pine seeds, pine roots, or soil. Also included were recently described species of Fusarium associated with pine cankers in Colombia, i.e. F. parvisorum, F. sororula, F. marasasianum, F. pininemorale, and F. fracticaudum20. The strains were sent from different providers, and were kept on agar slants at 5 °C before handling. As F. circinatum is considered a quarantine organism for the European Union (EU), all strains from this species were maintained and manipulated in level 3 biohazard containment facilities in ANSES Plant Health Laboratory (here named as ANSES), in Malzéville, France, in compliance with EU Directive 2008/61/EC. Taking into consideration that the Fusarium strains from Colombia are recently described species20 not found in the EU, it was decided to manipulate them under the same conditions as F. circinatum.

To avoid biases generated by the involvement of different operators and laboratories and to minimize the risk of moving around living F. circinatum strains, all participants worked with DNA extracts rather than with living cultures. All strains were first gathered in ANSES and kept on site. For DNA extraction, the strains were first cultured on potato dextrose liquid media (PD Broth, DIFCO™), for approximately 5 days, after which 100 to 200 mg of fresh mycelium was harvested. Genomic DNA (gDNA) was extracted using the DNeasy plant mini kit™ (Qiagen, Courtaboeuf, France) following the manufacturer’s guidelines, after grinding mycelium with a Lysis matrix A tube containing one 6-mm ceramic sphere and garnet matrix (MP Biomedicals, Santa Ana, CA, USA) and homogenized for 20 s at 6.5 U (m/sec) using a FASTprep 24 device (MP Biomedicals). DNA concentration was estimated using the Nanodrop 2000 Spectrophotometer™. For each strain, genomic DNA was produced from biological replicates and mixed/homogenized in order to obtain enough DNA to be tested in all of the different protocols and to supply to all of the partners. For each strain, the quality of the DNA extract was assessed by successful PCR amplification of the Internal Transcribed Spacer rDNA using the ITS1/ITS4 primer pair45, and the DNA concentration normalized to 0.5 ng µL−1 and distributed as 50-µL aliquots in individual 2-mL microtubes (F1 to F84, Table 5). In total, each laboratory received 38 F. circinatum DNA samples (target DNA) and 33 non-target DNA samples. All samples were anonymously labeled, shipped at room temperature by fast delivery service, and kept in a freezer until analysis.

A set of DNA from F. circinatum-artificially infected pine seeds was also prepared. Two strains of F. circinatum (F7 and F11) were cultured for 6 days at 22 °C under cool-white fluorescent lights with a 12-h light period on Spezieller Nährstoffarmer Agar (SNA) medium to allow macro- and microconidia production46. Microconidia were harvested by washing the surface of cultures with 10 mL of deionized sterile water with 0.01% Tween 20. The resulting suspension was diluted with sterile water to obtain a final concentration of 2.52 × 103 and 1.90 × 103 conidia µL−1 for F7 and F11, respectively, based on counts made using a hemocytometer. Healthy Pinus pinaster seeds were first incubated in liquid PD Broth media as described by Ioos et al.24, in order to simulate biological enrichment of natural samples, as is performed in routine detection analysis. Four replicates of one thousand healthy seeds each were incubated for 72 h at 22 °C in a sterile Easy flat flask containing 50 mL of PD Broth. After incubation, the contents of the Easy flask (healthy seeds + liquid medium) were aseptically transferred into a sterilized grinding bowl, and ground for 1 min using a Microtron MB 550 mixermill (Kinematica, Lucerne, Switzerland). Seventy subsamples of 500 µL of homogenate were collected using a micropipette and transferred into individual sterile 2-mL microtubes. For each of the two F. circinatum strains, 4 sets of 14 homogenized healthy seed subsamples were spiked with 1 × 105, 1 × 104, 1 × 103, and 1 × 102 conidia, respectively. One set was spiked with sterile water to be used as negative control for the seeds. Total DNA was extracted from each spiked homogenate as described by Ioos et al.24 using the Nucleospin Plant II miniprep (Macherey-Nagel) DNA extraction kit. For each level of contamination, all the DNA extracts were pooled, homogenized, and then distributed as 50 µL F. circinatum contaminated seed DNA to be used as template for PCR testing (F85 to F92, Table 5). In total, each laboratory received eight pine seed DNA extracts anonymously labeled, which were transported using a fast delivery service and kept in a freezer until analysis.

In total, each partner received an identical panel of 79 DNA extracts, to be tested in duplicate analysis for each protocol that was assessed.

### Data generation and analysis

#### Indeterminate results

Data were processed anonymously, and no communication was allowed about the trials between partners before the end of the collaborative study. For each protocol, each participant tested all 79 DNA extracts in duplicate. For each sample, results of the tests were reported as either “positive” or “negative”, based on the duplicate analyses. With the exception of protocol p8 (where a Ct < 36 should be considered as a positive result), none of the five published real-time PCR protocols recommended a decision cut-off value. Therefore, the decision to rate a DNA sample as “positive” or “negative” was up to each participant, following the decision rules in force in the laboratory. However, in case of doubt or difficulty in interpretation of the results, “indeterminate result” could be reported. The participants were nevertheless encouraged to submit a brief description of the problem encountered. Indeterminate results between protocols, as reported by the participating laboratories, were compared by Fisher’s exact tests for count data.

Indeterminate results were also compared between laboratories by protocol using Fisher’s exact tests. Ideally, these comparisons would have been performed comparing all indeterminate results generated by laboratories, across all tests. However, it must be noted that not all participants implemented all nine protocols, and missing data for some partners exist in the datasets (for example laboratories that only participated in one test would not be included in the statistical test). Therefore, the decision was made to compare laboratories by protocol, in order to have an idea of potential differences that can exist for example between equipment, location or staff in charge of the tests. In both cases, Fisher’s exact tests were performed with simulated P-values based on 1 × 105 replicates.

Indeterminate results were then transformed following two scenarios as suggested by Chabirand et al.47 and Loreti, et al.48. The first scenario considered that an indeterminate result would be further assessed by the laboratory, and would always be rated “as expected” (i.e. a sample containing F. circinatum DNA would be rated as positive, and a sample not containing F. circinatum DNA would be rated as negative), so that the participant would always make the right decision, eventually. This dataset is here referred to as Dataset 1 (DS1). In the second scenario, an indeterminate result would always be rated “not as expected” (i.e. a sample containing F. circinatum DNA would be rated as negative, and a sample not containing F. circinatum DNA would be rated as positive), so that the participant would always make the wrong decision. This dataset is here referred to as Dataset 2 (DS2).

As the objective was to show the potential biases in the application of a protocol that may arise through differences between equipment, location or staff in charge of the analysis, identity of laboratories is not revealed and only the range of indeterminate results is shown.

#### Rates of false positive and false negative results

Results of the protocols were assessed by computing a number of parameters using both datasets: (i) PA, the number of positive accords or true positives, defined as the number of DNA samples from F. circinatum strains (or DNA from seed samples contaminated with F. circinatum) yielding positive results with the protocol; (ii) NA, the number of negative accords or true negatives, here defined as the number of DNA samples from other Fusarium species yielding negative results with the protocol; (iii) PD, the number of positive deviations or false positives, which takes into account the number of DNA samples from other Fusarium species (or DNA from seed samples not contaminated with F. circinatum) yielding positive results with the protocol; and iv) ND, the number of negative deviations or false negatives, which corresponds to the number of DNA samples from F. circinatum strains (or DNA from seed samples contaminated with F. circinatum) yielding negative results with the protocol.

Similarly, for both datasets, the performance of the protocols regarding specificity was assessed using the PD rate, computed as PD rate = 100 × (number of misclassified known positive samples/total number of known negative samples), and the ND rate, computed as, ND rate = 100 × (number of misclassified known negative samples/total number of known positive samples). As for indeterminate results, the PD and ND rates were compared between laboratories by protocol (in total nine comparisons) for both datasets.

All comparisons were performed using Fisher’s exact tests for count data with simulated P-values based on 1 × 105 replicates.

#### Other performance criteria

For each protocol and for each participant the results obtained for the blind samples were processed according to EN ISO 16140 standard49 and the PM7/98 (2) EPPO standard16. Three performance criteria were assessed: relative accuracy (AC), diagnostic specificity (SP) and diagnostic sensitivity (SE). AC represents the agreement between the expected results and the results obtained using the protocol. SE provides an estimation of the ability of the procedure to detect the target when it is present (presence of F. circinatum DNA). SP provides an estimation of the ability of the procedure not to detect the target when it is not present (no F. circinatum DNA present in the sample). AC, SP and SE were estimated using PA, NA, PD and ND, described in the previous sections, as follows:

$$\begin{array}{rcl}{\rm{AC}} & = & 100\ast ({\rm{PA}}+{\rm{NA}})/({\rm{NA}}+{\rm{PA}}+{\rm{PD}}+{\rm{ND}})\\ {\rm{SP}} & = & 100\ast {\rm{NA}}/({\rm{NA}}+{\rm{PD}});\\ {\rm{SP}} & = & 100\ast {\rm{PA}}/({\rm{PA}}+{\rm{ND}});\end{array}$$

Tests on the equality of SE, SP and AC between methods were performed using Fisher’s exact test.

Qualitative reproducibility or concordance (CO) was also estimated for each protocol. Concordance is the probability that two identical test materials sent to different laboratories will both provide the same results (i.e. both found positive or both found negative)50. Concordance for qualitative analyses is similar to reproducibility for quantitative analyses, and this performance criterion is a means to assess the ability of a protocol to provide consistent results with identical samples that are tested under different conditions: operator, equipment, master mix or DNA polymerase brand, location, time49. In order to have a reliable estimation of CO, it was calculated for each protocol using the original data reported by the participating laboratories. This means that positive, negative and indeterminate results were included. CO between protocols was compared for both datasets using Fisher’s exact tests for count data.

#### Statistical software for data analysis

All statistical tests were performed using the R statistical software version 3.4.051. Statistical tests were considered as significant for estimated P-values with a confidence of less than 5%. All figures were produced using the R package “ggplot2”52.