Chloroplast DNA analysis of the invasive weed, Himalayan balsam (Impatiens glandulifera), in the British Isles

Impatiens glandulifera or Himalayan balsam (HB), is an invasive alien weed throughout the British Isles (BI). Classical biological control of HB in the BI using a rust fungus from the Himalayan native range was implemented in 2014. However, not all HB populations are susceptible to the two rust strains currently released. Additional strains are needed that infect resistant populations in order to achieve successful control. These are best sourced from the historical collecting sites. A molecular analysis was conducted using six chloroplast DNA sequences from leaf material from across the BI and the native range. Herbarium samples collected in the Himalayas between 1881 and 1956 were also included. Phylogenetic analyses resulted in the separation of two distinct groups, one containing samples from the BI and the native range, and the other from the BI only; suggesting that HB was introduced into the BI on at least two occasions. The former group is composed of two subgroups, indicating a third introduction. Ten and 15 haplotypes were found in the introduced and native range respectively, and with two of these found in both regions. Results show where to focus future surveys in the native range to find more compatible rust strains.


Results
Our preliminary work showed that variability in the internal transcribed spacer regions, including the 5.8S rDNA (rDNA-ITS) was very low (data not shown), which is consistent with the previous research 30 . The six cpDNA regions (trnL-trnF, atpB-rbcL, rps16 Intron, trnG Intron, psbA-trnH (GUG) and rpl32-trnL (UAG) ) were selected due to high variability. The aligned sequences of trnL-trnF, atpB-rbcL, rps16 Intron, trnG Intron, psbA-trnH (GUG) and rpl32-trnL (UAG) were 906, 804, 813, 632, 397 and 1,047 bp in length, respectively. Out of the total of 4,599 bp, there were 42 variable sites including 30 substitutions and 12 indels (Table 1). Variable sites in each region were as follows: trnL-trnF, one substitution; atpB-rbcL, one substitution; rps16 Intron, five substitutions and three indels; trnG Intron, three substitutions and three indels; psbA-trnH (GUG) , nine substitutions and two indels; rpl32-trnL (UAG) , 11 substitutions and four indels. Based on the variable sites as mentioned above, 23 haplotypes (A-W) (eight and 13 haplotypes unique to the introduced and native range, respectively, and two haplotypes which occur in both regions) were found across the 86 individuals (52 populations) of HB sampled. From the 24 populations where samples were analysed from two or more individuals, 22 were fixed for a single haplotype and only two populations in the introduced range were polymorphic with two haplotypes.
A phylogenetic tree for HB was reconstructed by using the Maximum-likelihood (ML) and Bayesian inference (BIf) methods with the 23 different haplotype sequences of cpDNA, along with Impatiens parviflora and Cornus controversa as outgroups (Fig. 1). The ML tree was identical to that reconstructed by the BIf method. All the individuals of HB formed a monophyletic group with high bootstrap values and posterior probabilities. Based on the phylogenetic analyses, the 23 haplotypes could be divided into two groups, containing haplotypes from the introduced and native range (Group 1) and from the introduced range of the BI only (Group 2). Although Group 1 forms a monophyletic group, it could be divided into two subgroups (Subgroups 1A and 1B). The majority of samples from Pakistan are included in Subgroup 1A and those from India in Subgroup 1B (Table 2, Fig. 1). Haplotypes J and K belonging to Group 2 diverged earlier than Group 1 with a bootstrap value of 55%. The unrooted network analysis of cpDNA using all haplotypes, in which indels were treated as missing data, is similar with the trees created from the ML and BI methods (Fig. 2). DNA sequences of ACCA in the positions 179-182 at psbA-trnH in haplotype W and those of GGATA in the positions 380-384 at rpl32-trnL (UAG) in haplotype R were reverse complement of TGGT and TATCC, respectively, found in the other haplotypes (Table 1), and therefore these were treated as a single inversion event in the analysis. In this analysis, haplotypes A and L in Subgroup 1A are same as haplotypes B and C, and P and U, respectively. In Subgroup 1B, haplotype E is identical to haplotypes F, M, N, and V. Haplotype S is same as haplotype T. When each indel was treated as a fifth state in the analysis, homoplasy could be increased due to indels (see Supplementary Fig. S1). Based on both the network and phylogenetic analyses, Group 2, containing haplotypes J and K, is located in the outer branch and is likely to be the ancestral lineage of HB. In addition, the phylogenetic analyses suggest that the centre of diversification of HB in the native range is likely to be in the eastern part of the India Himalayas/Western Nepal, with the plant spreading and potentially evolving new genotypes in a north-westerly direction along the Himalayas through India and into Pakistan. Among the 23 cpDNA haplotypes, haplotype E was found to be the most widespread (Figs. 3, 4). Twenty of the 34 populations sampled in the BI region of the introduced range (18 of the 31 populations in the UK and two of the three populations in Ireland) and one of the 18 populations in the native range, Wangat Valley, India, were categorised as haplotype E. The geographical distributions for each haplotype, based on the cpDNA sequences, in the UK showed that haplotype E was most widespread in the south (Fig. 3). Haplotype J, the second most common haplotype, was found in seven populations, mainly concentrated in the north and east of the UK. Haplotype A was distributed in three populations in the introduced range (UK and Ireland) and in two populations in the native range including an unknown site in Kashmir, India. As would be expected in the native range, the haplotypes were found to be more diverse when compared with those in the introduced range (Fig. 4).

Discussion
It is important to identify the native range of an invasive weed prior to the start of a CBC programme. However, the centre of origin of the invasive population within the native range can prove to be equally important, particularly if there are high levels of genetic diversity in the target plant in its invasive range. Molecular methods are one of the tools that can be employed to determine the origin of introduced weeds, particularly when they have a wide native range. The success rate of CBC programmes has been shown to significantly improve when natural enemies are collected from populations in the native range which genetically match those in the introduced range 41 .
CpDNA is inherited maternally 42 and lacks recombination 43 . Therefore, data from cpDNA can provide important insights into evolutionary processes in plants such as hybridisation, population structure and phylogeography 44 . A total of 23 haplotypes of HB were found during this study across the introduced and native  G  G  T  --T  T  C  A  ---T  A  T  --A  C  T  G  G  T  A  T  A  G  T  T  C  --T  T  A  T  C  C  T  C   B  A  A  G  G  T  --T  T  C  A  ---T  A  T  --A  C  T  G  G  T  A  T  A  G  T  T  C  A  -T  T  A  T  C  C  T  C   C  A  A  G  G  T  --T  T  C  A  ---T  A  T  A  -A  C  T  G  G  T  A  T  A  G  T  T  C  --T  T  A  T  C  C  T  C   D  A  A  G  G  T  --T  T  C  T  -T  -T  C  T -   29 and Nagy and Korpelainen 30 , demonstrating that there is more variation in HB in the native range than in the introduced range. HB is native to the foothills of the Himalayas, with populations of the weed occurring in valleys separated by mountain ranges. As a result, populations have evolved in isolation and as shown here, represent different haplotypes. This study demonstrates that haplotypes of HB in the BI and the native range form two distinct genetic groups: Group 1, which can be divided into two subgroups; Subgroup 1A contains two haplotypes which are present in England and Ireland (A and C); Subgroup 1B contains seven haplotypes which are present in the BI (E, G, H, J, K, M, N). Haplotype E is the most common genotype found in the BI and is particularly prevalent in southern England and Wales. Haplotypes from the native range could be divided between the two subgroups, with the majority from Pakistan belonging to Subgroup 1A and those from India to Subgroup 1B. Group 2 consists of two haplotypes which are present in the UK (J and K), but as yet have not been matched to haplotypes in the native range. Haplotype J is predominately distributed in the northern and eastern parts of England and in Scotland. The earliest herbarium samples provided by the Natural History Museum, dating back to before the First World War, gave particular insight into the introduction of HB into the BI. Stately homes and country residences of the aristocracy were significant growers of new plant species brought to the BI by plant hunters and botanists 45 such as John Forbes Royle who described I. glandulifera. It is likely that HB would have first been grown on these estates, and indeed the plant can be found in many of them today. These historical samples dating back more than 100 years are likely to represent genotypes with very little genetic divergence from the initial founder populations. Interestingly, our analyses placed these oldest samples into haplotypes from both genetic groups, and the two subgroups suggesting that HB was introduced from the native range to the BI on at least two, but probably three www.nature.com/scientificreports/ separate occasions. These herbarium specimens, in addition to those from the native range, can provide significant evolutionary insights into the species, as they represent a genetic snapshot at a particular time and place 46 . In Group 1, it was possible to match both subgroups to populations in the native range. Subgroup 1A had an identical cpDNA profile (haplotype A) to two populations in the Batakundi, Kaghan Valley of Pakistan (PA02) and in an unknown location in Jammu and Kashmir of India (IN11); whilst Subgroup 1B had an identical cpDNA profile (haplotype E) to the population analysed from the Wangat Valley in Kashmir (IN02). As for those sites with full information, these two regions between the Kaghan Valley and the Wangat Valley are geographically distinct from each other, separated by the Pir Panjal range and Nanga Parbat region, which are high mountain ranges that divide the Vale of Kashmir in India and the North-West Province of Pakistan. However, Group 2 which consists of the haplotypes J and K was not associated with any of the populations included in the study from the native range. The position of Group 2 containing the haplotypes J and K, in the phylogenetic tree (closest to the outgroup) indicates that these haplotypes may be an ancestral lineage and it is possible that Groups 1 and 2 have diversified from north-eastern India and potentially north-western Nepal (the most easterly part of the native range of Himalayan balsam). The theory of the diversification of I. glandulifera along the mountain www.nature.com/scientificreports/ slopes of Nepal and India towards Pakistan is supported by a study by Janssens et al. 47 ; who concluded that the centre of origin for Impatiens species is south-west China. In order to confirm this, additional samples collected more widely from the native range, especially from Nepal, should be included in future studies. In addition, to confirm whether the haplotypes J and K represent an ancestral lineage, due to the low bootstrap value in the phylogenetic analysis shown in this study, other methods such as inter-simple sequence repeats PCR or nextgeneration sequencing should be used. The results of this study provide an insight into the genetic diversity of HB in the BI and yield crucial data concerning where to focus future searches for additional compatible strains of the rust P. komarovii var. glanduliferae in the native range of the species. In a mountainous region such as the Himalayas, it is probable that the rust has evolved with distinct plant biotypes in isolation and that, as such, distinct strains or pathotypes of the rust exist 48,49 . Consequently, the potential for intraspecies specificity of biological control agents, particularly co-evolved, biotrophic pathogens such as P. komarovii var.   Table 2. The letters in the circles correspond to the haplotypes in Figs. 1 and 2 and Tables 1 and 2. Haplotype O (UK31) is not included as its location is unknown. The map was generated using the ggplot2 34 , ggspatial 35 , sf 36 , rnaturalearth 37 and rnaturalearthdata 38  www.nature.com/scientificreports/ critical: most notably, control of rush skeleton weed, Chondrilla juncea in Australia 41 . In this example, a strain of the rust Puccinia chondrillina collected from Italy, was released and had a severe impact on one of three forms of the weed in the field; with populations levels being decreased by up to 99% 50,51 . Unfortunately, as not all forms were targeted, due to them not being recognised initially as distinct morphotypes, the distribution of the two other forms increased significantly. This necessitated the need to introduce additional rust strains, more virulent towards the resistant forms of the plants 52 to achieve successful control 41 . This case illustrates clearly the possibility to counter the presence of natural resistance in weed populations by the introduction of new pathogen strains 53 . There are, however, other instances where this appears not to be a factor for successful control, for example, mistflower, Ageratina riparia, in Australia, New Zealand and Hawaii 54 . In some cases, an isolate of a pathogen has broad intraspecies specificity, independent of where it was collected from 55 . The susceptibility of HB genotypes to strains of the rust, however, is not clear cut; the rust strain originally collected from India was from the centre location of the native range (IN07), an area where the molecular evidence reported here indicates that none of the BI haplotypes originated. However, this rust strain is able to infect some of the BI populations in England and Wales and has established in the field at a site where haplotype E is the dominant type (UK18) 14 .
The release of the rust strain from Pakistan has resulted in the infection of a separate cohort of HB populations 14 , enhancing the success of the CBC programme. In the case of HB, it will be advisable to collect a range of rust strains from throughout the native range, in addition to focusing on the areas matching the likely origin of a plant genotype. Potentially, this will also create more genetic diversity within the introduced populations of the rust making local adaptations to evolving HB populations more likely and rapid 56 . The sample number used in this study was limited and the number of plants sampled at each site in the BI was low, and may have masked a more significant number of mixed haplotypes at the sites. Only at two sites were the two haplotypes found (UK30 and IR03) and at the four sites where 4-8 plants were sampled only one haplotype was revealed. The results of this research have confirmed previous findings that populations of I. glandulifera in the BI have been introduced from both India and Pakistan 30 . The samples from the BI can be organised into two groups, one being composed of two subgroups, based on the sequences of the six cpDNA regions. Although caution should be taken when interpreting these results, they do provide compelling evidence where searches to find additional strains of the rust that are fully compatible with the dominant haplotypes in the BI should be targeted. Of the two native range haplotypes present in the introduced range, one is from the Butakundi, Kaghan Valley (Pakistan) and positioned within Subgroup 1A, whilst the other, from the Wangat Valley, Kashmir (India) is positioned in Subgroup 1B. No haplotypes were found amongst the native range samples that matched UK haplotypes J and K from Group 2. However, the position of the haplotypes J and K on the phylogenetic tree suggests www.nature.com/scientificreports/ that these haplotypes may be found in the most easterly part of the native range (where India borders Nepal). Thus, this region may yield rust strains compatible with the HB haplotypes J and K, and should, therefore, also be targeted in future surveys. These additional rust strains would thereby enhance the likelihood of successful biological control of HB in the BI.

Methods plant material. British Isles (BI).
A total of 34 distinct HB populations, including some rust-release sites, were used in this study from across the BI ( Table 2). The majority of the leaf samples (26 populations) were collected in 2016 from living plants, dried immediately in silica gel and stored at 4 °C, using the techniques described by Gaskin et al. 18 . The number of plants sampled from each population (N) varied from one to eight; although only for three populations were single samples available (UK03, River Till, Northumberland; UK05, River Tweed, Northumberland; and UK23, Nanstallon, Cornwall). For 23 populations, leaf samples were collected from a minimum of two individual plants. In order to check the validity of using limited sample numbers at each site, eight plants were sampled from four separate parts of the HB infestation at Harmondsworth Moor, Middlesex (UK14); and four plants from Silwood Park, Berkshire (UK15) and also Lanivet, Cornwall (UK24). Leaf samples from seven sites in England were provided by the Natural History Museum (London, UK) from historic herbarium material dating back to before the end of the First World War; collection dates are from 1898 to 1945 (and one sample from the UK from an unknown date and location). The aim was to maximise the chance of obtaining samples from the original introductions, before HB became widespread.
Himalayan native range. Eighteen HB populations from the native range (11 from India and seven from Pakistan) were also included in the study (Table 2). Leaf samples from eight populations were taken from CABI's fungal herbarium material that had been collected during surveys to look for natural enemies of HB from 2006-2010. Consequently, leaf material was only available from one plant, apart from at Rohtang, Himachal Pradesh, India (IN07) where two plants were available. The herbarium samples had been dried in a plant press and stored in wax packets at room temperature. The remaining 10 herbarium samples were obtained from the Natural History Museum. These samples included eight Indian samples and two from Pakistan collected 50-100 years ago. In addition, in order to use I. parviflora as an outgroup species for phylogenetic analyses, the leaf samples of this species were collected from one plant in Egham (Surrey, UK).  Supplementary Table S1). Consequently, these six cpDNA fragments were amplified and sequenced for all samples from the introduced and native range. PCR amplifications were undertaken in reaction volumes of 20 μl containing 10 ng of genomic DNA templates, 10 μl of MegaMix-Royal (Microzone, Haywards, UK) and 0.5 μM of each primer or the same reaction volumes containing of 10 ng of genomic DNA templates, 0.5 U KOD Hot Start DNA polymerase (Novagen/Toyobo, Darmstadt, Germany), 1 × PCR Buffer, 0.2 mM of dNTPs, 1.0 mM MgSO 4 and 0.3 μM of each primer. DNA amplification was performed in a Mastercycler (Eppendorpf AG, Hamburg, Germany). The PCR conditions were as follows: denaturation at 96 °C for 5 min, followed by 36 cycles of 94 °C for 45 s, 50-60 °C (depending on the primers) for 30 s, and 72 °C for 90 s. PCR products were fractionated in 1.5% (w/v) agarose gels, and visualised using SafeView Nucleic Acid Stain (NBS Biologicals, Huntingdon, UK) and UV illumination. The PCR products were purified with MicroCLEAN (Microzone, Haywards, UK) and directly sequenced bidirectionally using an ABI 3100 Genetic Analyzer (Applied Biosystems, Tokyo, Japan) with a Big Dye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Austin, USA) with the same primers used for PCR. All haplotype sequences were deposited in DDBJ/EMBL/GenBank under the accession number LC379653-LC379796. phylogenetic analysis. Sequence alignments were generated using the MUSCLE algorithm in the program MEGA 7.0.14 57 and manually optimised. In order to confirm that I. parviflora became the outgroup species, the sequence of C. controversa (Cornaceae), designated as one of the outgroup species, was obtained from the Gen-Bank database. ML and BIf analyses were performed using RAxML version 8.2.9 58 and MrBayes version 3.2.6 59 , respectively, for both the individual data partitions as well as the combined aligned dataset. The best fit nucleotide substitution model was determined using Kakusan 4 60 , which also generates input files for ML and BIf. Best fit models were evaluated using the Akaike Information Criterion (AIC) 61 for ML and the Bayesian Information Criterion (BIC) with significant determined by Chi-square analysis. Equalrate model among regions with GTR + G in AIC and non-partitioned model among regions in BIC were selected. In the ML analyses, the resultant tree was evaluated by bootstrap analysis with 1,000 replicates. In the BIf analyses, two independent runs, each with four Markov chain Monte Carlo were run for 3,000,000 generations. Samples were taken every 100 generations. Burn-in values were estimated using Tracer v1.6 62 . The first 25% of generations were discarded as burn-in and then a 50% majority rule consensus tree was generated from the sampled trees. Phylogenetic trees outputted from the ML and BIf analyses were processed via FigTree v1.4.3 63 . A network tree of the combined cpDNA regions was constructed with a statistical parsimony network using TCS v. 1.21 64 . Insertions of 5 bp observed at Scientific RepoRtS | (2020) 10:10966 | https://doi.org/10.1038/s41598-020-67871-0 www.nature.com/scientificreports/ rps16 Intron and psbA-trnH were considered as single mutation events. The analyses were conducted twice, in which each indel was treated as missing data (Fig. 2) and a fifth state (see Supplementary Fig. S1).

Data availability
Sequence data were deposited in DDBJ/EMBL/GenBank under the accession number LC379653-LC379796.