Genome skimming and NMR chemical fingerprinting provide quality assurance biotechnology to validate Sarsaparilla identity and purity

Sarsaparilla is a popular natural health product (NHP) that has been reported to be one of the most adulterated botanicals in the marketplace. Several plausible explanations are documented including economically motivated product substitution, unintentional errors due to ambiguous trade name associated with several different taxa, and wild harvesting of incorrect non-commercial plants. Unfortunately, this includes the case of an adulterant species Decalepis hamiltonii, a Red listed medicinal plant species by the International Union for Conservation of Nature (IUCN) and declared as a species with high conservation concern by the National Biodiversity Authority of India (NBA). This study provides validated genomic (genome skimming & DNA probes) and metabolomic (NMR chemical fingerprints) biotechnology solutions to prevent adulteration on both raw materials and finished products. This is also the first use of Oxford Nanopore on herbal products enabling the use of genome skimming as a tool for quality assurance within the supply chain of botanical ingredients. The validation of both genomics and metabolomics approach provided quality assurance perspective for both product identity and purity. This research enables manufactures and retailers to verify their supply chain is authentic and that consumers can enjoy safe, healthy products.


Scientific Reports
| (2020) 10:19192 | https://doi.org/10.1038/s41598-020-76073-7 www.nature.com/scientificreports/ DNA barcoding is one of the molecular methods that can be used for species identification. Standardized gene regions such as rbcL, matK and ITS2 are generally used as barcode markers 9,10 . Using the barcoding approach, 600-800 bp long sequences of candidate barcode regions are amplified in target species using conserved primers. Resulting amplicon is sequenced to identify genetic differences and species are identified. Mishra et al. 8 examined the efficacy of recommended barcode markers rbcL, matK, trnH-psbA, ITS and ITS2 in differentiating three congeneric species of the genus Decalepis. They also examined the ability of these markers to separate Decalepis spp. from H. indicus. Their results suggest that combination of rbcL + matK + ITS provided accurate signal in differentiating Decalepis spp and a SNP difference at 230 bp in matK was able to distinguish H. indicus from Decalepis spp. However, using the barcoding approach for species identification in herbal products has some limitations. Generally, it is difficult to obtain ~ 700 bp long amplicons from herbal products due to the sheared nature of DNA present in them 11 . It also involves several post-PCR processing steps, which may take several days, to identify the species ingredients of a product. The resources required to achieve a successful DNA barcoding test are prohibitive to commercial applications.
In order to overcome the limitations of DNA barcoding, we advocate the use of combination of metabolomic and genomic approaches. One of the metabolomic approaches popularly used in pharmaceutical industry is Nuclear Magnetic Resonance (NMR). Due to its efficacy, several studies used this technique in herbal industry for verification of quality assurance. For example, Hachem et al. 12 , demonstrated the ability of 1 H NMR spectroscopy for the detection, identification and quantification of adulterants in 160 herbal food supplements sold in the market. This study highlighted poor manufacturing practices as evidence for the variability of active pharmaceutical ingredients in capsules of the same box. A similar study was conducted on 150 herbal dietary supplements 13 demonstrated the utility of the 1 H NMR spectroscopy in verifying quality in a good manufacturing process (GMP). In this study they also used mass spectrometry as a complementary method for confirming the chemical structures identified in the NMR profiles. The qNMR spectroscopy method provides accurate results with its absolute and relative quantifications, which include a measure of sensitivity with reasonable errors reported to be less than 1% and 0.2%, respectively 14 . Choi et al. 15 reported that the NMR fingerprinting could efficiently differentiate between transgenic and wild type plants. Hence, NMR tools can be used for differentiating GMO and non-GMO plants, animals and derived by-products. In addition, NMR was also used for assessing the quality of coffee, and distinguishing between superior (Arabica coffee) and inferior (Robusta coffee) cultivars 16 . Likewise, Vogels et al. 17 studied proton NMR spectroscopy's ability to determine adulteration in orange juice. Literature states that the qNMR tool has the unique capabilities of both authentication and quantification of commercial botanical ingredients 18 . qNMR is one of the standard-free quantification tools and has capabilities to analyze multiple mixtures without any internal and external standards 19 . Nuclear magnetic resonance (NMR) is a fast and accurate analytical method. Most recently, the phytochemical analysis and metabolomics/chemical profiling by NMR have also been used for the authentication of botanicals and plant extracts 20 . The added benefit of NMR is that it provides a metabolite profile that can be used for identity of closely related species not differentiated by genetic markers. This provides a quick screen for unknown adulterants, overcoming the very difficult task of developing a DNA-based test assay for some extract ingredients used in food and natural health products.
Genome skimming involves shallow sequencing of the target genome to retrieve highly repetitive regions such as ribosomal DNA (rDNA) and plastid DNA. Previous studies used this approach to resolve the phylogeny of tropical trees 21,22 , on herbarium material for plant identification and phylogenomics 23,24 . In this study, we used nanopore sequencing technology for the first time on botanical products to retrieve long stretches of chloroplast genome from D. hamiltonii and H. indicus to generate more markers for assay development. Using this new sequence information, hydrolytic probe-based assays were designed for quick identification of Sarsaparilla species. Although some barcodes were able to differentiate the specimens of Sarsaparilla group, genome skimming provides several advantages such as, extra sequencing information that can provide more reference sequences for development of multiple assays. We feel the genome skimming approach will overcome issues known to single barcode markers such as low specificity (lack of species differentiation) and sensitivity due to amplification issues in processed samples with degraded DNA due to allele dropout. The development of hydrolytic probe-based assays will enable qPCR assays on portable devices, which can provide quick and accurate response within the supply chain verification.
This goal of this study is to provide a validated genomic (DNA barcoding, genome skimming & DNA probes) and metabolomic (NMR chemical fingerprints) biotechnology solutions to prevent adulteration on both raw materials and finished Sarsaparilla products. More specifically we used both genomic and metabolomic methods to identify samples of known provenance and those in the commercial supply chain. The validation of both genomics and metabolomics approaches provided quality assurance perspective for both product identity and purity.

Results
DNA barcoding of Sarsaparilla group. Conventional barcode markers such as rbcL and matK couldn't amplify 50% of the samples tested in this study. We tested all 24 samples using generic barcoding primers of rbcL and matK that have amplicon lengths of around 500-700 bp. Out of 24 samples tested, four reference samples Pteaq_51470, 75NAT, SR140 and SR281 were successfully amplified by both the markers. However, the amplification rate in the samples obtained from manufacturing and retail units is very low. Samples 461NW, 867NW, BRM 278, BRM 391, 335BI, 336BI, 337BI and Amasar2 were amplified and the remaining 12 samples were not amplified by these markers. Due to such low rate of amplification we did not proceed further with this ineffective method. Filtering removed 24,000 reads that did not pass the quality threshold; the remaining 120,000 reads had good quality scores.

Development of new multi-loci markers for identification of Sarsaparilla samples. Using new
sequence information obtained from nanopore sequencing, chloroplast regions other than available barcode markers were targeted for primer and probe design. As per NCBI taxonomy both D. hamiltonii and H. indicus belong to Apocynaceae. Accordingly, Cynanchum auriculatum, which also belong to the same family was used as reference for assembling chloroplast contigs for both species, using the Read Mapper in CLC Genomics Workbench program. Although full length contiguous chloroplast genome was not assembled for both species, contigs long enough (2-54 kb) to identify variable regions were assembled. These contigs were aligned with chloroplast sequences of Smilax china and Pteridium aquilinum to detect the diagnostic regions characteristic for each species. Six primer sets were designed from these alignments ( Table 1). All the six primer sets were tested on DNA samples of Decalepis hamiltonii, Hemidesmus indicus, Pteridium aquilinum and Smilax spp. using conventional PCR. Sequences derived from these samples were used to design probes for testing samples collected from various phases of supply chain.  Pteaq_51470  SR140  SR281  461NW  867NW  BRM278  BRM391  44PR 67NW  104NW  223BI  226BI  295NW  341BI  342BI  357BI  379BI  335BI  336BI  337BI  Amasar1  Amasar2  Amasar3   Decha1   5′-GAT  TTC GCC  AAG  TCG ATT  CT-3'  361  aquilinum probe appropriately as suggested by their labels. All the positive samples were amplified by their respective probes (Fig. 1).
Chemical fingerprints and clustering. In this study, we have included 24 samples belonging to four species of Sarsaparilla group to determine the classification and grouping using chemometric modelling of 1 H NMR. In trade, Sarsaparilla group belongs to four different taxonomically distant groups. The chemical fingerprint (Fig. 2) shows how the samples are clustered (HCA) based on their metabolite diversity and its intensity. There is a clear metabolite difference between the four different taxa, which separated all the samples in ordination space in accordance to their respective taxonomic classification (Fig. 3). The clusters are mapped in the space by the first three principal components (PC1 to PC3), which explain 86.2% of the overall variance (Fig. 3). Ellipsoids in the cluster (Fig. 3) map are of 95% confidence levels to provide visual observation of separated groups. The area of the cluster Dec_ham (D. hamiltonii) is higher than other clusters, this might be possible due to the differences in processing methods of the products. The outlier in the cluster Smilax, is the sample labeled as S. officinalis (44PR) all other are S. aristolochiifolia. However, all the samples are taxonomically grouped accordingly with the reference materials and aligned with genomic clustering (Fig. 4).

Discussion
We demonstrated for the first time that the oxford Nanopore can be used for quality assurance of authentic ingredients in the natural health products industry. The MinION provided useful sequence data at a reasonable cost. Genome skimming allows for shallow sequencing of high copy regions of a genome such as chloroplast, mitochondria and ribosomal DNA, which are useful for taxonomic identification. Oxford nanopore's MinION device is a handheld, portable sequencing device that can be used for genome sequencing in a limited resourcesetting 25  www.nature.com/scientificreports/     Using a probe-based technology has several advantages compared to traditional primer-based methods for species identification in herbal products. The qPCR method can be validated 27 and is fit for purpose in identity testing species ingredients. The barcoding approach has several disadvantages such as (1) does not work well on finished products because the DNA fragments are smaller than the barcode regions. (2) Resource intensive (times and costs). (3) Primers designed in conserved regions tend to amplify few to many non-target species depending on the level of sequence conservation. (4) Does not allow proper assay design with proper positive controls. (5) Cannot be properly validated for sensitivity (e.g., LOD/LOQ), and (6) several post-PCR processing steps such as running the gel, sequencing of amplicons and analysis of sequence information are necessary to identify the target species. Conversely, using real-time PCR, which monitors the amplification of target DNA in real-time using fluorescent dye, invalidates the requirement for post-PCR processing and provides quick species identification. This becomes even more useful when hundreds of samples to be tested and provide the authentication results to the industry within a specified time. In addition, having a portable qPCR device on-site further reduces the necessity to ship samples to a specified testing location and provide even quicker results. Therefore, we developed probe-based assays for four species of Sarsaparilla group and tested them successfully on a portable qPCR device.
Understanding the chemical composition of a phytomedicine is vital for providing safe consumer products. This is especially important for products labelled as Sarsaparilla that may contain taxonomic species and adulterants with similar morphological characteristics but with different chemical composition and pharmacological efficacy than original Sarsaparilla group. The botanical industry has been faced with these problems for a long time. Notably, these issues need to be highlighted and addressed to both botanical industries and consumers of natural health products. For example, Srirama et al. 28 reported that Phyllanthus debilis and Phyllanthus amarus are chemically different but biologically equivalent. Their study suggested that hepatoprotective property was found in P.amarus but not in P. debilis. In contrast, Saraco asoca which is used in gynecological disorders and its adulterant Polyalthia longifolia have similar chemical compositions but different biological effects 29 . In our study, the 'Sarsaparilla' is one such botanical entity that poses problems in terms of field identification by localized suppliers. This Sarsaparilla name designates the following botanicals in the industry: Smilax spp., Pteridium aquilinum (North American Sarsaparilla), Decalepis hamiltonii and Hemidesmus indicus (South East Indian Sarsaparilla). These botanicals are taxonomically different entities and traded and used as bio-active equivalents/substitutes. Conventional Pharmacognosy techniques based on macro-morphological characters may not be effective in distinguishing the four botanicals traded as "Sarsaparilla" if it is in a processed form such as root fragments or www.nature.com/scientificreports/ powdered extracts. In this study, we demonstrated that using both genomic and metabolomic approaches provide robust authentication for Sarsaparilla products. Our results present a combined approach that produced very similar results in segregating the four species in Sarsaparilla group (Fig. 4). Either of these tools could be used for the verification of Sarsaparilla species ingredients in raw or finished natural health products. Sample preparation. The standard sample preparation for NMR processing required 50 mg tissues to be homogenized using mortar and pestle in liquid nitrogen and dissolved in 1 ml of deuterated methanol (CD3OD). All the samples were prepared in triplicate. The solvent was chosen for its greater solubility towards diverse chemical compounds. Samples were incubated in the sonicating bath for 30 min at room temperature. Sonicated samples were centrifuged for 5 min at 3000 rpm, and then 600 μl of clear supernatant was collected in a 5 mm Wilmad ® NMR tube. The same samples used above for NMR extractions were also used for DNA extraction. A final volume of 100 µl DNA was extracted from samples using Nucleospin Plant II kit of Macherey-Nagel following the manufacturer's protocol. www.nature.com/scientificreports/ Spectra acquisition for NMR. To analyse chemical fingerprints of the samples 1H-NMR spectra were acquired using 600 MHz Bruker Avance III NMR equipped with a 5 mm "TXI" room temperature probe. To acquire data, we used Bruker pulse program "zgpr" and the acquisition mode "DQD" including the following parameters as follows: number of complex points, 28 Data processing and analysis for NMR. 1H-NMR spectra were processed using TopSpin 4.0.7. Phase and baseline were corrected automatically except for the two samples (Pte_aqu_BRM391-b and Dec_ham_337BIb). These two were having problem with baseline and was corrected manually using fourth-order polynomial algorithm. The corrected samples were well grouped with their replicates. Spectra were calibrated to the residual methanol peak as 3.31 ppm. Processed spectra were bucketed with simple rectangular buckets of positive intensities without scaling (AMIX 4.0.1). The chemical range utilized for bucketing was − 1 to 12 ppm, with a width of 0.01 ppm. While bucketing, the residual solvent signals of water and methanol were removed at the regions 4.75-5.06 and 3.16-3.45 ppm, respectively. After bucketing, each spectrum was normalized by setting below means as 0 and above means were binned from 1 to 100. All the spectra were analysed and clustered by the Hierarchical Clustering Analysis (HCA) and Hierarchical Cluster on Principal Compounds (HCPC) using Euclidian dissimilarity matrix and Ward's clustering method using the R package FactoMineR 30 . The spectral intensities of chemical compounds were converted to the chemical fingerprints and were hierarchically clustered (HCA).

Methods
DNA barcoding and conventional PCR. Samples were tested using CBOL Plant Working Group 9 recommended DNA regions rbcL, and matK as well as new primers designed in this study. The selected loci were amplified by polymerase chain reaction (PCR) on a PTC-100 thermocycler (Bio-Rad). DNA was amplified in 20 μL reaction mixtures containing 1 U AmpliTaq Gold Polymerase with GeneAmp 106 PCR buffer II (100 mm Tris-HCl pH 8.3, 500 mm KCl) and 2.5 mm MgCl2 (Applied Biosystems), 0.2 mm dNTPs, 0.1 mm of each primer (0.5 mm for matK), and 20 ng template DNA. Amplified products were sequenced for new markers in both directions with the primers used for amplification, following the protocols of the University of Guelph Genomics facility (www.uogue lph.ca/~genom ics). Products from each specimen were cleaned using Sephadex columns and run on an ABI 3730 sequencer (Applied Biosystems). Bidirectional sequence reads were obtained for all the PCR products. Sequences were assembled using Sequencher 4.5 (Gene Codes Corp), and aligned manually using Bioedit version 7.0.9.
Library preparation and sequencing on the MinIon. DNA quantity was measured using a Qubit spectrophotometer and fragment size of the genomic DNA was measured with Agilent's TapeStation instrument.
Genomic DNA was used for library preparation using 1D Genomic DNA by Ligation (SQK-LSK109) protocol of Oxford Nanopore Technologies. Flow Cell was attached to the MinION device and priming was completed as per the above protocol, the library containing adapter-ligated high molecular weight genomic fragments was loaded on to the flow cell and the sequencing run was run with default settings for 24 h. Throughout the run, raw sequencing reads were collected in minIT in both FASTQ and Fast5 formats while FASTQ files were used for data analysis.
Bioinformatic analyses. For assembling chloroplast genome, FASTQ files were uploaded into CLC Genomics Workbench software version 12.0.2. Chloroplast sequences were mapped directly to the reference Cynanchum auriculatumn chloroplast sequences from Genbank using CLC Workbench Read mapper, other genomic contigs were assembled using CANU by setting the genome size as 15 kb while discarding the reads shorter than 100 bp. www.nature.com/scientificreports/