Unusual Five Copies and Dual Forms of nrdB in “Candidatus Liberibacter asiaticus”: Biological Implications and PCR Detection Application

“Candidatus Liberibacter asiaticus” (CLas), a non-culturable α-proteobacterium, is associated with citrus Huanglongbing (HLB, yellow shoot disease) currently threatening citrus production worldwide. Here, the whole genome sequence of CLas strain A4 from Guangdong of China was analyzed. Five copies of nrdB, encoding β-subunit of ribonucleotide reductase (RNR), a critical enzyme involving bacterial proliferation, were found. Three nrdB copies were in long form (nrdBL, 1,059 bp) and two were in short form (nrdBS, 378 bp). nrdBS shared >99% identity to 3′ end of nrdBL and had no active site. Sequences of CLas nrdB genes formed a distinct monophyletic lineage among eubacteria. To make use of the high copy number feature, a nrdB-based primer set RNRf/RNRr was designed and evaluated using real-time PCR with 262 HLB samples collected from China and USA. Compared to the current standard primer set HLBas/HLBr derived from the 16S rRNA gene, RNRf/RNRr had Ct value reductions of 1.68 (SYBR Green PCR) and 1.77 (TaqMan PCR), thus increasing the detection sensitivity three-fold. Meanwhile, RNRf/RNRr was more than twice the stability of primer set LJ900f/LJ900r derived from multi-copy prophage. The nrdB-based PCR thereby provides a sensitive and reliable CLas detection with broad application, especially for the early diagnosis of HLB.

Scientific RepoRts | 6:39020 | DOI: 10.1038/srep39020 dedicated for RNR research has been established 9 . Currently, no information about CLas RNR has been published, except for a brief mention of a partial RNR gene sequence in PCR detection 10 .
Detection of CLas mainly relies on PCR technologies involving the use of specifically designed primer sets based genomic DNA sequences, mostly the 16S rRNA gene. Examples are primer set OI1/OI2c for standard PCR 11 and primer set HLBas/HLBp/HLBr for TaqMan real-time PCR 12 . The chromosome of CLas has three copies of the 16S rRNA gene 13 . One strategy for further improvement of PCR detection is to identify and target genes with > 3 copies. The proof of concept has recently been achieved in PCR detection of Spiroplasma citri, causing citrus stubborn disease by targeting multi-copy phage genes 14 . In CLas, a phage-based primer set (LJ900f/LJ900r) has been developed and tested 15 . However, recent investigation showed that CLas prophages and their sequences were highly variable including the absence of prophage 5,16 , which could impede detection reliability or accuracy. The high copy number nrdB provides an ideal target for sensitive detection of CLas.
The aims of this research were: (1) characterize nrdB in CLas based on available RNR information and bacterial genome sequences and predict its possible biological role; (2) elucidate phylogenetic relationships of CLas among eubacteria based on nrdB DNA and amino acid sequences; and (3) evaluate the use of a nrdB-based primer set for improvement of CLas detection, with comparisons made to existing PCR primers such as the 16S rRNA gene-based primer set HLBas/HLBr and the prophage sequence-based primer set LJ900f/LJ900r.

Results
Identification of multiple-copy regions in A4 genome. As shown in Fig. 1, ten repeat regions were detected in the A4 genome by Dot Matrix analysis. Examination of the retrieved sequences revealed that regions 3, 4 and 6 were identical DNA sequences of 5,769 bp, each containing the genes of 16S, 23S and 5S rRNAs or the rrn operon (Supplementary Table S1). The other seven regions were sequences of three different sizes: 1,881 bp for region 1 and 10, 1,059 bp for regions 2, 5, and 9, and 1,491 bp for regions 7 and 8. Results of sequence alignments showed that regions 1, 2, 5, 9, and 10 contained a common 390-bp sequence (red in Fig. 1); regions 1, 7, 8, and 10 contained a common 1,492 bp sequence (green in Fig. 1); and regions 2, 5, and 9 contained a common 769 bp sequence (purple in Fig. 1). Genes or open reading frames (ORFs) corresponding to each region were listed in Supplementary Table S1.
Characterization of CLas nrdB. Since the 390 bp sequence was repeated five times (the highest) in the CLas genome, the 390 bp-containing sequences, i.e. region 1, 2, 5, 9, 10 ( Fig. 1) were selected for further study. In region 1 and 10, 378 of the 390 bp formed ORFs CD16_00035 and CD16_04445, respectively (Supplementary Table S1). In region 2, 5, and 9, the whole 1,059 bp formed ORFs CD16_00300, CD16_03625, and CD16_04230, respectively (Supplementary Table S1). All five sequences were annotated as nrdB encoding the β -subunit of RNR Class Ia (EC 1.17.4.1), two (CD16_00035 and CD16_04445) in short form (nrdB S , 125 amino acids) and three (CD16_00300, CD16_03625, and CD16_04230) in long forms (nrdB L , 352 amino acids) ( Table 1). Note that 12 bp at the 5′ end of the 390-bp sequence were not part of nrdB S (Fig. 2). nrdB S1 and nrdB S2 had a SNP at position 389, part of the synonymous stop codons. Five SNPs were found among nrdB L1 , nrdB L2 , and nrdB L3 without causing frame shifts (Fig. 2). Conserved domain analysis indicated the long nrdB L protein (352-aa) contained a diiron center (ion binding site), the tyrosyl radical, a putative radical transfer pathway and a dimer interface The dot-matrix map was created by self-comparison through BLAST program available in National Center for Biotechnological Information. Genome length was marked in both X-and Y-axis with the prophage region identified. The upper-left diagonal (in blue shadow) shares the same information as the bottom-right diagonal. Examination on one diagonal (e.g. the bottom-right) reveals ten repeat regions on the diagonal line labeled with numbers accordingly. Sequences sharing > 99.9% similarities (repeats) among the ten regions are marked with the same color. The red color sequence (390 bp) has the higher copy number of five (Region 1, 2, 5, 9, and10). Region 3, 4, and 6 are rrn operon in blue.
BLASTn search against all published CLas genome sequences revealed that all CLas strains had the same number of nearly identical nrdB genes (both nrdB S and nrdB L ) ( Table 2), except for the CLas strain SGCA5, which could be due to the influence of de novo assembly 17 that dropped out repeat sequences because reassembly using A4 sequence as a reference showed the same five nrdB genes (unpublished data). The copy number of nrdB in CLas was much higher (five) than all the non-CLas Liberibacters, as well as those of other bacterial species (Table 2). Phylogenetic trees of selected representative bacteria based on 16S rRNA gene, amino acid sequence and DNA sequence of nrdB gene are shown in Fig. 4. In all three trees, Liberibacters were clustered together. Within Liberibacters, CLas clustered together, demonstrating the monophyletic lineage of CLas based on nrdB gene as that of the 16S rRNA gene. It is, however, noted that based on 16S rRNA gene tree, Agrobacterium was closely related to Liberibacters. This was not the case in the nrdB gene tree.
Specificity of RNR primer set. Primer set RNRf/RNRr was designed based on the 390 bp repeats in the CLas genome ( Fig. 2; Table 3). BLASTn search (word size = 16) using RNRf/RNRr primer sequences as queries against the GenBank nr/nt database that contained > 1,000 bacterial genome sequences returned hits strictly to the RNR gene of CLas. PCR of DNA samples extracted from two healthy citrus plants and one CLas-free psyllid reared in our laboratory showed no amplification with primer set RNRf/RNRr by SYBR Green real-time PCR. The melting point of RNRf/RNRr amplicon was at 81.50 °C.

Evaluation on RNRf/RNRr with field samples from China and USA.
A total of 262 DNA samples extracted from CLas infected plants and psyllids in seven provinces in China and three states in USA were tested with SYBR Green real-time PCR format (Table 4). Overall, there was a significant difference between the Ct values of RNRf/RNRr and HLBas/HLBr (P < 0.0001), although variations existed from location to location in both countries. The largest P value in China was from Guangxi Province and the largest P value in USA was from Florida. However, in all cases, P values were < 0.05 and Δ Ct were negative within a range from − 1.36 to − 1.75 (Table 4). In addition, the RNRf/RNRr qPCR assays on three different qPCR systems (ABI system, MJ system, and CFX system) also showed the robust of RNRf/RNRf on detection of CLas (Table S2).

Discussion
The inability to culture CLas in vitro limits the use of traditional in vitro culture-based methodologies to study its biology. Genome sequence analyses in this study provided the first insight into an RNR gene of CLas and reveal previously unknown properties of the bacterium. According to model studies, RNRs are divided into three classes (Classes I, II, and III), largely based on their interaction with oxygen and the way in which they generate their tyrosyl racdical 19 . The CLas nrdB described in this study belongs to Class Ia, that is exclusively oxygen-dependent 8 , implying an aerobic lifestyle of CLas. This is the first report on oxygen usage status of CLas, which will benefit future efforts on in vitro cultivation of the bacterium.
Typically, bacterial RNR genes are arranged in an operon. Class Ia RNR genes form nrdAB, where nrdA encodes RNR α -subunit, and nrdB encodes RNR β -subunit. This does not seem to be the case in CLas, where both nrdA and nrdB are dispersed separately in the bacterial genome (  Table S3), encoding a riboflavin biosynthesis protein similar to that of nrdI in the Class Ib operon nrdHIEF where nrdH encodes a glutaredoxin-like protein, nrdI encodes a flavorotein, nrdE encodes RNR α -subunit, and nrdF encodes RNR β -subunit 20 . It was also reported that in Mycobacterium tuberculosis, RNR subunit genes were not arranged in an operon 21 . Interestingly, both CLas and M. tuberculosis are nutritionally fastidious intracellular pathogens. The HLB associated CLas is not cultivable. The slow growing M. tuberculosis causes human tuberculosis. The most intriguing finding from this study is that CLas has five copies nrdB, three in a long form designated nrdB L and two in a short form designated nrdB S , along with a single nrdA ( Table 1). As shown in Table 2, among the known Liberibacter genomes, only CLas has multiple copies of RNR genes. Although it is common to find multiple RNR classes within a single bacterial species 8 , only a few cases of nrd gene direct duplication have been reported. For example, M. tuberculosis has a second class Ib-like subunit gene 21 and Sreptococcus pyogenes has two clusters of class Ib genes, nrdHEF and nrdF*I*E* 22 . In both cases, the duplicated genes show significant variations at the level of DNA sequences (< 71% identity). In this study, the sequences of three nrdB L are almost identical and the two nrdB S are nearly identical. The common regions between nrdB L and nrdB S are also identical. These indicate that the nrdB gene duplication events are recent.
Duplication of RNR genes has been shown to be important for bacterial proliferation. As in the cases of M. tuberculosis and S. pyogenes, the two different nrd genes allowed bacterial growth under different growth environments 21,22 . Along this direction, the nrdB duplication in CLas could be related to its environmental adaptation and likely by increasing functional dosage 23 . Although more evidence is needed, it will be of interest to study if this possible dosage effect could be linked to the current dominance of CLas in HLB. In Brazil, both CLas and  The iron binding residues in pink centered by a purple dot (binding site), the tyrosyl radical in red, the putative radical transfer pathway in green. The regions targeted by primer set RNRf/RNRr were highlight in cyan. All conserved residues and model was generated using Phyre server 29 . The final refinement of all 3-D structure figures were made using the Pymol Molecular Graphics System (v1.7.6).
Scientific RepoRts | 6:39020 | DOI: 10.1038/srep39020 CLam were reported to be associated with HLB 24 . However, as observation continued, the population of CLas increased whereas the population of CLam decreased 25,26 .
It is noted that nrdB S has no active site (Fig. 4). Its biological role(s) could be an interesting topic. In early research, a strain of Escherichi coli (C600) was found to have two forms of β -subunit of RNR, one was a full length and functional β -polypeptide, the other was a truncated and non-functional β '-polypeptide 27 . In a model RNR structure of α 2β 2, there could be two possible homodimeric β -subunits (β β and β 'β ') and one heterodimeric β -subunit (β β '). The heterodimeric β -subunit was found to conform to a half-site reactivity, which might be involved in regulation of enzyme activity. In this regard, we speculate that the non-functional short form nrdB S could be used at the transcriptional level to generate a heterodimer as part of the RNR regulation in CLas proliferation.
While in silico genome sequence analyses of RNR genes only provide information for understanding CLas biology, the high copy number and conserved feature of nrdB was explored for CLas detection. The use of primer set HLBas/HLBr along with a hybridization probe (TaqMan PCR) has been regarded as a standard protocol for CLas detection. However, problems arise when high Ct values, e.g. Ct = 30 or higher, are encountered. This situation is commonly encountered when testing citrus trees for the presence of CLas, especially for symptomless or atypical symptom samples. The available RNRf/RNRr PCR detection system provides a remedy. First, as HLBas/ HLBas, RNRf/RNRr was also based on the highly conserved gene. This assured the reliability of CLas detection, in contrast to the prophage-based primer set Lj900f/LJ900r (Fig. 5). In fact, the universal presence of RNR gene has been recommended as a key target for phylogeny research of viruses that lack ribosomal RNA genes 28 ; and   Table 3. General information of PCR primers in this study. second, the RNRf/RNRr locus has five copies, higher than the three copies of the 16S rRNA gene. This means more initial targets are available for PCR leading to increased sensitivity of detection. As demonstrated in Fig. 5, RNRf/RNRr PCR is at least three times more sensitive than HLBas/HLBr PCR in both SYBR green and TaqMan formats. In this study, the robust of RNRf/RNRrqPCR assays were also confirmed on three different real-time PCR system, although greater sensitivity of RNR primers was showed on both ABI system and MJ system rather than on CFX system (Table S2). In summary, through genome sequence analyses, we discovered that CLas had five copies of RNR β -subunit gene nrdB. CLas nrdB has both long and short forms that could play a role in the RNR regulation in the bacterial proliferation. Phylogenetically, all CLas nrdB genes clustered together, forming a stable evolutionary lineage, as that of the 16S rRNA gene. The high copy number and conserved feature of nrdB provide a foundation for being used in sensitive and reliable detection of CLas. Primer set RNRf/RNRr has been developed and tested. The detection system is recommended for use to resolve CLas detection issue when the primer set HLBas/HLBr encounters border line Ct for interpretation.

Materials and Methods
Bacterial genome sequences and strains. The whole genome sequence of CLas strain A4 that originated from an HLB citrus tree in Guangdong of China (CP010804) 3 was used for DNA/gene copy evaluation. All bacterial genome sequences were downloaded from GenBank database (v211.0) hosted by the National Center for Biotechnology information (NCBI) ( Table 2). Field strains were collected for population study. A CLas strain was represented by DNA extracted from an infected leaf sample of citrus (Citrus sp.) or periwinkle (Catharanthus roseus) or an individual ACP. Samples were from seven provinces (Guangdong, Guangxi, Yunnan, Fujian, Jiangxi, Zhejiang and Hainan) in China and three states (Florida, Texas and California) in USA (Table 4) Identification of nrdB and in silico characterization. The CLas strain A4 genome sequence was self-compared using the BLASTn program with the word size set at 128-bp with the web service of NCBI. The result was visualized with the Dot-Matrix option. DNA sequence regions with highest number of repeats were retrieved. The genetic nature of DNA sequences was characterized according to genome annotation, assisted by BLAST search against the NCBI conserved domain database (CDD, v3.14). Since the identified DNA sequences were longer than the annotated genes, only gene sequences were downloaded and used for analyses. Protein structure analyses were initially carried out with Phyre server (http://www.sbg.bio.ic.ac.uk/~phyre2/html/ page.cgi?id= index) using a profile-profile alignment algorithm 29 . Final 3-D structures were made using Pymol Molecular Graphics System (v1.7.6, Schrödinger LLC).
For phylogenetic studies, all published CLas and selected bacterial species representing major bacterial groups were used (Table 2). DNA and amino acid sequence of nrdB were retrieved according to genome annotation or from the ribonucleotide reductase database (v0.901) 9 . The total number of nrdB gene in each genome was directly counted from the genome annotation and further confirmed by similarity searching the bacterial genome with the corresponding nrdB sequence. DNA sequences of 16S rRNA genes were downloaded from NCBI GenBank nucleotide database (Genbank version 211.0). Phylogenetic trees were constructed using the Neighbor-joining method with MEGA 6.0 30 .  Primer/probe designs and PCR experiments. CLas nrdB sequences were aligned through the Clustal Omega software 31 . Common regions across all nrdB sequences were identified and used to design PCR primers and TaqMan probe sequences with Primer 3 software 32 (Table 3). Primer and probe sequence specificity were checked through BLASTn against the GenBank nucleotide database (Genbank version 211.0). The TaqMan probe was synthesized by labeling the 5′ -terminal nucleotide with 6-carboxy-fluorescein (FAM) reporter dye and the 3′ -terminal nucleotide with Black Hole Quencher (BHQ)-1 (Table 3) through a commercial source. Primers of HLBas/HLBr and HLBp and LJ900f/LJ900r were synthesized according to the original publication 12,15 . Both SYBR Green and TaqMan real-time PCR formats were used in this study. The SYBR Green real-time PCR assays were performed in three different real-time PCR systems. In the USA, MJ Research DNA Engine For evaluation of differences among primer sets of RNRf/RNRr, HLBas/HLBr, and LJ900f/LJ900r, 34 CLas samples from China and 10 CLas samples from USA were used (Table 3). The SYBR green real-time PCR format was used to for primer set evaluations. Since HLBas/HLBr-HLBp (TaqMan real-time PCR format) was popularly used, RNRf/RNRr-RNRp was also used. To substantiate the evaluation results, a total of 262 CLas samples collected from China and USA (Table 4) were tested with SYBR green format.