Investigation of the cause of geographic disparities in IDEXX ELISA sensitivity in serum samples from Mycobacterium bovis-infected cattle

Accurately identifying Mycobacterium bovis-infected cattle is critical for bovine tuberculosis prevention and control. One method for identifying infected cattle is an ELISA developed by IDEXX laboratories, which detects antibodies to two M. bovis proteins, MPB70 and MPB83. The assay’s sensitivity varies by geographic region, with sensitivities of 77%, 45%, and 9% in bovine serum samples from the United Kingdom (n = 126), the United States (n = 146), and Mexico (n = 128), respectively. We hypothesized that geographically-biased sequence variation in mpb70 and mpb83, or in the genes that regulate their expression (sigK and rskA), may explain these differing sensitivities. This hypothesis was tested by comparing the sequences of these four genes in 455 M. bovis strains isolated from cattle in the aforementioned countries. For each gene, a single, common sequence was identified in most genomes of the M. bovis strains collected in all three countries. Twelve of the 455 strains were isolated from infected cattle for which the IDEXX ELISA was also performed. Five of the seven ELISA-positive genomes and three of the five ELISA-negative genomes contained the most common sequence of all four genes. Thus, sequence variation in mpb70, mpb83, sigK, and rskA does not explain the geographic disparities in IDEXX ELISA sensitivity.

• Supplementary Table S1 (supplementary table S1.xlsx): Complete list of M. bovis strains examined. Each row contains information for a single M. bovis strain. The first column contains the name of the strain, while the second column contains its sample code, if applicable. The third column contains the country of origin of the bovine from which the strain was isolated, while the fourth column indicates the organization that sequenced the genome of the strain.
• Supplementary Table S2 (supplementary table S2.xlsx): List of cases where the extended coding sequence of a given gene from the de novo assembly of a given M. bovis strain differed from that of the reference assembly for the same strain. The first column contains the strain for which the discrepancy occurred. The second column contains a description of the global alignment between the two sequences, while the third column describes the problem that appeared to cause the sequences to differ. The fourth column indicates which assembly process (de novo or reference) appeared to contain the correct sequence.
• Supplementary Table S3 (supplementary table S3.xlsx): List of anomalous extended coding sequences. The first column contains the strain from which the extended coding sequence was extracted. The second column indicates whether or not the extended coding sequence contained ambiguous nucleotides. The third column contains a description of the global alignment between the extended coding sequence from that strain and the corresponding extended coding sequence from M. bovis strain AF2122/97. The fourth column indicates whether the sequence was used for subsequent analyses (retained) or discarded.
• Supplementary Table S4 (page 15): Sequence variation in the extended mpb70 coding sequences. A multiple sequence alignment was constructed among the extended mpb70 coding sequences from 451 M. bovis strains, and alignment positions that did not contain the same nucleotide in all strains were identified. The header row of the table contains the alignment positions that were variable. Numbers prefixed with a minus sign or a plus sign indicate positions upstream or downstream of the coding sequence, respectively, while numbers with no prefix indicate positions within the coding sequence. For example ?1 means the base immediately preceding the start of the coding sequence, while 1 means the first base of the coding sequence and +1 means the base immediately after the stop codon. The extended canonical mpb70 coding sequence (i.e., the mpb70 sequence from M. bovis strain AF2122/97) was used as a reference. Specifically, the first main row of the table shows the bases that occur in each variable position in the reference sequence. The remaining rows show mutations in a given strain relative to the reference strain. If the nucleotide in a given position was the same as in the reference strain, then a period is shown; otherwise, the nucleotide (or gap) at that position is indicated. Only strains whose extended mpb70 coding sequence differed from that of M. bovis strain AF2122/97 are shown; the extended mpb70 coding sequence for all strains not shown were 100% identical to that of M. bovis strain AF2122/97. • Supplementary Figure S1 (page 23): The extended canonical mpb70 coding sequence. The coding sequence is shown in blue, while the upstream and downstream regions (500 bp each) are shown in black.
• Supplementary Figure S2 (page 24): Multiple alignment of the extended mpb70 coding sequences from 451 M. bovis genomes. The consensus sequence is shown at the bottom of the alignment. Nucleotides in the individual sequences that match the consensus sequence are shown in blue, while those that differ from the consensus sequence are shown in white. The coding sequence is indicated by a red box.
• Supplementary Figure S3  Supplementary Discussion S1: Detailed background information on mpb70, mpb83, sigK, and rskA This document provides background information on the antigens used by the IDEXX ELISA-the proteins MPB70 and MPB83-as well as their corresponding genes (mpb70 and mpb83). Specifically, their sequence characteristics are summarized in Section 1, while the three-dimensional structures of MPB70 and MPB83 are described in Section 2. Current knowledge concerning the functions of MPB70 and MPB83 is covered in Section 3, and Section 4 describes the existence of homologues of MPB70 and MPB83 in other bacteria. Background data are also given on two proteins and their corresponding genes that are known to regulate the expression of mpb70 and mpb83 (Section 5).
The information presented in this section comprises a mixture of results previously reported in the literature and simple in silico analyses performed by the authors of this study. Unless otherwise specified, the information is specific to M. bovis strain AF2122/97, which was the first M. bovis strain to have its genome sequenced.

Sequences of mpb70, mpb83, and their protein products
Including the stop codon, the coding sequence of mpb70 is 582 bp in length, and the MPB70 protein thus contains 193 amino acid residues. The first 30 residues of MPB70 constitute a signal peptide that is cleaved off by signal peptidase 1, giving a mature protein of 163 residues 1-3 . The only post-translational modification of MPB70 appears to be a disulfide bond linking residues C38 and C172 2 . These residue numbers, and all others in this file, correspond to the immature (pre-cleavage) form of the protein being described.
At 220 residues, MPB83 is slightly larger than MPB70, and contains a 23-residue N-terminal signal peptide that is cleaved by signal peptidase II 3,4 . Beginning at position 22, the protein contains the post-translational lipidation motif LAGC, with the cysteine residue being lipidated 5 . MPB83 is also post-translationally glycosylated, with Omannose linkages at two adjacent threonine residues (T48 and T49) 6 .
Previously, it was observed that mpb70 and mpb83 are paralogues with significant sequence identity 7 . When a Needleman-Wunsch global alignment 8 between their coding sequences was performed using the EMBOSS 9 program needle, 63.6% of the alignment positions were identical, while 18.0% of the positions were gaps and 18.4% were mismatches (Figure 1). Most of the gaps occurred in mpb70 near the beginning of the alignment. In a global alignment between the corresponding proteins, 60.5% of the alignment positions were matches, 9.4% were conservative substitutions, 14.8% were non-conservative substitutions, and 14.8% were gaps ( Figure 2).

Structures of MPB70 and MPB83
The three-dimensional structure of MPB70 has been determined using nuclear magnetic resonance spectroscopy 2 . It contains a single β-barrel structure consisting of 7 antiparallel β-strands ( Figure 3A). Half of the β-barrel is exposed to the solvent, while the other half forms part of the hydrophobic core of the protein.  The global alignment was produced using the EMBOSS 9 program needle, while the visual representation was produced using the EMBOSS program prettyplot.
MPB70 also contains eight α-helices, which are largely packed against the solventinaccessible portion of the β-barrel ( Figure 3A). The structure of MPB83 has not yet been experimentally determined. Thus, the homology-modeling software SWISS-MODEL 10 was used to predict the structure of MPB83 based on its homology to MPB70. Only the portions of MPB70 and MPB83 that are homologous to one another were included in the modeling process; this included residues 32-193 of MPB70 and residues 58-220 of MPB83 (see Figure 2). The model built by SWISS-MODEL was nearly identical to the experimentally-determined model of MPB70, with eight α-helices and seven antiparallel β-strands forming a β-barrel ( Figure  3B).

Functions of MPB70 and MPB83
While MPB70 and MPB83 have been extensively characterized as antigens, much less is known about their functions. MPB70 is secreted 11 , while MPB83 is anchored to the cell surface 12 . Both exhibit sequence homology to the FAS1 domain, which is a component of several proteins (such as fasciclin I) involved in cell adhesion 13 . It has been hypothesized that MPB70 and MPB83 may contribute to osteitis after tuberculosis infection or vaccination by interacting with periostin 3,14 , which is a protein that contains multiple FAS1 domains, is found on the surface of osteoblasts, and is thought to be involved in bone formation and repair 15 . Currently, however, there appears to be no direct evidence to support this hypothesis. Beyond this, little is known about the function of these proteins, with no function-related keywords, gene ontology terms 16 , or other functional annotations being associated with either protein 17 . A search for potentially more well-characterized bacterial homologues of MPB70 and MPB83 was performed using four iterations of position-specific iterative BLAST (PSI-BLAST) 18 ; however, little additional information was obtained, with most hits being proteins of unknown function (e.g., "hypothetical protein") or proteins annotated as "fasciclin" or "cell surface protein".

Regulators of mpb70 and mpb83 expression
Two genes, sigK and rskA, encode proteins-sigma factor K (SigK) and regulator of sigma factor K (RskA), respectively-that have been found to control the expression of mpb70 and mpb83. In a comparison of several different M. bovis strains, some of which expressed mpb70 and mpb83 in high amounts and some in low amounts, Charlet et al. 24 found that strains in the former group expressed sigK in high amounts, while strains in the latter group contained a mutation in the third position of the start codon of sigK that caused its expression to be very low. When low-producing strains were complemented with a non-mutated sigK gene, expression of mpb70 and mpb83 was similar to those of the high-expressing strains 24 . Thus, SigK appears to be a positive regulator of mpb70 and mpb83 expression. As a transcription factor, SigK contains both a DNA-binding domain and an RNA polymerase-binding domain 25 . DNA binding sites for SigK were found to be located at promoter boxes approximately 10 residues and 35 residues upstream of the transcription start site for both mpb70 and mpb83 26 . Another regulator of mpb70 and mpb83, RskA, has been investigated in the context of the differential expression of these genes in M. bovis strain AF2122/97 compared to Mycobacterium tuberculosis strain H37Rv. (In M. tuberculosis, mpb70 and mpb83 are conventionally called mpt70 and mpt83, respectively; however, for simplicity, they will be referred to here by their M. bovis-specific names). Specifically, it was observed that M. bovis strain AF2122/97 expresses high amounts of mpb70 and mpb83, while the expression of these genes in M. tuberculosis strain H37Rv is low, despite their sigK sequences being identical 27 . The authors hypothesized that rskA, a gene located physically close to sigK in the genome, was responsible for this difference. It was found that RskA in M. bovis contained two amino acid substitutions (G107D and G184E) relative to its counterpart in M. tuberculosis. When M. bovis was complemented with the version of rskA found in M. tuberculosis, expression of mpb70 and mpb83 was greatly reduced. To provide further evidence that RskA represses the transcription of mpb70 and mpb83, the authors deleted the rskA gene in M. tuberculosis, which caused it to express mpb70 and mpb83 in quantities similar to M. bovis strain AF2122/97 27 .
RskA represses the transcription of mpb70 and mpb83 by forming a complex with SigK that blocks both its DNA-binding domain and its RNA polymerase-binding domain, rendering it inactive 25 . Under oxidizing conditions, the SigK-RskA complex is stabilized by a disulfide bond between residues C133 and C183 of SigK; under reducing conditions, however, this disulfide bond is broken, causing SigK and RskA to dissociate 25 . SigK has thus been described as a redox sensor 25 .
The structures of SigK and the cytoplasmic domain of RskA have been determined in complex with one another using X-ray crystallography 25 (Figure 4). SigK contains two domains, each with four α-helices. The cytoplasmic portion of RskA contains four α-helices. Three of these are sandwiched by the two domains of SigK, while the fourth lies on the outside of the complex near the N-terminus of SigK.

Supplementary Discussion S2: Mutations in mpb70, mpb83, sigK, and rskA that could cause false negatives by the IDEXX ELISA
Several types of mutations in mpb70, mpb83, sigK, and rskA could explain false negatives by the IDEXX ELISA. These mutations can be divided into two categories: those that could prevent the anti-MPB70 or anti-MPB83 antibodies produced by the bovine host from interacting with the versions of those proteins used in the IDEXX ELISA, and those that could prevent the bovine host from generating anti-MPB70 or anti-MPB83 antibodies altogether. Many of these potential mutations relate to specific sequence-or structure-related attributes of MPB70, MPB83, SigK, and RskA (see Supplementary Discussion S1 online).
Potential mutations falling into the first category are as follows. For ease of discussion, some potential mutations are described in terms of the genes themselves, whereas others are described with reference to changes in their protein products.
• Substitutions in MPB70 or MPB83 epitopes. These could cause the antibodies generated by the infected bovine to be reactive to the mutated versions of the proteins, but not to the versions used in the IDEXX ELISA. Although it is not clear which regions of MPB70 and MPB83 are typically recognized by antibodies, solvent-accessible regions (such as residues 114-117, 123-128, 133-135, and 138-143 in MPB70; see Supplementary Discussion S1 online) would be more likely to be epitopes. • Substitutions in the glycosylated residues of MPB83 (T48 and T49). Such mutations could prevent MPB83 from being glycosylated, and it is possible that antibodies that recognize the unglycosylated form of the protein would not recognize the glycosylated form in the IDEXX ELISA. Substitutions in residues near the glycosylation sites, which may act as a recognition motif for the enzyme catalyzing the glycosylation reactions, could have the same effect. • Frameshift mutations in the mpb70 or mpb83 coding sequences. An insertion or deletion that changes the reading frame would result in the production of a very different protein product. Antibodies that recognize this product would be unlikely to recognize the corresponding protein used in the IDEXX ELISA.
Potential mutations falling into the second category are as follows.
• Mutations that alter the signal peptide of MPB70 or MPB83. These could prevent them from being secreted (for MPB70) or anchored to the cell surface (for MPB83), making them inaccessible to the host's antibodies. • Missense mutations in the start codon of mpb70 or mpb83. These would prevent their transcription, and thus no corresponding protein would be produced. • Substitutions in the lipidation motif (residues 22-25) of MPB83. These could prevent the lipidation of MPB83, preventing it from becoming anchored to the cell surface.
• Missense mutations in the start codon of sigK. These would prevent its transcription, and without SigK, mpb70 and mpb83 would no longer be transcribed. • Missense mutations, insertions, deletions, or nonsense mutations in the sigK coding sequence. Such mutations could render its protein product nonfunctional, preventing mpb70 or mpb83 from being transcribed. • The mutations D107G and E184G in RskA. As described in Supplementary Discussion S1 online, it has been shown that the presence of Asp and Glu in positions of 107 and 184, respectively, of RskA renders it non-functional, whereas it is functional when glycine residues are present in both positions. As RskA negatively regulates the expression of mpb70 and mpb83, the mutations D107G and E184G could cause only small amounts of MPB70 and MPB83 to be produced. • Nonsense mutations in the mpb70 or mpb83 coding sequences. A premature stop codon would result in a shortened protein product, and antibodies may not be elicited to the truncated protein.
• Mutations in the regulatory regions upstream of the mpb70 or mpb83 coding sequences. Mutations in these regions could reduce the ability of SigK to bind, preventing the transcription of these genes.