Genetic variation and linkage disequilibrium in Bacillus anthracis

Zwick, Michael E.; Thomason, Maureen Kiley; Chen, Peter E.; Johnson, Henry R.; Sozhamannan, Shanmuga; Mateczun, Alfred; Read, Timothy D.

doi:10.1038/srep00169

Download PDF

Article
Open access
Published: 24 November 2011

Genetic variation and linkage disequilibrium in Bacillus anthracis

Michael E. Zwick^1,2,
Maureen Kiley Thomason¹,
Peter E. Chen¹,
Henry R. Johnson⁴,
Shanmuga Sozhamannan¹,
Alfred Mateczun¹ &
…
Timothy D. Read^1,2,3

Scientific Reports volume 1, Article number: 169 (2011) Cite this article

2397 Accesses
6 Citations
12 Altmetric
Metrics details

Subjects

Abstract

We performed whole-genome amplification followed by hybridization of custom-designed resequencing arrays to resequence 303 kb of genomic sequence from a worldwide panel of 39 Bacillus anthracis strains. We used an efficient algorithm contained within a custom software program, UniqueMER, to identify and mask repetitive sequences on the resequencing array to reduce false-positive identification of genetic variation, which can arise from cross-hybridization. We discovered a total of 240 single nucleotide variants (SNVs) and showed that B. anthracis strains have an average of 2.25 differences per 10,000 bases in the region we resequenced. Common SNVs in this region are found to be in complete linkage disequilibrium. These patterns of variation suggest there has been little if any historical recombination among B. anthracis strains since the origin of the pathogen. This pattern of common genetic variation suggests a framework for recognizing new or genetically engineered strains.

Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Article Open access 17 October 2019

Alejandra Vergara-Lope, M. Reza Jabalameli, … Reuben J. Pengelly

Pan-genomics in the human genome era

Article 07 February 2020

Rachel M. Sherman & Steven L. Salzberg

Frequencies and characteristics of genome-wide recombination in Streptococcus agalactiae, Streptococcus pyogenes, and Streptococcus suis

Article Open access 27 January 2022

Isaiah Paolo A. Lee & Cheryl P. Andam

Introduction

Characterizing the patterns of genomic variation found among microbial pathogens often reveals unique aspects of their biology and evolutionary history^1,2. In bacteria and archaea, recombination arises from proximate mechanisms that include transduction, conjugation and transformation and can shape the levels of genomic variation and the observed patterns of statistical association between variant sites³. Detecting the patterns of association among common variant sites, termed linkage disequilibrium, has been used in both bacteria and archaea to help elucidate the effects of recombination on genomic variation^4,5,6,7. More recently, elegant methods for detecting recombination have confirmed that historical recombination rates show extraordinary levels of variation within some bacteria genera^8,9.

Whole-genome sequencing studies of the highly virulent gram-positive endospore-forming bacterium Bacillus anthracis, the agent used in the 2001 bioterrorist attacks in the United States, have led to a number of major findings. B. anthracis is found to be a recently emerged, monophyletic lineage from within the polyphyletic Bacillus cereus sensu lato group, with low levels of genetic variation. A variety of approaches, including multilocus variable number of tandem repeats analysis (MLVA)^{10,11,12,13,14,15,16}, amplified fragment length polymorphism¹⁷, Sanger sequencing^18,19,20, multilocus sequence typing (MLST)²¹ and microarray-based resequencing²², note a paucity of genetic variation within B. anthracis. Complete genome sequencing of a limited number of B. anthracis genomes lends further support for these findings^23,24. Recently, reports show that genotyping B. anthracis strains with “canonical” single nucleotide polymorphism (SNP) typing can efficiently illuminate the organism's global population structure²⁵.

While the B. anthracis lineage appears to be monophyletic, the existence of “canonical SNPs” implies that historical recombination between strains is likely a rare event. In a previous study, we used custom-designed resequencing arrays to resequence 29 kb from a worldwide panel of 56 B. anthracis strains (3.1 Mb total sequence). Our analysis showed not only low levels of genetic variation, but also complete linkage disequilibrium among the common single nucleotide variants we discovered²². These variant sites were located on the pXO1 and pXO2 plasmids in addition to the main chromosome. These observations are consistent with a model in which all extant B. anthracis strains arose from a single clone, with no historical recombination occurring among the different strains; thus, the common variants observed today in B. anthracis strains reflect the mechanism of mutation as opposed to the acquisition of sequences from other strains by recombination. If true, this hypothesis would explain why it is possible to use a few canonical SNPs to characterize the global population structure of clonal B. anthracis strains.

Here we report the results of a sequencing study whose aim was to quantitatively assess the levels and patterns of genomic variation in B. anthracis to replicate our original findings for a much larger genomic region. Using a novel experimental protocol consisting of whole-genome amplification of different samples followed by hybridization to custom-designed resequencing arrays, we resequenced 303 kb in each of 39 B. anthracis strains from a worldwide strain collection (9.6 Mb total sequence). We used an efficient algorithm contained within a custom software program, UniqueMER, to identify and mask repetitive sequences on the resequencing array to reduce false-positive identification of genetic variation, which can arise due to cross-hybridization. Our analysis of the resulting sequencing data estimates a remarkably low level of DNA sequence variation, by functional class, in B. anthracis. Furthermore, our analysis shows complete linkage disequilibrium among common segregating sites in the region that we resequenced. The patterns of variation we see are consistent with an absence of historical recombination among B. anthracis strains since the origin of the pathogen.

Results

We performed targeted sequencing of 39 B. anthracis strains from the Biological Defense Research Directorate's strain collection using custom-designed Affymetrix resequencing arrays (Table 1). We determined the raw sequence from each RA image file by using the ABACUS algorithm as implemented within the RATools software package^22,26,27 and then filtered as described in the Materials and Methods. A total of 9.5 Mb (∼245 kb per B. anthracis strain) of genome sequence was obtained (Supplemental File 1, Supplemental Table 2). Figure 1 shows the phylogeny of the B. anthracis strains inferred from these sequences. Two results are apparent. First, we see a statistically significant differentiation between the B. anthracis A and B strains, as found previously by both ourselves and others^13,22,25. Second, we see that the sequenced Ames strains cluster together (BAN 003, 039, 032, 041), which reflects the recent origin of these strains from a common ancestor.

Table 1 List of the worldwide collection of 39 B. anthracis strains resequenced

Full size table

Population genomic analysis of the 39 B. anthracis strains sequenced revealed a total of 240 single nucleotide variants (SNVs), with strains having an average of 2.25 differences per 10,000 bases sequenced (Table 2). This analysis shows that B. anthracis has a remarkably low level of genomic variation, consistent with our previous estimate and what has been seen in a number of newly arising bacterial pathogens^2,22. For the purpose of comparison, this level of variation is roughly a quarter of that observed in the human genome when sampled in a similar worldwide fashion^28,29,30. After functionally annotating the 240 SNVs, we found that on a per-site basis, replacement sites (those sites that change amino acids in proteins) are the least variable, silent sites are the most variable and intergenic regions have intermediate levels of genetic variation.Our data provide a slightly lower (0.39 vs 0.58) albeit not dramatically different estimate for the dN/dS ratio than that previously reported for B. anthracis (Table 2).

Table 2 Characteristics of single nucleotide variants (SNVs) observed within genomic regions sequenced in a worldwide collection of 39 B. anthracis strains.

Full size table

We saw that 141 of the 240 SNVs we discovered were found in just a single sample. To assess the significance of this observation, we performed an analysis of the site frequency spectrum compared with what would be expected under the neutral theory, with the neutral theory expectation assuming we sampled a constant-sized population at mutation-drift equilibrium³¹. Our analysis revealed an excess of rare variants relative to the neutral theory expectation, as evidenced by a negative value for the Tajima's D statistic³² (Table 2). Statistically significant departures from the neutral expectation were observed for all SNVs (Figure 2) and for the class of replacement SNVs (Figure 3). Possible explanations for the pattern we observed are rapid demographic expansion of B. anthracis, or purifying selection acting to remove deleterious alleles, similar to what we reported in our earlier, more limited study²².

We previously noted an absence of historical recombination in a worldwide collection of B. anthracis strains²². We predicted that if B. anthracis arose from multiple independent clones or underwent recombination since the time of the most recent common ancestor of the worldwide collection of strains we sequenced, there should be genetic evidence of this. To test this hypothesis and characterize the extent to which recombination has shaped patterns of genomic variation in B. anthracis, we analyzed our data to seek evidence of historical recombination in the region that we resequenced. We first used LDhat to estimate the amount of recombination among 64 common SNVs found at greater than 10% frequency in our sample^33,34. The 97.5% upper bound for our estimate of historical recombination (2N_er) was 1.0×10⁻⁵ per site. Strikingly, this upper bound estimate for recombination is 22 times lower than that determined for Watterson's estimator of the population mutation rate (Θ_w per site) shown in Table 2³⁵. This finding implies that historical recombination has had little or no effect on the patterns of genetic variation we saw. Furthermore, the estimate for 2N_er obtained from LDhat is identical to the minimum amount of recombination that can be estimated with this program, implying that the true value could be substantially lower. In a related test, we asked whether historical recombination among any of the 2,278 pairs of common (>10% frequency) SNVs we detected formed four distinct haplotypes³⁶ as a result of historical recombination between different B. anthracis strains. We never saw this outcome in our data, providing a point estimate of 0 for the historical recombination among the SNVs in the region we resequenced³⁷. The absence of historical recombination is evident by the complete lack of any pairs of sites with four haplotypes (Figure 4, Haploview 4.2). Combined, our data confirm our previous observation that historical recombination within B. anthracis is exceedingly rare or nonexistent, consistent with a model whereby all contemporary B. anthracis strains arose from a single common clonal ancestor²².

Discussion

Our data show that whole-genome amplified bacterial genomes can be hybridized to oligonucleotide resequencing microarrays to determine genome sequences. Analysis of the resulting sequence data gives us an important insight into the population structure and history of B. anthracis. A great many studies have supported a monophyletic origin of B. anthracis^{10,11,12,13,14,17,21,22,25,38} and our analysis reveals no evidence for recombination in the history of the worldwide collection of B. anthracis strains we sequenced. This finding provides a clear explanation for why canonical SNPs are able to successfully type strains, because if recombination were common, as has been shown in other larger bacterial genera, then different genomic regions would have distinct evolutionary histories^3,9. The extensive linkage disequilibrium in B. anthracis that we describe stands in stark contrast to some human pathogens, in which exchange of genetic material is fundamental to the organism's pathogenicity^6,7,39,40,41 (but see⁴²).

The apparent absence of historical recombination in B. anthracis could have at least two explanations. The first is that B. anthracis has reduced recombination, perhaps because it is inherently refractive to transduction, conjugation and transformation, or because there are defects in the DNA replication machinery. These deficiencies would have to have arisen very early in the history of B. anthracis to be passed onto the worldwide population of the species. Arguing against this hypothesis is the observation that genetic studies have shown it is possible to create recombinant B. anthracis strains in the laboratory⁴³. Furthermore, the induction of natural competence in B. cereus ATCC14579^44,45 suggests that transformation could occur in natural populations. Finally, historical recombination has been inferred by comparing genome sequences from strains that compose the larger B. cereus group⁹. We believe the more plausible explanation is that low levels of genetic variation combined with the recent global population expansion have limited the opportunities for vegetative B. anthracis strains with enough genetic divergence to detect recombination to co-locate. If true, this hypothesis predicts that future dense surveys of B. anthracis from Africa, where the most genetically diverse strains are found, might be able to detect recombination, if it is in fact occurring.

An analysis of recently evolved pathogens that included B. anthracis reported an elevated dN/dS ratio compared with more distantly related microbial taxa deriving from much more ancient last common ancestors, such as Escherichia coli⁴⁶. The authors interpreted their data as providing evidence for relaxed natural selection in newly arising pathogens. This interpretation depends formally upon treating the variants within a clonal lineage, like B. anthracis, as older divergent sites (found between species), as opposed to younger, segregating polymorphic sites (found within species). Our data provide a slightly lower (0.39 vs 0.58) albeit not dramatically different estimate for the dN/dS ratio than that previously reported for B. anthracis (Table 2). But we disagree with the classification of the sites for recently derived clonal lineages⁴⁶. Rocha et al. (2006) previously showed that comparisons of dN/dS between closely related bacterial genomes need to explicitly consider the time since divergence of the analyzed strains⁴⁷. Furthermore, population genetic theory predicts that the behavior of statistics like dN/dS will differ for polymorphic and divergent sites^48,49 and that the use of this statistic in population-genetic samples is relatively insensitive to the strength of natural selection⁵⁰. In fact, the elevated dN/dS ratios seen are those predicted for segregating polymorphic variants (see Figure 3 in⁴⁹). Thus, the inference of relaxed natural selection in newly arising pathogens, like B. anthracis, is not well supported by the data observed.

Finally, the apparent absence of recombination within B. anthracis suggests that the patterns of association seen among common sites could be a powerful tool to help recognize newly arising or genetically engineered strains. As new strains are typed for their common SNP variation, their allelic configurations could be compared against other previously characterized strains. Novel allelic configurations would indicate a previously unobserved strain variation and possibly point to a need for greater genetic and phenotypic characterization. The increasing throughput and ever-decreasing costs of pathogen whole-genome sequencing mean that in the very near future, these sorts of sequence-based experiments that can rapidly detect both common and rare variants are likely to become routine³⁸. Methods of analysis of these rich datasets that directly characterize the patterns of linkage disequilibrium among variant sites could give us valuable insights into the origins and evolutionary processes shaping the genomes of pathogens.

Methods

B. anthracis Strains Sequenced

We selected a diverse panel of 39 Bacillus strains from the Biological Defense Research Directorate (BDRD) collection at the Navy Medical Research Center (NMRC) for chip resequencing (see Table 1). Twenty-one of the strains were also typed by MLST using ABI sequencing⁵¹. The MLST data are available through the Bacillus cereus MLST website (http://pubmlst.org/bcereus/).

RA design, Hybridization, Sequence determination

The RA design queried 303,006 base pairs and was based upon the B. anthracis Ames reference sequence (5.2 Mbp, NC_003997). Unique sequences targeted for sequencing were identified as previously described²². Genomic DNA from each strain was isolated using standard protocols as previously described^22,27. We obtained target DNA for RA hybridization by performing whole-genome amplification (WGA) on 100 ng of genomic DNA following the manufacturer's instructions (REPLI-g Kit from Qiagen, Valencia, CA). The typical yield was 20 – 30 ug per strain. The WGA DNA was then DNAse digested, biotin end-labelled and hybridized to individual RAs overnight following established protocols^22,27. Subsequent washes and stains were carried out following the RA manufacturer's standard protocols (Affymetrix, Sunnyvale, CA). RAs were scanned at 570 nm, with a pixel size of 3 m per pixel averaged over 2 scans. Genomic sequences were determined for each sample by using the ABACUS algorithm as implemented in RATools (http://www.dpgp.org)^22,26,27.

Filtering of Raw Sequence Files

The raw sequence files obtained from RATools were filtered in two ways. First, we used UniqueMER to mask repeated 30-mers for each of the 39 strains sequenced. UniqueMER is an open source program that locates all unique and repeat n-mers in an input space consisting of a given set of genomes (https://sourceforge.net/projects/uniquemer/). The algorithm for the program is a distributed hashing scheme consisting of a hash table per computing node. The genomes in the input space are divided among the available computing nodes and all hash tables are processed in parallel. A hash table can represent one or more genomes and each entry in a table represents one n-mer and its frequency of occurrence in the entire input space. A sliding window equal to the length of the n-mer slides in 1-bps increments across each genome, the subsequence in the window is hashed and its frequency of occurrence updated. Each n-mer is hashed using the following hash function:

where s is the n-mer, s[i] is the ith nucleotide in the n-mer and n is the length of the n-mer. Thus, an n-mer is unique if it does not have a maximal, exact match to any other n-mer. As space complexity was the bottleneck to allow more genomes to be processed, in-memory load was reduced by avoiding the storage of sequence in the hash tables. Collision resolution for n-mers with identical hash codes but different underlying sequences is achieved by retrieving the n-mer from disk.

A sequence is considered unique if it is unique among all sequences, in both forward and reverse orientations, in the input space. The program tracks the copy number of each n-mer and outputs the frequencies as a histogram. Each n-mer is further grouped into blocks of unique and repeat sequences if there is overlap between neighboring n-mers based upon physical location. The blocks of unique and repeat n-mers are outputted in GFF format (http://www.sanger.ac.uk/resources/software/gff/spec.html). The GFF format file containing the coordinates of repeated exact match 30-mer sequences was then used to filter the strain sequences using a custom Perl script.

The second screening method consisted of masking those sequenced bases called in less than 80% of the sequenced samples. The final sequence files are contained within a .zip archive in Supplemental File 1. Supplemental Table 2 reports the genome coordinates and percent bases called for each sample sequenced. Supplemental Table 3 contains the position and genotype of all single nucleotide variants (SNVs) discovered in this study.

Phylogenetic Analyses

The PHYLIP package (v3.69) was used for all phylogenetic analyses⁵². UniqueMER filtered RA genome sequences for each strain were concatenated to create a single strain sequence in FASTA format. RA sequences were converted to PHYLIP format using Clustal X for subsequent analyses⁵³. A custom Perl script (Phylip_neighbor_distance.pl) that called the PHYLIP program's dnadist and neighbor modules was used to generate a distance matrix and determine a neighbor-joining (NJ) tree for the RA datasets. A separate Perl script (Phylip_boot_distance.pl) that called the PHYLIP program's seqboot, dnadist, neighbor and consense was used to generate 1000 replicate data sets for bootstrap analysis of the NJ trees. The PHYLIP program drawgram was used to draw the NJ trees. The Phylip program's dnapars and proml were use to confirm distance trees using parsimony and likelihood, respectively.

Population Genetic Analyses

All population genetic analyses were calculated using the popgen_fasta2.0.c code (Cutler DJ, unpublished work) on the 39 B. anthracis fasta files as previously described²². This code calculated the average number of pairwise differences and Watterson's estimator of the population mutation rate (Θ_w per site) for the entire sequenced region and different annotated SNV functional classes while accounting for missing data. A point estimate for Tajima's D was determined for all the data and different SNV functional classes. The statistical significance of these point estimates was determined relative to the standard neutral theory expectation, mainly a constant-sized population and mutation-drift equilibrium. Our linkage disequilibrium analysis of common single nucleotide variants (SNVs) included sites at greater than 10% frequency with genotype calls in at least 80% of samples analyzed. In order to analyze the list of common SNPs with McVean's LDHat program, a unique conversion script was written to generate the necessary sites and locs files. These files provide the input for convert. Within convert, all common SNVs as defined previously were analyzed. The output files from convert, in addition to a uniquely generated likelihood file, are then used as input for interval. Interval generates rates.txt and bounds.txt using an assigned start value for 2N_er that dictates the starting point for the RJMCMC. Multiple values for this starting parameter were tested, all of which provided identical output. Finally, stat was run to generate summary files for both the rates and bounds output files that were generated by interval. These summary files provide the mean, median, 2.5^th percentile and 97.5^th percentile estimates of the recombination rates between each pair of SNPs, as well as the estimated locations of recombination rate changes in the region being analyzed. Haploview 4.2 was used to assess and visualize the linkage disequilibrium in the samples by performing the 4-gamete test⁵⁴. The default 4-gamete color scheme was used, with black blocks representing less than 4 distinct 2-marker haplotypes.

References

Maiden, M. C. J. Multilocus sequence typing of bacteria. Annual Review of Microbiology 60, 561–588 (2006).
Article CAS PubMed Google Scholar
Achtman, M. Evolution, population structure and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol 62, 53–70 (2008).
Article CAS PubMed Google Scholar
Didelot, X. & Maiden, M. C. J. Impact of recombination on bacterial evolution. Trends in microbiology 18, 315–322 (2010).
Article CAS PubMed PubMed Central Google Scholar
Jolley, K. A., Wilson, D. J., Kriz, P., McVean, G. & Maiden, M. C. The influence of mutation, recombination, population history and selection on patterns of genetic diversity in Neisseria meningitidis. Mol Biol Evol 22, 562–569 (2005).
Article CAS PubMed Google Scholar
Wirth, T. et al. The rise and spread of a new pathogen: seroresistant Moraxella catarrhalis. Genome Res 17, 1647–1656 (2007).
Article CAS PubMed PubMed Central Google Scholar
Tanabe, Y., Sano, T., Kasai, F. & Watanabe, M. M. Recombination, cryptic clades and neutral molecular divergence of the microcystin synthetase (mcy) genes of toxic cyanobacterium Microcystis aeruginosa. BMC Evol Biol 9, 115 (2009).
Article CAS PubMed PubMed Central Google Scholar
Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5, e1000344 (2009).
Article CAS PubMed PubMed Central Google Scholar
Didelot, X. & Falush, D. Inference of bacterial microevolution using multilocus sequence data. Genetics 175, 1251–1266 (2007).
Article CAS PubMed PubMed Central Google Scholar
Didelot, X., Lawson, D., Darling, A. & Falush, D. Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences. Genetics 186, 1435–1449 (2010).
Article CAS PubMed PubMed Central Google Scholar
Jackson, P. J. et al. Characterization of the variable-number tandem repeats in vrrA from different Bacillus anthracis isolates. Appl Environ Microbiol 63, 1400–1405 (1997).
CAS PubMed PubMed Central Google Scholar
Keim, P. et al. Molecular diversity in Bacillus anthracis. J Appl Microbiol 87, 215–217 (1999).
Article CAS PubMed Google Scholar
Smith, K. L. et al. Meso-scale ecology of anthrax in southern Africa: a pilot study of diversity and clustering. J Appl Microbiol 87, 204–207 (1999).
Article CAS PubMed Google Scholar
Keim, P. et al. Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis. J Bacteriol 182, 2928–2936 (2000).
Article CAS PubMed PubMed Central Google Scholar
Smith, K. L. et al. Bacillus anthracis diversity in Kruger National Park. J Clin Microbiol 38, 3780–3784 (2000).
CAS PubMed PubMed Central Google Scholar
Fouet, A. et al. Diversity among French Bacillus anthracis isolates. J Clin Microbiol 40, 4732–4734 (2002).
Article PubMed PubMed Central Google Scholar
Fasanella, A. et al. Molecular diversity of Bacillus anthracis in Italy. J Clin Microbiol 43, 3398–3401 (2005).
Article CAS PubMed PubMed Central Google Scholar
Jackson, P. J., Hill, K. K., Laker, M. T., Ticknor, L. O. & Keim, P. Genetic comparison of Bacillus anthracis and its close relatives using amplified fragment length polymorphism and polymerase chain reaction analysis. J Appl Microbiol 87, 263–269 (1999).
Article CAS PubMed Google Scholar
Price, L. B., Hugh-Jones, M., Jackson, P. J. & Keim, P. Genetic diversity in the protective antigen gene of Bacillus anthracis. J Bacteriol 181, 2358–2362 (1999).
CAS PubMed PubMed Central Google Scholar
Radnedge, L. et al. Genome differences that distinguish Bacillus anthracis from Bacillus cereus and Bacillus thuringiensis. Appl Environ Microbiol 69, 2755–2764 (2003).
Article CAS PubMed PubMed Central Google Scholar
Ko, K. S. et al. Identification of Bacillus anthracis by rpoB sequence analysis and multiplex PCR. J Clin Microbiol 41, 2908–2914 (2003).
Article CAS PubMed PubMed Central Google Scholar
Helgason, E., Tourasse, N. J., Meisal, R., Caugant, D. A. & Kolstø, A. B. Multilocus sequence typing scheme for bacteria of the Bacillus cereus group. Appl Environ Microbiol 70, 191–201 (2004).
Article CAS PubMed PubMed Central Google Scholar
Zwick, M. E. et al. Microarray-based resequencing of multiple Bacillus anthracis isolates. Genome Biol 6, R10 (2005).
Article PubMed Google Scholar
Read, T. D. et al. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296, 2028–2033 (2002).
Article CAS ADS PubMed Google Scholar
Read, T. D. et al. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423, 81–86 (2003).
Article CAS ADS PubMed Google Scholar
Van Ert, M. N. et al. Global genetic population structure of Bacillus anthracis. PLoS ONE 2, e461 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Cutler, D. J. et al. High-throughput variation detection and genotyping using microarrays. Genome Research 11, 1913–1925 (2001).
Article CAS PubMed PubMed Central Google Scholar
Zwick, M. E., Kiley, M. P., Stewart, A. C., Mateczun, A. & Read, T. D. Genotyping of Bacillus cereus strains by microarray-based resequencing. PLoS ONE 3, e2513 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
Article CAS ADS PubMed Google Scholar
International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
Kimura, M. The neutral theory of molecular evolution (Cambridge University Press, Cambridge [Cambridgeshire] ; New York, 1983).
Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595 (1989).
CAS PubMed PubMed Central Google Scholar
McVean, G. A. T. et al. The fine-scale structure of recombination rate variation in the human genome. Science (New York, NY) 304, 581–584 (2004).
Article CAS ADS Google Scholar
Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science (New York, NY) 310, 321–324 (2005).
Article CAS ADS Google Scholar
Watterson, G. A. The homozygosity test of neutrality. Genetics 88, 405 (1978).
CAS PubMed PubMed Central Google Scholar
Hudson, R. R. & Kaplan, N. L. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985).
CAS PubMed PubMed Central Google Scholar
Maynard Smith, J. & Smith, N. H. Detecting recombination from gene trees. Molecular Biology and Evolution 15, 590–599 (1998).
Article CAS PubMed Google Scholar
Chen, P. E. et al. Rapid identification of genetic modifications in Bacillus anthracis using whole genome draft sequences generated by 454 pyrosequencing. PLoS ONE 5 (2010).
Suerbaum, S. et al. Free recombination within Helicobacter pylori. Proc Natl Acad Sci USA 95, 12619–12624 (1998).
Article CAS ADS PubMed PubMed Central Google Scholar
Gomes, J. P. et al. Evolution of Chlamydia trachomatis diversity occurs by widespread interstrain recombination involving hotspots. Genome Res 17, 50–60 (2007).
Article CAS PubMed PubMed Central Google Scholar
Chen, P. E. et al. Genomic characterization of the Yersinia genus. Genome Biol 11, R1 (2010).
Article CAS PubMed PubMed Central Google Scholar
Holt, K. E. et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nature Genetics 40, 987–993 (2008).
Article CAS PubMed PubMed Central Google Scholar
Janes, B. & Stibitz, S. Routine markerless gene replacement in Bacillus anthracis. Infection and Immunity 74, 1949 (2006).
Article CAS PubMed PubMed Central Google Scholar
Mironczuk, A. M., Kovacs, A. T. & Kuipers, O. P. Induction of natural competence in Bacillus cereus ATCC14579. Microb Biotechnol 1, 226–235 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kovacs, A. T., Smits, W. K., Mironczuk, A. M. & Kuipers, O. P. Ubiquitous late competence genes in Bacillus species indicate the presence of functional DNA uptake machineries. Environ Microbiol 11, 1911–1922 (2009).
Article CAS PubMed Google Scholar
Hershberg, R. & Petrov, D. Evidence That Mutation Is Universally Biased towards AT in Bacteria. PLoS Genet 6, e1001115 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rocha, E. P. et al. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol 239, 226–235 (2006).
Article CAS PubMed Google Scholar
Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).
CAS PubMed PubMed Central Google Scholar
Akashi, H. Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics 139, 1067–1076 (1995).
CAS PubMed PubMed Central Google Scholar
Kryazhimskiy, S. & Plotkin, J. The Population Genetics of dN/dS. PLoS Genet 4, e1000304 (2008).
Article CAS PubMed PubMed Central Google Scholar
Priest, F. G., Barker, M., Baillie, L. W., Holmes, E. C. & Maiden, M. C. Population structure and evolution of the Bacillus cereus group. J Bacteriol 186, 7959–7970 (2004).
Article CAS PubMed PubMed Central Google Scholar
Felstein, J. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. (2010).
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 4876–4882 (1997).
Article CAS PubMed PubMed Central Google Scholar
Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of the Navy, Department of Defense, or the U.S. Government. The BDRD, NMRC authors are employees of the U.S. Government. This work was prepared as part of their official duties. Title 17 U.S.C. §105 provides that “Copyright protection under this title is not available for any work of the United States Government.” Title 17 U.S.C. §101 defines a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of that person's official duties. This work was supported by the Transformational Medical Technologies Program under contract TMTI_IB06RSQ002 through the Defense Threat Reduction Agency to TDR and SS.

Author information

Authors and Affiliations

Biological Defense Research Directorate, Naval Medical Research Center, Silver Spring, MD, USA
Michael E. Zwick, Maureen Kiley Thomason, Peter E. Chen, Shanmuga Sozhamannan, Alfred Mateczun & Timothy D. Read
Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
Michael E. Zwick & Timothy D. Read
Division of Infectious Diseases, Emory University School of Medicine, Atlanta, GA, USA
Timothy D. Read
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
Henry R. Johnson

Authors

Michael E. Zwick
View author publications
You can also search for this author in PubMed Google Scholar
Maureen Kiley Thomason
View author publications
You can also search for this author in PubMed Google Scholar
Peter E. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Henry R. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Shanmuga Sozhamannan
View author publications
You can also search for this author in PubMed Google Scholar
Alfred Mateczun
View author publications
You can also search for this author in PubMed Google Scholar
Timothy D. Read
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MZ, MKT, PC, HJ and TR wrote the main manuscript text and MZ prepared figures 1–3. All authors reviewed the manuscript.

Ethics declarations

Competing interests

Drs. Zwick and Read are paid consultants with the Henry M. Jackson Foundation for Military Medicine with the Biological Defense Research Directorate, Naval Medical Research Center, Silver Spring, MD, USA.

Electronic supplementary material

Supplementary Information

Supplemental Material

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareALike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

Reprints and permissions

About this article

Cite this article

Zwick, M., Thomason, M., Chen, P. et al. Genetic variation and linkage disequilibrium in Bacillus anthracis. Sci Rep 1, 169 (2011). https://doi.org/10.1038/srep00169

Download citation

Received: 11 August 2011
Accepted: 03 November 2011
Published: 24 November 2011
DOI: https://doi.org/10.1038/srep00169

This article is cited by

Phenotypic differentiation of Streptococcus pyogenes populations is induced by recombination-driven gene-specific sweeps
- Yun-Juan Bao
- B. Jesse Shapiro
- Francis J. Castellino
Scientific Reports (2016)
Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology
- Timothy D Read
- Ruth C Massey
Genome Medicine (2014)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.