Artificial selection on anther exsertion in wild radish, Raphanus raphanistrum

To study the genetic architecture of anther exsertion, a trait under stabilizing selection in wild radish, artificial selection on anther exsertion was applied for 11 generations. Two replicate lines each of increased and decreased exsertion plus two randomly-mated controls were included. Full pedigree information is available from generation five. To estimate correlated responses to selection, 571 plants from all lines and matrilines were grown in the greenhouse and a number of floral, growth, and phenology traits were measured. To create an outbred F2 mapping population, all possible crosses among the two high and two low exsertion lines were made, using a multiple-family design to capture the genetic variance still present after 11 generations of selection. Six floral traits were measured on 40 parents, 240 F1, and 4,868 F2 offspring. Opportunities for reuse of these data include traits not previously analyzed, other analyses, especially using the pedigree and fitness data, and seeds from all generations and photos of flowers in the later generations are available.


Background & Summary
The genetic correlation between filament and corolla tube lengths in wild radish is very high in magnitude (0.85, ref. 1), estimated with precision, and known to be caused by pleiotropy or extremely tight linkage 2 . The relative lengths of these two traits determine the position of the pollen-bearing anthers relative to the opening of the corolla tube; this composite trait is called anther exsertion, which can be defined as ln-long stamen filament length minus ln-corolla tube length. The high filament-corolla tube correlation is likely due to stabilizing selection on anther exsertion by bees in the family Halictidae 3,4 ; functionally, intermediate anther exsertion maximizes pollen removal by these bees 5 . Stabilizing selection on the difference between two traits is equivalent to correlational selection to increase the correlation between the traits 6,7 .
This paper describes six datasets derived from a series of studies designed to understand selection and genetics of anther exsertion; Figure 1 gives a flowchart of all of the experiments. All plants were derived from a single natural population (see Data Records). To create increased variance in anther exsertion to better test for stabilizing selection, as well as determine the rate of response to selection perpendicular to the major axis of variation, artificial selection for increased and decreased exsertion was performed for a total of 11 generations, with two replicates of each selection treatment 8 . There were also two randomlymated control lines for a total of six selection lines; each line contained 12 outbred maternal 'lines'. We refer to these as 'matrilines' because they are denoted and followed based on the maternal parent, but each generation these lines were outcrossed using pollen from a unique randomly-chosen plant in a different matriline, so that matrilines are not distinct from each other in the nuclear genome. The floral, fitness, and pedigree data for the four selected lines (not the controls) are contained in the 'ArtificialSelectionExsertion.csv' file.
To test for correlated responses to this selection, after five (replicate 1) or six (replicate 2) generations, 571 plants evenly distributed across the two high, two low, and two control selection lines were grown and 12 floral traits were measured, as well as flowering time and aboveground biomass. These data are in 'CorrelatedResponses.csv'. To quantify floral trait variation over the lifetime of these annual plants, seven floral traits were measured five or six times, and pollen viability was scored twice, over a period of three months on seventy-two of these plants ('2001FieldFlowerMeas.csv').
An F 2 mapping population was created by crossing high and low selection lines to determine the genetic basis of the rapid evolution of anther exsertion in the seelction lines. The 11th generation of selection consisted of choosing the extreme anther exsertion plants from 10 matrilines in each of the four selection lines ('QTLParentalMeasurements.csv'). These 40 plants were crossed in all four high by low exsertion line combinations. The resulting F 1 plants ('QTLF1measurements.csv') were outcrossed to produce 4,863 F 2 plants distributed among 20 full-sibling families, five per cross type (see Methods; Fig. 2 and Table 1). Six floral traits were measured from floral photographs on each of these F2 plants ('QTL F2 Measurements.csv').
Many analyses of these data are possible in addition to those in the one paper that has been published to date 8 , which used only one of the six datasets included here ('ArtificialSelectionExsertion.csv'). Because anther exsertion and the component traits of filament and corolla tube lengths were measured in 10 generations under selection, multiple times across the lifespan of the same plants, and in a very large outbred F2 composed of full-sibling families from reciprocal crosses means that a variety of questions concerning genetic and microenvironmental causes of trait variation can be addressed. Additional unanalyzed traits are also included in some of the datasets, and photos of the flowers from top and side views are available for additional trait measurements. Novel integrated analyses across these datasets are

Artificial selection
We conducted 10 generations (11th in the F2 study below) of selection for increased and decreased longstamen anther exsertion (ln long filament length-ln corolla tube length), with two replicate lines for each of increased anther exsertion, decreased exsertion, and two randomly-mated controls. Each of the six replicate selection lines consisted of 12 unique matrilines; the most extreme of up to 10 offspring in each matriline was mated in each generation. The matriline of each plant is noted; full pedigree information (paternity) is available starting at generation 5. The first three generations of selection were done at University of Illinois, generations 4 and 5 of replicate 1 of the artificial selection and half of the correlated responses plants were grown at Reed College, and the rest of the greenhouse and lab work at Kellogg Biological Station; thus replicate 1 is also referred to as Reed or R and replicate 2 as KBS or K. For details see ref. 8.

Outbred F2 QTL design
For future QTL analysis, six plants from each of the 12 matrilines in the two high and two low exsertion selection lines were grown for a total of 288 plants. One flower from each was photographed and the lengths of the corolla tube, short and long filaments, and short and long filament anthers were measured; this was done for all F 1 and F 2 plants as well. The plant with the highest or lowest exsertion (matching the selection direction) within each matriline was chosen; this represents the 11th generation of artificial selection on exsertion. This most extreme plant from the 10 most extreme matrilines in each selection line were chosen for the outbred crossing design; the other two matrilines in each selection line were discarded. These 40 parental plants were then randomly paired to make five pairs within each selection line, and then each pair was randomly grouped with a pair from each of the other three lines to form five 'octets' of plants. Each octet was used to produce four outbred full-sibling F 2 families, one from each of the four cross types; the design for one octet is shown in Figure 2.
To produce the F 1 generation, each plant was mated to one plant from each of the other lines within the same octet, producing four F 1 families, one for each of the four possible crosses between high and low exsertion selection lines. Because there were two pairs of parental plants from each selection line, this design produced pairs of unrelated F 1 plants for each of these four cross types; these pairs were then crossed reciprocally to produce one of the 20 outbred full-sibling F 2 families (Figure 2). Due to the reciprocal crosses, each of the 20 F 2 families is subdivided into A and B groups depending on maternal plant.  A total of 4,863 F 2 plants were grown in 10 blocks of up to 500 plants each, with each full sibling family represented by up to 25 plants per block, and each octet represented by up to 100 plants per block. Blocks alternated between consisting entirely of seeds from the A moms, or entirely of seeds from the reciprocal B moms; thus all the odd number blocks were A seeds, and all the even number blocks B seeds.

Data Records
The six datasets are stored at Dryad (Data Citation 1). Some contents are common across datasets: Matriline: All of the plants are descended from the Binghamton NY population (BINY; 42.184089E, 75.835319W) and most have a code with a capital letter A-E and a number up to 475. This refers to the original mothers in the seed collection, where 5 transects (A-E), one meter apart, were run across an alfalfa field and seeds were collected from one maternal plant every meter. The transects varied in length-the last plant collected in each was A368, B385, C355, D475, and E100. The numbers refer to the same grid position in each transect, i.e., B1 is one meter from A1, B2, and C1. Seeds were collected from a total of 1,575 maternal plants, although some have no seeds left. A total of eight matrilines in the high and low replicate 1 populations have different codes without the initial letter; these are descendants from the BINY population but their pedigree cannot be traced back to the original field maternal plant. In a number of cases over the generations a matriline produced no viable seeds, so two families in the next generation came from one matriline; these are denoted with decimals added to the number and/or lowercase letters at the end of the code, but in all these cases the maternal lineage can be traced back to the field maternal plant.
Floral traits: the core set are Petal Length (PetLen), Petal Width (PetWid), Corolla Tube Length (Tube), Short Filament Length (ShrtFil), Long Filament Length (LongFil), and Pistil Length. In the early generations of artificial selection these traits were measured using calipers on dissected flower as described in Conner and Via 1 . In later studies, these are measured from floral photographs, and also include the length of the anther on one short and one long stamen (ShrtAnther and LongAnther). Often the ovules were counted (Ovule#). We often calculated Anther Exsertion as Long Filament minus Corolla Tube. All values are mm.
Treatment: High or H-selection for increased exsertion; Low or L-selection for decreased exsertion; Cntrl-randomly mated controls Replicate line: 1 ( = Reed = R) or 2 ( = KBS = K) respectively for the two replicates nested within each Treatment.
Photo: Some files have the code from the camera denoting the image the measurement was made from, available from the first author.

ArtificialSelectionExsertion.csv
Offspr: the replicate offspring grown from each matriline; in later generations usually 1-10. ID: a unique integer identifier added in later generations to track the pedigree. MomID, DadID: the ID of the parents of that plant. In the first generation with IDs, these are lower case letters, because the parents of these individuals were not recorded.
RelFit: Relative fitness = RawFit/Mean Fitness for that line and generation; this is used to estimate selection differentials and gradients.
RawFit: Number of offspring grown and measured in the next generation from that plant. Within each matriline, typically only one will have nonzero fitness, that is, the selected plant, except when different plants within a matriline were used as males versus females due to incompatibility or where a matriline was split due to failure of a different matriline (see above).
Gen: generation of selection.

CorrelatedResponses.csv:
Matriline, AvPetLen, AvgTube, AvShrtFil, AvgLongFil, Pistil, Ovules: See above, except the four traits with 'Av' were the average of two measurements of different structures within the same flower, i.e., two different petals, filaments, etc. The third flower was measured in most cases, but sometimes a later flower close to the third was used. Treatment: Direction of artificial selection. Replicate Line: the two replicates within each treatment. Block: Plants were grown at KBS or Reed; some traits differed between sites. CRoffspring#: up to four plants were grown at each location from each matriline Days to flower: number of days from planting to first open flower.
Nectar vol: volume of nectar in microliters from the 5th and 6th flowers on the central inflorescence.
Nectar conc%: % sugar concentration from refractometry using the same nectar sample. FlowerNo: The total number of flowers was counted on some plants at Reed at harvest, just over two months after planting.
Biomass: aboveground dry biomass in grams was measured at Reed at harvest. Total pollen: Number of pollen grains produced were counted using a Coulter Counter on all six anthers from one flower at KBS, and 3 long and 1 short stamen anther at Reed.
LongPollen: the count for the four long stamen anthers at KBS. ShrtPollen: the count for the short stamen anthers at KBS. Array: These plants were divided into three arrays of 24 plants each; arrays were taken into the field five or six times.
FieldRow and Field Column: The grid positions used for the plants in the field. AvgFlwr#: The number of flowers open when the plants were taken into the field, averaged over the five or six field days.
QTL ParentalMeasurements.csv and QTL F1 measurements.csv All columns as described above except Offspr denotes the six offspring grown from each maternal plant. There are two additional Cross Types in the F 1 dataset, the High X High and Low X Low; seeds from these are available, but have not been used to make F 2 plants to date. Family: There are five outbred full sib families within each cross; these correspond to the parental 'octets'.
Mom: Crosses of the F 1 to make each full-sib F 2 family were done reciprocally, so there is Mom A or B depending on the direction of the cross.
F2: Replicate offspring from each cross. Note that this is redundant with Mom A or B, because all F2s within each family were given a unique number-A is mom for 1-25, 51-75 etc, and B is mom to 26-50, 76-100 etc.
Block: 1-10 for the 10 temporal blocks. Flwr date: the date that the first flower opened on that plant.

Technical Validation
The distributions of all floral measurements show a good fit to a normal distribution; all outliers (identified graphically as clearly outside the normal distribution) were either validated or corrected using original data or photos (Figure 3). For the artificial selection lines, the very tight fit of the data (R 2 = 0.99 for both replicates; Figure 4 in ref. 8) to the fitted regression of response to selection on the selection differential strongly indicates that the data are precise and reliable.