Radiation-induced alternative transcription and splicing events and their applicability to practical biodosimetry

Accurate assessment of the individual exposure dose based on easily accessible samples (e.g. blood) immediately following a radiological accident is crucial. We aimed at developing a robust transcription-based signature for biodosimetry from human peripheral blood mononuclear cells irradiated with different doses of X-rays (0.1 and 1.0 Gy) at a dose rate of 0.26 Gy/min. Genome-wide radiation-induced changes in mRNA expression were evaluated at both gene and exon level. Using exon-specific qRT-PCR, we confirmed that several biomarker genes are alternatively spliced or transcribed after irradiation and that different exons of these genes exhibit significantly different levels of induction. Moreover, a significant number of radiation-responsive genes were found to be genomic neighbors. Using three different classification models we found that gene and exon signatures performed equally well on dose prediction, as long as more than 10 features are included. Together, our results highlight the necessity of evaluating gene expression at the level of single exons for radiation biodosimetry in particular and transcriptional biomarker research in general. This approach is especially advisable for practical gene expression-based biodosimetry, for which primer- or probe-based techniques would be the method of choice.

. In each panel, top tracks indicate Affymetrix probesets while bottom tracks indicate known splice variants. Arrows indicate 5' to 3' orientation of the gene. Figure S3. Positional gene enrichment analysis shows significant co-localisation of radiationresponsive genes. Scale bar indicates percentage of enrichment with 100% enrichment corresponding to genomic neighbors. Arrows indicate the 5' to 3' orientation of the genes. Please note that the separate clusters ACTA2-FAS and PANK1-KIF20B are in close proximity (< 1 Mb) on chromosome 10.

In vitro irradiation
The beam quality can be approximated to H-250 (ISO4037): 250 kV, 15 mA, 1.2 mm Al equivalent inherent filtration and 1 mm Cu additional filtration. The K air at the reference position was measured using a NE2571 ionisation chamber (SN309) connected to a Farmer 2500 electrometer. The chamber, together with the electrometer, was calibrated in terms of K air and the traceability to the international standards was assured. The reference point of the ionisation chamber was placed at the same distance with the reference position of the samples. The ionisation chamber was always placed in the beam, next to the samples, for a precise measurement of the time integrated K air . The stability of the X-ray generator during the irradiation was verified in this way.

RNA extraction
For RNA isolation from PBMCs a combination of the TRIzol® reagent (Invitrogen, Carlsbad, CA, USA) extraction method and the purification on Qiagen RNeasy columns (Qiagen, Venlo, The Netherlands) was used. Briefly, 5x10 6 cells were lysed in 1 ml of TRIzol® reagent and further processed following the manufacturer's recommendations. Following the RNA precipitation with isopropanol, the obtained pellet was resuspended in 1 ml of ethanol and transferred to the RNeasy column. Further purification was done according to the manufacturer's instructions.

Microarray hybridisation
Ten µg of cRNA, synthesised and purified from 0.25 µg of total RNA using the Ambion® WT Expression kit (Ambion, USA) was used for cDNA synthesis, followed by cDNA fragmentation and labeling with the GeneChip® Terminal Labeling kit (Affymetrix, Santa Clara, CA, USA).
Fragmented and labeled cDNA was hybridised to Human Gene 1.0 ST arrays (Affymetrix, Santa Clara, CA, USA) using the GeneChip® Hybridization, Wash and Stain kit (Affymetrix, Santa Clara, CA, USA) (hybridization module) and hybridization controls (Affymetrix, Santa Clara, CA, USA) with rotation at 45°C for 16 hours. After hybridization, arrays were washed and stained using GeneChip® Hybridization, Wash and Stain kit (stain module) after which the arrays were immediately scanned using an Affymetrix GeneChip® Scanner.

Predictive analysis
Generalized linear models were trained using the R glmnet package. These methods build a regularised linear model which uses the lasso penalty to perform feature selection, resulting in only relevant features to receive nonzero weights. A multinomial model was used to model the three-class classification problem, and an internal five-fold cross-validation was used to tune the model's internal parameter lambda. Feature importance measures were then derived from the weights of the linear model. The Random Forest based classification model uses an ensemble of randomised decision trees to perform classification. We used a collection of 1000 decision trees to build these models, and subsequently used the internal feature importance mechanism based on entropy reduction to obtain feature importance values. The Nearest Shrunken Centroid Classifier gradually shrinks the average gene expression centroids of the two groups to the overall centroid. The non-differentially expressed genes are removed first as the distance between the centroids of two groups is small in this case and the group centroids of these genes will therefore quickly reach the overall centroid. Differentially expressed genes, in contrast, will "survive" the shrinkage much longer and will have a higher probability of being used for classification. The optimal level of shrinkage is determined with ten-fold cross-validation, which is used to select the number of genes for class prediction. Finally, the centroids of these genes are used to classify the new samples to the nearest centroid.
While each of these classifiers has an internal mechanism to select informative features based on their internal weight or importance, we also experimented with explicitly reducing the number of features describing the data. To this end, internal model information was