Abstract
In transfusion medicine, the identification of the Rhesus D type is important to prevent anti-D immunisation in Rhesus D negative recipients. In particular, the detection of the very low expressed DEL phenotype is crucial and hence constitutes the bottleneck of standard immunohaematology. The current method of choice, adsorption-elution, does not provide unambiguous results. We have developed a complementary method of high sensitivity that allows reliable identification of D antigen expression. Here, we present a workflow composed of high-resolution fluorescence microscopy, image processing, and machine learning that - for the first time - enables the identification of even small amounts of D antigen on the cellular level. The high sensitivity of our technique captures the full range of D antigen expression (including D+, weak D, DEL, Dā), allows automated population analyses, and results in classification test accuracies of up to 96%, even for very low expressed phenotypes.
Similar content being viewed by others
Introduction
The high immunogenicity of the Rhesus factor1 renders it one of the most relevant blood markers in transfusion medicine next to the factors of the AB0 system2. The Rhesus antigens are encoded by two homologous genes, RHCE and the clinically more relevant RHD3. Currently, more than 280 RHD alleles4 and approximately 30 Rhesus D (RhD) epitopes are known and account for the strong immunogenicity and the huge complexity of the RhD blood group assignment. Accurate classification of blood samples, however, is of utmost importance as a false assignment can cause dangerous anti-D immunisations potentially leading to haemolytic transfusion incidents or maternal alloimmunisation inducing haemolytic disease of the newborn5,6.
The RHD gene encodes the Rhesus D protein expressed on erythrocyte membranes (Supplementary Fig. 1). Based on the molecular background, the D antigen expression level, and the presence of epitopes, the following five types are defined7,8 (Table 1): D-positive (D+; high expression), D-negative (Dā; no expression due to gene deletion), partial D (extracellular mutations that do not affect the expression level but the existence of certain epitopes), weak D (intracellular or transmembrane mutations causing reduced expression), and DEL (D-elute; intracellular or transmembrane mutations leading to very low expression). The DEL variant9 primarily occurs in Eastern Asia (up to 10ā30% of all seemingly Dā typed individuals10,11). DEL phenotypes are caused by different RHD missense mutations, splice site mutations, or RHD-CE hybrid genes12,13. To date, 32 DEL types are listed in the RhesusBase Site and unidentified ones are still emerging14,15,16 (http://www.rhesusbase.info17).
Rhesus D phenotyping of blood donors is routinely performed by incubation of blood samples with anti-D antibodies and visual observation of haemagglutination. This method, however, does not identify all D phenotypes due to a lack of sensitivity. Therefore, very low expressed D variants are frequently misclassified as D--15,18,19. Numerous studies report primary and secondary anti-D immunisation of Dā recipients after transfusion of weak D and DEL products12,19,20. Recently, population surveys, lookbacks, and multi-centre studies were performed worldwide revealing the presence of weak D and DEL samples in the seemingly Dā donor pool by performing PCR and sequencing. Thus, 0.1% of Caucasians and almost half of Asians (46%) with Dā phenotype were reclassified as DEL6,11,15,21. While more precise Dā characterisation constitutes a loophole in the legislation of most countries, regulations that dictate a more detailed analysis of Dā donors were recently introduced in the United States18 and Switzerland22. RHD genotyping of Dā samples is performed by molecular biology. Information about protein abundance of known alleles is given by the RhesusBase. In case of novel alleles the D antigen expression level has to be quantified, commonly by flow cytometry23: Using flow cytometry, the D antigen is detected by fluorescently labelled standard antibodies24,25,26. This technique, however, exhibits huge inter-laboratory variability27, lacks standardised reagents28, and frequently fails to detect D variants with low expression due to an inherent lack of sensitivity (limit of detection: 22 antigens/cell)13,14,19. Identification of very low expressed D variants is commonly performed by adsorption-elution assay9,26,29,30,31,32. This laborious technique, however, requires experience, is time-consuming, and lacks standardised protocols. Varying numbers of washing steps and/or different incubation times may lead to contradictory results. DEL samples yielding inconsistent results, as well as DEL samples without detectable D antigen expression, were reported in the literature3,14,15,33.
We present a method that allows reproducible classification of all known D types in particular potentially immunogenic, very low expressed D variants. The workflow combines high-resolution fluorescence microscopy and bioinformatic algorithms. The D antigen is labelled with three fluorescent antibodies that are also used in standard immunohaematology. Sequentially, three replicas of each specimen are marked with antibodies binding to different D antigen epitopes. Samples are analysed by high-resolution fluorescence microscopy allowing single molecule sensitive detection. The high sensitivity of our technique captures the D antigen expression of D+, weak D, DEL, and Dā phenotypes. Protein expression levels are observed directly at single cell level and reveal huge variations within and between each population. These variations in RhD expression are caused by the stochastic nature of protein expression attributable to the statistical behaviour of chemical systems34,35,36. Image processing and feature extraction are used for cell recognition and automatic determination of characteristics (features) that describe the D antigen occurrence in cell populations. Based on the feature values non-linear machine learning algorithms are applied and result in the final Rhesus D type classification. Thus, this straightforward method holds great promise for reliable D antigen classification.
Results
We have established an effective combinatory workflow (depicted in Fig. 1) consisting of high-resolution fluorescence microscopy, image processing, and machine learning techniques. For Rhesus D type classification, 51 human blood test samples of already known Rhesus D types (namely D+, weak D, DEL, and Dā; listed in Table 1) were measured and more than 2000 microscopy images (image size ~82*82āĪ¼m2) were analysed at single molecule and single cell level.
High-resolution fluorescence microscopy of Rhesus D phenotypes
Three replicas of each sample were independently marked with Atto655-labelled antibodies H41 (Atto655-H41-Ab), BRAD3 (Atto655-BRAD3-Ab), and BIRMA-D6 (Atto655-BIRMA-D6-Ab) binding D antigen epitopes not affected in the analysed weak D and DEL types. Their use in immunohaematological routine anticipates that they have good binding affinities and accessibility to the corresponding epitopes. Fluorophore Atto655 was covalently attached to the antibodies (average degree of labelling of 1.8) and marked antibodies were applied in amounts to ensure saturation of all RhD molecules. In order to avoid false positive signals from antibodies sticking to the bottom and RhD immobilization caused by protein adhesion to the glass, the apical side of the erythrocytes was imaged. One-colour imaging using red-absorbing fluorophore Atto655 was applied in order to reduce autofluorescence and cytotoxic photodamage. Positive (D+ blood samples) and negative controls (Dā blood samples) were carried along during each experiment. All amino acids changed in the weak D and DEL types analysed within this study are located in the intracellular or transmembrane part of the protein (Supplementary Fig. 1) and do not interfere with antibody-antigen interaction.
Cell contours of erythrocytes and intensity peaks corresponding to the RhD antigen were detected and analysed using the implemented automated image processing techniques. Parametrisation was done in a user-assisted fashion but was not changed during the experiments (see Methods for further details). Figure 2b.I. depicts the automatically detected contours of all erythrocytes from a bright-field image. Atto655-H41-Ab labelled D antigens correspond to the intensity peaks on the cells with intensities proportional to the count rates of fluorescence emission. The yellow dots in Fig. 2b.II mark the centres of the automatically detected peaks. All imaged cells are detected, regardless of their size and shape. Incompletely imaged cells, however, as for instance chopped cells on the edge of the images, are not considered in the subsequent analysis steps, as incomplete information about cells would bias all further statistical results and implications.
Descriptive statistical analysis of fluorescence peak intensities
High-resolution fluorescence microscopy of blood samples revealed differences in the peak intensities between the four different RhD populations, namely D+, weak D, DEL, and Dā (Fig. 2a). Figure 2a.I. shows a typical image of a cell obtained from a D+ sample, representative for all analysed samples (number of samples nā=ā12). Here, we use the peak intensities as a parameter that describes the cell populations and hints at the RhD clustering behaviour of different samples. Each peak intensity value has been calculated as the sum of intensities in a 3āĆā3 pixel area around the peak maximum (Table 2). Analysis revealed average peak intensities (Ī¼āĀ±āĻ) of the D+ cell populations in the range of 11.7āĀ±ā4.0*103 counts/peak using Atto655-H41-Ab labelling, 7.6āĀ±ā2.4*103 counts/peak using Atto655-BRAD3-Ab labelling, and 7.0āĀ±ā1.4*103 counts/peak using Atto655-BIRMA-D6-Ab labelling. All D+ cells showed a fluorescence signal.
In total, we analysed 5894 cells using Atto655-H41-Ab (average cell count per sample 115), 5826 cells using Atto655-BRAD3-Ab (average cell count per sample 110), and 5731 cells using Atto655-BIRMA-D6-AB (average cell count per sample 111). Figure 2a.II. depicts a cell representative for a weak D sample (nā=ā14), in this case RHD*weak D type 3. All analysed cells in this population were labelled with at least one fluorescent antibody. Average peak intensities of all weak D cell populations are in the range of 7.4āĀ±ā1.4*103 counts/peak using Atto655-H41-Ab labelling, 6.3āĀ±ā0.8*103 counts/peak labelled with Atto655-BRAD3-Ab labelling, and 6.2āĀ±ā0.6*103 counts/peak using Atto655-BIRMA-D6-Ab labelling. Analysis of DEL cell populations (nā=ā12, Fig. 2a.III.), with only ~10% of all cells labelled, revealed average peak intensities in the range of 6.2āĀ±ā1.0*103 counts/peak using Atto655-H41-Ab labelling, 6.4āĀ±ā0.7*103 counts/peak labelled with Atto655-BRAD3-Ab labelling, and 6.2āĀ±ā0.7*103 counts/peak using Atto655-BIRMA-D6-Ab labelling. Figure 2a.IV. shows a cell representative for the Dā cell population; for ~1% of the cells, sparsely distributed peaks were detected. The average peak intensities of all Dā cell populations are in the range of 6.7āĀ±ā2.0*103 counts/peak using Atto655-H41-Ab labelling, 6.0āĀ±ā0.7*103 counts/peak using Atto655-BRAD3 labelling, and 6.0āĀ±ā0.7*103 counts/peak using Atto655-BIRMA-D6-Ab labelling.
In a separate experimental setting, we performed a statistical comparison of the distributions of individual Atto655 and single Atto655-marked antibodies on protein G coated glass to the very sparse signal occurrences on Dā and DEL cells (Supp. Note including Supp. Figs 8 and 937,38). Statistical analyses revealed that the distribution of individual, Atto655 marked antibodies on coated glass and that of sparsely distributed antibodies on Dā as well as on DEL cells have a high similarity. High average peak intensities of the fluorescently labelled D+ population indicate that a part of the signals originate from several fluorescent antibodies.
A simple comparison of the peak intensity distributions of different populations may lead to the incorrect conclusion that a Rhesus D type classification can be achieved using just a single parameter. The analysis of the intensity distributions shows large overlaps between individual populations, as for instance a 68% overlap between Dā and DEL samples for Atto655-H41-Ab labelling or 70% for Atto655-BRAD3-Ab and Atto655-BIRMA-D6-Ab labelling. All calculated percentages of overlaps between the four populations are summarised in Fig. 3b and Supplementary Figs 2 and 3.
Population differentiation by machine learning
Machine learning supported algorithms, however, are capable of automatic classification of such overlapping populations. Hence, Rhesus D blood group assignment can only be fulfilled by using more comprehensive information about the D antigen abundance on individual cells and cell populations. Therefore, several features based on single molecule information were used for machine learning. The following features were extracted: number of peaks, cell intensity, standard deviation of cell intensity, peak density, distance complete and nearest, and intensity ratio (Table 3). The feature cell intensity is of special interest, as this parameter is comparable to the parameters used in flow cytometry.
Subsequently, we applied machine learning on the extracted features in order to determine the Rhesus D type assignment of the sample. A schematic representation of the machine learning workflow is shown in Fig. 4a. A cross validation approach was used to ensure the reliability and accuracy of our results. The used dataset was split multiple times into training and testing partitions. The training subset was used to create mathematical models that can be considered as functions used to generate a classification vote out of given input parameters. The testing subset was used to test the previously created mathematical models on new data, as well as to assess the classification performance.
The classification task of the here analysed Rhesus D types is rather challenging, because large peak intensity overlaps between each population and a high heterogeneity within each population are present. In order to obtain a final Rhesus D type assignment, a combined classification method (see Methods) was implemented. This combinatory classification method comprises two different approaches, namely image level classification (method 1) and sample level classification (method 2).
Image level classification performed well for classification of weak D and DEL samples with accuracies of up to 92% for DEL and 83% for weak D using Atto655-BRAD3-Ab labelling, 75% for DEL and 92% for weak D using Atto655-BIRMA-D6-Ab labelling, and 100% for DEL and 92% for weak D using Atto655-H41-Ab labelling. However, the classification results for D+ and Dā samples are not sufficient, as for instance none of the Dā and only 33% of all D+ samples using Atto655-BRAD3-Ab labelling were correctly identified (Supplementary Tables 1, 2 and 3).
Sample level classification provides a higher overall classification accuracy compared to image level classification: 58% of all D+ and 53% of all Dā samples were classified correctly using Atto655-Brad3-Ab labelling, 75% and 84% using Atto655-BIRMA-D6-Ab labelling, and 83% and 76% using Atto655-H41-Ab labelling. In contrast, sample level classification results for low expressed RhD types are worse compared to image level classification results.
Method 3 combines the advantages of method 1 (high classification accuracies for low expressed and highly heterogeneous cell population) and method 2 (high classification accuracies for common and rather homogenous cell population). Furthermore, classification rules are defined to determine the final Rhesus D type assignment by choosing between image and sample level classification results (see Methods, Fig. 4b).
This new method achieves higher classification accuracies, as for instance an overall test classification accuracy of 64% using Atto655-BRAD3-Ab labelling has been obtained. Hence, ten out of twelve (83%) DEL samples are classified correctly. Using Atto655-BIRMA-D6-Ab labelling, the majority of DEL samples are classified properly (83%) with an overall test classification accuracy of 78%. Best results are obtained using Atto655-H41-Ab labelling: A test classification accuracy of 96% has been observed. All D+ samples and all weak D samples are classified correctly. Furthermore, very low expressed DEL samples are all but one correctly classified; only one out of 13 Dā samples is classified as DEL. A comprehensive result listing can be found in Table 4 and Supplementary Tables 1, 2, and 3.
Discussion
We provide a comprehensive workflow for improved Rhesus D type classification. This classification is achieved by acquiring high-resolution fluorescence microscopy images, detecting single cells and fluorescence signals, extracting and calculating features from the given information, creating mathematical models, and finally applying the latter ones in order to get a final Rhesus D type assignment. We have developed multiple classification rules that enable a more accurate and sensitive Rhesus D type classification compared to commonly used laboratory methods by taking into account information on protein expression at single cell level.
Most accurate results for automatic Rhesus D type assignment were obtained using method 3 and Atto655-H41-Ab labelling. Here, 49 out of 51 human blood samples were classified correctly. Only one Dā and one DEL sample were classified incorrectly, which yields an overall classification accuracy of 96%. In both cases only one method of the combinatory classification approach failed in correct assignment.
Comparison of the peak intensity distributions of Atto655-BIRMA-D6-Ab and Atto655-BRAD3-Ab labelled D+ and weak D samples reveals a larger overlap compared to Atto655-H41-Ab labelled samples (Supplementary Figs 2 and 3 compared to Fig. 3a). Consequently, the accuracy of Rhesus D type classification for D+ and weak D with Atto655-BIRMA-D6-Ab and Atto655-BRAD3-Ab labelling is reduced. In the case of Atto655-BRAD3-Ab and Atto655-BIRMA-D6-Ab labelled samples, the peak intensity distributions of all four RhD populations overlap, aggravating correct classification. The large overlaps of the peak intensity distributions lead to the assumption that the misassignment is of biochemical origin caused by differences in the accessibility of extracellular epitopes of the D antigen as well as diversity in antibody affinity to related protein motives36.
The analysis of the DEL type RHD*DEL8 (RHD IVS3+1 G>A) shows a surprising result: Kƶrmƶczi et al.13 briefly mentioned that binding of antibodies BRAD3 and BIRMA-D6 in this DEL variant has not been detected by adsorption-elution technique. This observation is not fully consistent with our results, since we observed signals for a part of the DEL population using the same antibody. We assume that this discrepancy can be explained by the higher sensitivity of our method. However, a more detailed analysis is beyond the scope of this contribution.
Our workflow can be used as a complementary method to standard immunohaematological techniques to reveal otherwise undetected very low expressed Rhesus D types, which have a prevalence of up to 30% in Asian Dā population11. Whereas commonly used adsorption-elution technique lacks standardised protocols, is time-consuming and requires an experienced technician, our method is less laborious. The substantially shorter incubation time (30 minutes compared to several hours) of blood cells with a high concentration of antibodies and the shorter washing procedure both save time. Moreover, less experience is required for sample preparation as well as fluorescence microscopy. The developed machine learning based analysis software performs feature calculation and RhD phenotype classification automatically. Of additional advantage are the visualisation of individual antigens on cells without limitations on sample size and the visual control of detected antigens used for analysis. Actually, the quality of separation of the cell populations of different Rhesus D types is in our case only limited by the biochemistry of the applied antibodies.
Here, we have shown that our method achieves a reliable discrimination of well described RhD subpopulations. The high sensitivity of our method revealed intra-population variability, which has yet not been observed and hence represents a new form of blood group typing. The application of standard antibodies facilitates the straightforward implementation of our technique in immunohaematological routine. Since high throughput methods for expression level analyses (e.g. RHD typing) are gaining in importance18,22, we also suggest the use of multi-colour labelling and implementation of a high speed imaging system (e.g. a nanoreader39,40) to accelerate RhD type classification.
The presented method can be used to characterise the expression level of novel RHD alleles or to validate new methods in which determination of very low levels of protein expression is essential41. This technique holds great promise to improve the safety of red blood cell units and to prevent dangerous transfusion incidents. Moreover, this workflow is broadly applicable in a variety of scientific fields, such as in molecular biology and medicine (in cases of cell population classification by rarely expressed cell markers) as well as in biophysics and material science.
Methods
Blood samples
Ethylendiaminetetraacetate (EDTA)-anticoagulated blood samples were provided by the Red Cross Transfusion Service (Linz, Upper Austria, Austria). RhD assignment was done by standard serology24 and RHD gene sequencing was performed on samples with weak D and DEL phenotypical expression. The sample cohort (Table 1) consisted of the most common weak D types in Europe, RHD*weak D type 1 (nā=ā6), RHD*weak D type 2 (nā=ā3), and RHD*weak D type 3 (nā=ā5), two RHD alleles causing DEL phenotypical expression, RHD*DEL8 (nā=ā6), and RHD*09.05 (nā=ā6). D+ (nā=ā12) and Dā control samples (nā=ā13) were provided for each analysis. Red blood cells were prepared within 7 days of sampling.
Statement on the use of human blood samples
All human blood samples were kindly provided by the Red Cross Transfusion Service (Linz, Upper Austria, Austria) and were collected during routine blood donations in accordance with the strict policies of the Red Cross Transfusion Service Linz. The usage of residual blood material from blood donations is, as captured in a written consent of the Upper Austrian Ethic Commission, not subject of the Austrian Tissue Safety Act. Nevertheless, all blood donors signed their informed consents that potential residual blood material can be used for research and development purposes. All experimental protocols were approved by and carried out in collaboration with the Red Cross Transfusion Service Linz.
Immunohaematology
All samples were incubated using monoclonal antibodies targeting different epitopes of the D antigen: Atto655-H41-Ab (binds to epitope 3.1), Atto655-BRAD3-Ab (binds to epitope 6.2), and Atto655-BIRMA-D6-Ab (binds to epitope 9.1). Atto655-H41-Ab was generously supplied by Bio-Rad (Dreieich, Germany). Atto655-BRAD3-Ab and Atto655-BIRMA-D6-Ab were obtained from the International Blood Group Reference Laboratory (Bristol, UK).
Antibody labelling
The primary antibodies were labelled independently via an N-hydroxysuccinimid (NHS)-ester with Atto655 (ATTO-TEC, Siegen, Germany): Atto655 was dissolved in anhydrous dimethylsulfoxid to yield a final concentration of 1āmg/mL. Monoclonal antibodies were mixed with Atto655 in 0.2āM sodium bicarbonate buffer at pH 8.4. An average degree of labelling of 1.8 ensures a high amount of antibodies with a single fluorophore molecule attached. The reaction mixture was incubated for 1āhour at room temperature. In order to remove unbound dye, gel filtration was applied using PD-10 SephadexTM G-25M columns (GE Healthcare, Buckinghamshire, UK). Fluorescently labelled antibodies were concentrated by cut-off filters (several centrifugation steps at 1500āg for 3āminutes with Vivaspin 6, MWCO: 10,000, Sartorius Stedim Biotech, Goettingen, Germany), aliquoted and stored at ā20āĀ°C.
Sample preparation
100āĪ¼L EDTA-anticoagulated blood samples were washed with sodium chloride (0.9%, Fresenius Kabi Austria, Linz, Austria) at 79āg for 3āminutes. Erythrocytes were incubated for 30āminutes at 37āĀ°C with antibodies in ID-CellStab (buffer specially formulated for erythrocytes; Bio-Rad Laboratories, Cressier, Switzerland). Unbound antibodies were removed by washing three times with sodium chloride. Subsequently, cells were resuspended in ID-CellStab.
Fluorescence microscopy and image acquisition
Images were acquired with a modified Olympus IX81 inverted epifluorescence microscope, using a two axis scanning stage and an Olympus UAPON 100x/1.49 NA oil objective. Blood samples were illuminated with a diode laser at 642ānm (Omicron-laserage Laserprodukte GmbH, Phoxx 642, Rodgau-Dudenhofen, Germany). The signal was acquired using an Andor iXonEM+ 897 (back-illuminated) EMCCD (16āĪ¼m pixel size). The following filter sets were used: Dichroic filter (ZT405/488/561/640rpc, Chroma, Olchin, Germany), emission filter (446/523/600/677ānm BrightLine quad-band band-pass filter, Semrock, Rochester), and an additional emission filter (HQ 700/75āM, NC209774, Chroma Technology GmbH, Olching, Germany). The signal was acquired for 10āms with 50āms delay at 0.75ākW/cm2 excitation power. Conversion of fluorescence intensities into photon counts is given by: 1 count/pixelā=ā0.3 photons/pixel.
The signal-to-noise ratio was 31āĀ±ā9. An image sequence of 150 images was recorded. The first ten images were acquired using bright field microscopy to enable assignment of fluorescence signals to distinct cells. The illumination protocols were performed with custom-made LabView-based control software. All samples were measured within 24āhours since results of test experiments (data not shown) proved that fluorescence intensities remained constant within this time period: For D+ samples the peak intensity variation between measurements at days 1 and 2 was 2āĀ±ā10%, for DEL samples 1āĀ±ā19%, and for weak D samples 6āĀ±ā17%; for Dā samples no change was measured. A sketch of the fluorescence microscopy setup can be found in Fig. 5.
Data analysis ā cell and single molecule detection
All data analysis tasks were performed using implemented and adapted image processing techniques. For cell detection tasks we applied thresholding, mean filtering, convolution, evolution strategies, and an active contour method42,43,44,45,46. Those methods allow detection of all erythrocytes in each image, regardless of their shape or size. For molecule detection (D antigen occurrences on the cell membrane) conservative smoothing, top-hat filtering, thresholding, and region growing were applied47,48. Details on the used image processing methods and the used parameterisations can be found in the Supplementary Material. The so developed analysis framework, a short documentation, and exemplary microscopy images can be found on the Bioinformatics Research Group homepage (http://bioinformatics.fh-hagenberg.at/site/index.php?id=16).
Feature definition and extraction
The identification of cell contours enables the assignment of fluorescence signals to the corresponding erythrocyte. Based on this assignment, further statistical analyses at the cell level (considering data of individual cells) as well as at the image level (considering all cells per image) were performed. Features that include information obtained at the cellular and molecular level were used to distinguish between different D antigen types. In Table 3 all extracted features and their short explanations are listed in detail. A detailed explanation and calculation formulas can be found in the Supplementary Information.
These features further serve as input for machine learning methods that are used to learn models which classify samples according to their Rhesus D type. Boxplots for each feature, showing their distributions and mean values among the analysed Rhesus D types, are depicted in Supplementary Fig. 7.
Statistical analyses
Statistical analyses were carried out using R49 and the main implemented statistical functionalities. Data sorting and filtering was done using Microsoft Excel. Distribution plots were generated using the R ggplot package. If not stated otherwise all data is expressed as the meanāĀ±āSD.
Peak intensity distribution overlap
For each RhD type we calculated the average peak intensity distributions of all analysed samples. The peak intensity has been calculated as the sum of intensities in a 3āĆā3 pixel area around the peak maximum. Figure 3a and Supplementary Figs 2 and 3 show the peak intensity distributions for samples labelled with Atto655-H41-Ab, Atto655-BIRMA-D6-Ab, and Atto655-BRAD3-Ab. If the distributions were clearly separated from each other, then this feature would be sufficient for a clear Rhesus D type identification.
For calculating the overlap of distributions we split the intensity range into bins of size 50. For each pair of Rhesus D types we extracted the overlapping area of each bin by extracting the minimum of the two values. The sum of all detected minima of each bin reflects the overlapping percentages of these two Rhesus D types.
Machine learning algorithms
In general, data mining is understood as the practice of automatically searching for patterns in large stores of data. In order to do so, a set of input parameters and a set of target variables are defined and further used to create a mathematical model. The generation of a mathematical model is done by machine learning algorithms. In the here presented study classification algorithms were used to generate mathematical models that are able to classify samples on the basis of their features50.
Here, all tasks were performed as classification tasks using the implementation in the HeuristicLab framework51. The following classification algorithms were applied: random forests (RFs52,53), support vector machines (SVMs54), genetic programming with offspring selection (GP50,55,56), and k-nearest neighbour classification (kNN57). Further details can be found in the Supplementary Materials.
Each algorithm was performed using 5-fold cross validation and was repeated multiple times (nā=ā40), which results in multiple classification models that are combined via majority voting. The majority voting was performed by counting the votes of all mathematical models for each Rhesus D type separately. Afterwards, the final assignment was made by selecting the class with the majority of votes58. Thus, images or cells without signals had a lower impact on the overall classification result.
Method 1: Image level classification
Classification method 1 implemented image level based classification by classifying all images separately according to their Rhesus D type assignment. All images and the corresponding extracted features were used as input. For each image 160 (each algorithm is repeated 40 times for each image) mathematical models (classifiers) were created, and each model votes for a certain Rhesus D type. Subsequently, we used a majority voting step which collects all classification statements (votes) of all images from each sample. A final classification for each sample was made via the majority of votes. This method renders a robust class assignment for low expressed Rhesus D phenotypes, since more information on cell population heterogeneity is captured.
Method 2: Sample level classification
Sample level classification was based on the averages of the feature values of all images from one sample. This new dataset was used to create 160 mathematical models that vote for certain Rhesus D types. The final assignment for each sample was made by choosing the Rhesus D type with the majority of votes. The sample level classification allowed distinguishing RhD types with homogenous cell populations.
Method 3: Combinatory classification based on sample and image level information
Method 3 is based on sample and image level information and combines the advantages of method 1 and 2. Method 1 performs best for heterogeneous cell populations; method 2 enables a robust classification for homogenous cell populations. For this purpose, method 1 and method 2 were applied (independently from each other) and all classification results were stored. Subsequently, classification rules were defined to regulate the class assignment process between the classification results of methods 1 and 2:
-
First the concordance of both classification methods had to be examined. If both classification results were concordant, the sample was assigned to this class.
-
Otherwise the involved classes were further analysed:
-
If the decision had to be made between Dā and DEL samples, the two most similar classes, the decision was based on the number of images acquired for the specific sample.
-
If there were at least 11 images, the image classification result was chosen as we considered this information enough for a reliable class assignment.
-
If there were fewer images, it was more reliable to choose the classification result at the sample level.
-
-
Additional rule for weak D and DEL classification:
-
For differentiation between weak D and DEL, the classification result of method 1 was chosen, as here more information about cell heterogeneity and image heterogeneity is included.
-
In any other case, the result of method 2 was the assignment of choice.
Additional Information
How to cite this article: Borgmann, D. M. et al. Single Molecule Fluorescence Microscopy and Machine Learning for Rhesus D Antigen Classification. Sci. Rep. 6, 32317; doi: 10.1038/srep32317 (2016).
References
Landsteiner, K. & Wiener, A. S. An agglutinable factor in human blood recognized by immune sera for rhesus blood. Proc. Soc. Exp. Biol. Med. 43, 41ā42 (1940).
Avent, N. D. & Reid, M. E. The Rh blood group system: a review. Blood 95, 375ā387 (2000).
Tippett, P. A speculative model for the Rh blood groups. Ann. Hum. Genet. 50, 241ā247 (1986).
Patnaik, S. K., Helmberg, W. & Blumenfeld, O. BGMUT: NCBI dbRBC database of allelic variations of genes encoding antigens of blood group systems. Nucleic Acids Res. 40, D1023ā1029 (2012).
OstgĆ„rd, P., Fevang, F. & Kornstad, L. Anti-D in a āD positiveā mother giving rise to severe haemolytic disease of the newborn. A dilemma in antenatal immunohaematological testing. Acta. Paediatr. Scand. 75, 175ā178 (1986).
Xu, W., Zhu, M., Wang, B.-L., Su, H. & Wang, M. Prospective Evaluation of a Transfusion Policy of RhD-Positive Red Blood Cells into DEL Patients in China. Transfus. Med. Hemotherapy 15ā21 (2014).
Reid, M. E., Lomas-Francis, C. & Olsson, M. L. The Blood Group Antigens Fact Book. (Elsevier, 2012).
Flegel, W. & Wagner, F. Molecular biology of partial D and weak D: implications for blood bank practice. Clin. Lab. 48, 53ā9 (2002).
Okubo, Y., Yamaguchi, H., Tomita, T. & Nagao, N. A D variant, Del? Transfusion 24, 542ā542 (1984).
Flegel, W. A. Blood group genotyping in Germany. Transfusion 47, 47ā53 (2007).
Srijinda, S., Suwanasophon, C., Visawapoka, U. & Pongsavee, M. RhC phenotyping, adsorption/elution test, and SSP-PCR: the combined test for D-elute phenotype screening in thai RhD-negative blood donors. ISRN Hematol. 2012, 358ā316 (2012).
Wagner, T. et al. Anti-D immunization by DEL red blood cells. Transfusion 45, 520ā526 (2005).
Kƶrmƶczi, G. F., Gassner, C., Shao, C.-P., Uchikawa, M. & Legler, T. J. A comprehensive analysis of DEL types: partial DEL individuals are prone to anti-D alloimmunization. Transfusion 45, 1561ā1567 (2005).
Gassner, C. et al. Presence of RHD in serologically Dā, C/E+ individuals: A European multicenter study. Transfusion 45, 527ā538 (2005).
Krog, G. R. et al. Is current serologic RhD typing of blood donors sufficient for avoiding immunization of recipients? (CME). Transfusion 51, 2278ā2285 (2011).
Garcia, F. et al. New RHD variant alleles. Transfusion 55, 427ā429 (2015).
Wagner, F. F. & Flegel, W. A. The Rhesus Site. Transfus. Med. Hemother. 41, 357ā363 (2014).
Westhoff, C. M. Rh complexities: serology and DNA genotyping. Transfusion 47, 17ā22 (2007).
Kim, K. H. et al. Primary anti-D immunization by DEL red blood cells. Korean J. Lab. Med. 29, 361ā365 (2009).
Yasuda, H., Ohto, H., Sakuma, S. & Ishikawa, Y. Secondary anti-D immunization by Del red blood cells. Transfusion 45, 1581ā1584 (2005).
Flegel, W. A., Von Zabern, I. & Wagner, F. F. Six yearsā experience performing RHD genotyping to confirm Dā red blood cell units in Germany for preventing anti-D immunizations. Transfusion 49, 465ā471 (2009).
Crottet, S. L. et al. Implementation of a mandatory donor RHD screening in Switzerland. Transfus. Apher. Sci. 50, 169ā174 (2014).
Mannessier, L. & Broly, H. Evaluation of human and murine monoclonal anti-rhĆ©sus antibodies. Rev. Fr. Transfus. Immuno-hĆ©matolgie XXXI, 175ā185 (1988).
Polin, H., Danzer, M., Hofer, K., Gassner, W. & Gabriel, C. Effective molecular RHD typing strategy for blood donations. Transfusion 47, 1350ā1355 (2007).
Bauer, K. et al. CAR, A novel mediator of erythroid differentiation and migration, is specifically downregulated in erythropoietic progenitor cells in MDS. Leuk. Res. 39, 16ā17 (2015).
Polin, H. et al. Identification of RHD alleles with the potential of anti-D immunization among seemingly Dā blood donors in Upper Austria. Transfusion 49, 676ā681 (2009).
Flegel, W. A. et al. Section 1B: Rh flow cytometry coordinatorās report. Rhesus index and antigen density: An analysis of the reproducibility of flow cytometric determination. in Transfus. Clin. Biol. 9, 33ā42 (2002).
Arndt, P. A. & Garratty, G. A critical review of published methods for analysis of red cell antigen-antibody reactions by flow cytometry, and approaches for resolving problems with red cell agglutination. Transfus. Med. Rev. 24, 172ā194 (2010).
Massuet, L. & Armengol, R. A New Method of Antibody Elution from Red Blood Cells Using Organic Solvents. Vox Sang. 39, 343ā344 (1980).
Mak, K. H., Yan, K. F., Cheng, S. S. & Yuen, M. Y. Rh phenotypes of Chinese blood donors in Hong Kong, with special reference to weak D antigens. Transfusion 33, 348ā351 (1993).
Polin, H. et al. On the trail of anti-CDE to unexpected highlights of the RHD*weak 4.3 allele in the Upper Austrian population. Vox Sang. 103, 130ā136 (2012).
Daniels, G. L. An investigation of the immune response of homozygotes for the Rh haplotype -Dā and related haplotypes. Rev. Fr. Transfus. Immunohematol. XXV, 185ā197 (1982).
Roberts, G. H. Elution Techniques in Blood Bank. J. Contin. Educ. Top. Issues 8, 28ā31 (2006).
McAdams, H. & Arkin, A. Itās a noisy business: Genetic regulation at the nanomolecular scale. Trends Genet. 15, 65ā69 (1999).
Raser, J. M. & OāShea, E. K. Control of stochasticity in eukaryotic gene expression. Science 304, 1811ā4 (2004).
Edwards, B. M. et al. The remarkable flexibility of the human antibody repertoire; isolation of over one thousand different antibodies to a single protein, BLyS. J. Mol. Biol. 334, 103ā118 (2003).
Jacak, J., Hesch, C., Hesse, J. & SchĆ¼tz, G. J. Identification of immobile single molecules using polarization-modulated asynchronous time delay and integration-mode scanning. Anal. Chem. 82, 4288ā92 (2010).
Wiesbauer, M. et al. Nano-Anchors with Single Protein Capacity Produced with STED Lithography. Nano Lett. 13(11), 5672ā5678 (2013).
Hesse, J., Wechselberger, C., Sonnleitner, M., Schindler, H. & SchĆ¼tz, G. J. Single-molecule reader for proteomics and genomics. J. Chromatogr. B. Analyt. Technol. Biomed. Life Sci. 782, 127ā135 (2002).
Hesse, J. et al. RNA expression profiling at the single molecule level. Genome Res. 16, 1041ā1045 (2006).
Kim, Y. et al. Rh D blood group conversion using transcription activator-like effector nucleases. Nat. Commun. 6, 1ā12 (2015).
Gonzalez, R. C. & Woods, R. E. Digital Image Processing. (Prentice Hall, Inc, 2002).
Canny, J. A Computational Approach to Edge Detection. In IEEE Trans. Pattern Anal. Mach. Intell. 8, 679ā697 (1986).
JƤhne, B. Digitale Bildverarbeitung. (Springer-Verlag, 2005).
Rechenberg, I. Evolutionsstrategie. (Friedrich Frommann Verlag, 1994).
Kass, M., Witkin, A. & Teropoulos, D. Snakes: active contour models. Int. J. Comput. Vis. 1, 321ā331 (1988).
Dewan, M. A. A., Ahmad, M. O. & Swamy, M. N. S. A method for automatic segmentation of nuclei in phase-contrast images based on intensity, convexity and texture. IEEE.Trans. Biomed. Circuits. Syst. 8, 716ā728 (2014).
Burger, W. & Burge, M. Principles of Digital Image Processing: Fundamental Techniques. (Springer Verlag, 2011).
Hornik, K. R FAQ at https://cran.r-project.org/doc/FAQ/R-FAQ.html (2016)
Affenzeller, M., Winkler, S. M., Wagner, S. & Beham, A. Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications (Chapman & Hall/CRC Press, 2009).
Wagner, S. et al. Architecture and Design of the HeuristicLab Optimization Environment. Advanced Methods and Applications in Computational Intelligence 6, (Springer, 2014).
Breiman, L. Bagging Predictors. Mach. Learn. 24, 123ā140 (1996).
Breiman, L. Random Forests. Mach. Learn. 45, 5ā32 (2001).
Vapnik, V. Statistical Learning Theory. (Wiley, 1998).
Koza, J. Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, 1992).
Kommenda, M., Kronberger, G., Wagner, S., Winkler, S. M. & Affenzeller, M. On the Architecture and Implementation of Tree-based Genetic Programming in HeuristicLab. in GECCO ā12 Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation 1, 101ā108 (2012).
Duda, R., Hart, P. & Stork, D. Pattern Classification (Wiley, 2000).
Winkler, S. M. et al. Data based prediction of sentiments using heterogeneous model ensembles. Soft Comput. 1ā12 (2014).
Acknowledgements
This work was done within the FIT-IT project (number 835918) NanoDetect sponsored by the Austrian Research Promotion Agency (FFG) and within the project Tomo3D (project number 845419) funded by FFG COIN Cooperation & Innovation. Furthermore, the authors wish to thank Barbara Becker for supplying antibodies.
Author information
Authors and Affiliations
Contributions
J.J. and C.G. designed the experiments. H.P., T.E. and S.M. prepared the blood samples. S.M. and T.E. performed fluorescence microscopy. D.M.B., S.M.W., S.S., V.D. and L.O. designed the algorithms. D.M.B. and S.M.W. implemented the algorithms, analysed the data, and generated results and statistics. D.M.B., S.M., S.M.W. and J.J. discussed the results and wrote the paper. J.J. and S.M.W. supervised the project. All authors commented on the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articleās Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Borgmann, D., Mayr, S., Polin, H. et al. Single Molecule Fluorescence Microscopy and Machine Learning for Rhesus D Antigen Classification. Sci Rep 6, 32317 (2016). https://doi.org/10.1038/srep32317
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep32317
This article is cited by
-
DEL variants: review of molecular mechanisms, clinical consequences and molecular testing strategy
Functional & Integrative Genomics (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.