Inter-species cell detection - datasets on pulmonary hemosiderophages in equine, human and feline specimens

Marzahl, Christian; Hill, Jenny; Stayt, Jason; Bienzle, Dorothee; Welker, Lutz; Wilm, Frauke; Voigt, Jörn; Aubreville, Marc; Maier, Andreas; Klopfleisch, Robert; Breininger, Katharina; Bertram, Christof A.

doi:10.1038/s41597-022-01389-0

Download PDF

Data Descriptor
Open access
Published: 03 June 2022

Inter-species cell detection - datasets on pulmonary hemosiderophages in equine, human and feline specimens

Scientific Data volume 9, Article number: 269 (2022) Cite this article

915 Accesses
2 Citations
Metrics details

Subjects

Abstract

Pulmonary hemorrhage (P-Hem) occurs among multiple species and can have various causes. Cytology of bronchoalveolar lavage fluid (BALF) using a 5-tier scoring system of alveolar macrophages based on their hemosiderin content is considered the most sensitive diagnostic method. We introduce a novel, fully annotated multi-species P-Hem dataset, which consists of 74 cytology whole slide images (WSIs) with equine, feline and human samples. To create this high-quality and high-quantity dataset, we developed an annotation pipeline combining human expertise with deep learning and data visualisation techniques. We applied a deep learning-based object detection approach trained on 17 expertly annotated equine WSIs, to the remaining 39 equine, 12 human and 7 feline WSIs. The resulting annotations were semi-automatically screened for errors on multiple types of specialised annotation maps and finally reviewed by a trained pathologist. Our dataset contains a total of 297,383 hemosiderophages classified into five grades. It is one of the largest publicly available WSIs datasets with respect to the number of annotations, the scanned area and the number of species covered.

Measurement(s)	Hemosiderin-Laden Macrophage • Hemosiderin-Laden Macrophage
Technology Type(s)	machine learning • visual observation method
Sample Characteristic - Organism	Homo sapiens • Felinae • Equus caballus

Deep Learning-Based Quantification of Pulmonary Hemosiderophages in Cytology Slides

Article Open access 03 August 2020

Detection of acute promyelocytic leukemia in peripheral blood and bone marrow with annotation-free deep learning

Article Open access 13 February 2023

Enhanced YOLOv5 network-based object detection (BALFilter Reader) promotes PERFECT filter-enabled liquid biopsy of lung cancer from bronchoalveolar lavage fluid (BALF)

Article Open access 29 September 2023

Background & Summary

In recent years, deep learning has revolutionised microscopy-based image recognition. Outstanding results can be achieved in well-defined tasks under the condition that sufficient high-quality datasets are available^1,2,3. For certain species and/or certain pathologies, however, available data may be sparse. Approaches such as transfer learning and domain adaptation provide the possibility to develop algorithms that generalise across species although they come with their own challenges and limitations². The generalised applicability of deep learning models between species could offer enormous scientific and economic value. For domains that lack appropriate training data, for example due to data protection and privacy restrictions, approaches that allow for this transferability may especially be useful in the context of animal models for human diseases.

To be able to develop, investigate and apply these algorithms, suitable cross-species datasets have to be available. The dataset described in this work aims to tackle several gaps present in currently available datasets. Firstly, whereas there are a couple of highly domain and target specific whole slide image (WSI) datasets publicly available for tissue^1,2,3, to the authors’ knowledge none for cytologic research questions. Secondly, no publicly available dataset provides annotated WSIs from multiple species for the same pathology. Finally, as shown in our previous publication^4,5, there is a high inter- and intra-observer variability for grading pulmonary hemosiderophages, which can be reduced by algorithmically supporting experts during labelling. For the development of these algorithms, large high-quality datasets are required, which is a further motivation for this publication.

In the following sections, we will describe the creation of a novel multi-species Pulmonary hemorrhage (P-Hem) WSI dataset. P-Hem describes repeated bleeding into the lung and can have a broad range of causes like congestive heart failure, leukaemia, physical exercise, or autoimmune disorders^{6,7,8,9,10,11} with possible life-threatening consequences¹². In sport horses, a specific disease entity called exercise-induced pulmonary hemorrhage (EIPH) has very high incidences and may lead to reduced athletic performance^9,13,14. This disease has therefore high relevance for the equine sport industry and has been used as an animal model for human P-Hem¹⁵. P-Hem is often diagnosed by cytologic examination of pulmonary fluid (bronchoalveolar lavage fluid (BALF)) with quantification of hemosiderin content in alveolar macrophages^7,16. In chronic bleeding, macrophages (hemosiderophages) degrade red blood cells into hemosiderin, which is a protein complex containing iron. Usually special stains for iron, such as Prussian Blue or modified Turnbull’s Blue (Quincke reaction), are used to highlight the hemosiderin concentration in alveolar macrophages. For diagnosis of P-Hem in humans, a 5-tier grading system has been developed by Golde et al.⁷ and Doucet and Viel¹⁷ have adapted this system for EIPH in horses. Hooi et al.¹¹ have recently described a similar scoring system for cats.

For the creation of this novel dataset, we digitised and fully annotated 55 equine, seven feline and 12 human BALF samples with a total of 297,383 manually verified macrophage annotations in form of bounding boxes. To improve labelling efficiency and data quality, we applied expert-algorithm cooperation in the following manner. Firstly, we incorporated a publicly available pre-trained EIPH model⁴ for equine WSI grading to our multi-species dataset resulting in 585,600 candidate annotations. Secondly, visualisation and clustering techniques were applied to semi-automatically remove 45,944 false positive annotations. Afterwards, a trained pathologist (C. A. B.) performed a screening and reviewed the complete dataset which left 303,289 hemosiderophages. As a final validation step, the hemosiderophages were arranged and presented according to their grade and conclusively checked by the same trained pathologist (C. A. B.) resulting in a total of 297,383 manually verified annotated hemosiderophages.

As a result of this expert-algorithm pipeline which is visualised in Fig. 1), we present the largest publicly available fully-annotated multi-species cytology WSI dataset to date. Our dataset provides researchers with unprecedented opportunities to develop new inter-species algorithms and can help to overcome domain adaptation limitations. We evaluated the quality of the dataset by conducting a species-wise 3 × 3 cross validation and performed an ablation study to estimate how many annotated WSIs are needed to adapt to new species.

Methods

The following section describes the sample collection, staining and digitisation procedure followed by our annotation processing pipeline. The balf samples of the three species were collected at different institutes for routine diagnostic evaluation of respiratory disease. Therefore, no animal was harmed for the creation of this dataset. Individual case histories were not considered in the present study and all samples we received were anonymised by the providing laboratory. Approval for use of animal specimens was given by the State Office of Health and Social Affairs of Berlin (approval ID: StN 011/20) and for human samples by the University of Lübeck (approval ID: AZ 19–428). The 74 cytological slides were prepared by cytocentrifugation and stained for iron content with Prussian Blue (n = 37) or modified Turnbull’s Blue using the Quincke reaction (n = 37). Both staining methods result in similar insoluble blue pigments¹⁸ and therefore similar hemosiderophages appearances. Digitisation of the glass slide was performed using a linear scanner (Aperio ScanScope CS2, Leica Biosystems, Germany) at a magnification of 400× (resolution: \(0.25\frac{\mu m}{px}\)). To be as consistent as possible in the data pre-processing phase, all samples were stained and digitised in the same laboratory (Institute of Veterinary Pathology, FU Berlin).

Equine datasets

Fifty-seven equine samples were prospectively collected at the VetPath Laboratory Services (Australia) from 29 BALFs samples of 25 horses with clinical signs of lower respiratory tract disease. Samples were prepared by cytocentrifugation (CYTOPRO 7620, Wescor Inc, Logan, UT, USA) at 510 × g for 3 minutes using a variable volume of BALFs depending on cellular density. Subsequently unstained slides were shipped to the FU Berlin, Germany, and stained with both staining methods and digitalized as described above.

Manually expert labelled equine (MELE) dataset

A preliminary dataset using 17 equine WSIs was developed for a previous publication⁴ and revised for this publication. Initially, these slides were fully annotated by one expert (C. A. B.) with the open source software SlideRunner¹⁹ in a two stage process. First all macrophages/hemosiderophages were annotated by screening the WSIs and afterwards cell annotations were assigned a corresponding grade. From these 17 WSIs, 16 were added to this publication and one was removed due to a significant fungal contamination (>1% of the cells) in the Turnbull’s blue staining, resulting in 10 Prussian Blue and 6 Turnbull’s Blue samples from 16 horses. Subsequently (for this publication), the same expert (C. A. B.) modified this dataset by a second screening process and review of the grades with the help of density maps (see section Density map). In the following, we will refer to this dataset as manually expert labelled equine (MELE) dataset.

Expert-algorithm labelled equine (EALE) dataset

For the creation of the expert-algorithm labelled equine (EALE) dataset, we used 39 additional WSIs from 26 horses. A detailed overview regarding the dataset’s meta-data can be accessed at the supplementary Table images_meta_data.csv. The samples were prepared at the same laboratory as the MELE dataset and were processed according to the same protocol. The dataset consists of 18 Prussian Blue and 21 Turnbull’s Blue samples. The database was created by interference of the WSIs with an algorithm developed on the initial dataset (MELE) and multiple steps of quality control (Clustering, Screening, Density maps) as summarised in Fig. 1.

Expert-algorithm labelled feline (EALF) dataset

Seven feline samples were retrospectively obtained from the study by Hooi et al.¹¹, which was designed to evaluate the presence of hemosiderophages in feline BALF samples. Samples were initially prepared by cytocentrifugation and stained with Wright’s stain¹¹. For this study specimens were de-stained and re-stained with Turnbull’s Blue. The re-staining of WSIs is assumed to have a negligible effect on dataset creation in light of the applied expert-algorithm collaboration pipeline which can correct for algorithmic confusions due the lower quality of the input data^4,5. Labels were created by interference and a multi-step quality control (Clustering, Screening, Density maps). In the following, we will refer to this dataset as expert-algorithm labelled feline (EALF).

Expert-algorithm labelled human (EALH) dataset

The samples were collected by a BALF procedure using local anaesthesia bronchoscopy. In all cases humans did not undergo any steroid or other immunoregulatory therapy. After the volume of recovered BALF had been assessed, the fluid was filtered through a layer of sterile gauze, centrifuged (15 min, 4 °C, 65 × g) and resuspended. Total cell counts were assessed in a Neubauer chamber and viability was determined by trypan blue exclusion. Each cytospin slide was prepared from BALF with 50,000 cells (600 cpm, 15 min; Heraeus Sepatech Omnifuge 2.0 RS, Hanau, Germany). Following staining with May-Grünwald-Giemsa and HEMATOGNOST Fe® SIGMA routine cytological examination were performed to confirm P-Hem due to different underlying diseases. Supplementary preparations were made of 12 cases with proven P-Hem and unstained specimens were subsequently send to FU Berlin and three stained with Turnbull’s blue and nine with Prussian Blue. In the following, we will refer to this dataset as expert-algorithm labelled human (EALH) dataset.

Labelling and visualisation platform

To create this multi-species WSI dataset, we used the open source online platform EXACT²⁰, which was specifically modified for this project. The software supports the creation of this dataset with multiple features which we will briefly summarise in the following section. Manual WSI annotations are supported by a special screening mode, which allows for systematic screening of slides in a user-defined magnification while saving the progress per expert and therefore allowing to conveniently resume the work at a later point in time. Furthermore, a bounding box annotation process is streamlined by a single-click annotation mode which incorporates the average hemosiderophages size and therefore minimises the need to further adjust the bounding box to the cell size. Annotation versioning supports the tracking of changes and provides detailed and reproducible insights into the development process of datasets.

Inter-species inference from a pre-trained model

At the time of dataset development, no annotations for feline or human P-Hem slides were publicly available, which resulted in limited options to perform transfer learning-based methods. Therefore, we directly applied the publicly available⁴ equine P-Hem deep learning model trained on the MELE dataset to the WSIs of the EALH, EALF and EALE dataset (Fig. 1 Inference). The equine deep learning model uses a custom RetinaNet-model^4,21 optimised for hemosiderophage WSI detection. The model was trained with the Adam optimiser on 14 fully annotated WSI from the MELE dataset until convergence was reached by a maximal learning rate schedule of 0.01. The model was validated on three remaining fully annotated WSIs from the same dataset. As described in section MELE, for this publication, we excluded one slide due to considerable fungal contamination, resulting in 16 MELE slides.

Inference on the 58 unannotated WSIs of the EALE, EALF, EALH datasets took on average 120 seconds per WSI on an NVIDIA Quadro P5000 graphics card. To minimise the probability of missing hemosiderophages, we applied a classification probability threshold of 0.35 to all slides to obtain a highly sensitive and less specific model resulting in 585,600 macrophage/hemosiderophage candidate annotations.

Semi-automatic data cleaning via customised clustering

The accuracy of deep learning models depends on multiple factors, which are oftentimes difficult to control. One influencing factor, that may lead to varying results, is the quality of the source dataset, which, in turn, strongly depends on various pre-analytic steps such as image acquisition. Additionally, the label quality used for training deep learning models has a strong influence on the final performance, and for P-Hem grading a high inter- and intra-observer variability has previously been described^4,5. Special stains for iron are ideal to quantify the intracytoplasmatic hemosiderin content (stained as blue pigment), but introduce considerable difficulties in differentiation of different cell types due to the weak staining of cellular components. One additional aspect is the domain shift between species, which might manifests in altered cell morphology and texture compared to the source domain (i.e., equine tissue). An example for this domain shift artefacts is the reduced performance of the initial algorithm on the feline samples due to false-positive detections of granulocytes or multiple bounding box predictions per cell.

To minimise the effect of the above-described implications on this dataset, we established the following semi-automatic pipeline. Firstly, all cell patches of a slide were copied into a new image on the EXACT server and sorted by width in ascending order on the x-axis (Fig. 1 3rd row, Clustering). Predictions were grouped by width-to-hight-ratio of the bounding box in a annotation map. Thereby a human expert could remove obvious false positive predictions (small cell types and non-maximum-suppression artefacts) using the web interface. This is implemented by drawing a rectangle with a computer mouse around groups of cells to delete them from the dataset. Aforementioned size-based visualisation also allowed the efficient re-labelling of false-positive granulocytes in feline samples due to their significant smaller cell size compared to macrophages. The semi-automatic data cleaning step removed 17.45% of the cells created at the inference step.

Experts screening

For labelling data, expert-algorithm collaboration is considered suitable for creating high quality datasets⁵. Diligent expert review of algorithmic predictions is indispensable, especially for WSIs that may potentially exhibit a significant domain shift to the initial training data. To keep the screening process as consistent as possible, the same veterinary pathologist (C. A. B.) performed all annotation tasks. To enable an efficient validation of all algorithm-created annotations across the WSIs, we used the screening mode provided by the EXACT software. With this mode, it is possible to check a WSI patch by patch and correct errors on a user-selected magnification. An overlap of 15% per patch is applied, and the expert’s progress is saved automatically (Fig. 1 Screening). In this screening step, the expert removed 44.8% of the automatically detected cells (236,367) and introduced 560 new hemosiderophages on 51,110 patches. These numbers are in line with the high sensitivity and low specificity expected from setting low cutoff values for algorithmic predictions. Similar to the screening of the computer aided annotations, the original manual annotations from the equine MELE dataset initially developed for a previous publication⁴ were reviewed. Here, the expert deleted 17,050 of the 77,004 annotations (22.1%) and introduced 30 new annotations. Deletion of such a high number of manual and algorithmic labels was mostly attributed to the difficulty of classifying different cell types (macrophages versus other cell types) with the special iron stain. Clear identification of macrophages (including hemosiderophages) in BALF is largely based on morphology of the cell nucleus which is, however, only very weakly highlighted with iron stains. Cellular size and shape alone are only vague cell classification criteria. We have noticed that the task of distinguishing hemosiderophages against neutrophils may be complicated by the positive iron staining of both cell types. While the initial manual labelling of the MELE had a high sensitivity for labelling hemosiderophages, its re-evaluation suggested that many neutrophils had been wrongly annotated. During expert screening, unambiguous non-macrophagic cells, especially cells with a small cell size, were deleted, however this had no influence on the overall hemosiderin score of the respective WSI.

Density map

Initially, all hemosiderophages were classified into discrete grades from zero to four depending on their hemosiderin concentration, both for computer-aided annotations for the EALE, EALF, EALH dataset and the expert-created annotations for the MELE dataset. However, the hemosiderophages hemosiderin absorption is a continuous process which is only mapped to a discrete grading system. This can lead to inconsistent classification between neighbouring grades as previously described by Marzahl et al.⁴. To overcome this limitation, we utilised the provided cell-based regression approach⁴ to assign a continuous grade between zero and four to each hemosiderophage. Afterwards, we created a new image-map where the hemosiderophages were arranged in an ascending order along the x-axis according to their hemosiderin score. These novel image-maps were created for each WSI individually and reviewed by the same trained pathologist (C. A. B.) to make the process of identifying mislabelled cells on the border between two grades (Fig. 2 Density Map) as consistent as possible. On the density maps the expert changed the grade of 38,799 (13.04% Up: 13,591 Down: 25,208) annotations from which 99.92% were changes within one grade. The density maps also provided another opportunity to review the cell type of the annotations, which were deleted in 5,906 (1.95%) instances.

Data Records

We provide the 55 equine, 12 human and seven feline original WSIs in the Aperio SVS format without any identification properties publicly available on figshare²². Alongside, we supply all hemosiderophage annotations after each of the four processing steps (Inference, Cluster, Screening, DensityMap) as comma-separated files for easy access, as binary files which are compatible to our training and evaluation pipelines, and in the sqlite format for SlideRunner¹⁹. Each annotation provides the following information:

The annotation source slide name
A universally unique identifier (UUID)
The absolute bounding box coordinates (x1,y1,x2,y2) on the WSI
The EIPH grade in a discrete range from zero to four

Additionally, we provide a Docker build with all packages installed to download the WSIs and annotations for reproducing our experiments. Table 1 gives an overview of the dataset’s meta-data. A detailed per-image statistic can be examined in the supplementary Table images_meta_data.xlsx. The dataset column distinguishes between MELE, EALE, EALF and EALH datasets. The version column indicates the processing steps (Inference, Cluster, Screening, DensityMap) to which the following statistical data refer. The EIPH score was calculated by the method of Doucet and Viel¹⁷.

Table 1 Overview of the dataset meta-data, including the species, the dataset name, the number of slides, the version of the post-processing refinement step (Inference, Cluster, Screening, DensityMap) and the number of labels per hemosiderin score.

Full size table

In total, the expert screened 51,110 patches on 74 WSIs from three species which covers a total area of 5,196.17 mm². This resulted in 297,383 annotated macrophages/hemosiderophages, making this the largest published multi-species dataset of macrophages/hemosiderophages and one of the largest cytology WSI datasets in general.

Technical Validation

To gain a deeper understanding of the data and to establish a baseline for future studies, we conducted multiple experiments. Firstly, during the screening phase, we noticed that the expert (C. A. B.) deleted a high number of his own manually created annotations from the dataset of our previous work (MELE dataset). Furthermore, our deep learning method, which was trained on these initial annotations, also introduced many false positive annotations even at conservative thresholds. This effect was amplified by the decision to configure the model with a relatively high sensitivity in order to miss as few cells as possible. The observation that the initial object detection model was configured to have a high sensitivity (and therefore a low specificity) is backed by the statistics that only 560 new hemosiderophages were introduced in the screening phase of the dataset development (EALH, EALF, EALE) compared to 229,054 deleted cells. The combination of these effects caused the manual deletion of large quantities of annotations as shown in Table 1. To quantify and compensate for this effect, in the following first experiment, we investigated if the trained deep learning model can be efficiently adapted to this change in annotation behaviour by retraining on the updated annotations from the MELE dataset created for this publication. In a second experiment, we evaluated inter-species domain transfer and performed an inter-species cross-validation study. This experiment is followed by an ablation study to estimate the quantities of annotations needed to train an accurate EIPH object detector. To evaluate the object detection performance of the models trained in our experiments, we used the mAP metric introduced in the 2007 PASCAL VOC challenge²³.

Reevaluation of the inference step

To investigate whether and how efficient the deep learning model can adapt to the changed annotation behaviour, we trained models with the initial and reviewed MELE dataset and optimised thresholds for the different datasets individually. To make the results comparable to the initial publication⁴, we used the original 17 slides, including the slide with fungal contamination. We applied the customised RetinaNet architecture with a ResNet-18 pre-trained on ImageNet. The network was trained with the Adam optimiser using a maximal learning rate of 0.001 until the validation loss started to increase. As a metric to quantify how effective the deep-learning model adapted to the new annotations we calculated the mAP score with an intersection over union (IoU) >0.5 and compared total cell numbers. The mAP score increased with the new annotations by 5 percent from 0.66 to 0.71 compared to the object detection results reported in earlier works. This indicates that the experts annotations are more consistent. The optimal threshold calculated on the validation set for equine samples increased from 0.35 to 0.65 and for humane and feline slides from 0.35 to 0.80. The total number of detections decreased from originally 585,600 to 301,109 (ground truth 297,383) while the number of false negatives increased from 560 to 7,351 according to the final dataset. In conclusion the deep-learning model is able to adapt to new annotation behaviours and a stronger focus on finding optimal thresholds could lead to decreased manual interactions but introduces the risk of overlooking false-negative annotations.

Inter-species domain adaptation

As shown by Bullone et al.¹⁵, equines can be used to better understand human asthma on an immunopathological level. To support scientific research in this direction, the use of machine learning models across species is of great scientific and economic importance. To investigate the potential and limitations of transferability across different species, we have carried out a 3 × 3 cross-validation in which we trained on one species and validated on all other species separately. To support the comparability of the results across the species with their varying amount of available WSIs and to keep the computational effort within reasonable limits, we decided to use only five WSIs for training and two other WSIs for validation (See Table 2). This is further motivated by availability of only 7 feline WSIs. For the other two species, the training and validation subset was selected by using the seven most balanced slides with respect to the number of grade zero and one macrophages/hemosiderophages (see Table 2). We used this WSI sampling strategy to minimise the effect of an imbalanced dataset which could negatively impact the transferability study. Due to the circumstance that feline WSIs only contain hemosiderophages with the grades zero and one we only used these two classes for the cross-validation for all species and reason that the transferability of these two classes can be generalised to the remaining classes. Example patches and results from this cross-validation experiment are visualised in Fig. 3. The experiment achieves best results if the source is equal to the target domain with an mAP value of 0.90 (Equine 0.88, Human 0.90, Feline 0.91). The training on equine slides resulted in an mAP of 0.88 on human data which indicates that a domain transfer without adaptions to the deep learning model might be possible. Further studies need to show if this algorithms can be used for specific disease of humans such as COVID-19²⁴. When the source domain is human or feline, the average inter-species mAP is 0.8 (min 0.77, max 0.81). Moreover, EIPH can also affect other species such as dogs¹¹ and future studies may evaluate if the described domain transfer can be reproduced.

Table 2 The filenames of the five training and two validation slides (below the double line) per species used for the ablation and inter species cross-validation study.

Full size table

Ablation study

Annotating WSIs manually is a laboursome and expensive task. Therefore, one of the most interesting questions in creating datasets and training deep learning models is the number of WSIs and annotations needed to reach a converging performance. To answer this question, we started training for each species separately on one uniquely sampled patch (size 1024 × 1024 pixels, number of annotations: mean = 6.19, SD = 3.74) from one slide and then doubled the number of patches from the same slide every time training reached convergences on the validation set. The training set was chosen to have a balanced number of grade zero and one hemosiderophages. The cell-covered area of each WSI contains on average 1,000 unique patches, therefore we continued the ablation study using up to five different WSIs for training after reaching the values of 1024 training patches on the first slide. To increase the comparability between our experiments, we used the same network, parameters, annotations and slides as described in the section domain adaptation. As visualised in Fig. 4, the performance of the model increased significantly independent of the species until 128 patches with around 1000 unique hemosiderophages and started to converge afterwards even if additional WSIs were introduced and the total number of annotations was increased up to twentyfold. As described above, to keep the experiments between species comparable, we only used grade one and two hemosiderophages and therefore reason that around five hundred cells per type are sufficient to reach convergence. To put this into perspective: 12 human samples contain only 127 grade four hemosiderophages, making the shown inter-species domain transfer highly valuable for creating deep learning models for human data. This is especially valid for P-Hem, which has a particularly high incidence in horses.

Usage Notes

Due to multiple dependencies of our repository we provide a docker file to streamline the setup process and install all necessary packages for tracing and reproducing our results. The most prominent dependencies are: fast.ai²⁵, a deep learning library which is build on PyTorch²⁶, matplotlib²⁷ for visualisation, object-detection-fastai with our custom RetinaNet implementation and OpenSlide²⁸.

The repository is structured as follows: On the top level the “Download.ipynb” jupyter notebook will download all slides and annotations from figshare²² automatically. The folder Statistics contains notebooks which analyse the dataset annotations and general information about the slides. Inference contains code to train the described models and perform inference on slides. Regression trains the regression models to predict a continuous EIPH grade and is used for creating the density maps. Cluster contains code to create custom annotation maps and synchronise the generated images and annotations with EXACT.

Code availability

All code used in the experiments to generate results, plots and tables was written in Python and is available through our GitHub repository for EIPH analysis [https://github.com/ChristianMarzahl/EIPH_WSI/] in the folder SDATA and is referenced on Zenodo²⁹.

References

Hou, L. et al. Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types. Scientific Data 7 (2020).
Aubreville, M. et al. A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research. Scientific Data 7, 1–10 (2020).
Article Google Scholar
Bertram, C. A., Aubreville, M., Marzahl, C., Maier, A. & Klopfleisch, R. A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor. Scientific Data 6, 1–9 (2019).
Article Google Scholar
Marzahl, C. et al. Deep learning-based quantification of pulmonary hemosiderophages in cytology slides. Scientific Reports 10, 1–10 (2020).
Article Google Scholar
Marzahl, C. et al. Are fast labeling methods reliable? a case study of computer-aided expert annotations on microscopy slides. In MICCAI, 24–32 (Springer International Publishing, Cham, 2020).
Jonckheer, J., Slabbynck, H. & Spapen, H. Diffuse alveolar haemorrhage with predominant upper lung lobe involvement associated with congestive heart failure: a case series. Journal of thoracic disease 9, 3956 (2017).
Article Google Scholar
Golde, D. W., Drew, W. L., Klein, H. Z., Finley, T. N. & Cline, M. J. Occult pulmonary haemorrhage in leukaemia. Br Med J 2, 166–168 (1975).
Article CAS Google Scholar
Hopkins, S. R. et al. Intense exercise impairs the integrity of the pulmonary blood-gas barrier in elite athletes. Am J Respir Crit Care Med 155, 1090–1094 (1997).
Article CAS Google Scholar
Hinchcliff, K. W. et al. Association between exercise-induced pulmonary hemorrhage and performance in thoroughbred racehorses. Journal of the American Veterinary Medical Association 227, 768–774 (2005).
Article Google Scholar
Martínez-Martínez, M. U., Herrera-van Oostdam, D. A. & Abud-Mendoza, C. Diffuse alveolar hemorrhage in autoimmune diseases. Current rheumatology reports 19, 27 (2017).
Article Google Scholar
Hooi, K. S., Defarges, A. M., Jelovcic, S. V. & Bienzle, D. Bronchoalveolar lavage hemosiderosis in dogs and cats with respiratory disease. Veterinary Clinical Pathology 48, 42–49 (2019).
Article Google Scholar
Ahmad, K. A., Bennett, M. M., Ahmad, S. F., Clark, R. H. & Tolia, V. N. Morbidity and mortality with early pulmonary haemorrhage in preterm neonates. Archives of Disease in Childhood-Fetal and Neonatal Edition 104, F63–F68 (2019).
Article Google Scholar
Morley, P. S., Bromberek, J., Saulez, M. N., Hinchcliff, K. W. & Guthrie, A. J. Exercise-induced pulmonary haemorrhage impairs racing performance in t horoughbred racehorses. Equine Vet. J. 47, 358–365 (2015).
Article CAS Google Scholar
Birks, E. K., Durando, M. M. & McBride, S. Exercise-induced pulmonary hemorrhage. Veterinary Clinics: Equine Practice 19, 87–100 (2003).
Google Scholar
Bullone, M. & Lavoie, J.-P. Asthma “of horses and men”—how can equine heaves help us better understand human asthma immunopathology and its functional consequences? Mol. Immunol. 66, 97–105 (2015).
Article CAS Google Scholar
Hoffman, A. M. Bronchoalveolar lavage: sampling technique and guidelines for cytologic preparation and interpretation. Veterinary Clinics of North America: Equine Practice 24, 423–435 (2008).
Google Scholar
Doucet, M. Y. & Viel, L. Alveolar macrophage graded hemosiderin score from bronchoalveolar lavage in horses with exercise-induced pulmonary hemorrhage and controls. J Vet Intern Med 16, 281–286 (2002).
Article Google Scholar
Meguro, R. et al. Nonheme-iron histochemistry for light and electron microscopy: a historical, theoretical and technical review. Archives of histology and cytology 70, 1–19 (2007).
Article CAS Google Scholar
Aubreville, M., Bertram, C., Klopfleisch, R. & Maier, A. Sliderunner. In Bildverarbeitung für die Medizin 2018, 309–314 (Springer, 2018).
Marzahl, C. et al. Exact: a collaboration toolset for algorithm-aided annotation of images with annotation version control. Scientific reports 11, 1–11 (2021).
Article Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In ICCV, 2980–2988 (2017).
Marzahl, C. et al. Inter-species cell detection: Datasets on pulmonary hemosiderophages in equine, human and feline specimens. Figshare https://figshare.com/s/d48f0520b4fa164043b5 (2021).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J. & Zisserman, A. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 303–338 (2010).
Article Google Scholar
Drak Alsibai, K. Detection of hemosiderin-laden macrophages in bronchoalveolar lavage fluid of covid-19 patients: Is perls stain a potential indicator of oxidative alveolar damage? Acta Cytologica 1–3, https://doi.org/10.1159/000508020 (2020).
Howard, J. & Gugger, S. Fastai: A layered api for deep learning. Information 11, 108 (2020).
Article Google Scholar
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, 8026–8037 (2019).
Hunter, J. D. Matplotlib: A 2d graphics environment. Computing in science & engineering 9, 90–95 (2007).
Article ADS Google Scholar
Goode, A., Gilbert, B., Harkes, J., Jukic, D. & Satyanarayanan, M. Openslide: A vendor-neutral software foundation for digital pathology. Journal of pathology informatics 4 (2013).
Marzahl, C. & Aubreville, M. ChristianMarzahl/EIPH_WSI: SDATA publication, zenodo, https://doi.org/10.5281/zenodo.5722932 (2021).

Download references

Acknowledgements

C.A.B. gratefully acknowledges financial support received from the Dres. Jutta & Georg Bruns-Stiftung für innovative Veterinärmedizin. Human samples were kindly provided by the BioMaterialBank Nord, which is supported by the German Center of Lung Research. The BioMaterialBank Nord is member of popgen 2.0 network (P2N) which is supported by a grand from the German Ministry for Education and Research (01EY1103).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Pattern Recognition Lab, Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Christian Marzahl, Frauke Wilm, Andreas Maier & Katharina Breininger
Research and Development, EUROIMMUN Medizinische Labordiagnostika AG, Lübeck, Germany
Christian Marzahl & Jörn Voigt
VetPath Laboratory Services, Ascot, Western, Australia
Jenny Hill & Jason Stayt
Department of Pathobiology, OntarioVeterinary College, University of Guelph, Guelph, ON, Canada
Dorothee Bienzle
Cytology Laboratory, Lungen Clinic Grosshansdorf, Großhansdorf, Germany
Lutz Welker
Technische Hochschule Ingolstadt, Ingolstadt, Germany
Marc Aubreville
Institute of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany
Robert Klopfleisch & Christof A. Bertram
Department of Artifical Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Katharina Breininger
Institute of Pathology, University of Veterinary Medicine, Vienna, Austria
Christof A. Bertram

Authors

Christian Marzahl
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Hill
View author publications
You can also search for this author in PubMed Google Scholar
Jason Stayt
View author publications
You can also search for this author in PubMed Google Scholar
Dorothee Bienzle
View author publications
You can also search for this author in PubMed Google Scholar
Lutz Welker
View author publications
You can also search for this author in PubMed Google Scholar
Frauke Wilm
View author publications
You can also search for this author in PubMed Google Scholar
Jörn Voigt
View author publications
You can also search for this author in PubMed Google Scholar
Marc Aubreville
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Maier
View author publications
You can also search for this author in PubMed Google Scholar
Robert Klopfleisch
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Breininger
View author publications
You can also search for this author in PubMed Google Scholar
Christof A. Bertram
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.M. and C.A.B. wrote the manuscript and carried out the main research and analysis tasks of this work. C.M. carried out data analysis, training of networks and method development. C.A.B. was responsible for image annotations. J.H. and J.S. provided the equine, D.B. the feline and L.W. the human samples. M.A., F.W., J.V., K.B., R.K., A.M. provided expertise through intense discussions. All authors contributed to the preparation of the manuscript and approved of the final manuscript for publication.

Corresponding author

Correspondence to Christian Marzahl.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Marzahl, C., Hill, J., Stayt, J. et al. Inter-species cell detection - datasets on pulmonary hemosiderophages in equine, human and feline specimens. Sci Data 9, 269 (2022). https://doi.org/10.1038/s41597-022-01389-0

Download citation

Received: 01 September 2021
Accepted: 16 May 2022
Published: 03 June 2022
DOI: https://doi.org/10.1038/s41597-022-01389-0