Terahertz refractive index-based morphological dilation for breast carcinoma delineation

This paper reports investigations led on the combination of the refractive index and morphological dilation to enhance performances towards breast tumour margin delineation during conserving surgeries. The refractive index map of invasive ductal and lobular carcinomas were constructed from an inverse electromagnetic problem. Morphological dilation combined with refractive index thresholding was conducted to classify the tissue regions as malignant or benign. A histology routine was conducted to evaluate the performances of various dilation geometries associated with different thresholds. It was found that the combination of a wide structuring element and high refractive index was improving the correctness of tissue classification in comparison to other configurations or without dilation. The method reports a sensitivity of around 80% and a specificity of 82% for the best case. These results indicate that combining the fundamental optical properties of tissues denoted by their refractive index with morphological dilation may open routes to define supporting procedures during breast-conserving surgeries.

www.nature.com/scientificreports/ to favor low recurrence risk and to attain more widely clear surgical margins. In return, a second surgery concomitantly increases the morbidity rate. So far, different research teams worldwide have reported the ability of terahertz imaging and spectroscopy to discriminate between healthy and malignant breast tissues. These studies were primarily conducted on formalinfixed and paraffin-embedded breast tissue 32,33 . Such investigations opened the route for clinical studies on freshly excised breast volumes 18,22,24 . The capabilities of terahertz radiation demarcation between normal and abnormal tissue regions were originally attributed to free-water content. Indeed, free-water molecules have been proven to present a specific permittivity step around 900-GHz 5 . Moreover cancer tissues are known to exhibit a greater free-water content than normal tissues 34 . However, further studies have suggested that the origin of contrast could not be solely attributed to water. That is because specific dielectric features exhibited by breast tissues, in the low terahertz frequency band, were not observed in water dielectric profile 35 . Hence, it has been suggested that, specific functional groups play a potential role 22 . Globally, the refractive index of breast cancer tissues has been shown to be higher than the one observed for normal tissues over the terahertz band. On the contrary, the related absorption coefficient was reported as unsatisfactory parameter for demarcation 35,36 . Additionally, the contrast level between healthy and malignant tissues depends on cancer cell density. In fact, while the resolution of any light-based imager remains dictated by the diffraction limit, two objects separated by a distance less than the wavelength cannot be distinguished. For instance, the spatial resolution of a far-field imaging system operating at 1-THz will be limited to 0.3-mm. Hence, the respective response to the external terahertz radiation stimuli of two biological entities, separated by a distance smaller than 0.3-mm, will have to be averaged. Considering the typical diameter of the eukaryotic cell is at the order of tens of microns, it can be concluded that, such a terahertz imager cannot manage to resolve entities at the cellular level. It has, however, been demonstrated that the use of computational imaging system operating in a total internal reflection geometry could resolve features with a sub-wavelength lateral resolution 37 . While it can be expected that high densities of cancer cells will lead to a well-defined demarcation, the dielectric response of isolated abnormal groups may be blurred by the healthy surrounding and ultimately leading to recognition analysis failure. Although the diffraction limit of resolution may complicate recognition in areas sparsely populated by cancer cells 38 , it also raises delicate questions on the exact frontier between two well localized normal and abnormal regions. Indeed, rather than depicting a sharp contrast between areas, the obtained cliché may inevitably exhibit a smooth gradient from one to another area which is a result of class-overlapping. That is particularly limiting when it comes to providing a pixel-by-pixel diagnosis based on the information collected.
The present work proposes a new approach for the clinical classification of breast tissue pixels that overcomes the limitations aforementioned. The method is based on the extraction of the terahertz refractive index map of freshly excised samples followed by morphological dilation. A high value of the refractive index has been reported as a reliable measure of the presence of cancer within a tissue 22,24 . Morphological dilation is a part of set-theory 39 and is commonly employed to images having characteristics of ambiguity and vagueness 40 . It consists of expanding a given shape contained in the input image. In biology, morphological processing was notably employed for counting blood cells during blood smear test 41 , to isolate female gametocyte 42 or for skin cancer segmentation 43 .
Operating dilation from regions exhibiting a higher refractive index should allow bypassing class-overlapping limitations. Such a process is referred to as terahertz refractive index-based morphological dilation and operates as follows: (1) the refractive index map of a freshly excised breast tissue is extracted through a specific objective function minimization; (2) a refractive index threshold is defined such that pixels exhibiting a refractive index higher than the threshold are classified as malignant while others are classified as benign; (3) morphological dilation is used to spread the malignant zones to the neighborhood.
To conduct these investigations, different freshly excised breast tissues have been scanned in reflection geometry by means of a terahertz spectrometer. The refractive index maps have been extracted. Different refractive index thresholds and dilation shapes have been tested. The related pixel classifications have been compared to those provided by a pathologist. Finally, the sensitivity and specificity of each combination of threshold-dilation shape have been derived.
The paper is organized as follows: "Experimental framework" describes the experimental framework to acquire raw terahertz images of the freshly excised breast tissues. "Refractive index map" describes the mathematical background to extract the refractive index map. "Morphological dilation" defines the morphological dilation and the respective dilation shapes employed in the study. "Image registration" describes the registration of obtained images with respect to the pathological cliché. "Diagnosis compliance" details the evaluation of compliance between the classifications provided respectively by the pathologist and the reported strategy. "Results" presents the results for different samples. Finally, "Conclusions" presents the conclusions.

Experimental framework
The experimental protocol was assessed and approved by the ethics committee of the Bergonié Institute. Human tissue analysis have been conducted in view of the fundamental ethical principles as stipulated in the Helsinki declaration and its later revisions. Written informed consent from each patient undergoing breast surgery was collected, stipulating their agreement regarding the use of their tissues for research purposes.
Breast tissue samples. Following surgery, breast excisions were cut into slices of a few millimeters and kept into physiological serum before measurement to ensure the moisture content and delay the necrosis. A maximum of one hour elapsed between the end of surgery and the terahertz acquisition starting time. Once measurement was complete, excised tissue samples were placed in formalin-buffered solution. This process enabled the further histology routine to compare the diagnoses provided by the reported method and the patholo- www.nature.com/scientificreports/ gist. Biological samples analyzed using the method about to be reported were obtained from three different patients. One sample was excised from each of these patients.

Measurement setup.
Time-domain terahertz pulsed images were acquired with a TPS3000 spectrometer (TeraView Ltd, Cambridge, UK) operating in reflection geometry. In such systems, terahertz pulses are generated from the activation of a GaAs photoswitch. A photoswitch consists of a discontinuous metallic antenna patterned onto a photoconductive layer. Ultra-fast near-infrared pulses with an energy greater than the semiconductor band gap are focused onto the gap between the two electrodes forming the photoswitch. The incident pump laser thus propagates within the photoconductive layer and generates electron-hole pairs due to absorption. Those photocarriers are then accelerated within the electric field of the biased antenna. The acceleration of these charges produces a transient current that drives the metallic antenna and is eventually emitted as a broadband terahertz pulse. The bandwidth directly depends on the lifetime of the carriers before recombination. The carrier lifetime in the GaAs crystal is in the subpicosecond scale, hence enabling pulses with a bandwidth ranging from 200-GHz to 2-THz. The schematic of the experimental set-up is given in Fig. 1. The route of the terahertz pulses is governed by two planar mirrors and a knife-edge right-angle prism mirror (KERAPM). The terahertz pulses are focused on the tissue sample supported by a 2-mm thick non-birefractive C-cut sapphire substrate (see Supplementary Information, Supplementary Fig. 1) via a polytetrafluoroethylene (PTFE) lens. The maximum incident angle of the terahertz pulses is 10 • . Both the reflections at the air-sapphire and sapphire-tissue interfaces are then focused onto a photoconductive antenna detector. The detector is sourced from the same ultra-fast near-infrared pulses used for terahertz wave generation with a beam splitter. The pulses are, however, delayed in time with a mechanical delay line. The periodic variation of the delay line length allows a time gated detection of terahertz pulses reflected by the object. In order to reduce the natural absorption of terahertz pulses by water vapor molecules, the terahertz route is confined within nitrogen chamber.

Refractive index map
To extract the refractive index from a raw frequency image, a reference electric field has to be recorded. The reference electric field E r (ω) refers to the electric field generated by the acquisition system. The reference measurement records the electric field of the reflection from a metal plate that is located where the sapphire substrate sample holder is aimed to be positioned for tissue imaging. From the reference electric field E r (ω) , the experimental transfer function T s (ω) , which is a measure of the disturbance experienced by the incident field as a result of the interaction with the sample, can be calculated: with E s (ω) the sample frequency-dependent electric field. The shape of transfer function T s (ω) is a function of the refractive index n(ω) and the extinction coefficient κ(ω) of the sample under inspection. E s (ω) depends on the Fresnel's coefficients in transmission T(ω) and in reflection R(ω) , and on propagation coefficients P(ω, d): with d being the thickness of the sapphire substrate. The Fresnel's coefficients T(ω) and R(ω) , as well as propagation terms P(ω, d) relate to the refractive index n(ω) and the extinction coefficient κ(ω) through: www.nature.com/scientificreports/ where a and b are the indices of the respective medium, n is the complex refractive index defined as n = n(ω) − jκ(ω) and c is the light velocity in vacuum. Although the extinction coefficient κ(ω) is involved in the calculation of the transfer function T s (ω) , no significant differences have been reported in the literature between normal and abnormal tissue extinction 22,35 . Hence, solely the refractive index is further considered as a possible intrinsic parameter for demarcation.

Map extraction.
The extraction of the complex refractive index n(ω) at each pixel from the experimental transfer function T s (ω) can be performed by solving an inverse electromagnetic problem. Inverse electromagnetic problems usually minimize a specific convex objective function. This function denotes the discrepancies between the experimental waveform E s (ω) and the waveforms E c x (ω) successively computed from a set of candidate parameters, where the x-index refers to the x th -candidate tested. The candidate waveforms E c x (ω) are computed as stipulated in 44 . The corresponding transfer functions T c x (ω) are calculated in the same way as described by (1). The measures of discrepancies δM x (ω) between the experimental transfer function T s (ω) and the computed transfer functions T c x (ω) are defined as: The natural logarithmic ratio is favored here instead of standard difference as it is more penalizing. Finally, the objective function χ(ω) to be minimized is defined as: The minimization of the transfer function is subject to the following set of candidate parameters: It was stated before that the sample is maintained by the sapphire substrate. Instead of extracting the properties of the sapphire substrate for each pixel, the properties were extracted upstream, in absence of a sample, and following the same minimization process. The sapphire properties are provided in Supplementary Information, see Supplementary Fig. 2. Finally, applying the above process to each electric field stored in each pixel of the sample image allows to construct the refractive index map. Once the refractive index map is obtained, it is converted to a binary map that shows areas that are considered malignant or benign. To do so, a threshold among the refractive index vector has to be set. Depending on the defined value for the threshold, one may progressively increase or decrease the extent of areas classified as malignant, since pixels with a refractive index higher than the threshold are classified as cancerous. A schematic of the process is given in Fig. 2.
Operating frequency. Although the refractive index is often referred to as optical constant, its profile varies as a function of the frequency. Previous studies have reported the terahertz frequency dependent refractive index values of abnormal and normal breast tissues 36 . Overall, the global difference between these values was shown to be the highest between 300-and 700-GHz, roughly. Hence, rather than investigating the entire band, the classification was operated at 550-GHz, as a good trade-off between signal-to-noise ratio (SNR) and higher frequency spatial resolution 45 . However, naively classifying pixels via the refractive index exhibited at 550-GHz www.nature.com/scientificreports/ may hardly be relevant. In particular, the refractive index extracted at the edges of malignant regions with low density may present values close to the ones of healthy tissues. Therefore, morphological dilation is introduced to overcome this limitation.

Morphological dilation
Prior to dilation, the refractive index map is converted to a binary image as it was described in the previous section. The dilation can therefore be referred to as binary dilation. The dilation consists of a shift-invariant addition, denoted " ⊕ ", within the meaning of Minkowski 46 . Mathematically, let's define P as an ensemble that contains the pixels (x, y) of the tissue imaged. The binary dilation ∂ � (P) of P by a shape ∈ Z 2 -also referred to as a structuring element, is given by: where ∈ produces the translation from P to ∂ � (P) . Supposing the matrix P and the structuring element as represented in Fig. 3, the matrix ∂ � (P) is obtained by superimposing the center of aligned with each pixel in P that has a value of 1.
In the present work, three different structuring elements have been considered to dilate the binary refractive index map. They are referred to as 1 , 2 and 3 classifiers. Their spatial properties are exposed in Fig. 4. These specific geometries allow the classifiers to act in the close vicinity of a starting pixel and with the same impact in all directions.
Therefore, depending on the classifier considered, a pixel may be attributed to the malignant group if at least one of the component of the structuring element n -where n ∈ N * -reports a pixel whose refractive index is higher than the defined refractive index threshold. Alternatively, the structuring elements can be seen as the area of influence of a cancerous pixel. Consequently pixels with a refractive index lower than the threshold but situated in such an area of influence, are turned into malignant pixels. It is however important to note that the process is constrained to a unique dilation and therefore, newly classified malignant pixels cannot, in turn, exercise a zone of influence.
In order to carry out the dilation and the registration steps that follow, it is essential to preserve the morphology of the imaged sample. To do this, the dilation procedure must be carried out with respect to the initial contour of the sample generated from a standard contouring algorithm, thus preventing the appearance of cancerous pixels outside the original surface of the sample. A schematic of the dilation process operated on a binary refractive index map is given in Fig. 5.

Image registration
The classification images provided by the reported method and the ones given by the pathologist do not share the same coordinate system. Image registration is the process of migrating different images into one common coordinate system 47 . Therefore, image registration is necessary to enable the comparison between the data sets.  www.nature.com/scientificreports/ Effectively, the spatial resolution of the optical microscope used to acquire pathology clichés is far greater than the one of the employed terahertz imager. Additionally, the orientation of the tissue sample in the terahertz image and in the clinical image are expected to be different, as they are not acquired with the same angle. A simple pixel-by-pixel comparison is therefore not possible as it stands. Prior to comparison, images have to be resized and reorientated. The registration process is feature-based and solely involves image contours to avoid unintentional human bias. The different steps that are followed to register the images with respect to each other are hereafter described.
Contouring. Contour lines, also called isolines, can be calculated by interpolating the value of the scalar field found at each pointel of each pixel. An infinite number of isolines can however be delineated. The choice of the contour to define the spatial extent of the sample in the image remains therefore subjective. For each sample, the isoline that suited the visualized tissue area best was determined by carefully comparing the terahertz image and the different contour levels.
Resizing. As the resolution of the images is different, it is necessary to resize the histology pictures. To do so, a bicubic interpolation is operated onto pathology images. Contrary to the previous interpolation, where it is based on the four nearest pixels, bicubic interpolation takes into account a neighborhood of sixteen pixels. Therefore, bicubic interpolation provides a smoother histology slide than simple bilinear interpolation.
Reorientation. First, the contour of the terahertz image is manually and progressively twisted to bring it closer to the twist angle of the pathology contour. Once the orientations approximately match, the pathology contour is iteratively rotated to establish the correlation between the two contour matrices at each step. Basically, it consists in determining the Pearson's correlation coefficients 48 . The rotation angle providing the highest positive correlation is selected and the terahertz image is correspondingly rotated. The flow chart of these three pre-treatments, namely contouring, resizing and reorientation for image registration is provided in Fig. 6.
Image discrepancies issues. Although one can resize and reorientate the two images with respect to each other, the pathology cliché and the terahertz image may not perfectly depict the same information. First, while terahertz imaging is performed directly on freshly excised tissues, the pathology diagnosis is established after the histology routine. Moreover, to obtain the pathology image, the excised tissue is first fixed in neutral buffered formalin, then dehydrated in subsequent alcohol baths with increasing concentrations, then cleared in a solvent before being infiltrated and finally embedded in paraffin wax. At this stage, the processed tissue is encased in a paraffin block that can be sliced in sections of a few microns thickness to be deposited on glass slides. These tissue sections are deparaffinized, rehydrated and subsequently stained with hematoxylin and eosin dyes. Finally, they are dehydrated in alcohol and cleared in a solvent before being mounted with a coverslip. The embedding, the sectioning and the desiccation alter the global structure of the tissues. These alterations are collectively referred as artefacts 49 . Artefacts include loss of tissue area and details, folds and wrinkles or cracks and holes. These alterations may result in misinterpretation as they are modifying the morphological structure of tissues. Alternatively, these artefacts may drastically limit the evaluation of the terahertz classification compliance (see Supplementary Information Supplementary Fig. 3, for an example based on one of the tissue reported by the present work). However, histological slides remain the only available reference picture that allows one to examine the performances of classifier under-test. Overall, there are two ways to deal with such issues: (1) correcting the histology slides at risk of adding artificial information; (2) comparing directly the terahertz image with the raw pathology image at risk of underestimating the efficiency of the method. The first way would require to morph the pathology image to correspond to the terahertz picture. Some procedures to do so were reported in the literature 50 . However, these methods are cumbersome and the evaluation of the histological cliché reconstruction is often complicated since no perfect reference pathology image exists. As terahertz imaging remains a new www.nature.com/scientificreports/ technology for breast carcinoma delineation, the second approach was favored-at risk of underestimating the efficiency of the classifiers.

Diagnosis compliance
Following the histology routine, pathology images are colored in different shades of blue and pink. The pathologist draws the contour of malignant areas based on his/her expertise. From the interpretation of the pathologist, the images were binarized and each pixel was classified either as benign or as malignant 51 .
Once both diagnosis images exhibit binary information, have the same size and orientation, the compliance between them can be evaluated. In case of discrepancies, the pathologist classification prevails over terahertz delineation. The present section describes how the ability of classifiers was evaluated with respect to the pathologist one.
Performance of the classification test. As each diagnosis presents a binary information, four different cases can be distinguished: • True negative: both methods classify a pixel as benign; • True positive: both methods classify a pixel as malignant; • False positive: the terahertz method stands for a malignant pixel while histology stipulates a benign pixel; • False negative: the terahertz method stands for a benign pixel while histology stipulates a malignant pixel.
Hence, for each refractive index threshold associated with a specific classifier, one can fill the corresponding confusion matrices that highlight the classification procedure performances. In such error matrices, the rows represent the instances in the terahertz class, here the predicted class, while columns represent the actual diagnosis provided by histology examination 52 .
From these matrices, the effectiveness of the classification method is assessed by creating the receiver operatic characteristic (ROC) curve 53 for each classifier. The ROC curve represents the ability of the classifier to provide the correct diagnosis as the refractive index threshold varies. The ROC curve is obtained by plotting the true positive rate (TPR) as a function of the false positive rate (FPR). The TPR is defined as the number of true positives divided by all pixels classified by the pathologist as positives: true positives and false negatives. The FPR is defined as the number of false negatives divided by all pixels classified by the pathologist as negatives: false positives and true negatives. It can also be thought as a plot of the sensitivity-that is equivalent to the TPR defined in Eq. (8), against the probability of false-alarm-that can be calculated as (1-specificity) and defined in Eq. (9) 54 . These measures of performances are favored as they are not sensitive to changes in data distributions, compared to accuracy and to error rate. Hence, both metrics can be used with imbalanced data 55 . Finally, the best discrimination thresholds are selected as the ones that provide the highest sensitivity while preserving the healthy tissue area from false diagnosis, i.e. specificity. It is noted that the aforementioned classification procedure is studied for the specific case of breast conserving surgery. Hence, it is essential to preserve the healthy area while removing the malignant zones. Ultimately, the best classifiers are selected as the ones that provide the highest measure of TPR − FPR , since higher values of this function indicates more accurate results.

Results
In this section, the classifiers are employed to evaluate their effectiveness on three freshly excised breast tissues. Two of these samples were diagnosed as invasive ductal carcinoma (IDC) and one was identified as an invasive lobular carcinoma (ILC). These samples are referred to as test sample TS#1, TS#2 and TS#3. TS#1. TS#1 is an invasive ductal carcinoma. The pathology image with some enlightened pathology areas, the pathology mask, the raw terahertz image obtained at 550-GHz and the correlated refractive index map are presented in Fig. 7.
It can be observed that the raw terahertz image as well as the refractive index map exhibits specific features that correspond to the pathology image. Regions depicted in Fig. 7a,b. correspond to fibrous tissues that are included in an adipose matrix. Such regions are therefore expected to globally give rise to a lower refractive index than the one classified as malignant as depicted in Fig. 7c,d. Although such a refractive index seems overall lower than the refractive index of the tumour, it remains relatively close to it. Therefore, classifying only on the basis of the refractive index would certainly prove to be inefficient. The sensitivity and the specificity of each structuring element classifier for varying refractive index threshold were calculated for TS#1. The corresponding ROC curves and TPR − FPR functions are given in Fig. 8.
Each n -dependent ROC curve is located to the left of the TPR = FPR line in Fig. 8, proving that the fraction of true positives is greater than the proportion of false positives. It is clear that the use of the refractive index alone as a classifier ( 0 ) is shown to be less efficient than associating the refractive index with a classifier. Such a statement is not surprising as the classification does not consider the neighborhood. While on ROC graphics, depicted in Fig. 8, it does not seem that obvious which classifier among 1 , 2 and 3 performs well, the TPR − FPR visualization indicates that the structuring element 3 in association with a high refractive index threshold by about 2.6 is the most efficient rule of classification. The association provides a classification with a sensitivity by around 80% and a specificity of 82%. What is more, the wider the structuring element, the higher the refractive index has to be set for good performances. Effectively, starting with a high refractive index makes it possible to identify, in a first instance, tissue areas densely populated with cancer cells, while a broad structuring element makes it possible to efficiently spread the identification over a wide zone.
The corresponding AUC for each ROC curve, the TPR − FPR value, the sensitivity and the specificity for the first two best refractive index thresholds are given in Table 1 (see Supplementary Information, Supplementary  Table 1. for the complete list of performances). While 1 and 2 are less efficient than 3 for both sensitivity and specificity, the 0 classifier provides a slightly greater sensitivity for a threshold of 2.1, by about 83%. However, the gain of 4% in sensitivity with respect to 3 costs concomitantly 20% in method specificity. Reasonably, this gain is not worth it, considering such a drastic decrease in classification specificity. Alternatively, if one wants  www.nature.com/scientificreports/ to increase the sensitivity while maintaining specificity at a reasonable level, second best thresholds may offer a promising substitute. On using the second best threshold provided by 3 of 2.5, an increase of 7% in sensitivity conjointly leads to a decrease by about 12% in specificity. By doing so, one reaches a sensitivity of 86%. The superimposition of the classification images from the reported method and the clinical one, corresponding to the performances listed in Table 1 are given in Fig. 9.
TS#2. TS#2 sample is an invasive ductal carcinoma from which a 67 years old woman was suffering. The initial tumor site was found to be roughly 100 mm 2 . On Fig. 10, the pathology image with some enlightened pathology areas, the pathology mask, the terahertz image at 550-GHz, and the refractive index map are shown. The pathology image as well as the pathology mask exhibit the presence of a hole, where no tissue is found. The lack of tissue in the middle of the section is not natural and enlightens the issues, that have been previously reported towards pathology images. Hence, this specific region is not considered for performance evaluation.
The ROC curves as well as the TPR − FPR function for different classifiers with various thresholds are given in Fig. 11. Similarly to the foregoing, all ROC curves are located to the left of the TPR = FPR line, hence proving that the fraction of true positives remains greater than that of false positives.
The most effective classifiers towards conserving classification are 2 and 3 , both for a threshold set at 2.1. While the combination of such a threshold with 2 provides a sensitivity of 67% and a specificity of 70%, the same threshold operating with 3 gives rise to a sensitivity by about 78% and a specificity of 57%. Hence, tuning the structuring element geometry would offer an interesting trade-off between specificity and sensitivity. The respective performances of each classifier applied to TS#2 are listed in Table 2 (see Supplementary Information,  Supplementary Table 1 for the complete list of performances).
The classification maps involving each classifier and their respective best performing thresholds are exposed in Fig. 12. These images show the improvement in classification with the use of morphological dilatation. Moreover, they highlight the difficulties of good prediction at the outer margins. Low performance at the outer margins may come from the non-conformity of the information in these areas between the terahertz image and the histology picture. The most convincing hypothesis for this non-conformity is the tissue deformation imposed by the histological routine.  Table 1. Statistical measure of the performance of the classifiers and AUC. The sensitivity and specificity obtained for the best performing classifier-refractive index threshold association is given in bold.  Fig. 13, the pathology image with some enlightened pathology areas, the pathology mask, the terahertz image at 550-GHz, and the refractive index map are shown. The ROC curves plotted in Fig. 14 indicate a lower efficiency towards classification than the efficiencies for TS#1 and TS#2. The cause may be found in the distribution of cancer cells within the malignant zone, in comparison to previously tested samples. While for other cases the malignant zone was densely populated, cancer cells are found in small quantity and in an inhomogeneous manner over TS#3. Additionally, the histology routine may have altered the tissue morphology as stated in "Image discrepancies issues".

Classifier
The AUC values and the performances for each classifier are given in Table 3 (see Supplementary Information,  Supplementary Table 1. for the complete list of performances). Despite the lower efficiency the most accurate classifying strategy remains 3 when associated with a refractive index threshold of 2.3. The TPR − FPR measure  www.nature.com/scientificreports/ is by around 0.28 with a sensitivity of 53% and a specificity of 76%. As already indicated these performances are below the ones reached for other study cases. 2 classifier offers a greater specificity of 85% but simultaneously concedes 11% upon sensitivity, thus falling below the critical threshold of half the number of malignant pixels correctly classified. The weak density of cancer cells within a lobular carcinoma slice may lead one to opt for 0 classifier operating in association with a low refractive index threshold to maximize the sensitivity, despite a concomitant loss in specificity.
The corresponding classification images for TS#3 for each classifier and the correlated best refractive index threshold are demonstrated in Fig. 15. The global classification clearly suffers from the spatial discrepancies between the fresh state tissue and the histological state. Even though such differences are expected to be the main roots behind classification accuracy weakness, the histological type of TS#3 may also trigger difficulties. It can be assumed that the classification strategy may provide better performances when applied on ductal carcinoma cases than on lobular ones. However, it is noted that the number of samples investigated does not allow to assert such a hypothesis.

Conclusions
In this paper, a new approach to support breast carcinoma margin delineation during surgeries with terahertz radiations was proposed. The method relies on the acquisition of the excised samples by means of a terahertz time-domain imager followed by a segmentation based on the extracted refractive index map at 550-GHz and its morphological dilation. Morphological dilation was introduced to overcome the weakness of the refractive index alone as a classifier in tissue regions sparsely populated with cancer cells. Dilation was used to construct a zone of influence of pixels. Hence, tissue areas close to regions identified as malignant were succesfully classified as cancerous despite a refractive index suggesting benign zones.
The performances of the classifications were assessed for three different samples. Overall, the association of a high refractive index threshold with a wide dilation has shown to be the most appropriate combination to maintain both method sensitivity and specificity at decent levels for invasive ductal carcinoma. The best performances of the methods have been reported to stand by about 80% in sensitivity and 82% in specificity. On the contrary, the same methodology applied onto an invasive lobular carcinoma showed lower performances. Various hypothesis were drawn to determine the roots for classification failure. While lobular carcinoma are  Table 2. Statistical measure of the performance of the classifiers and AUC. The sensitivity and specificity obtained for the best performing classifier-refractive index threshold association is given in bold.   www.nature.com/scientificreports/ globally less populated by cancer cells than the ductal histology type, pathology image alterations may also have contribute by rendering the diagnosis evaluation tedious. The recognition performances of malignant areas could be improved. Indeed, the terahertz classification has localized false negatives surrounded by true positives. Therefore, implementing an additional and simple processing that classifies as malignant, benign-predicted pixels that are encircled by cancerous ones would enhance the classification accuracy.

Classifier
Although investigations on higher rank classifier, i.e. for n with n > 3 , have not been conducted, a more efficient structuring element could be found. Nevertheless, a high rank for a structuring element is accompanied by an equally high refractive index threshold. Thus, a reasonable assumption would be that the refractive index suitable for the use of these higher-ranked classifiers lies beyond the optical properties of biological tissues.
This preliminary investigation towards terahertz refractive index-based morphological dilation may open the routes to refine strategies to improve the accuracy with which breast tumour margins are delineated. However, the field still stands in its early stages and suffers challenges due to pathology reference image alterations that complicate classification correctness assessment. Additionally, performance comparison with other classification algorithms are yet to be investigated and will be needed to pursue with the proposed methodology. Finally, and in authors's opinion, the applicability of terahertz waves for breast carcinoma margin demarcation still requires further studies to evaluate its feasibility in the clinical environment.

Methods
Numerical procedures were conducted with in-house software, written with the MatLab development framework. The software follows mathematical procedures described in this paper and our preceding works.  Table 3. Statistical measure of the performance of the classifiers and AUC. The sensitivity and specificity obtained for the best performing classifier-refractive index threshold association is given in bold.  Figure 15. TS#3 tissue sample classification maps at 550-GHz for 0 , 1 , 2 , 3 and their respective first two best thresholds. "Not applicable" refers to regions where the binary pathology classification and the binary terahertz classification image do not match spatially. The values listed in each box are respectively standing for the refractive index threshold, the true positive rate and the false positive rate.