Introduction

The main purpose of forensic anthropology is to reconstruct the biological profile of deceased individuals; that is, to predict sex, age of death, lineage and height based on the remains of skeletons1. Forensic sex prediction has taken a large place in literature since the late 1960s and identification of sex from human skeleton has been described as an important factor, even a key element in both forensic medicine and bio-archaeological context2,3,4. Sex prediction is an indispensable part of biological profile. Anthropologist uses the biomarkers of the skeletal system that vary between sexes to determine sex5,6.

It is noteworthy that studies have been conducted in literature for the estimation of sex almost with all bones of the human skeleton and that the accuracy of gender determination has been researched frequently by comparing with different populations. It can be seen that various bones such as femur2,3, patella7,8, mandible9, calcaneus10, metatarsal bone and phalanx11,12, occipital condyle13, hand bones14,15 and sternum16 are used in sex prediction. It has been reported in a large number of studies in literature that cranium and pelvis bones, which are considered to be the most dimorphic areas according to skeletal parts, can be used in sex prediction by using different assessment methods4,10,16,17,18,19.

Identification of sex includes some inherent limitations that are affected by different factors such as ethnicity, socio-economic status, diet and geographic location. The inability to generalize the results obtained from a specific population, especially in skeletal parts such as cranium, to other populations and the need for population-specific studies increase the interest in cranium and mandible in sex determination4,20. For these reasons, all techniques reported for identifying sex are specific to related studies and they may not be applicable to different samples or data sets3.

It can be seen that methods such as discriminant analysis, machine learning algorithms (ML), support vector machine and artificial neural network are commonly used in sex prediction in which these bones are examined2,3,7.

ML is a modern classifier that is used extensively in the field of engineering, and it is now gradually integrated in the field of health. These algorithms are classified as supervised, unsupervised and reinforcement. Supervised learning is algorithms that match the relationship between input and output, unsupervised learning is algorithms that match the characteristics of the data about which there is no information and reinforcement leaning is the algorithms that match the input data with desired characteristics20. Decision Tree (DT) algorithm is one of the simple, powerful, fast and frequently used data mining classification algorithms that processes the inputs by dividing them continuously8,21,22,23. Logistic regression (LR) is a classification algorithm that uses the sigmoidal curve function to classify the relationship between output probability and parameters. Random Forest (RF) is an ensemble algorithm that can derive more than one decision tree within the system24. Extra Tree Classifier (ETC) is a superior method to RF, and this advantage is due to the random division of nodes and using all data as a training set25. Linear discriminant Analysis (LDA) is a classification algorithm that reveals the difference and relationship between classes26. Quadratic Discriminant Analysis (QDA) is a superior method to LDA and is a second-order parametric classifier27.

Computerized tomography (CT) is an imaging method that can show all tissues, especially bone tissue with sharp borders. In case of thin section, image orientation can be changed in three dimensions and can be taken to orthogonal plane. In this way, length and angle measurements can be calculated in a way that is less affected by orientation. With all these aspects, it provides superior results compared to studies carried out with more conventional osteometric devices16.

The aim of this study is to show the success of sex prediction by using ML with parameters obtained from CT images of cranium and mandible skeleton.

Results

Of the 25 parameters determined, 20 (NVIC, NSVC, NNL, PC, NIVA, PNIC, VIC, NIC, RML, CML, GHGA, HML, COL, CMHA, HGGC, COIC, HGGMC, HGGMA) were found to be statistically significant between males and females (p ≤ 0.05). In 18 of these parameters which were found to be statistically significant, the average of the parameter used was higher in males, while the average of the parameter used was higher in females in 2 parameters (GHGA, CMHA) (Tables 1, 2).

Table 1 Comparison of parametric data of males and females.
Table 2 Comparison of non-parametric data of males and females.

ROC analysis was performed with the IBM SPSS (Version 21) package program to reveal the discriminative power of the parameters in distinguishing between male and female individuals, and the highest AUC ratio was obtained with the CGL parameter (Fig. 1). AUC, cut-off, p, Sen, Spe values of all parameters are given in Table 3. In addition, ROC curves and AUC values for each algorithm are given in Fig. 2.

Figure 1
figure 1

ROC curve.

Table 3 ROC result table.
Figure 2
figure 2

ML ROC curve.

0.90 Acc, 0.80 Mcc, 0.90 Spe, 0.90 Sen and 0.90 F1 values were found as a result of the LR algorithm. As a result of the confusion matrix performed, 27 of 30 males and 27 of 30 females were predicted correctly (Fig. 3). Of the MLs, the highest Acc, Mcc ratio was found as 0.90, 0.80 with LR algorithm. Acc ratios of the other MLs were between 0.81 and 0.88. The coefficient of each parameter according to the LR algorithm, respectively − 5.33, 1.45, 1.05, 1.01, − 6.10, − 5.30, − 5.29, 2.84, − 4.94, 5.80, − 7.77, − 1.73, − 1.50, 1.61, − 2.28, 8.12, 1.50, 1.10, 1.22, − 2.90, 7.4, − 5.59, 4.03, 4.20, − 3.01 as was found, and HGGMA, PC, BIC HGGA, CMHA, HGGC parameters were statistically significant in terms of gender.

Figure 3
figure 3

LR confusion matrix.

0.88 Acc, 0.77 Mcc, 0.88 Spe, 0.88 Sen, 0.88 F1 values were found as a result of LDA algorithm and 26 of 30 males and 27 of 30 females were predicted correctly as a result of confusion matrix. 0.83 Acc, 0.67 Mcc, 0.83 Sep, 0.83 Sen, 0.83 F1 values were found as a result of QDA algorithm and 24 of 30 males and 26 of 30 females were predicted correctly as a result of confusion matrix. 0.88 Acc, 0.77 Mcc, 0.88 Spe, 0.88 Sen, 0.88 F1 values were found as a result of RF algorithm and 24 of 30 males and 27 of 30 females were predicted correctly as a result of confusion matrix. 0.85 Acc, 0.70 Mcc, 0.85 Spe, 0.85 Sen, 0.85 F1 values were found as a result of ETC algorithm and 24 of 30 males and 27 of 30 females were predicted correctly as a result of confusion matrix. 0.81 Acc, 0.67 Mcc, 0.81 Spe, 0.81 Sen, 0.81 F1 values were found as a result of DT algorithm and 24 of 30 males and 23 of 30 females were predicted correctly as a result of confusion matrix.

In addition, in terms of the reliability of our study, the tenfold cross-validation estimation values of the algorithms are also included. As a result of tenfold cross validation, Acc ratio of 87.766 ± 0.819 with LR algorithm, Acc ratio of 87.733 ± 0.410 with LDA algorithm, Acc ratio of 86.533 ± 0.592 with QDA algorithm, Acc ratio of 85.766 ± 1.045 with RF algorithm, Acc ratio of 77.200 ± 1.970 with ETC algorithm, Acc ratio of 80,266 ± 1.396 was obtained with the DT algorithm (Table 4).

Table 4 Tenfold cross validation results (%Acc).

In our study, the SHAP explanatory model of the RF algorithm was used to reveal the contribution of the parameters to the general algorithm, and it was found that the first five contributions were found to be with the parameters HGGMC, PC, GGL, HGGA, HGGC (Fig. 4).

Figure 4
figure 4

RF algorithm SHAP explanatory image.

Discussion

The aim of this study is to test whether sex identification can be made by using ML with the parameters obtained from cranium and mandible CT images taken to orthogonal plane. In the statistical analysis performed, NVIC, NSVC, NNL, PC, NIVA, PNIC, VIC, NIC, RML, CML, HML, COL, HGGC, COIC, HGGMC, HGGMA parameters were found to be statistically significant in distinguishing between sexes (p ≤ 0.05). Of the MLs tested, 0.90 Acc, 0.80 Mcc, 0.90 Spe, 0.90 Sen, 0.90 F1 values were found as a result of LR algorithm. It was found that 27 of 30 males and 27 of 30 females were predicted correctly as a result of confusion matrix. Acc ratios of other MLs were found to be between 0.81 and 0.88. Working in small datasets, lack of external validation, and not working in different populations are the limitations of our study.

Forensic anthropologists constantly try to improve skeletal identification methods by using various methods in various parts of the skeleton or by developing new methods to determine gender4. Pelvis and cranium are known as the most dimorphic skeletal parts and they form the basis of sex determination researches4,10,17,18,19. Bertsatos et al.19 reported that they predicted sex with an Acc ratio of 0.71–0.90 in total according to the results of the discriminant function analysis they carried out with the parameters taken from the cranium. Franklin et al.28 and Dayal et al.29 reached Acc ratios of 0.88- 0.90 and 0.80- 0.85, respectively according to the results of the discriminant function analysis they carried out with the parameters taken from the cranium. In this study, 0.90 Acc, 0.80 Mcc, 0.90 Spe, 0.90 Sen, 0.90 F1 results were found as a result of LR algorithm. Since the ML results included Mcc value which can evaluate Acc, Spe, Sen values together and which shows the reliability of algorithm, it is thought that reliability and accuracy were tested with various methods and reliable results were found in the study12.

While discriminant function analysis is one of the most widely used methods in forensic and archaeological cases for the determination of sex in literature, it is known that error rates are always different from 0%2. The fact that the MLs used in the present study were trained as 80% training and 20% test set increases the prediction reliability of the study and makes it more advantageous when compared with discriminant analysis.

CT is preferred for providing advantage in the measurement of missing and damaged parts by making bone measurements very close to original and allowing for the reconstruction of each bone part, unlike conventional osteometry devices (calliper, odontometer, digital distance meter)16,22. As far as we know, studies that associate parameters taken from cranium and mandible on orthogonal plane with ML based sex prediction are very limited. Even if CT is used in current studies, the results can show differences because the orientation of the image is not converted to the orthogonal plane since especially angular measurements are parameters affected by orientation.

In their study they predicted sex from cranium by using CT, Gillet et al.30 used geometric morphometric model in their study and reported that they reached 0.90 Acc ratio for skull model. Zaafrane et al.31 reported that they estimated sex with an Acc ratio of 0.90 from parameters of cranium in CT images they analysed by using basic statistical methods. These differences in results can be explained with the fact that the evaluation of sexually dimorphic features depend on group specific standards and skeletal characteristics differ among different populations, as well as the methodological methods used and differences in statistical analyses.

Imaizumi et al.32 They used the support vector machine in their study in which they examined 100 skull skeletons and obtained a gender prediction rate of over 90% with 10% cross validation. In this study, we use image-based CNN, SVM, etc. We did not choose algorithms. The reason for this is due to the selection of only anthropometric points, not the entire cranium skeleton. Anthropometric points were measured manually using the Horos Project program and the results were used as ML algorithm input. Because image-based algorithms will produce a result by learning all the points of the given cranium skeleton.

It has been reported in literature that the possibility of removing the mandible intact is high33. The reason for this is the fact that the presence of a dense compact bone layer in the mandible makes it durable and therefore more likely to be found intact34. It is reported in literature that the measurements taken from the mandible are generally obtained from panoramic radiography images and that these images are affected by orientation35. According to the results of studies in which only the measurements taken from mandible are evaluated, an Acc ratio between 0.60 and 0.88 seems to be a reliable structure for sex prediction29,35,36,37. In this study, combining the parameters taken from the mandible with the cranium strengthened gender prediction. RML, CML, GCGA, CFL, PLL, PICA, CGC, PLIC, CGGIC, CGGIA parameters taken from the mandible were found to be statistically significant in sex identification.

Since the identity of individuals should be predicted quickly and accurately in events such as war, natural disasters and fire, which deeply affect the society, the CT technology and MLs used in the present study show that prediction time can be minimized and high accuracy can be obtained. Considering the high Acc ratio found as a result of LR algorithm, it is thought that the present study will strengthen and contribute to studies related with sex prediction.

Materials and methods

Image set and population

The study was conducted at Karabük University Training and Research Hospital, Department of Radiology after 2020/363 coded approval of Karabük University Faculty of Medicine non-interventional clinical research ethics committee.

The image set in the study consisted of the CT images of 150 male and 150 female individuals whose ages ranged between 20 and 65. Individuals with any surgical operation or pathology of the cranium skeleton were excluded from the study. Average age of the males was 54 (min 20, max 65), while average age of the females was (min 21, max 65). No statistically significant difference was found between the average ages of males and females (p = 0.395).

Multidetector CT (MDCT) protocol

Radiological images used in the study were obtained from CT images with a section thickness of 5 mm taken in supine position by using a 16-row MDCT scanner (Aquilion 16; Toshiba Medical Systems, Otawara, Japan) in the department of radiology of a Karabük University Training and Research Hospital. Scanning protocol values were tube voltage: 120 kV, gantry rotation: 0.75 s and pitch: 1.0 mm.

Image analysis

The images obtained were transferred to Horos Medical Image Viewer (Version 3.0, USA) program, which is a personal workstation in Digital Imaging and Communications in Medicine (DICOM) format. Images in sagittal, transversal and coronal planes were obtained from the transferred images by using 3D Curved Multiplanar Reconstruction (MPR). The line passing through the nasion and inion points of the images in these three planes was determined and all images were brought to the orthogonal plane (Fig. 5A). Later, CT images brought to orthogonal plane were overlapped by increasing their section thicknesses (Fig. 5B).

Figure 5
figure 5

(A) Sagittal, transversal and coronal images brought to orthogonal plane, (B) Overlapped image.

Length, angle, area and curvature length measurements of the anatomic points of the overlapped images were performed. These parameters and their abbreviations are listed below in Tables 5, 6 and 7. Demonstration of all evaluated parameters is shown in Fig. 6.

Table 5 Length parameters and abbreviations.
Table 6 Angle parameters and abbreviations.
Table 7 Curve lenght-area parameters and abbreviations.
Figure 6
figure 6

Demonstration of parameters (1: NVIA, 2: ZA, 3: COIC, 4: HGGA, 5: VIC, 6: NNZA, 7: COLI, 8: CML, 9: NIC, 10: NIVA, 11: CMHA, 12: RML, 13: NNL, 14: COL, 15: GA, 16: NFIA, 17: HML, 18: HGGC, 19: NVIC, 20: PC, 21: GCGA, 22: PNIC, 23: HGGMA, 24: NSVC, 25: HGGMC).

Machine learning algorithms

In this study, scikit-learn model (Version 0.20.0) in Python programming language (Version 3.7.1) was used to make ML modelling38. ML modelling was performed by using i7, 8 GbHp-Folio 1040 model computer. Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Extra Tree Classifier (ETC) algorithms were used. The dataset was mixed by shuffling, and the first 80% (240 measurements) was designated as the training set, while the last 20% (60 measurements) was designated as the test set. In addition, tenfold cross validation accuracy values are also included in terms of the reliability of our study.

Performance criteria

Accuracy (Acc), Specificity (Spe), Sensitivity (Sen), F1 score (F1), and Matthews correlation coefficient (Mcc) values were included as performance criteria.

$$\begin{aligned} \mathrm{Acc} & =\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}+\mathrm{FP}+\mathrm{TN}}\\ \mathrm{Sen} & =\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} \\ Spe & =\frac{TN}{TN+FP}\\ \mathrm{Mcc} & =\frac{\mathrm{TP}\times \mathrm{TN}-\mathrm{FP}\times \mathrm{FN}}{\sqrt{(\mathrm{TP}+\mathrm{FP})\times (\mathrm{TP}+\mathrm{FN})\times (\mathrm{TN}+\mathrm{FP})\times (\mathrm{TN}+\mathrm{FN})}}\\ \mathrm{F}1 & =2\frac{\mathrm{Specificity}\times \mathrm{Sensitivity}}{\mathrm{Specificity}+\mathrm{Sensitivity}} \end{aligned}$$
(1)

TP: True positive, TN: True negative, FP: False positive, FN; False negative.

Statistical analysis

Mean, standard deviation, minimum and maximum values were included in the descriptive statistics of each data according to gender groups. Normality test Anderson Darling test was applied to each parameter and it was checked whether the data were normally distributed. Two simple T test was applied to parametric data and Mann–Whitney U test was applied to nonparametric data and p ≤ 0.05 value was considered as statistically significant. In order to reveal the differences of the parameters in terms of gender, ROC analysis was performed and the ROC curve was included. Minitab 17 and IBM SPSS (Version 21) package program was used in analyses.

Ethical considerations

This retrospective study was initiated with the 2020/363 decision of the Karabük University Faculty of Medicine non-interventional clinical research ethics committee.

Ethical approval

The present study was approved by Karabük University Faculty of Medicine Local Non-Interventional Clinical Trials Ethics Committee with the protocol number 2020/363. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

This study is retrospective and based on images taken from the hospital archive system. Therefore, the requirement for informed consent for the study was waived by the Karabük University Faculty of Medicine Local Non-Interventional Clinical Trials Ethics Committee.

Conference presentation

This study was presented as an oral presentation at the 21st National Anatomy Congress in Turkey.