Introduction

Minor physical anomalies (MPAs) are subtle morphological alterations in the head, eyes, ears, mouth, hands, and feet that do not have significant cosmetic or clinical effects1. Data from past research have proven that MPAs arise during the first or early second trimester and can be considered as indicators of altered brain development during early neurodevelopment due to schizophrenia2,3,4,5,6,7,8. Qualitative minor malformations represent defects of embryogenesis during the process of organogenesis, and the phenogenetic variation of final morphogenesis can be seen through quantitative defects9. Most of the current research on MPAs in schizophrenia has been based on qualitative observation, and only a few study groups have conducted quantitative measurements of MPAs10. Moreover, although MPAs have been suggested as biomarkers for schizophrenia, studies have rarely performed qualitative and quantitative measurements of MPAs in all six body parts (head, ears, eyes, mouth, hands, and feet) nor have they used a large number of cases for validation. Moreover, no research team has established systematic prediction tools for clinical use.

Extensive studies have shown that the MPA score of patients with schizophrenia is higher than that of healthy controls5,11,12,13,14,15,16,17,18,19,20. Some studies based on qualitative MPAs have reported that craniofacial anomalies are more often found in patients with schizophrenia than in healthy controls21,22 and seem to most accurately differentiate patients with schizophrenia from other patient groups19,21. Furthermore, according to a review paper, MPAs in the mouth and palate regions are more common than those in other anatomical sites. However, some studies have presented inconsistent results, with one showing that for five areas (namely mouth, feet, head, eyes, and ears) patients with schizophrenia have higher MPA scores than healthy controls19, but a systematic literature review reporting nobody region was more different or significant during MPA score testing than the others12. Reviewing the results of three papers that recorded detailed qualitative MPA items also did not lead to any consistent suggestions23,24,25. In two of the three studies, eight items exhibited significant differences between patients with schizophrenia and healthy controls (namely fine electric hair, hair whorls, epicanthus, low-seated ears, adherent ear lobes, steepled palate, furrowed tongue, and large gap between first and second toes), while six different items were significant in the last of the three studies (namely fused eyebrows, malformed ears, asymmetric ears, cuspidal ear, tongue with smooth-rough spots, and third toe longer than the second toe).

In previous quantitative assessments of MPAs, only a few studies on schizophrenia have reported comprehensive measurements covering multiple body areas, but rather, they have relied on observations of only specific parts, especially the palate and eye areas. From 2016 to 2020, four studies focused on the palate region. Two presented a significant difference in palate width between patients with schizophrenia and healthy controls26,27, and the other two reported differences in palate shape characteristics but did not find palate width to be significant28,29. Furthermore, a study measuring the quantitative characteristics of the eyes showed that patients with schizophrenia tended to have smaller optical angles and inner canthal distance. These results were supported by another study measuring the eye and skull regions8.

The objective of this study was to construct a web-based risk prediction tool for schizophrenia based on MPA measurements. The qualitative observations spanned six body areas and the detailed quantitative measurements covered five body areas. We used a large-scale training set to analyze and build the prediction model. A validation dataset was used to show the generalizability of the model, demonstrating the potential for clinical applications. Two different statistical methods, logistic regression, and Lasso regression were employed to establish nomograms for predicting the risk of schizophrenia. Comprehensive qualitative and detailed quantitative MPA measurements were assessed in patients with schizophrenia and healthy controls to identify the MPA measurements that could be used to distinguish patients with schizophrenia from healthy controls. Meanwhile, for clinical practice and research purposes, we developed a visualization and easy-to-use tool based on validated nomogram models that can predict the potential risk of schizophrenia.

Results

Descriptive data

Characteristics of patients from the training set (n = 500) and validation set (n = 244) are presented in Table 1. A comparison of the distribution of sex, age, body weight, height, and BMI between patients and healthy controls is presented. In the training set, patients had a mean age of onset of 25 years and a mean disease duration of 18 years. In the validation set, patients had a mean age of onset of 25 years and a mean disease duration of 17 years.

Table 1 Characteristics of participants in the training and validation sets.

Nomogram variables identified based on logistic and Lasso regression

Variables in Table 2 were assessed through univariate and multivariate logistic regression analyses, which identified a total of 11 predictors, namely hair whorls, epicanthus, adherent ear lobes, high palate, furrowed tongue, hyperconvex fingernails, a large gap between first and second toes, skull height, nasal width, mouth width, and palate width as independent predictors of schizophrenia. These predictive factors were incorporated into a nomogram (Fig. 1). The Lasso regression analyses selected 12 variables to construct predictive nomogram, including hair whorls, furrowed tongue, skull height, interpupillary distance, mouth width, palate width, high palate, epicanthus, adherent ear lobes, soft ears, hyperconvex fingernails, and large gap between first and second toes (Fig. 2).

Table 2 Results of a multivariable logistic regression model on MPA of schizophrenia.
Fig. 1: Nomogram of logistic regression analysis.
figure 1

A 11-variable nomogram established by logistic regression for predicting the risk of schizophrenia using minor physical anomalies.

Fig. 2: Nomogram of Lasso regression analysis.
figure 2

A 12-variable clinical nomogram established by Lasso regression for predicting the risk of schizophrenia using minor physical anomalies.

To use the nomogram, the first variable result, i.e., a patient’s hair whorls, is located on the relevant axis. Next, a straight line is drawn upward from the value of the result to the point axis on the top of the nomogram to determine the points received based on hair whorls. This process is then repeated for every variable, and total points are calculated by adding all the points obtained from every variable. The final sum is located on the total points axis, and a straight line is drawn downward from there to obtain the risk of schizophrenia.

The logistic regression and Lasso regression models were implemented as an online risk prediction nomogram to calculate the expected risk of schizophrenia using qualitative and quantitative MPAs. This web-based tool is freely available at https://www.szprediction.net/, with all major browsers supported (Google Chrome, Internet Explorer, Mozilla Firefox, Safari, etc.). It contains the most useful items of MPAs to help clinical practice easily predict schizophrenia on an objective basis.

Predicting the performance of the models

We have collected qualitative and quantitative MPAs data from the patients with schizophrenia and healthy controls and further divided these subjects into a training set and validation set to establish the nomogram predicting schizophrenia and executing the validation for the model.

In the logistic training set, the receiver operating characteristic (ROC) showed that the resulting model had quite good discrimination, with an area under the curve (AUC) of 0.85 (Fig. 3a). The logistic model also displayed good discrimination in the validation set, with the 0.84 AUC, 80.5% accuracy, 80.7% sensitivity, and 80.2% specificity (Fig. 3b). Of the 11 logistic regression model variables, 10 were included in the Lasso regression result, thus consolidating the model. The training set had an AUC of 0.85 and the validation set also showed good discrimination with the 0.85 AUC, 80.5% accuracy, 80.7% sensitivity, and 80.2% specificity (Fig. 3c, d). Furthermore, calibration plots of models were used to provide better information about the models, graphically showing good agreement between the predicted and observed data in the training and validation cohorts (Fig. 3e–h).

Fig. 3: Receiver operating characteristic (ROC) curves and calibration curves for evaluating the discrimination performance of logistic regression and Lasso regression models in both training and validation cohorts.
figure 3

ad ROC curves showing the capabilities of logistic regression and Lasso regression methods in predicting schizophrenia risk. eh Calibration plot with a binary fringe plot of logistic regression and Lasso regression models.

The sex, weight, and BMI showed significant differences between schizophrenia patients and healthy controls in the training set (Table 1). We also have tried to establish the MPAs nomogram with sex and BMI (Supplementary Figs. 1 and 2) in the logistic regression model (the AUCs of the training set and validation set were 0.86 and 0.84, respectively) and in the Lasso regression model (the AUCs of the training set and validation set were 0.87 and 0.84, respectively) (Supplementary Fig. 3). The above results were similar to the results of the model without sex and BMI. In addition, we stratified the male and female subjects in the training set to perform the ROC curve analysis by both logistic regression analysis and lasso regression analysis, respectively. We further use the validation dataset to analyze the performance of models by ROC curve analysis. In male-based models, the AUC was 0.81 and 0.89 in the logistic and lasso regression model, respectively (Supplementary Fig. 4). In female-based models, the AUC was 0.87 and 0.91 in the logistic and lasso regression model, respectively (Supplementary Fig. 4).

Discussion

To the best of our knowledge, this study on the measurement of MPAs in patients with schizophrenia enrolled the largest number of participants of Chinese origin. The reported prediction models predict the risk of schizophrenia by utilizing a nomogram, which was established through logistic or Lasso regression and is readily available. In addition, we measured qualitative and quantitative individual items across six regions of the body to investigate the items that had the greatest predictive power. The most important contribution of this study is that it provides a web-based prediction tool for clinical use. This tool has been tested on hundreds of cases to verify the reliability of the models.

Studies have indicated that MPAs are a reliable marker for distinguishing patients with schizophrenia from healthy controls7. However, the items that are most useful among MPAs and should be used to predict schizophrenia remain unclear. Eleven items of MPAs were included as predictors of schizophrenia based on the logistic regression model, and 12 items were adopted in the Lasso regression model. The Lasso regression model included all the items of the logistic regression model except for nasal width. Both the logistic and the lasso regression models achieved good performance in the training and validation sets with high AUC values, suggesting the reliability of both the models for the classification of schizophrenia. Our improved models achieved higher predictive efficacy for schizophrenia than those reported in previous studies, which correctly classified 81.3 and 84.5% of cases13,23.

There were many studies that used logistic regression or lasso regression to build nomograms30,31,32. Several studies also used logistic regression to determine which variables of MPAs can discriminant schizophrenia patients from healthy controls10,33,34. Lasso regression is a shrinkage and variable selection method for regression models to avoid overfitting the data in the analysis and overestimation of how well the model performs35,36. Since lasso regression has the advantages that can reducing the complexity of high dimensional data, we used this method to confirm the stability of the model. In addition, these two statistical methods do not remove the need to validate a model in an external dataset. Therefore, the models of this study should undertake more validation in the future. In the study, we considered that logistic and lasso regression models are suitable for selecting MPA variables to discriminate schizophrenia patients from healthy controls, so the two methods were used to build the nomogram models.

Studies have shown that hair variations are related to brain anomalies because they have a common developmental origin22,25. Our results showed that people with ≥2 hair whorls are more likely to be patients, and this is in accordance with the results of a study that measured MPAs in a cohort with schizophrenia within the Serbian population; this study reported that approximately 65.1% of patients with schizophrenia had ≥2 hair whorls25. Our results showed that the epicanthus is part of the proposed checklist and this has been proven significant in several previous studies13,23,24,37,38. However, the fact that the epicanthus is more common in the Asian population suggests that MPAs differ with ethnic differences between Asian and other populations24,37. The study participants were homogeneous (all Chinese) and therefore the results do not apply to other ancestral populations. Adherent ear lobes were significant in our study and in two previous studies23,24. Moreover, this qualitative feature is unlikely to be influenced by age5. The palate is believed to be a typical structure in terms of schizophrenia-related MPAs, which originates at 6–9 weeks of pregnancy and continues to develop until 16–17 weeks. Palatal differences in patients with schizophrenia are noteworthy findings. The formation of the primary palate involves a series of local cellular changes during the craniofacial growth period39. Our results revealed that patients with schizophrenia have a significantly higher and narrower palate. This finding is consistent with those of most studies3,4,21,40, but failed to agree with those presented in a paper published in 2016; that study reported that patients with schizophrenia have a significantly wider palate26. The inclusion of the high palate and furrowed tongue in both the models shows that our results are consistent with the findings that palate and tongue anomalies represent the highest prevalence of MPAs in the mouth region of patients with schizophrenia8,25,41,42. Our results revealed that patients with schizophrenia have a greater skull height and nasal width, and smaller mouth width and palate width.

Moreover, the result of a previous study showed schizophrenia patients were significantly more likely to have MPA than normal controls in both genders, but the difference was tended to be more pronounced in males38. This may be related to the fact that the male fetus is more vulnerable to injury during prenatal development43. Previous studies also pointed out that males have lower resistance in the processes of brain development44,45. However, considering the influence of gender on the risk models in this study. We added sex weight and BMI into the models, and the results were similar to the results of the model without sex and BMI. We do not exclude that gender may have an impact on risk prediction, but in this study, gender seems to have little effect on the prediction accuracy of our models. In addition, we also selected MPA variables for the risk of schizophrenia according to different genders. The results showed that high palate and palate width were included in both male and female logistic and lasso regression models, which may imply that the characteristics of the palate were suitable for both men and women to assess the risk of schizophrenia.

This study has the following limitations. First, we performed linear measurements, and this may not be able to completely evaluate three-dimensional structures, for example, that of the palate. Second, we did not assess the subtypes of schizophrenia because schizophrenia has many different subtypes and these may be associated with different symptoms or require different treatments. Whether a specific subtype of schizophrenia has certain MPA features still needs to be determined. Third, our participants were all of Taiwanese Han Chinese origin, and patients of different ethnicities are required to further validate our prediction model. Fourth, we only investigated the relationship between MPAs and schizophrenia in this study; moreover, the relationships between MPAs and other psychiatric disorders need more research in the future.

In conclusion, this study on the measurement of MPAs in patients with schizophrenia had the largest number of participants of Chinese origin. Our findings support the hypothesis of abnormal neurodevelopment in schizophrenia and may provide more understanding of the pathophysiology of schizophrenia. Furthermore, this is the study using MPAs to establish a pioneering web-based tool based on validated risk models of schizophrenia for clinical practice or research purposes. Our easy-to-use nomograms were not intended to replace standard diagnostic methods, but to help researchers or clinicians better assess the risk of schizophrenia. Additional replications of the topic are warranted to test the reported outcomes.

Methods

Study subjects

The present study was conducted using data taken from 463 patients of Taiwanese Han Chinese origin with schizophrenia who were recruited from five medical institutions in southern Taiwan: Chi Mei Medical Center, Jianan Mental Hospital, Lok An Hospital, National Cheng Kung University Hospital, and National Cheng Kung University Hospital Dou-Liou Branch. All patients were diagnosed as having schizophrenia by professional psychiatrists of the participating hospitals based on the Diagnostic and Statistical Manual of Mental Disorders, DSM-IV-TR (2011–2018) or DSM-5 (2018–2021). For comparison, 281 people without a history of psychiatric disorders were recruited from the hospital staff and community and allocated to the healthy control group. The data collected included the participants’ baseline descriptions and their measured MPAs, which could be utilized as indicators of underlying disease susceptibility. The study excluded subjects with histories of illegal substance or alcohol abuse, identifiable neurological disorders, clinical mental retardation, somatic disorders with neurological components, or those whose parents were not Han Chinese. The study design and recruitment procedures received ethical approval from the institutional review boards (IRBs) of the participating hospitals (IRB numbers: 10102-006, 11-011, B-BR-103-036-T, 10301-002, 10612-011, B-BR-106-088, B-BR-108-094, and 10901-006). Written informed consent was obtained from the participants.

Measurements of MPAs and craniofacial features

The qualitative and quantitative assessments of MPAs have performed the Waldrop scale and other scales based on resources34. Using the combined standardized MPA scale in six body regions: head, ears, eyes, mouth, hands, and feet; 41 qualitative assessments and 27 quantitative measurements were performed.

We developed qualitative MPAs using a manual scoring system, with 16 qualitative measurements and 23 new items based on the Waldrop scale46 and the study by Ismail et al.1. We further added two measurement items, strabismus, and cuspidal ear, into the qualitative assessment of MPAs based on a Japanese study37. In the qualitative assessment, 33 items acted as binary variants. We assigned normal present variants a score of 0 and absent or other variants a score of 1. The other 8 qualitative assessment items were scored using an ordinal scale (0–2) to represent the magnitude of the anomaly. MPAs were assessed separately on the right and left sides in 28 bilateral anatomical sites. A total qualitative MPA score was calculated by adding all the scores from each region, with a range between 0 and 83.

The quantitative MPAs consisted of 27 items from the scales of Lane et al.4, McGrath et al.11, and Elizarraras-Rivas et al.47, which included separate right- and left-side measurements taken at 11 bilateral anatomical sites. It also included two items (head circumference and canthal distance) from the Waldrop scale, and we further divided canthal distance into inner and outer canthal distances. The quantitative measurements were conducted using standardized methods with the two most-cited books on anthropometric measurements used as references48,49.

Two well-trained research assistants collected all qualitative and craniofacial measurement data from 20 healthy subjects. The range of inter-rater reliability for the qualitative items was 0.95–1.00, whereas the range of intraclass correlation coefficients for the quantitative items was 0.70–0.96.

Statistical analysis

Demographic parameters including sex, education level, weight, height, and body mass index (BMI) were evaluated. Continuous variables are expressed as the mean and standard deviation, and categorical variables are presented as frequencies and proportions. Between‐group comparisons, Student’s two-sided t-test was used for continuous variables and Pearson’s chi-square test was used for categorical variables. The nomogram was constructed using the following modeling process. We first assigned the patients of three hospitals (Jianan Mental Hospital, Lok An Hospital, and National Cheng Kung University Hospital) to a training set and the patients of the other two hospitals (Chi Mei Medical Center and National Taiwan University Hospital Yun-Lin Branch) to a validation set. A total of 320 patients with schizophrenia were assigned to the training set and 143 were assigned to the validation set. To match the ratio of cases and controls, we randomly assigned two-thirds of the control group (n = 180) to the training set and one-third of the control group (n = 101) to the validation set.

In order to simplify the complex scoring, we combine the left and right features of the qualitative items. As long as there were variants on the left or right sides, the score would be assigned as 1. The measurements were averaged on the left and right sides of the quantitative items. Logistic regression analyses were applied to present significant variables. A total of 68 variables were first analyzed by univariate analysis, then there were 24 variables that got a p-value < 0.05. We further analyzed these 24 variables by multivariate analysis. Finally, based on the results of the multivariate model, there were 11 predictors were got a p-value < 0.05 and were used to construct nomograms for predicting the risk of schizophrenia. Lasso regression was used to select variables as a double-check for logistic regression results. Lasso regression can force certain coefficients to be set to zero, effectively selecting a simpler model that does not contain those coefficients. The optimal value for the tuning parameter λ within one standard error of the minimum was determined with cross-validation (Supplementary Fig. 5). A total of 68 variables were first analyzed by Lasso regression, then 16 variables with p-value < 0.05 were included in the multivariate analysis. Finally, 12 predictors with a p-value < 0.05 were used to construct prediction nomograms of schizophrenia.

The prediction model formed from the training set was applied to the validation set to validate and evaluate the prediction efficacy. We plotted the ROC curve and calculated the AUC to verify the discrimination performance in both the training and validation sets. The AUC represents the model’s ability to discriminate between patients with schizophrenia and those without. AUC must be between 0 and 1. An AUC of 1 denotes that both sensitivity and specificity are 100%, indicating perfect concordance, whereas an AUC of 0.5 indicates a result equal to chance. The nomograms were further evaluated to determine sensitivity and specificity. High sensitivity and specificity values confirm that the ability of the nomogram to accurately identify schizophrenia is good. A calibration plot with a binary fringe plot was generated for the accuracy of the nomogram, which compared predicted probabilities with the observed incidence of schizophrenia. If the smoother lies close to the diagonal, the model is well-calibrated. The butterfly fringe plot is shown in a panel below the predicted probabilities. It indicates the counts of the responses by using lines of various lengths. Lines that point downward indicate the number of counts for Y = 0, whereas lines that point upward indicate the number of counts for Y = 1. All analyses were performed using SAS 9.4 (SAS Institute Inc., Cary, NC, USA) and R Studio (version 1.2.1335, R Foundation, Vienna, Austria).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.