A Raman algorithm to estimate human age from protein structural variations in autopsy skin samples: a protein biological clock

The recent increase of the number of unidentified cadavers has become a serious problem throughout the world. As a simple and objective method for age estimation, we attempted to utilize Raman spectrometry for forensic identification. Raman spectroscopy is an optical-based vibrational spectroscopic technique that provides detailed information regarding a sample’s molecular composition and structures. Building upon our previous proof-of-concept study, we measured the Raman spectra of abdominal skin samples from 132 autopsy cases and the protein-folding intensity ratio, RPF, defined as the ratio between the Raman signals from a random coil an α-helix. There was a strong negative correlation between age and RPF with a Pearson correlation coefficient of r = 0.878. Four models, based on linear (RPF), squared (RPF2), sex, and RPF by sex interaction terms, were examined. The results of cross validation suggested that the second model including linear and squared terms was the best model with the lowest root mean squared error (11.3 years of age) and the highest coefficient of determination (0.743). Our results indicate that the there was a high correlation between the age and RPF and the Raman biological clock of protein folding can be used as a simple and objective forensic age estimation method for unidentified cadavers.

and can be used to estimate the age of a suspect based on blood remains. Usable on a wide spectrum of different tissues, the median error in age prediction of this method is 3.6 y, with a Pearson correlation coefficient, r = 0.96, to chronological age 21 . The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, with this coefficient being either + 1 or − 1 depending on whether the relationship is positive or negative, respectively. Horvath's clock leads to a remarkably precise age estimation as compared to other proposed methods (e.g., telomere length, p16INK4a, or microsatellite mutations) [23][24][25] , which show distinctly lower correlation coefficients and only relate to specific types of cells rather than directly to the chronological age. The epigenetic clock was also found to relate to the body mass index (BMI) with a good correlation, r = 0.42, for analyses conducted on liver tissue 21 . Although Horvath's method proved highly objective and has remarkable accuracy among the various age estimation methods, special technical skills, expensive reagents and equipment are still required as well as time and effort.
Looking for a different approach, we examined the possibility of using Raman spectroscopy of human skin in an attempt to establish a reliable and easily accessible method of age estimation in forensic medicine. Our spectroscopic method relies on the collection of incisional skin samples during forensic autopsy, and spectroscopic examination of the structural characteristics of both the proteins and lipids contained in the dermis (including epidermis, dermis, and hypodermis). Aging of the structures of skin has been deeply studied by employing a number of different analytical techniques 26,27 . Our previously published proof-of-concept study attempted to label all the skin-emitted Raman bands and to locate any biophysical link between vibrational fingerprints and specific structural variations associated with human age 28 . The existence of such links was indeed phenomenologically proven for protein structures; however, it is hindered by insufficient statistics. The final proof for our spectroscopic findings, thus, relies on the statistical validations.
The main purpose here is a statistical validation of the presence of Raman spectral markers for precise age identification from human skin samples. In doing so, we used the same spectral deconvolution algorithm previously proposed 28 . In that previous study, we investigated the Raman spectra of various skin samples to find a candidate of ideal structural change for age estimation with a small number of cases. Accordingly, we found that protein folding may be a good potential "biological clock". In this study, upon increasing the number of autopsy samples (i.e., n = 132) and, concurrently, taking care that a wide age range was included, we attempted to assess the statistical exactness of age estimation by Raman spectrometry. The statistical validation presented here could be attractive for the forensic community because the Raman method is label-free and directly links vibrational features at the molecular scale to practical forensic purposes. Figure 1 gives schematic drafts of the different secondary structures of proteins detected by Raman spectroscopy. Figure 1 also includes drafts of the specific vibrational modes that were monitored in this study, as well as the Raman spectroscopic links in terms of the vibrational frequencies of different structural assemblies. The structure of skin upon aging has been thoroughly studied with a number of different analytical techniques 26,27 . Unfortunately, explicit biophysical links are still missing between the observed vibrational fingerprints and the specific structural variations associated with human age. With reference to the relative intensities of specific Raman scatterings from different protein mesostructures (α-helix structure and disordered random coil protein structures as seen in Fig. 1), we defined the protein-folding intensity ratio (R PF ). The R PF represents the ratio of Raman intensities related to the C=O stretching mode (i.e., in the overall Amide I vibrational spectrum depicted in Table S2) in disordered protein structures (i.e., the random coil Raman signal seen at 1681 cm −1 ) to α-helix structure (i.e., the Raman signal at 1652 cm −1 ) ( Fig. 1) [29][30][31][32][33][34][35][36][37] .
Studies 29,30 have reported that proteins irreversibly fold upon aging. Accordingly, in our previous study 28 , the Raman features representing those folding processes were located as parameters sensing biological age, although their age-sensing precision was yet unknown to us.
The aim of this study was to statistically validate the Raman spectroscopic parameters of protein folding in human skin as age-predicting parameters. The Raman spectra of abdominal skin sample were measured in 132 autopsy cases and the protein-folding intensity ratio (R PF ) was calculated. And four models, based on linear (R PF ), squared (R PF 2 ), sex, and R PF by sex interaction terms, were examined.

Results
The data from a total of 132 subjects (92 males and 40 females) were analyzed. Table 1 shows the summary statistics of age and R PF . The R PF of our samples ranged from 0.16 to 0.90. The mean years of age and R PF were 53.4 and 0.41, respectively. On the other hand, the median years of age and R PF were 53.5 and 0.39, respectively. The estimated regression coefficients for four models were shown in Table 2. The results of cross validation suggested that Model 2 including linear and squared terms was the best model with the lowest root mean squared error (11.3 years of age) and the highest coefficient of determination (0.743). Both estimates for R PF and R PF 2 were statistically significant at the 5% significance level in Model 2. There was a strong negative correlation between age and R PF with a Pearson correlation coefficient of r = − 0.878. Complete plots for all age intervals are given in Fig. 2 for the R PF . Figure 2 also shows the predicted values of age and the 95% prediction intervals. Table 3 shows the 95%, 80%, and 50% prediction intervals. For instance, if the R PF is 0.3, the 95%, 80%, and 50% prediction intervals are 46.7-91.3 years of age, 54.5-83.5 years of age and 61.4-76.6 years of age, respectively. If the R PF would be 0.5, the 95%, 80%, and 50% prediction intervals are 15.9-60.6 years of age, 23.7-52.8 years of age and 30.6-45.9 years of age, respectively. There was no statistically significant correlation between the R PF and BMI. There was no statistically significant difference between the male and female samples. Moreover, the specific cause of death was not seen in the outliers.  Figure 2. Prediction curve using Model 2 (R PF ). Complete plots for all age intervals are given for the proteinfolding intensity ratio (R PF ) and the 95% prediction intervals are also shown. There was a strong negative correlation between age and the R PF with a Pearson correlation coefficient of r = 0.878.

Discussion
To quickly identify unidentified cadavers, age estimation methods that can be used by non-experts at the places where bodies are found must be developed Recently, biomarkers, such as prostate specific antigen and thyroid hormones, skin imaging, and ultrasonography have been reported as alternative age estimation methods to the conventional subjective methods [38][39][40][41][42] . Skin imaging and ultrasonography are particularly elementary and can be used by non-experts on the scene 38,41 . Unfortunately, their estimation accuracy is relatively low. In this study, a large number of analytical data showed that R PF , related to the folding in peptides and proteins, was a reliable marker for the estimation of human age. The reason for the observed precision of the age-predictor, R PF , might reside in the fact that protein oxidation and misfolding mainly affect the frequency shift of the subbands used for age assessment, while they minimally alter the Raman intensity. Although the outputs of this study were validated with a statistically analyzable number of patients and the shown equations are presently used at the Kyoto Prefectural University of Medicine for estimating the age of unidentifiable bodies, our presented method still requires a substantial amount of time and skill.
Recently, hand-held Raman spectrometric devices were developed 43 . This device is now available in each prefectural police department in Japan for drug detection at crime scenes and requires less than a minute for detection, no expensive reagents, and technical skills. Considering our results, this equipment appears a promising choice for age estimation of unidentified cadavers in instantaneous assessments at the crime scene by simply detecting R PF related bands.
There are certain limitations in our study. First, although there was no statistically significant difference between the male and female samples, this may be related to our sample size. Therefore, we need to continue increasing the sample number. Second, although the specific causes of death were not related to the outliers, we could not perform a statistical analysis due to the diverse causes of death. Third, the confidence intervals of age prediction were relatively wide especially in young people. Many methods that use growing and development characteristics are reported as reliable for individuals under 20 years of age 44,45 . Therefore, we consider that age estimation should be done by the combination of many methods to obtain better accuracy and avoid erroneous judgments. Finally, our sampling was restricted to Japanese donors living  Table S1 for the selected age intervals (shown in inset). Spectroscopic related to the proteins are shown. In the protein spectral zone, the labels α and d refer to α-helix and disordered structures, respectively. The vibrational origins of all labeled bands are listed in Table S2  www.nature.com/scientificreports/ in the same area and climate; nevertheless the results are evocative and must next be confirmed with future experiments regarding whether (or to what extent) the Raman biological clock of protein folding is common to all human beings.

Materials.
A total of 132 cadaveric skin samples were obtained from forensic autopsies for cadavers at different ages spanning from a few months to 93 y old (Table S1 in the Supplementary Materials). Relatives of all the subjects gave their informed consent before inclusion in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Institutional Review Board of Kyoto Prefectural University of Medicine, Japan (ERB C-191). Samples were taken during the autopsies conducted within 48 h. Several samples for each patient were taken from the sub-umbilical region of abdominal skin in the body, which typically remains unexposed to solar irradiation. The sampling included only Japanese subjects. A list of the donors with information on their age, height and weight at death, sex is given in Table S1. In the statistical characterization, no distinction was made regarding the sampling; therefore, we included in the Raman analysis any available sample irrespective of the cause of death. The skin samples were 10 × 10 mm 2 , and encompassed the full thickness of the skin from the stratum corneum to the hypodermis.
Spectroscopy. The samples were stored in a humid and evacuated environment at approximately 0 °C until use. Raman spectroscopy was conducted within 5 days after the autopsy. The Raman excitation source in this experiments used a 532 nm Nd:YVO4 diode-pumped solid state laser operating with a power of 200 mW (SOC JUNO, Showa Optronics Co. Ltd., Tokyo, Japan). An objective lens with a numerical aperture of 0.5 was used both to focus the laser beam on the sample surface and to collect the scattered Raman light. A pinhole aperture of 100 μm was adopted employing an objective lens with a magnification of 100x. For each studied sample, several tens of spectra were collected from the epidermis to dermis (100-600 µm depth from the skin surface) in the cross-section and we obtained an average spectrum whose sub-bands were best-fitted to Gaussian-Lorentzian bells 46 . Table S2 shows a complete description of the sub-bands labeled in Fig. 3 for the protein spectral zone, which includes the frequency range at which these specific sub-bands were located and their vibrational origins.
To have a reliable sampling of the bands per each donor, a number of Raman spectra in the order of > 100 needed to be averaged. Although the spectral acquisition procedure could be automatized, scanning of wide spectral range through the dermis to epidermis resulted in time consuming (about 20 h of measurement per donor).
Statistics. We evaluated the correlation between age or the BMI and the R PF using the Pearson correlation coefficient. To build a prediction model, we performed regression analysis with age as the dependent variable for four candidate models. The independent variables of those models were follows: R PF in Model 1; R PF and R PF