Urinary proteome signature of Renal Cysts and Diabetes syndrome in children

Renal Cysts and Diabetes Syndrome (RCAD) is an autosomal dominant disorder caused by mutations in the HNF1B gene encoding for the transcriptional factor hepatocyte nuclear factor-1B. RCAD is characterized as a multi-organ disease, with a broad spectrum of symptoms including kidney abnormalities (renal cysts, renal hypodysplasia, single kidney, horseshoe kidneys, hydronephrosis), early-onset diabetes mellitus, abnormal liver function, pancreatic hypoplasia and genital tract malformations. In the present study, using capillary electrophoresis coupled to mass spectrometry (CE-MS), we investigated the urinary proteome of a pediatric cohort of RCAD patients and different controls to identify peptide biomarkers and obtain further insights into the pathophysiology of this disorder. As a result, 146 peptides were found to be associated with RCAD in 22 pediatric patients when compared to 22 healthy age-matched controls. A classifier based on these peptides was generated and further tested on an independent cohort, clearly discriminating RCAD patients from different groups of controls. This study demonstrates that the urinary proteome of pediatric RCAD patients differs from autosomal dominant polycystic kidney disease (PKD1, PKD2), congenital nephrotic syndrome (NPHS1, NPHS2, NPHS4, NPHS9) as well as from chronic kidney disease conditions, suggesting differences between the pathophysiology behind these disorders.

The most prominent clinical feature in HNF1B-associated syndrome is the renal disease, usually characterized by renal cysts, renal dysplasia, solitary or horseshoe kidney, hydronephrosis, and hyperuricaemic nephropathy 11 . The renal abnormalities of HNF1B mutant carriers have also been related to the congenital anomalies of the kidney and the urinary tract (CAKUT) 11 . Moreover, recently, HNF1B-mutations have been associated in some patients to autosomal dominant tubulointerstitial kidney disease (ADTKD-HNF1B) 12 . In addition, several cases of unknown chronic kidney disease (CKD) have been reported both in children 13 and adults 14,15 . Tubular dysfunction manifesting by hypomagnesemia, hypocalciuria [16][17][18] , and hyperuricemia 14,19 has also been described. Extrarenal features comprise maturity-onset diabetes of the young, pancreatic hypoplasia, abnormal liver function, and genital tract malformations. The phenotype of HNF1B mutant carriers is indeed highly variable within and between families 20 . These observations led to the hypothesis that non-allelic factors, as well as stochastic variation in temporal HNF1B gene expression and environmental factors, could cause the strong intrafamilial variability of RCAD patients 3,21 .
Urinary proteomics is increasingly being employed in kidney disease research. Several studies have demonstrated that capillary electrophoresis coupled to mass spectrometry (CE-MS) enables the identification and validation of several biomarkers or peptide signatures classifying the diagnosis and prognosis of various kidney diseases [22][23][24][25] . In addition to their diagnostic and prognostic usefulness, proteomics derived biomarkers may advance the understanding of the molecular pathways involved in the pathogenesis of a specific disorder or condition.
In this study, we aimed to obtain more insights into the renal pathophysiology of the RCAD syndrome by applying a proteomic approach to investigate changes at urinary peptides level that can be used to characterize RCAD patients.
Results study setup and patient data. In total, 244 urine samples were included in this study: 44 samples were used for discovery and 200 samples were used as a validation set (Fig. 1A). In the discovery set, we included 22 RCAD patients and 22 healthy controls. Subsequently, we used a wider population comprising healthy patients (n = 20), RCAD patients (n = 24), autosomal dominant polycystic kidney disease (ADPKD) patients (n = 55), CKD patients (n = 55), and patients with nephrotic syndrome (n = 46) as a validation set. The urinary proteome data for all the samples were previously measured and derived from the Human Urinary Proteome Database 26-28  (22 healthy,22 RCAD) was analyzed leading to the identification of 146 sequenced urinary peptides that were modelled in a SVM classifier called RCAD146. In the next step, the validation phase, we studied the discriminatory ability of the panel RCAD146 panel in new RCAD patients (n = 24) and individuals with CKD or patients carrying monogenic mutations associated with different renal diseases. (B) Representation of the 146 urinary peptides significantly modified between RCAD and healthy controls. Normalized molecular mass (kDa) was plotted against normalized capillary electrophoresis (CE)-migration time (min). Mean signal intensity was given in 3-dimensional depiction. (C) Cross-validation score of the RCAD146 model from the analysis of the discovery cohort along with the definition of the cut-off 0.3 (dashed line). with the exception of all RCAD urine samples, which were analyzed by CE-MS for this specific study. The 46 RCAD patients were divided into discovery and validation set. The RCAD samples used in the discovery were matched with healthy controls based on age and gender. Furthermore, we divided both sets considering similar phenotypes. The clinical features of the used RCAD patients are described in Table 1 and Supplementary Table 1.
Identification of RCAD-related urinary peptides and development of a urinary peptide-based classifier. For the identification of significant urinary peptides related to the RCAD syndrome, we compared the urinary proteome profiles of 22 patients carrying HNF1B heterozygous mutations with 22 age-and gender-matched healthy controls (Table 2A). The statistical analysis was adjusted for multiple testing following the concept described by Benjamini and Hochberg 29 and defined in the clinical proteomics guidelines 30 . This led to the identification of 294 differentially excreted peptides (corrected p < 0.05) between these two groups. For 146 out of the 294, high-confidence sequence information could be assigned. Fragments of uromodulin (UMOD), protein unc-119 homolog A (UNC119), and mucin (MUC1), as well as a large number of collagen fragments, were identified. Moreover, peptides associated with calcium binding were also detected. Amongst them, peptides such as sarcalumenin (SRL), and annexin A1 (ANXA1) were downregulated. In contrast, peptide fragments such as gelsolin (GSN), short transient receptor potential channel 4-associated protein (TRPC4AP) and the direct target of HNF1B -osteopontin (SPP1) 31 were upregulated. All relevant details, encompassing the sequence information as well as the fold-change, are described in Supplementary Table 2. The difference in abundance of these 146 peptides between RCAD patients and healthy controls is shown in Fig. 1B and Table 3. These proteome plots are showing the mean abundance of the significant peptide-biomarker in urine of RCAD patients and healthy individuals. The 146 sequenced peptides were combined into a classifier termed "RCAD146" using a support vector machine (SVM), which was optimized for the classification of patients in the discovery cohort. Based on a cut-off score of 0.30 (Fig. 1C), the RCAD classifier discriminated RCAD from healthy controls with 90.9% sensitivity and 100% specificity and an AUC of 0.99 in the discovery cohort.
Validation of the RCAD146 classifier in an independent group. The RCAD146 classifier was validated in an independent group of samples (Table 2B), consisting of 24 RCAD and 20 healthy controls. The analysis revealed an AUC of 1.00 [0.92 to 1.00 (95% CI); p < 0.0001]. At the pre-defined cut-off level of 0.30 based on the discovery cohort, the classifier displayed a sensitivity of 91.67% and specificity of 100%. To obtain confirmation about how RCAD146 performed in differentiating the pediatric RCAD urinary proteome from other kidney diseases, we further selected a group of patients with chronic kidney disease (CKD) (n = 55). This control group of children was particularly interesting as (i) 40% of adults with HNF1B mutations develop CKD 14 , (ii) it represents a condition with severe chronic kidney damage and, (iii) confirms that the performance of the RCAD146 classifier is independent of proteinuria in RCAD patients. This analysis showed a specificity of 98.18% and an AUC of 0.987 [0.931 to 1.00 (95% CI); p < 0.0001] for the classification of children with CKD as non-RCAD. To further evaluate the specificity and validity of the pediatric RCAD urinary proteomic pattern, the classifier was When evaluating all data sets combined, the overall sensitivity and specificity were 91.67% and 94.32%, respectively. Furthermore, this ROC analysis including all datasets revealed an AUC of 0.975 [0.943 to 0.992 (95% CI); p < 0.0001] (Fig. 2B).

Discussion
This is the first study showing a unique proteome profile that distinguishes children with RCAD from healthy controls and patients suffering from different renal diseases.
The most prominent finding of the study was the identification of 294 differentially regulated peptides potentially related to RCAD syndrome, where sequence information was obtained for 146 peptides. Similar to a previous study on ADPKD and urinary peptides 32 , the majority of peptides enriched in the urine of RCAD patients were collagen type I or type III fragments. This may reflect active extracellular matrix (ECM) remodelling, which could be related to ECM modifications due to cyst expansion 33 . The abundance of collagen and osteopontin fragments in RCAD children displayed an opposite tendency to previous findings described in several studies for different kidney diseases 22,23,34 . Collagens are characterized as the most abundant urinary peptides as well as the main elements of the interstitial ECM, being involved in different biological functions as cell adhesion, tissue development and tensile strength 35,36 . Along these lines, osteopontin-derived peptides were also identified increased in RCAD patients, due to the involvement of osteopontin in the remodelling of the ECM 34 . Therefore, the increased abundance of collagen and osteopontin fragments in the RCAD urinary samples may reflect the cystic phenotype and the still non-fibrotic status of patients' kidneys, whereas tubulointerstitial fibrosis determines the peptide excretion pattern in CKD 37 . Additionally, an early decline in kidney function may be predicted by the decreased excretion of uromodulin (UMOD) fragments 38 , which has also been found reduced in cases with tubular atrophy and fibrosis 39 . Another peptide fragment deregulated in the present study was mucin 1 (MUC1), an extracellular protein expressed in the renal tubular epithelium. Recently, MUC1 was described to be a predictor of renal impairment 40 , along with its increase in mice and human was correlated with the development of fibrosis 41 . It is important to notice that mutations in UMOD, MUC1, and HNF1B are responsible for ADTKD, showing a strict correlation between these proteins and RCAD along with ADTKD phenotypes 42 . Considering the acknowledged role of HNF1B in regulating kidney transports and also calcium-sensing receptor CaSR 16,43 , it was interesting to notice that several peptides associated with calcium binding or calcium regulating properties might be changed in RCAD patients. The disruption of multiple calcium regulators may be one of the bases of the renal cysts formation as observed previously 44,45 . Furthermore, the protein unc-119 homolog A (UNC119) that plays a crucial role in the proper ciliary targeting of the cystic gene nephrocystin-3 46 , was decreased.
The RCAD146 classifier correctly identified most of CKD patients as non-RCAD. Since CKD is a rare condition in RCAD children 13 , it would be of interest testing a cohort of adult RCAD patients suffering from CKD, in order to investigate the performance of the RCAD146 classifier.
The other disorders used as disease controls in this study were biologically related (e.g. ADPKD) or non-related (e.g. nephrotic syndrome) to the RCAD syndrome. A group of patients affected by ADPKD appears relevant as they display phenotypic correlations with the RCAD syndrome. HNF1B was shown to regulate Pkd2 in the mouse 47 and mutations in HNF1B can mimic polycystic kidney disease especially in the prenatal setting and early childhood 48,49 . Notably, the RCAD146 classifier precisely discriminated RCAD from ADPKD.  When compared with monogenic mutations sharing a common nephrotic syndrome phenotype, the RCAD146 classifier also identified subjects carrying mutations in NPHS1, NPHS2, WT1 (NPHS4), and ADCK4 (NPHS9) as non-RCAD. This group of patients may confirm, as the CKD group, that RCAD urinary proteome is not reflecting just proteinuria.

Group of patients Mutation
Interestingly, in a parallel test, the urinary proteome of a two year old PAX2 mutant carrier was misrecognized by RCAD146 (data not shown). This observation suggests that the RCAD pediatric proteome could potentially be closer to patients with mutations in the gene encoding the transcription factor PAX2, known to cooperate with HNF1B in kidney morphogenesis and ureter differentiation 50 , than patients with either polycystic or nephrotic syndrome. Additional samples are required to further validate this common feature.  A limitation of the current study is that the recruited ADPKD patients were not children, but young adults. This is due to the difficulty in recruiting children with ADPKD because the average age at the diagnosis is 30 years old 51 . Another shortcoming is that there was no information available related to the respective albuminuria/proteinuria values of the ADPKD patients. Moreover, this study included a post hoc analysis, due to the selection of the diseased control population from previous studies 37, [52][53][54][55][56][57] . However, all the samples were analysed according to the same rules and identical conditions (sample preparation and proteomic platform). No discrepancy between the data of the measured RCAD samples and the stored data is to be expected, because the normalization procedure protects the data from aberration of the intensity of the peptide signals. Furthermore, we controlled all measurements with a urine standard sample to identify unforeseeable technical aspects over time 58 .
Overall, the study, performed in agreement with the guidelines of clinical proteomics, demonstrates a significant value of the urinary proteome analysis in the detection of RCAD highlighting some proteins that potentially participate in the development of cysts and that may be useful for early diagnosis.
The urinary peptide signature of pediatric RCAD patients is mainly characterized by the increase of collagen peptides (especially type I or type III fragments), and osteopontin, along with the decrease of uromodulin. Including the 146 peptides differentially excreted between RCAD and healthy patients in a diagnostic biomarker classifier, we demonstrated that RCAD pediatric urinary proteome is different from patients with Pkd1-2 and Nphs1-2-4-9 mutations, as well as from CKD patients. Future studies will be conducted to evaluate the performance of the RCAD146 panel in additional pediatric cohorts of disorders more related to RCAD such as autosomal recessive polycystic kidney disease (ARPKD), ADTKD or diabetic patients. Moreover, follow-up clinical data of the patients described in this study will be addressed to estimate the performance of this classifier to predict the progression of RCAD. These analyses together are expected to provide further insights into the pathophysiology and disease evolution of RCAD patients. Methods patient recruitment. RCAD urine samples were collected from three different clinical centres: Children's Hospital, CHU-Toulouse (France, n = 33), University Children Hospital, Heidelberg (Germany; n = 11), Clinical Research Center for Rare Diseases Aldo e Cele Daccò, Ranica (Italy, n = 2). RCAD patients' average age was 8.4 years. Furthermore, 56.5% of the RCAD patients had a normal renal function (estimated glomerular filtration rate (eGFR) > 90 ml/min/1.73 m 2 ). For patients under 20 years, baseline eGFR (mL/min/1.73 m2) was estimated using the creatinine-based "Bedside Schwartz" equation 59 . On the other hand, for patients over 20 years (e.g. ADPKD cohort), the CKD-EPI formula was used to calculate the eGFR values 60 . After collection, urine samples were stored at −20 °C and shipped frozen for subsequent proteome analysis. In addition, all non-RCAD samples were retrieved from the Human Urinary Proteome database [26][27][28] . This group of samples included healthy patients (n = 42), and patients suffering from kidney diseases and carrying different genetic mutations, such as: PKD1 (n = 46); PKD2 (n = 9), NPHS1 (n = 2), NPHS2 (n = 35), WT1 (n = 6), ADCK4 (n = 3). Additionally, a group of samples from a pediatric cohort with chronic kidney disease was tested with different etiologies, like focal segmental glomerulosclerosis, IgA nephropathy, membranous glomerulonephritis, mesangioproliferative glomerulonephritis, diabetic nephropathy, vasculitis, and Henoch-Schönlein purpura nephritis (n = 55). This wider group of negative controls (non-RCAD) presented an average age of 16.5 years. RCAD and healthy patients were selected by similar age and gender; CKD and nephrotic syndrome cohorts were age-matched and ADPKD patients were phenotypic-matched for the presence of cysts. Characteristics of all individuals included in this study are extended in Supplementary Table 1.
This study was designed and performed in compliance with all the regulations regarding the protection of subjects participating in medical research. Collection, storage and analysis of urine samples have been approved by the local ethics committees of the three participating centres (Comité de Protection des Personnes Sud-Ouest et Outre Mer III, Ethikkommission der Medizinischen Fakultät Heidelberg, and Comitato Etico di Bergamo respectively). All participating subjects or legal guardians of patients provided written informed consent to the use of urine samples. This study was performed in accordance with the Helsinki Declaration.

Urine sample preparation and Ce-Ms analysis. Urine sample collection and CE-MS analysis were
performed as reported previously 61,62 . Briefly, immediately before preparation, urine samples aliquots stored at −20 °C were thawed and 700 μl were diluted with the same volume of 2 M urea, 10 mM NH 4 OH comprising 0.02% SDS. Then, samples were filtered via a Centristat 20-kDa cut-off centrifugal filter device (Sartorius, Goettingen, Germany) at 2,600 g for one hour at 4 °C in order to remove high molecular weight compounds. The obtained filtrate was desalted using a PD-10 column (GE Healthcare, Sweden) equilibrated in 0.01% aqueous NH 4 OH to eliminate urea, electrolytes and salts. Finally, samples were lyophilized and stored at 4 °C prior to CE-MS analysis. The samples were re-suspended in 10 µL of HPLC-grade H 2 O shortly before CE-MS analysis, as described 62 . CE-MS analyses were accomplished using the P/ACE MDQ capillary electrophoresis system (Beckman Coulter, Fullerton, USA) online coupled to a MicroTOF MS (BrukerDaltonic, Bremen, Germany) 62 .
The electro-ionization sprayer (Agilent Technologies) was grounded, and the ion spray interface potential was defined between −4 and −4.5 kV. Spectra were accumulated every 3 s along with over a range of m/z to 350-3000. Detailed information on accuracy, precision, selectivity, sensitivity, reproducibility and stability of the CE-MS method have been described previously 62 .
Ce-Ms data processing. A proprietary software (MosaiquesVisu) was used to deconvolute mass spectral ion peaks demonstrating identical molecules at different charge states into single masses 63 . The achieved peak list allows the characterization of each polypeptide according to its CE-migration time (in minutes), molecular mass (in Daltons), and ion signal intensity. Subsequently, normalization of the amplitude of the urinary peptides was conducted on twenty-nine 'housekeeping' peptides (peptides varied slightly between samples, generally present in at least 90% of all urine), similarly to previous studies 64 . These 29 'housekeeping' peptides are commonly used for normalization in all studies. Furthermore, these peptides are consistently reported in urine and to date, they do not appear to be significantly associated with any diseases investigated 64 . All detected peptides were deposited, clustered, matched and annotated in a Microsoft SQL database [26][27][28] , allowing further statistical analysis. All normalized amplitudes of the analysed samples are included in Supplementary Table 3. peptide sequencing. Candidate peptides for the RCAD-classifier were identified and sequenced by the use of tandem mass spectrometry (MS/MS) analysis and searched against human entries in the UniProt database, as previously described 65,66 . Briefly, to acquire the sequence information, urine samples were separated on a Dionex Ultimate 3000 RSLC nano flow system (Dionex, Camberly, UK) or a Beckman CE systems (PACE MDQ) coupled to an Orbitrap Velos MS instrument (Thermo Fisher Scientific) 65,66 . Thereafter, data files were examined against the UniProt human non-redundant database using Proteome Discoverer 1.2 (Thermo) and the SEQUEST search engine. No fixed modifications were selected, hydroxylation of proline and lysine and oxidation of methionine were enabled as an optional modification, no enzyme specificity was specified in the settings 65 . The matching of the peptide sequence obtained by MS/MS analysis to the CE-MS peaks was based on molecular mass [Da] and theoretical migration time, calculated using the number of basic amino acids 67 . Peptides were accepted only if they had a mass deviation below ±5 ppm and <50 mDa for the fragment ions.
Peptide identification and statistical analysis. For the identification of potential HNF1B-related urinary peptide biomarkers, a comparison between RCAD cases and healthy controls was performed. Only peptides that were detected in at least 70% (frequency threshold) of the samples in at least one of the two groups were further considered for statistical analysis. Using the Wilcoxon rank-sum test followed by adjustment for multiple testing with the false-discovery rate method presented by Benjamini and Hochberg 29 , adjusted P-values were calculated based on the comparison between RCAD cases and healthy controls. Only peptides with a P-value less than 0.05 were considered as statistically significant. The RCAD146 classifier is developed as SVM classification model 68,69 , based on the amplitudes of the significant urinary peptides related to RCAD, which allows the calculation of specific classification scores. These classification scores were further used for statistical analysis, e.g. ROC curves. In more detail, the sensitivity and specificity assessed for the RCAD146 classifier were calculated based on the number of correctly classified subjects. The receiver operating characteristic (ROC) plots and the confidence intervals (95% CI) were based on exact binomial calculations. The area under the curve (AUC), and sensitivity and specificity values of the ROC of the classifier were determined using R-based statistical software (version 3.3.3) and confirmed using MedCalc version 12.7.5.0 (MedCalc Software bvba, Ostend, Belgium). Graphs related to ROC curves and Box-and-Whisker plot were generated with R-based statistic software (packages ggplot2, plotly).

Data Availability
The raw data generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.