Human coelomic fluid investigation: A MS-based analytical approach to prenatal screening

Coelomic fluid (CF) is the earliest dynamic and complex fluid of the gestational sac. CF contains maternal cells and proteins produced by embryonic cells, tissues and excretions. The biochemical composition of CF is modified throughout the first trimester of pregnancy and its protein profile reflects both physiological/pathological changes affecting the embryo and mother. Identification of variations in the balance of proteins might indicate particular types of pathologies, or ascertain specific genetic disorders. A platform utilizing protein enrichment procedures coupled with shotgun identification and iTRAQ differentiation provided the identification and quantitation of 88 unique embryonic proteins. It is relevant to note that chromosome X protein CXorf23 was found suggesting the embryo sex. Foetal sex was determined by Quantitative Fluorescent Polymerase Chain Reaction (QF-PCR) on coelomic cells, foetal tissues and maternal white blood cells, with a 100% concordance rate between iTRAQ-MS/MS and QF-PCR data. The functional associations among the identified proteins were investigated using STRING database. Open Targets Platform showed as significant the following therapeutic areas: nervous, respiratory, eye and head system disease.


Results
Coelomic fluid (CF) is a yellow viscous fluid with lower protein concentrations. Jauniaux et al. found that total protein content is 18 times lower in CF than in maternal serum, and 54 times higher in CF than in AF (Supporting information, Table 1S) 3 . Giambona and co-workers adopted a direct micromanipulator pickup of the embryo-foetal cells selected on the basis of their morphology 34,35 to remove maternal cell contamination and to obtain an early prenatal diagnosis of gene disorders. CF proteome reveals also maternal contamination, and the protein quantification has been reported in the updated literature data with no distinction between embryo and mother 3 . CF shares many proteins with maternal plasma, trophoblastic, and yolk sac 33 . Cindrova-Davies et al. 33 identified 165 proteins from CF using a gel electrophoresis liquid chromatography (GELC)-MS/MS approach. Serum and common circulating blood proteins respectively accounted for 30% and 10% of the total number of identified and categorized proteins. Therefore, depletion of these proteins is a prerequisite for the detection of the low-abundance components. The starting point to overcome the maternal contamination should also require the removal of IgG and IgA, belonging to the mother immune system since the foetus is immunologically not competent, and HSA, playing a nutritional role during the foetal growth. It is now well accepted that no one single proteomic workflow would come even close to identifying all major proteins in action 37 . The introduction of LC-MS/ MS or LC-MALDI MS/MS based shotgun proteomics approaches in conjunction with several pre-fractionation schemes has proven to be a valid and complementary alternative to 2-DE gel-based analysis 38,39 .
The aim was to establish a differential proteomic expression profile of CF via a shotgun proteomic workflow. The practicality in the rapid detection of low abundant classes of protein families was used to critically evaluate the pitfalls and strengths of the approach. Three different analytical strategies for embryonic protein enrichment from CF were designed wherein information with respect to the readiness of the protein entry being detected by any one approach were used as an indicator of the specificity. The first step was to establish an efficient methodology to perform, monitor, and compare different pre-fractionation schemes, for the development of a distinctive approach in which efforts were not directed primarily towards identifying markers, but rather in establishing a proteomic expression profile of CF. The protein profile of CF obtained by direct MALDI mass spectrometry in the linear mode showed the presence of multicharged species of HSA (Fig. 1) 4+ , respectively. IgG and serotransferrin gave peaks at m/z 52675, 98704, and 39493 respectively. Spectrum showed also the presence of other less intense peaks, nevertheless the attribution was not so simple at this level. Three different protocols were used for sample preparation and variations were "tried-and-tested" within each procedure (Scheme 1, Experimental Section). Protocol I involved a chemical fractionation of proteins carried out by a simple procedure based on their different physicochemical properties. The acid isoelectric point of HSA (IP 5.2) suggested the protein to be soluble under basic conditions. The experimental design was planned in order to obtain three fractions: the supernatant fraction (S) and two hydrosoluble fractions (basic H 1 and acidic H 2 ). All fractions Analyses showed the presence of albumin and similar protein profile in all fractions, indicating the non-specificity of the method. Finally, the three fractions were subjected to in-solution protein digestion and chromatographic fractionation, then analyzed by MALDI MS/MS. The Protein Pilot software allowed the identification of 104 proteins from fractions H 1 and H 2 (Supporting Information, Table 2S). Protocol II was based on depletion by home-made HTP spin column [40][41][42][43] . HTP is a crystalline form of calcium phosphate which is widely used in biochemistry because its specificity for fractionation and purification of monoclonal antibodies and proteins. HTP chromatography was performed by applying a salt and pH gradient (Fig. 6S, Protocol II) in order to obtain an efficient separation according to the different protein isoelectric points 44 . All steps were monitored by SDS-Page and mass spectrometry. MALDI and SDS-Page protein profile of HTP fractions revealed the ubiquitous presence of HSA (Supporting Information, Figs 3S and 4S), demonstrating the non-specificity of the method. In order to check all fractions by direct mass spectrometry, a novel MS-compatible experimental procedure based on the use of immunoaffinity devices was designed (Experimental session) 18 . MARS cartridge (III a ) can selectively remove high-abundant proteins from human serum, plasma, and cerebrospinal fluid, offering the opportunity to analyze up to 200 samples with no memory effect. The selective immunodepletion provides an enriched pool of low-abundant proteins for downstream proteomics analysis. PROT-BA device (III b ) is specific for albumin and IgG depletion from human serum (25-50 μL).The immunoaffinity medium in the prepacked spin column is a mixture of two beaded mediums containing recombinant expressed small single-chain antibody ligands, resulting in low non-specific binding and high capacity.
The depletion efficiency of protocols III a and III b was compared by linear MALDI mass spectrometry. MS/ MS analysis was used to further evaluate the most efficient setup to deliver the highest number of identified low-abundant proteins. Linear MALDI spectra of fractions collected from PROT-BA showed residual HSA (Supporting information, Fig. 5SA,B). MS/MS analysis of tryptic peptides from the PROT-BA depleted fraction allowed the identification of 48 proteins and several different isoforms (Supporting Information, Table 3S). MARS led to an excellent HSA depletion. Figure 2A shows the linear MALDI spectra of the depleted fraction. In this case, 95 proteins were identified by MS/MS analysis (Table 4S, Supporting Information). Therefore, MARS was reputed to be the best analytical device for CF protein quantitation.
Shotgun identification and iTRAQ differentiation of CF were carried out on twelve individual samples (DCF A-N ). To improve the sequence coverage and the number of identified peptides, a deglycosylation step was performed after the MARS protocol application. The iTRAQ-MS/MS analysis of DCF A-N revealed 88 differentially expressed proteins using an ion ratio ≥2 or ≤0.5. Only 49 UniProtKB validated sequences were evaluated (Table 1). At least three peptides were used to identify and quantify CF proteins. Table 1 lists iTRAQ-MARS differentiation data for individual samples A-C. All experiments were performed in triplicate.  Quantitative Fluorescent Polymerase Chain Reaction (QF-PCR). Evaluation of foetal sex. Embryo-foetal nucleated red cells in CF were identified by optical phase contrast microscopy. These are roundish cells with a diameter of 12-16 μm, high cytoplasmic nuclear ratio, and the nucleus polarized to one side of the cell near the wall. Coelomic cells isolated from CF were successfully analyzed to obtain information on foetal sex. Quantitative Fluorescent Polymerase Chain Reaction (QF-PCR) was used to evaluate foetal sex. Specific short tandem repeats (STR) of highly variable chromosomal markers (STR) located on chromosome X and Y chromosomes (AMXY, HPRT, SRY, DXS1187, DXS8377, DXS6803, DXS6809) were used to obtain information on foetal sex. Figures 3 and 4 display parts of electropherograms of QF-PCR polymorphic STR markers for samples from a female and a male foetus. Specific markers of X and Y chromosomes are showed. Patterns of DNA derived from coelomic cells, fetal tissues and maternal white blood cells are reported (Figs 3 and 4).

Discussion
Prenatal screening tests play an important role in the diagnosis for affected pregnancies. Pregnancy progression and birth involve foetal/maternal biochemical processes that depend on complex interactions at multiple levels. The balance among these interactions is disturbed at more than one level when a major problem arises. Proteins represent the functional complements of genes, therefore disorders, as well as changes in the number of gene copies and/or gene regulatory mechanisms, are reflected at the level of protein production and expression. The complex nature of biological fluids requires well established analytical methodologies for the enrichment of low abundance proteins or efficient sample fractionation steps before proteomics analysis. There are no currently preferred or standardized protocols to separate proteins from body fluid proteomes, and gel electrophoresis and chromatography are considered to be complementary. Characterization of CF proteome might be a key point for an innovative prenatal screening. CF is quite different in comparison to the other biological fluids which are usually adopted for clinical screening (plasma, serum, urine). The protein content of CF is profusely due to the maternal contamination, that include serum albumin and immunoglobulins in high concentrations (1.7 g/L and 35 mg/L, respectively) 3 . Therefore, immunodepletion avoids the masking effect of high abundance proteins for obtaining information about the whole CF proteome. Protocols I and II were planned taking into account the physico-chemical properties of HSA. Protocol I led to the identification of 104 proteins, distributed between acidic and basic proteins. The most part of proteins were identified from only one or three peptides, and the method showed a low specificity towards HSA. Moreover, several isoforms and thirty peptides of HSA were also detected, confirming the masking effect exerted by high abundance proteins. In protocol II, based on the use of HTP resin, a series of gradients were applied in order to achieve an efficient protein separation. The elution of basic and acidic proteins required the use of KCl, TRIS and EDTA solutions which caused difficult in monitoring all fractions by MALDI mass spectrometry. The fractions collected from the column were pooled according to the elution profile and analyzed by SDS-PAGE. The last fractions eluted by water were checked by mass spectrometry. Spectra revealed the HSA isolation to be unsuccessful, confirming the non-specificity of the method (Figs 3S and 4S, Supporting Information). The other two protocols (III a and III b ), based on the use of immunoaffinity devices, yielded the best results. Notwithstanding, the PROT-BA method was characterized by low recovery of the total protein content, and the nature as mono use device limited its practicality. Under the adopted experimental conditions, the non-complete depletion of HSA was observed and only 48 proteins were identified ( Fig. 5S and Table 3S, Supporting Information). HSA removal was strongly improved by the MARS protocol and this method provided the identification of 94 proteins. The reusability of the same device without memory effects was considered striking. The MARS protocol was then adopted for the quantitative analysis. The use of the deglycosylating enzyme PNGase F was a ploy to improve the number of detectable peptides for protein sequence coverage. Figure 2B shows several little mass shifts related to sugar removal from proteins. The CF develops during the 4th week of gestation, and it can be aspirated starting from the 5th week making coelocentesis the earliest possible method of prenatal diagnosis. It is reasonable to consider coelocentesis to be a source of foetal progenitor and stem cells, and CF to be associated to the foetal system 45      reported to be correlated to pregnancy-related complications, such as pre-eclampsia and preterm birth 47 . Since the analyzed samples were from patients with no karyotype abnormalities, at the moment it is not possible to establish strict relationships with diseases. However, it is important to note that chromosome X protein CXorf23 (Uncharacterized protein CXorf23, CX023_HUMAN; Table 1, lane 45) was found over-expressed in samplesA 116 and C 117 , suggesting an embryo female sex. The same protein was found down regulated in sample B 115 . Genomic DNA analysis from corionic villus and maternal blood confirmed the sex of foetus; in particular, female for A 116 and C 117 , and male for B 115 , proving that CF can certainly be used in prenatal screening. The proteomic approach used here led to a differential identification of the catalogue of CF proteins. These data can further be used for bioinformatics elaboration. The biological associations among the identified proteins were investigated using the STRING database. The predicted protein-protein associations were queried through a vast number of databases derived in different ways (e.g., experimentally determined interactions, protein neighborhood data, or data acquired via text mining) 48 . As shown in Fig. 5 three main networks of cellular components (GO) were identified: proteasome core complex (GO:0019773, blu), blood micro particle (GO:0072562, red), and extracellular region (GO:0005576, green). For the 49 differentially expressed proteins, functional enrichment analysis showed two major pathways in INTERPRO Protein domains and features networks: proteasome, subunit alpha/beta (IPR001353), nucleophile aminohydrolases, N-terminal (IPR029055) (Table 5S, Supporting Information). It is not surprising to observe the relevance of proteasome in GO and INTERPRO networks because proteasome constitutes the central proteolytic machinery of the highly conserved ubiquitin/proteasome system, the major cellular tool for extralysosomal protein degradation. Proteasome can play opposite roles in the regulation of cell proliferation and apoptosis, these roles are apparently defined by the cell environment and proliferative state. During early embryogenesis, proteasomes perform proteolytic functions and also stored as a maternal supply of proteasomes for the developing embryo. Changes in proteasome distribution during fertilization and further stages of development, could be associated to the replacement of maternal proteasome by proteasomes expressed by the embryo itself 49 . Furthermore, we adopted Open Targets (http://www.opentargets.org) as a tool for the verification of the congruency of results. Open Targets is a public-private partnership to establish an informatics platform, the Target Validation Platform. The Open Targets Platform (OTP) is a comprehensive and robust data integration for access to and visualisation of potential drug targets associated with diseases 50 . A drug target can be a protein, protein complex or RNA molecule, and it is displayed by its gene name according to the Human Gene Nomenclature Committee. OTP links multiple data types and assists users in identifying and prioritizing targets, in our case proteins, for further investigation. We checked a list of 49 targets (Table 1) and the OTP output gave back a summary report, in which therapeutic areas of interest were sorted by relevance to our list. Part of OTP output is reported in Fig. 6, where panel A displays the summary page for 49 targets together with the corresponding therapeutic areas. Some areas did not show correlations with the targets, for example "neoplasm" (p-value 0.2), "liver disease" (p-value 0.05) or "metabolic disease" (p-value 0.06). Nevertheless, other areas showed good correlations with the submitted proteins, for example "nervous system disease" (p-value 0.0004), "respiratory system disease" (p-value 0.00002), "eye disease" (p-value 0.0007) and "head disease" (p-value 0.0004). It is not surprising to observe the relevance of targets related to nervous system disease, since humans have considerably more prenatal maturation of their nervous systems.
Panel B in Fig. 6 displays the summary report obtained for 38 targets related to "nervous system disease". Colour variations are directly connected with scores. Values of 0.5 and 1 indicate that genes are weakly or strongly involved in diseases, respectively. Several proteins involved in neuronal diseases (Fig. 6, Panel B) were detected, although they can indicate a normal state of the central nervous system (CNS) embryonic evolution. In the first month of gestation specific areas of the CNS begin to form following a sequence of developmental processes including proliferation, migration, differentiation, synaptogenesis, apoptosis, and myelination 51 . In particular, cell precursors of brain and spinal cord in humans start to develop early in embryogenesis, approximately two weeks of gestation, through the process called neurulation. Ending the third week of gestation, the neural folds begin to move together and fuse forming the neural tube, leading to a complete neural tube formation approximately from gestation day 26 to 28. Interruption of neural development during this early period can result in severe abnormalities of the brain and spinal cord.
A quantitative shotgun proteomics analysis strategy was successfully used for the identification and differentiation of embryo proteins from CF. Several putative AF/maternal serum markers were identified. The proposed experimental approach furnished a powerful tool for achieving deeper insight into the CF protein composition in early stage of pregnancy, offering a novel perspective in the investigation of molecular constitution and dynamics of this gestational fluid. However, it should be noted that data reported are specific for gestational ages of 8 weeks and for samples from chromosomally normal pregnancies. In all cases, a concordance rate of 100% was found for sex determination between iTRAQ-MARS and Genomic DNA analysis from corionic villus, or amniotic liquid and maternal blood.

Methods
Collection of coelomic fluid. This study is part of an on-going investigation examining the feasibility of analysis on DNA extracted from CF for earlier prenatal diagnosis of foetal diseases. The study was conducted in accordance with the Declaration of Helsinki (Hospital Ethical Committee authorization), protocol number 26-01-2005, No 80 approved by the institutional Review Board of "Ospedali Riuniti Villa Sofia-Cervello", Palermo, Italia. Informed consent forms were obtained from all the study participants and all methods were performed in accordance with relevant guidelines and regulations. The details of such protocols have been previously described 33,34 . Following written consent, women were recruited between 7-10 weeks of gestation. CF from pregnancies with chromosomally normal foetus was obtained by ultrasound-guided transvaginal puncture, as reported 33,34 . Twenty-two samples of CF were selected for proteomic studies. CF cells were used for morphological and genetic analysis 33,34 , while the fluid was used for proteomic experiments.

Sample preparation of embryo-foetal cells and Quantitative Fluorescent Polymerase Chain Reaction (QF-PCR).
Embryo-foetal cells were one by one aspirated by a micromanipulator using a 45 μm glass micropipette (BioCare Europe). Cells were placed into a drop of 0.9% NaCl in the same Petri disk. Each drop containing a group of embryo-foetal erythroid precursor cells was centrifuged at 10000 rpm for 7 min and supernatants discarded. All samples were subjected to DNA extraction by alkaline method. QF-PCR was used to evaluate the foetal sex. Specific short tandem repeat (STR) of highly variable chromosomal markers located on X and Y chromosomes (AMXY, HPRT, SRY, DXS1187, DXS8377, DXS6803, DXS6809) were used to obtain information on foetal sex. Each primer was labelled with fluorescent dyes ( Protocol III a . The cartridge was treated four times with 400 µl of 50 mM NH 4 HCO 3 , (pH 8). CF (200 µl) was applied on column, centrifuged for 2 min at 3000 rpm, then collected. The cartridge was washed with 400 µl of 50 mM NH 4 HCO 3 and the obtained flow-through fractions were collected and concentrated (50 µl). High abundant proteins were eluted with buffer B (5 times, pH 2.5). The collected fraction was dried, than solubilized with 100 µl of buffer (0.375 M TRIS, 0.1% SDS, pH 8.8).
Protocol III b . The cartridge PROT-BA was treated with 200 μl of 50 mM NH 4 HCO 3 , (pH 8). CF (200 μl) was applied on column and incubated for 10 min. After centrifugation at 3000 rpm for 1 min, the flow-through fraction was collected. The cartridge was washed with 200 μl of 50 mM NH 4 HCO 3 and the collected flow-through fractions were combined. The retained proteins were eluted with (NH 4 ) 2 CO 3 (pH 10), after 10 min of incubation and centrifugation at 3000 rpm for 2 min.
Tryptic digestion. Each lyophilized fraction was solubilized with 100 µl of 50 mM NH 4 HCO 3 . Trypsin (20 pmol) was then added to the protein mixture, and the digestion step was performed in a home microwave (MWD 246 SL, Whirlpool, Italy) at 250 W irradiation power (12 treatments, each one lasting 2 min). Tryptic peptide mixtures were subjected to reversed phase chromatography fractionation 52 .
Sample preparation for comparative quantification. 200 µL from 10 samples of CF were pooled together (CFp) and used for comparative quantification experiments with twelve individual CFs. CFp and the twelve CFs (A-N) were depleted of high abundant proteins using MARS approach (Protocol IIIa).The depleted fractions were collected and dried. The depleted CFp sample (DCFp) was treated with 100 µl of 50 mM NH 4 HCO 3 and incubated with 4 µL of PNGase F (0.5 unit/μL). The microwave assisted deglycosylation step was performed at 250 W irradiation power (20 treatments each lasting 1 min). The protein mixture was subsequently treated with 200 µl of buffer (0.375 M TRIS, 0.1% SDS, pH 8.8) and boiled in a bain-marie for 5 min. After this time, trypsin (120 pmol) was added. The microwave assisted tryptic digestion was carried as above reported. All tryptic peptide mixtures were purified on a SPE Strata C18-E column (Phenomenex Inc,USA) to eliminate salts 53 and interferences with the iTRAQ reagent procedure. Column was washed with CH 3 OH, and then conditioned with 2 mL of CH 3 CN/TFA 0.1% (50:50, v/v) and 2 mL of TFA 0.1%. Samples were made acidic by adding 300 µl of TFA 0.5% and loaded on column. The washing step was performed using 4 mL of TFA 0.1%, and the flow-through fractions were wasted. Collection of peptides was performed by using (a) 4 mL of CH 3 samples C 117 , F 117 , I 117 , N 117 ). Labeled samples were combined in four groups (i.e., P 114 , A 116 , B 115 and C 117 ) and dried prior cation exchange and reversed phase chromatography fractionation. A series of 180 chromatographic fractions were collected for each group.

MALDI MS and MS/MS analysis.
Linear MALDI-TOF spectra were acquired using a 5800 MALDI-TOF/ TOF analyzer (AB SCIEX, Germany). All spectra were acquired in default calibration mode averaging 2500 laser shots with a mass accuracy of 500 ppm. MS and MS/MS analyses were performed in reflectron positive-ion mode. All chromatographic fractions were solubilized in 10 µl of matrix (α-CHCA 10 mg/mL, CH 3 CN/0.3% TFA in water, 50:50, v/v). MS spectra were acquired with a laser pulse rate of 400 Hz and at least 4000 laser shots, and CID-MS/MS experiments were performed at collision energy of 1 kV, using ambient air as the collision gas (10 −6 Torr). CID-MS/MS spectra required up to 5000 laser shots and a pulse rate of 1000 Hz.