Introduction

The ECM is a composite of cell-secreted molecules that offers biochemical and structural support to cells, tissues, and organs1. In humans, the composition of the ECM can be broadly summarized as a combination of water, protein, and polysaccharide, with the precise balance of these three compartments reflecting functional requirements of the tissues. The structural requirements determine the mechanical properties of the ECM, which depend on the protein composition of the matrix, particularly the abundance of collagen and elastin1. The physiological relevance of these properties extends beyond simple structural integrity. The cells surrounded by the ECM are capable of sensing its rigidity through integrin-mediated interactions with the matrix2. The mechanical properties of the matrix are then interpreted and affect motility, proliferation, differentiation, and apoptosis3,4,5,6. Thus, the knowledge of the precise composition of different tissues is important for understanding their structure-function relationship.

Connective tissue is one of the four basic types of human tissue, and is primarily composed of fibrous ECM components7. Tendons, ligaments, adipose tissue, bone, cartilage, and intervertebral disc (IVD) are connective tissues involved in diverse physiological roles including nutrient storage, endocrine function, and providing structural integrity8,9,10. The physical properties of connective tissues are similarly varied, perhaps best exemplified by contrasting adipose tissue, a loose tissue populated by large, lipid-containing adipocytes, with bone, a hard tissue consisting primarily of mineralized collagen fibrils9,11. The protein composition of human connective tissues is described as differing substantially, however quantitative and comparative studies on the subject are few and far between.

Connective tissue pathologies are often closely associated with alterations in the composition and structure of the ECM. This is particularly well studied for disorders that affect cartilage, such as osteoarthritis, and IVD degeneration. Osteoarthritis (OA), a common disorder occurring in 60% of adults over the age of 65 in Europe and North America, leads to pathological changes in the structure and composition of the cartilage matrix, which coincide with excessive ECM degradation12. Degeneration of the IVD is also linked to excessive matrix catabolism13. A great deal of variance exists around reported incidence rates of disc degeneration, however some reports suggest a rate as high as 70% in some populations, and the disorder is usually thought to be linked to back pain14,15. Adolescent idiopathic scoliosis occurs in approximately 2.5% of children aged 10–16, and was suggested to involve abnormalities in connective tissues16. It stands to reason that because of the matrix and tissue degradation observed in these disorders, there may be quantifiable differences between the protein composition of healthy and diseased connective tissues.

The goal of this study was to perform a systematic review and meta-analytic synthesis of the published literature reporting quantitative information on the protein composition of human connective tissues. The primary objective was to obtain estimates for absolute and relative amount of ECM proteins. We set two additional secondary objectives: 1) to quantify the changes in ECM composition in distinct pathologies; 2) to combine the outcomes for ECM proteins with reference values for other components to build quantitative estimates of whole tissue composition.

Methods

Information retrieval

A medical librarian (MM) constructed a search strategy (provided in Supplemental Materials and Methods) and performed a computerized bibliographic search of the Medline (OVID) database. This search strategy was then adapted to search EMBASE and SCOPUS. The searches were performed March 10, 2017 and returned 8,341 non-duplicate publications. The search of references and citations identified 7 additional publications. An updated search was performed on January 23, 2019 and returned an additional 1256 non-duplicate publications.

Study selection

All screening and selection was performed by 2 co-authors (TJM and GP for the original search, TJM and SVK for the updated search). Abstract/title screening identified papers that contained information on ECM proteins derived from human tissue. During full text screening, the studies were selected if they contained 1) quantification of proteins; 2) the quantified proteins included those found in the ECM; 3) samples were taken from human bone, adipose tissue, tendon, ligament, cartilage or skeletal muscle; 4) the articles reported complete statistical information including sample sizes, mean values, and measures of spread (standard deviation or standard error). Exclusion criteria were 1) no other studies reported on the protein or tissue of interest, making synthesis impossible; 2) the reported units were incompatible with the transformation to μg protein/mg dry tissue. All conflicts that arose throughout the screening process were discussed by the two reviewers until consensus was reached.

Data collection

The reported tissue type, ECM component name, mean, sample size, and standard deviation/standard error were extracted, together with information on subject age, sex, pathology, sub-location within the tissue as well as the methodological information. If the data within a paper were stratified by a variable such as age, sex, or region within a given tissue the data were entered as separate datasets. The complete data pool is available in Supplemental Table 1.

Data transformation

The data that were not presented in the form of μg protein/mg dry tissue were adjusted using the following assumptions and resources: (1) ECM proteins + glycosaminoglycans account for the entirety of the decellularized dry weight of cartilage17; (2) water content of tendon is 62.5%18; (3) water content of skeletal muscle is 75%19; (4) if publications provided direct measurements of water content, that information was used in lieu of reference values for transformations within that study; (5) molar weights were taken from genecards.org for moles to mass transformations.

Data synthesis

Analysis was performed using MetaLab20. We assumed random effects model, and calculated individual dataset weights (wi) using inverse variance weighting:

$${w}_{i}=\frac{1}{se{({\theta }_{i})}^{2}+{\tau }^{2}}$$
(1)

Where se(θi) were standard errors for individual datasets, and τ was an inter-study variance. When only standard deviations se(θi) were provided, se(θi) were calculated using dataset sample size as:

$$se({\theta }_{i})=\frac{sd({\theta }_{i})}{\sqrt{{n}_{i}}}$$
(2)

Inter-study variance ($${\tau }^{2}$$) was calculated using the DerSimonian and Laird method

$${\tau }^{2}=\frac{Q-(N-1)}{c}$$
(3)

Where N was the number of datasets, and Q statistics (Q) and concordance statistics (c) were calculated as follows:

$$Q=\sum _{i}(se{({\theta }_{i})}^{-2}{({\theta }_{i}-\frac{{\sum }_{i}se{({\theta }_{i})}^{-2}{\theta }_{i}}{{\sum }_{i}se{({\theta }_{i})}^{-2}})}^{2})$$
(4)
$$c=\sum _{i}se{({\theta }_{i})}^{-2}-\frac{{\sum }_{i}{(se{({\theta }_{i})}^{-2})}^{2}}{{\sum }_{i}se{({\theta }_{i})}^{-2}}$$
(5)

Outcomes reported in individual datasets ($${\theta }_{i}$$) were synthesized into a weighted outcome ($$\hat{\theta }$$), according to individual dataset weights ($${w}_{i}$$):

$$\hat{\theta }=\frac{{\sum }_{i}({\theta }_{i}\cdot {w}_{i})}{{\sum }_{i}({w}_{i})}$$
(6)

95% confidence intervals (CI) were calculated using a Z distribution:

$$\pm CI=\pm \,1.96\cdot se(\hat{\theta }).$$
(7)

Heterogeneity assessment

We report heterogeneity measures H2 and I2, which are in turn dependent on the total variation within our overall data pool (Qtotal):

$${Q}_{total}=\sum _{i=1}^{N}({w}_{i}\cdot {({\theta }_{i}-{\hat{\theta }}_{FE})}^{2})$$
(8)

where

$${\hat{\theta }}_{FE}=\frac{{\sum }_{i}se{({\theta }_{i})}^{-2}{\theta }_{i}}{{\sum }_{i}se{({\theta }_{i})}^{-2}}\,{\rm{a}}{\rm{n}}{\rm{d}}\,{w}_{i}=se{({\theta }_{i})}^{-2}$$
$${H}^{2}=\frac{{Q}_{total}}{df}$$
(9)
$${I}^{2}=\frac{{H}^{2}-1}{{H}^{2}}\cdot 100 \% .$$
(10)

Cumulative and single-study exclusion plots

Cumulative exclusion plots were generated by iteratively removing the largest contributors to overall heterogeneity until a predefined homogeneity threshold was reached, assessed using a X2 distribution (p = 0.05). Single study exclusion plots assessed the effect of removal of each individual dataset. It is important to note that both cumulative and single study exclusion plots were constructed for the analysis of heterogeneity only, and the studies remained incorporated into our overall estimates.

Funnel plots

Study level effects (θi) were plotted in relation to their inverse standard error. Theoretical 95% CIs were included to assist in visualizing an unbiased dataset.

Native tissue estimates

The estimates were generated for all tissues where 1) relative proteomic information was available; 2) an estimate for absolute amount of total collagen was calculated; 3) total collagen was included in the relative proteomic information. The estimated mass of a protein (Mp) was calculated from the ratio of its relative amount (Rp) to the relative amount of collagen (RCollagen) and the absolute estimate for the mass of collagen in the tissue (RCollagen)

$${M}_{p}=\frac{{R}_{p}}{{R}_{Collagen}}\,\times {M}_{collagen}.$$
(11)

When the estimates for non-collagenous tissue constituents calculated during the study were based on at least 3 datasets, they were used in lieu of (11).

Results

Study selection

The electronic search for reports containing quantification of ECM components in connective tissues identified 9,597 papers, of which 226 were included as potentially relevant after title/abstract screening. Full text screening yielded 37 articles containing quantitative information on at least one ECM protein. (Fig. 1a). From the selected articles, 12 reported the absolute quantitative information for proteins derived from articular cartilage21,22,23,24,25,26,27,28,29,30,31,32; 9 for intervertebral discs (IVD)16,33,34,35,36,37,38,39,40; 3 each for tendon41,42,43 and skeletal muscle44,45,46; 2 for ligament41,42; and 1 for adipose tissue (Fig. 1b)47. A single protein was quantified in 8 studies; multiple components in 19 studies; 5 studies quantified the entire tissue proteome; and 5 reported on tissue-level composition (Fig. 1c). Absolute quantification was provided in 29 papers, while 8 studies reported relative values. In the studies reporting absolute protein quantification, several stratified data by age, sex, tissue sub-location and pathological state, which we extracted as 580 individual datasets. Pathology types covered in the selected papers included 173 datasets for the ECM composition in states of IVD degeneration; 54 for osteoarthritis; 26 for scoliosis; 12 for osteochondral lesions and 6 for other pathologies, including diabetes and obesity (Fig. 1d). In total, the absolute amount of 89 unique components was reported (Supplemental Table 2). After data appraisal, we selected to perform meta-analysis for 4 most commonly reported proteins, which included collagen, elastin, fibronectin and proteoglycans, in addition to glycosaminoglycan. Four studies, all reporting on articular cartilage, were excluded from meta-analysis since they did not provide information on these 5 components.

Data distribution and heterogeneity

Collagen quantification in healthy IVD (34 datasets) and articular cartilage (12 datasets) were the largest data pools and thus most suited for synthesis and meta-analysis. We constructed funnel plots to investigate publication bias in the reported datasets, which would result in an asymmetric distribution of points on the plot about our total mean. The data points are distributed evenly about the estimated effect size for IVD (Fig. 2a), and for articular cartilage (Fig. 2b). The weighted distribution and normal probability plots indicated that values reported in the IVD data pool are approximately normally distributed (Fig. 2c), however the articular cartilage data deviated from normality (Fig. 2d). The single study and cumulative exclusion analysis (Fig. 2e) for the IVD data pool demonstrated that the individual study contributions to overall heterogeneity were fairly consistent, and removal of 53% of the datasets generated a homogenous data pool. In the articular cartilage data pool, no individual study markedly contributed to the overall heterogeneity, and removal of 29% of the datasets generated a homogenous data pool (Fig. 2f).

Meta-analysis of collagen abundance in connective tissues

Using a random effects model, we estimated collagen abundance in connective tissues (Fig. 3). IVD (n = 34 datasets reporting N = 207 samples) was found to contain 385 μg collagen/mg dry tissue (95% confidence interval (CI): 350, 420); articular cartilage (n = 12 datasets reporting N = 182 samples) – 708 μg collagen/mg dry tissue (95% CI: 668, 748), skeletal muscle (n = 10 datasets reporting N = 65 samples) – 80 μg collagen/mg dry tissue (95% CI: 72, 88), and tendon (n = 2 datasets reporting N = 13 samples) – 149 μg collagen/mg dry tissue (95% CI: 72, 226). We identified a single dataset for adipose tissue (6 samples) which reported collagen abundance to be 294 μg collagen/mg dry tissue (95% CI: of 279, 309). Two cartilaginous tissues, IVD and articular cartilage, were found to contain significantly different amounts of collagen.

Investigation of methodological and biological contributors to heterogeneity

Since the degree of heterogeneity in the two most populated cartilage data pools was high (IVD: H2 = 708, I2 = 99.9%; articular cartilage: H2 = 9.6, I2 = 89.6%), we next investigated whether factors related to methodological choices (Fig. 4) or biological factors (Fig. 5) contributed to the heterogeneity of the datasets. Different methodologies were employed to quantify collagen levels, including original hydroxyproline quantification using Stegemann technique (3 datasets) and its Kivirikko (24 datasets) and Woessner (2 datasets) modifications, as well as ELISA (5 datasets). Stratification of the data demonstrated that significantly lower collagen content was estimated using ELISA compared to other techniques (Fig. 4a). Studies reported the protein abundance in moles, micrograms, or a percent of total protein in relation to the wet or dry weight of tissue. Therefore, the dataset values were transformed to micrograms per milligram of dry tissue weight prior to meta-analysis. We examined if transformation of the data systematically affected the outcome in IVD (Fig. 4b,c) and articular cartilage (Fig. 4d,e) data pool, however data transformation did not significantly contribute to differences in outcome.

Several biological variables that could contribute to cartilage ECM composition were reported, including subject sex, age, and tissue sub-location. Subject sex or age did not significantly affect collagen abundance in IVD, even though an interesting age-dependent change was visually evident (not statistically significant by ANOVA, p = 0.25) (Fig. 5a,b). In IVD collagen abundance was significantly higher in the annulus fibrosus relative to both the endplate and the nucleus pulposus (Fig. 5c). Heterogeneity was reduced and data pools exhibited low bias and normal distribution (Fig. S1) when IVD data pools for annulus fibrosus (n = 4 reporting N = 113 samples, H2 = 124.4, I2 = 99.2%) and nucleus pulposus (n = 4 reporting N = 60 samples, H2 = 43.5, I2 = 97.7%) were separated. Collagen abundance in the lateral and medial condyle of articular knee cartilage was similar, however large variability in medial condyle reports was evident (Fig. 5d).

Estimated abundance for sGAG, fibronectin, elastin and proteoglycans

Robust analysis for non-collagenous ECM components was constrained by limited numbers of reports. Nevertheless, we estimated the content of sGAG, fibronectin, elastin, and proteoglycans in tendon, skeletal muscle, articular cartilage, ligament, and IVD (Table 1).

Changes in ECM composition in pathological samples

ECM composition in a pathological state was reported in 18 studies (Table 2). Disc degeneration was associated with a significant increase in elastin abundance, while scoliosis resulted in a significant decrease in collagen (Fig. 6a). Since we had identified IVD location as a significant determinant of collagen content, we further investigated if IVD pathologies differentially affect collagen content in different parts of IVD. We have found that IVD degeneration resulted in a significant increase in collagen in annulus fibrosus, but not in nucleus pulposus or intermediate zone (Fig. 6b). Scoliosis was associated with a reduction in collagen content in annulus fibrosus and in the endplate (Fig. 6b). In osteoarthritic articular cartilage, the collagen content was significantly increased, but articular cartilage degeneration or osteochondral lesions were not associated with changes in collagen or sGAG (Fig. 6c).

Overall composition of connective tissues

Some of the selected studies reported the proteomic analysis of tissue samples. While these studies do not provide absolute quantification of the identified proteins, we compiled the relative quantification data to obtain an overall portrait of protein composition of different connective tissues. The relative makeup of the proteomes of articular cartilage25, skeletal muscle48, tendon42,49, ligament42, and bone50 were reported (Fig. 7a). To determine the relative composition of the whole tissues, we combined the proteomics data with the estimated amounts of water or unique constituents such as lipids, or mineral for articular cartilage17,30, IVD16,51, skeletal muscle19, tendon18, ligament52, bone53, and adipose tissue54 (Fig. 7b). Finally, combining the relative quantifications with calculated estimates of absolute abundance of collagens, we calculated the estimated levels for all tissue constituents in articular cartilage (Table 3) skeletal muscle (Table 4), and tendon (Table 5).

Discussion

In this paper we synthesized the existing literature and generated robust estimates for collagen content in human connective tissues. In addition, within the IVD, we quantified the regional collagen content in the annulus fibrosus, nucleus pulposus, and endplate. Analysis of methodological techniques identified systematic differences between the estimates provided by ELISA and by the hydroxyproline-based assays. Collagen abundance was not significantly affected by sex or age. We demonstrated that osteoarthritis, disk degeneration and scoliosis lead to distinct changes in ECM composition, particularly in the abundance of collagen and elastin. Finally, we synthesized known information on absolute and relative protein content, as well as water, polysaccharide and unique tissue components such as lipid, to provided quantitative estimates for the native composition of human connective tissues.

Our primary goal was to synthesize existing research to build robust estimates of ECM component abundance. This question may appear outdated, as many publications and reviews present certain facts about tissue composition as common knowledge, such as cartilage containing somewhere between 20 and 35% protein, with the majority of that fraction being collagen and proteoglycans17,55. However, it is very difficult to follow the citation trail to find the origins of these estimates. We performed a systematic review in order to identify the primary papers reporting direct quantification of ECM compositions in human connective tissues. Based on these studies we were able to perform meta-analysis on a limited number of tissues and components. Importantly, for some tissues, such as bone, no study has passed the inclusion criteria, while for other tissues, such as adipose tissue and ligament, only 1–2 studies were identified, demonstrating significant gaps in information regarding quantification of absolute amounts of ECM components in human tissues.

The datasets describing collagen quantification in articular and IVD cartilage were sufficiently large and of high enough quality to obtain robust estimates and perform more in depth meta-analysis. While heterogeneity was high, which may negatively affect the precision of the estimates, the majority of the datasets for both IVD and articular cartilage data pools were evenly distributed about the random effects estimate within the expected 95% CI. No single study was found to account for a significant portion of overall heterogeneity. Methodologically, collagen abundance was assessed by measuring hydroxyproline content using Stegemann, Kivirikko, or Woessner methods56,57,58 in 29 of 35 studies or by ELISA in 6 studies. The mean of the estimates generated by ELISA were significantly lower than the overall estimate, suggesting that systematic methodological differences need further investigation. We have found that the sex or age of the subjects did not significantly affect IVD collagen abundance. It is known that articular cartilage contains proportionally more total protein than IVD17,51. Our estimates suggest that, in addition, collagen accounts for different proportions of total protein in these two tissues. Within the IVD we demonstrated that collagen abundance differs between the annulus fibrosus and nucleus pulposus, with the annulus having significantly more collagen. This is consistent with the current literature, which asserts that the dry-weight of the nucleus is shifted towards proteoglycans59. Importantly, pathological conditions were associated with significant and disease-specific changes in cartilage collagen content. Degeneration of IVD was associated with a significant increase in collagen abundance in the annulus fibrosus. Of interest, collagen content in nucleus pulposus of degenerating disks tended to decrease, providing biochemical basis for location-specific changes in degenerating IVD, where altered collagen distribution was proposed to contribute to the loss of structural integrity and horizontal bulging of the disc59. Similar to disk degeneration, the articular cartilage of osteoarthritis patients was found to contain more collagen than healthy tissue. Since increased deposition of ECM proteins was previously linked to fibrosis and injury repair of cartilage60, high collagen content in osteoarthritic and degenerating cartilage may be related to its repair and regeneration. Tears in degrading discs tend to occur in the annulus of IVD, consistent with higher collagen content in the annulus rather than nucleus of degenerating disks59. In contrast to cartilage degeneration, scoliosis was associated with a reduction in collagen abundance, in particular in the annulus and the endplate, supporting the theory that abnormalities in the IVD ECM are involved in the pathophysiology of scoliosis61. Thus, although individual studies reported heterogeneous estimates, given sufficient number of primary studies, the data still can be successfully synthesized to provide significant insights into the physiology and pathology of connective tissues.

In contrast to collagen, reports on other ECM components were much less common, which limited data potential for synthesis. We built estimates for the abundance of sGAG in tendon, skeletal muscle, and articular cartilage; fibronectin in tendon, ligament, and articular cartilage; elastin in tendon and cartilage; and total proteoglycans in IVD and articular cartilage. Limited number of reports prevented further analysis of these data. Our results suggest that the abundance of these components can be affected in pathological conditions, such as an increase in elastin in the degenerating disc. Thus, it is important to further investigate the abundance of non-collagenous ECM components, how they vary between tissues, and how they change in disease.

We generated relative protein breakdowns for 5 tissues (articular cartilage, skeletal muscle, tendon, ligament, and bone) and relative component proportions for 7 tissues (articular cartilage, IVD, skeletal muscle, tendon, ligament, bone, and adipose tissue). For three tissues, articular cartilage, skeletal muscle and tendon, we had an absolute estimate for the total collagen content, a proteomic breakdown that included collagen, and the estimates for other components, allowing us to generate numerical values accounting for 100% of the native tissue (1000 μg constituents/1 mg wet tissue). The summation of our individual estimates for articular cartilage (1085 ± 82) was slightly larger than expected. Our overall estimate for skeletal muscle (913 ± 41) was slightly lower than expected. Our 95% confidence interval for tendon (926 ± 99) includes 1000 μg constituents/1 mg tissue mass. The small amount of deviation from expected values in articular cartilage could result from systematic overestimation of collagen content by hydroxyproline based quantification methods. In regards to skeletal muscle, the reference proteomic study was not aimed at describing the complete proteome, and is missing some primary constituents of skeletal muscle such as its most abundant proteoglycan, decorin62. Proteins and proteoglycans are thought to make up 20% of skeletal muscle, while we only account for 11.2%. Incorporation of the remaining ~9%, translating to 90 μg, would put our overall calculation right around 1000 μg constituents/1 mg tissue. An additional uncertainty is due to our use of static reference values for total water content, which likely fails to take into consideration individual subject variations. Overall, the fact that for these 3 tissues the overall numerical estimates very closely (within 0.3–17%) account for an expected tissue mass strongly attests to the validity of underlying assumptions and calculations.

Outside of the scope of our findings themselves, academia is facing a systematic problem whereby flawed experimental design and selective reporting give rise to data that are erroneous, irreproducible, or cherry-picked63. The synthesis of published studies is often used in the context of clinical trial evaluation and is considered to be the gold standard of evidence64. In principle, synthesis can also be used to generate more robust and high powered estimates for virtually any quantitative scientific question, however the technique is seldom used in basic research20. The generated estimates effectively employ a much larger sample size than any single study, and through analysis of the datasets themselves, researchers can identify systematic reporting biases and contributors to inter-study variations. It is in the interest of all researchers to use these tools to cost-effectively improve upon the existing bank of knowledge. It is important to note that the quality of these estimates, and thus the effectiveness of the technique, strongly depends on the quality of the underlying publications. In our case, absolute quantification was not available for many tissues or for proteins other than collagen. Relative quantification using proteomics techniques has been performed for the number of tissues, however transforming relative quantification data from mass spectrometry to absolute estimates is difficult, and these studies were rarely reproduced. In therapy development, limitations in modern pre-clinical disease models contribute to a failure to translate in-vitro success to clinical trial success63. It is well understood that the physical properties of the ECM, in addition to the proteins themselves, are powerful regulators of cellular behavior. Thus, failure to adequately replicate the native extracellular environment when culturing cells in vitro may lead to their altered behavior, and as a result, what is observed may not translate into the human patient65. Culturing on ECM-coated dishes, 3D culture on biomimetic scaffolds, and culture of organ-like organoids has become more and more commonplace65,66. By building a quantitative ingredient list for the extracellular environment we will be able to refine and evaluate these models and further improve upon our repertoire of in vitro tools.