Introduction

With plans by NASA to return humans to the lunar surface by 20241 and to have the first-ever astronauts journey to Mars within the next 2 decades2, in addition to private interests in developing the first human colony on the Martian surface3, human space travel will no doubt continue if not increase in the following century. Despite these high ambitions, we still do not fully understand the cause of physiological changes we observe in astronauts who travel to space, one of which is microgravity-induced bone loss4,5.

Animals have long been used as models to assess the physiological changes observed as a result of various stimuli and inform their impact on human health. Space-traveling animals have even preceded humans, with several dogs, rodents, and primates being sent to space in the late 1940s–1960s6. After developing the necessary technology allowing mammals to survive all phases of spaceflight, beginning in the 1970s animal experiments shifted to focus on the physiological effects of space travel7. The information obtained in animal studies significantly augmented our knowledge regarding human adaptations during space travel. Experiments assessing skeletal changes in animals have the benefit of the collection of bone biopsies, which is absent in astronaut studies. These biopsies have allowed for an investigation into changes to cellular and molecular components of bone associated with microgravity, and thus provide further insight into the underlying mechanisms of microgravity-induced changes in bone health. These missions however come at a considerable price, and it has been estimated that NASA spent $1.2 billion per launch over the period from 1982 to 20108, therefore it is critically important to gain as much knowledge as possible from all the space experiments.

Even with the benefits of animal studies, and a significant expense associated with their execution, these experiments have not yet been used for the purposes of quantitative data synthesis. To overcome the problems associated with small sample sizes and a high degree of variability between individual missions we employed meta-analysis to improves the statistical power of all the studies. Thus, the objectives of this study were to (i) to systematically identify all the published literature regarding bone health in vertebrate animals that were part of experiments performed in space; (ii) use a meta-analytic approach to quantitatively characterize space-related changes to bone architecture and turnover in animals, (iii) identify cofounding variables associated with changes in bone health.

Results

Overview of relevant studies

The systematic search describing the overlap of space travel, animals, and bone executed in Medline, Embase, Web of Science, and BIOSIS, together with the 9 reports found via manual searches of the NASA Technical Report Server and the compendium of animal and cell spaceflight experiments compiled by Ronca et al.9 resulted in the identification of 1128 candidate articles (Fig. 1a). Of these, 340 articles focused on bone, while the rest discussed a range of physiological systems potentially relevant to bone health, including skeletal muscles, metabolism, and developmental issues (Fig. 1b). The majority of studies (83%) described findings in rats (664/1128), mice (181/1128), and primates (96/1128) (Fig. 1c). The number of papers describing animals in space peaked in the 1990s (Fig. 1d). From the 1970s through the 2000s, rats were the main spacefaring animal model. Interest in primates peaked in the 2000s, however, in the last decade mice have become the predominant animal model studied in space (Fig. 1e). Considering the available data, the full-text screen focused on 340 studies describing bone health in rodents and primates and identified 63 studies that presented quantitative measurements of trabecular and cortical bone architecture or bone turnover (Table 1)10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72. After excluding studies that reported data on treated animals, reported duplicate data, or demonstrated unclear reporting (Supplementary Table 3), 40 articles were selected for the final meta-analysis: 23 describing rats10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32, 12 describing mice33,34,35,36,37,38,39,40,41,42,43,44, 4 describing primates45,46,47,48, and 1 describing both mice and primates49. The final dataset included a total of 95 rats, 61 mice, and 9 primates (rhesus macaque monkeys) flown to space on 22 missions (Table 2).

Fig. 1: Systematic review information flow and outcomes.
figure 1

a Prisma diagram. be Analysis of 1128 articles selected after the title and abstract screening. b Distribution of physiological systems mentioned in the papers. c The number of articles discussing indicated species. d The number of articles by publication decade. e The number of articles by publication decade for species of rats (solid line), mice (dashed line), and primates (dotted line).

Table 1 Bone parameters included in the meta-analysis.
Table 2 Description of articles included in the meta-analysis.

Heterogeneity, bias, and the meta-analytic model

Statistical heterogeneity was moderate to high (I2 > 46%) for all the extracted parameters for spaceflight-related changes except for bone marrow area (I2 = 14.4%) and cortical bone area (I2 = 0%). Single mission exclusion analysis identified some mission-level outcomes removing which reduced the overall heterogeneity, however, no single mission influenced the heterogeneity of more than one parameter or the global outcome for Tb.BV/TV or Tb.N parameter datasets (Fig. 2a, Supplementary Fig. 1a). Cumulative-mission exclusion analysis demonstrated that exclusion of >21% of missions led to a homogeneous (I2 ≤ 30%) dataset, and that the overall outcomes for Tb.BV/TV and Tb.N were not affected by decreased heterogeneity (Fig. 2b, Supplementary Fig. 1b). The funnel plot demonstrated symmetrical distribution (Fig. 2c, Supplementary Fig. 1c). We assessed the quality of individual papers on a 25-point scale (Supplementary note 2) and examined if quality score affected the reported paper-level variance (Fig. 2d, Supplementary Fig. 1d) or effect size (Fig. 2e), however, no significant association of quality score with reported outcomes was observed. Subgroup analysis further demonstrated no difference between papers with low (<20) and high (≥20) quality scores (Supplementary Fig. 2). We conclude that the publication bias is negligible within this dataset. To account for low sample sizes, as well as heterogeneity, we used the modified sampling by size method73 for further analysis.

Fig. 2: Heterogeneity and sensitivity analyses for the BV/TV dataset.
figure 2

a, b Heterogeneity was analyzed using single mission exclusion (a) and cumulative mission exclusion (b). Red area: 95% CI for the global effect size (left axis); line: I2 (right axis). c Funnel plot. (d) Article-level standard error SE p) as a function of quality score. e Meta-regression of the Tb.BV/TV, Ob.S, and Ct.Ar paper-level outcomes as a function of quality score. The maximum quality score was 25. R2 is shown.

Changes in trabecular parameters during spaceflight

Many studies included two types of control—vivarium control (VC), where animals lived in a standard laboratory habitat, and the ground control (GC), where some or all aspects of spaceflight other than microgravity, such as physical enclosure, diet and lift off and re-entry forces, were simulated. We examined the percentage difference in spaceflight compared to GC, as well as in GC compared to VC. Of the 6 parameters describing trabecular bone: trabecular bone volume fraction (Tb.BV/TV), thickness (Tb.Th), number (Tb.N), separation (Tb.Sp), connective density, and Total BV/TV; Tb.BV/TV was significantly lower in spaceflight mice and rats compared to ground control, and Tb.Th was significantly reduced for mice (Fig. 3a, b left). For rodents overall, Tb.BV/TV and Tb.Th changed significantly by −24.1% [−43.4, −4.9] and −9.0% [−12.9, −5.2], respectively. Tb.N, Tb.Sp, and connective density demonstrated trends towards poor bone health in spaceflight mice and rats, however only the change in Tb.N reached statistical significance (Fig. 4a–c). Total BV/TV, which was measured in flat bones and in one case vertebra, did not change due to spaceflight (Fig. 4d). When ground and vivarium controls were compared, Tb.BV/TV, Tb.N, and Tb.Sp were unaffected, but Tb.Th was significantly lower in GC compared to VC (Fig. 3a, b right, Supplementary Tables 6, 7), suggesting that flight conditions other than microgravity may contribute to a reduced Tb.Th. In all trabecular parameters in rodents, heterogeneity was moderate to high, I2 > 46%. Trabecular parameters were measured in 4 primates on missions Bion 10 and 11, and demonstrated significantly lower Tb.BV/TV, a trend to reduced Tb.N, and Tb.Th, and a trend to higher Tb.Sp compared to GC (Table 3). Thus, there was a deficit in trabecular bone in rodents and primates after the spaceflight.

Fig. 3: Forest plot of spaceflight and ground control-induced changes to Tb.BV/TV and trabecular thickness.
figure 3

Changes in BV/TV (a) and trabecular thickness (b) of spaceflight animals (SF) compared to ground control animals (GC) (Left); and GC compared to vivarium control animals (VC) (Right). For each indicated species, missions are sorted by mission year (old to new); duration of spaceflight (Days), and the number of spaceflight animals (nSF) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to nSF. Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I2, and H2 are given for rodents. Asterisk (*) indicates missions wherein GC was not present, and SF was compared to VC.

Fig. 4: Forest plot of spaceflight induced changes to the trabecular number, trabecular separation, connective density, and total BV/TV.
figure 4

Changes in trabecular number (a), trabecular separation (b), connective density (c) and total BV/TV (d) of space flight animals (SF) compared to ground control animals (GC). For each indicated species, missions are sorted by mission year (old to new); duration of spaceflight (Days), and the number of spaceflight animals (nSF) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to nSF. Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I2, and H2 are given for rodents. Asterisk (*) indicates missions wherein GC was not present, and SF was compared to VC. For mission-level effect sizes and 95% CI, refer to Supplementary Tables 69.

Table 3 Spaceflight-induced changes in bone parameters of primates.

Changes in trabecular bone turnover during spaceflight

We next examined if spaceflight-induced bone deficits are associated with abnormal function of osteoblasts or osteoclasts. Osteoid surface (OS) and thickness (O.Th) were significantly lower in rodents by −29.9% [−53.9, −5.8] in OS and −28.6 [−54.5, −2.7] in O.Th; Osteoblast surface (Ob.S) and osteoblast number (N.Ob) demonstrated a trend to decrease compared to GC (Fig. 5). Comparison of ground and vivarium controls was available for only two missions for all osteoblast parameters except for Ob.S, for which GC and VC were not significantly different (Supplementary Tables 1013). Heterogeneity for osteoblast parameters was moderate to high I2 > 50%. The osteoblast parameters were from trabecular skeletal regions, except for missions Cosmos 936 (Ob.S) and Cosmos 1667 (OS and O.Th), in which the measurements were from endocortical surface of the tibia diaphysis and metaphysis, respectively, and excluding these data resulted in homogeneous datasets for Ob.S and O.Th (I2 = 0), but not for OS (I2 = 74.9%). When only osteoblast indices in trabecular bone are considered, spaceflight resulted in a statistically significant reduction in Ob.S of −20.1% [−35.0, −5.1], OS −30.4% [−55.1, −5.8] and O.Th −36.2% [−60.2, −12.2]. Thus, osteoblast formation and function in rodents were negatively affected by spaceflight.

Fig. 5: Forest plot of spaceflight induced changes in trabecular bone turnover parameters.
figure 5

Changes in osteoblast surface (a), osteoblast number (b), osteoid surface (c), and osteoid thickness (d) of space flight animals (SF) compared to ground control animals (GC). For each indicated species, missions are sorted by mission year (old to new); duration of spaceflight (Days), and the number of spaceflight animals (nSF) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to nSF. Overall effect size (%) and 95% CI are indicated by diamonds for rats and rodents, I2, and H2 are given for rodents.

In contrast to osteoblast parameters, changes in osteoclasts were inconsistent. Osteoclast surface (Oc.S) in SF mice demonstrated large study level increases in 2 of 3 datasets, however, it was unaffected in SF rats (Fig. 6a left). Osteoclast number (N.Oc) was higher in the one group of SF mice where it was measured, and was not significantly affected in spaceflight rats (Fig. 6b left). Moreover, comparing ground and vivarium controls demonstrated strong (10–70%) tendencies for study level increases (Fig. 6 right). Although the overall effect size for GC vs. VC comparisons only reached statistical significance for Oc.N, these data suggest that in rodents osteoclasts may be affected by spaceflight conditions other than microgravity. Heterogeneity was high for Oc.S and N.Oc datasets. The osteoclast parameters were from trabecular skeletal regions, except for missions Cosmos 936 (N.Oc) and one of the bones for mission Bion M1 (Oc.S); excluding these data did not significantly change the outcome. Thus, osteoclast parameters were unaffected in rats and variably affected in mice.

Fig. 6: Forest plot of spaceflight induced changes to osteoclast parameters.
figure 6

Changes in osteoclast area (a), and osteoclast number (b) of space flight animals (SF) compared to ground control animals (GC) (Left); and GC compared to vivarium control animals (VC) (Right). For each indicated species, missions are sorted by mission year (old to new); duration of spaceflight (Days), and the number of spaceflight animals (nSF) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to nSF. Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I2, and H2 are given for rodents.

Change in cortical bone parameters during spaceflight

Cortical parameters analyzed were bone marrow area (Ma.Ar, which included data on bone marrow diameter (Ma.Dm) transformed as π(d/2)2), cortical area (Ct.Ar), and thickness (Ct.Th). Ma.Ar and Ct.Th did not significantly differ between SF and GC in mice and rats, while Ct.Ar was significantly lower in spaceflight mice and rats (Fig. 7a–c). GC did not significantly differ from VC for any cortical parameter (Supplementary Tables 1416). The heterogeneity for cortical parameters was low, I2 < 15%, except for Ct.Th which showed high heterogeneity, I2 = 90.7%. The datasets of Ma.Ar, Ct.Ar, and Ct.Th are composed of measures taken in the diaphysis of long bones except for missions STS-131 (femoral neck), Bion M1 (animal group 1, ankle bones, and calcaneus), and SpaceX CRS-10 (rib). Removing these biological outliers did not change effect size and resulted in a homogeneous dataset for Ma.Ar with I2 = 0%. Cortical thickness measured in 4 primates was not significantly affected by spaceflight (Table 3). Thus, spaceflight resulted in cortical bone deficits, however, it was affected to a smaller degree compared to trabecular bone.

Fig. 7: Forest plot of spaceflight induced changes to cortical bone parameters.
figure 7

ae Changes in bone marrow area (a), cortical bone area (b), cortical thickness (c), as well as bone formation rate (d) and mineral apposition rate (e) for the diaphyses of long bones, of space flight animals (SF) compared to ground control animals (GC). For each indicated species, missions are sorted by mission year (old to new); duration of spaceflight (Days), and the number of spaceflight animals (nSF) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to nSF. Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I2, and H2 are given for rodents. *missions where SF was compared to VC. #mission where Ma.Ar was derived from average marrow diameter (Av.Ma.Dm) as Ma.Ar = π(Av.Ma.Dm/2)2. f Change in BFR and MAR on the periosteal and endocortical surface of long bones in SF compared to GC. The number of measurements (Nj) is indicated. Square/line: overall effect size (%) and 95% CI.

Change in cortical bone turnover during spaceflight

Only measures of bone formation rate (BFR) and mineral apposition rate (MAR) from the cortical bone surface in the diaphysis of long bones were included in the analysis. This resulted in the exclusion of measures of MAR and BFR in the pelvis and thoracic vertebrae from STS-7831, and in the humeral metaphysis from STS-52 and non-included mission STS-4150. Both BFR and MAR were lower in spaceflight rodents by −34.2% [−50.2, −12.8] and −13.5% [−27.1, 0.1], respectively (Fig. 7d, e). There were no differences between GC and VC for BFR nor MAR (Supplementary Tables 17, 18). Heterogeneity was moderate to high for these parameters, I2 > 52%. When long bone measurements of MAR and BFR taken on the periosteal and endocortical surfaces were separated, we found that the reductions in MAR and BFR were only significant on the periosteal surfaces (Fig. 7f). Thus, bone formation on periosteal surfaces of cortical bone appears to be more affected by microgravity.

Effects of covariates on spaceflight related changes in animal bone health

We next examined the contribution of covariates to the overall outcomes using sub-group analysis and meta-regression. First, we examine if animal characteristics, such as age, sex, and strain affect the overall outcome. Using linear regression analysis, we have found that rodent age was weakly associated with changes in osteoblast surface and cortical area, but not with Tb.BV/TV (Fig. 8a). Subgroup analysis further demonstrated that in animals 10 weeks of age or older, larger changes were observed in Tb.N, Ob.S, and Oc.S, while Ct.Th was more affected in younger animals (Supplementary Fig. 3a). When we compared trabecular parameters in instances when both primary and secondary spongiosa of a single bone were analyzed, we observed that changes in Tb.BV/TV, Tb.N, and Tb.Th in secondary spongiosa was greater than in primary spongiosa (Fig. 8b). Animal sex or strain did not significantly affect the outcome (Supplementary Fig. 3b, c).

Fig. 8: Exploratory analysis for the effect of covariates on spaceflight-induced changes in bone parameters.
figure 8

a Meta-regression of the Tb.BV/TV, Ob.S, and Ct.Ar as a function of animal age. b Subgroup analysis for Tb.BV/TV, Tb.Th, and Tb.N outcomes for primary and secondary spongiosa. c Meta-regression of the Tb.BV/TV, Ob.S, and Ct.Ar as a function of flight duration. d Forest plot of the rate of spaceflight induced change to Tb.BV/TV. e Subgroup analysis for Tb.BV/TV, Tb.Th, Tb.N, and Tb.Sp outcomes reported for individual rodent bones from region 1 (skull, vertebra, and thorax, blue), region 2 (pelvis, humerus, and femur, green), or region 3 (tibia and ankle bones, red) as illustrated on the left. For a and c, R2 is shown. For (b to e), N = number of missions, nSF = spaceflight animal sample size, and Nj = number of measurements. Square/line: overall effect size (%) and 95% CI. For (d), in each indicated species, missions are sorted by duration (shortest to longest); duration of spaceflight (Days), and the number of spaceflight animals (nSF) are indicated. Square/line: effect size (%) and 95% CI; dark blue: missions less than 14 days; dark red: missions 14 days or longer. Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents (black), rodents on short duration (dark blue), and long duration (dark red) missions.

Next, we examined if mission-related differences affected the outcome. Spaceflight duration did not significantly correlate with changes in Ob.S and Ct.Ar but was weakly associated with changes in Tb.BV/TV when assessed using meta-regression analysis (Fig. 8c). Moreover, subgroup analysis by mission durations shorter or longer than 2 weeks, demonstrated no significant difference for any parameters (Supplementary Fig. 4a). To estimate the rate of accumulation of bone deficits in space, we divided individual outcomes of our largest parameter dataset, Tb.BV/TV, by the mission duration. Although not statistically significant, the deficits in Tb.BV/TV per day was smaller in long spaceflights than in short spaceflights (Fig. 8d). We estimated the rate of accumulation of trabecular bone deficits as −1.7%/day [−3.5, 0.2], or −1.0%/day [−1.7, −0.4] when taking into account only long-duration missions. We also assessed if individual vs. group housing affects the outcomes, however, no differences were found except for Tb.N, which changed significantly greater when animals were housed individually (Supplementary Fig. 4b). Comparing outcomes by the space agency, we determined no significant difference between space agencies (Supplementary Fig. 4c).

Study-related differences included measurement techniques, presence of sham operation, sacrifice delay, and ground control conditions. For all trabecular and cortical architectural parameters, the division of measurement technique (Histology vs. μCT) coincided with the species difference of rats and mice preventing us from conducting any further meaningful subgroup analysis. In sham-operated rodents Tb.Th was affected significantly less than in naïve animals (Supplementary Fig. 5a). The sacrifice delay did not significantly affect the outcomes in subgroup analysis, although the change in Ob.S was associated with prolonged sacrifice delay in meta-regression analysis (Supplementary Figs. 5b, 6). When ground control groups were divided by the degree to which they mimic the environmental conditions of spaceflight other than the microgravity, we observed no association between the fidelity of the GC and spaceflight-induced changes, suggesting that they were primarily driven by microgravity (Supplementary Fig. 6c).

In astronauts, bone loss is strongly affected by its position in relation to the gravitational vector5,74. To assess if a similar trend is present in rodents, we performed a sub-group analysis of bones from different regions: region 1 that included calvaria, vertebrae, ribs, and sternum; region 2 with pelvis, humerus, and femur; and region 3 with tibia and ankle bones (Fig. 8e left). Changes in trabecular parameters were larger in bones located more distal from the axial skeleton (Fig. 8e right), however, the mean effect sizes were not significantly different between the regions. Among other parameters, Ct.Th, Ob.N, OS, and BFR demonstrated significant changes only in regions 2 and/or 3, while changes in Ob.S were only significant in regions 1 and 2 (Supplementary Fig. 7). These data suggest that bone position in relation to the gravitational vector may be important for rodents, however, targeted studies investigating these relationships would be required.

Discussion

We systematically reviewed and quantitatively synthesized the literature on bone health in space-faring rodents and primates. We report that bone mass is lower in spaceflight rodents and primates, with indications that microgravity is the driving factor inducing bone deficits. Deficits in trabecular bone were larger than in cortical bone and subgroup analysis suggested that distal skeleton was affected more than axial. Osteoblast indices in rodent trabecular bone were significantly lower, however, osteoclast numbers were not affected in rats, and were variably affected in mice. Even though the degree of bone deficit was found to poorly correlate with mission duration, the rate of accumulation of trabecular bone deficit was estimated as −1.7% [−3.5, 0.2] per day, which is much higher than the estimates of bone loss available for humans. Taken together, our data indicate that microgravity induces bone deficits in rodents and primates, and the data suggest that the prevalent mechanism is suppression of bone formation.

We have found that during the 4–39 days space missions rodents accumulated a deficit of −24.1% [−43.4, −4.9] in trabecular bone tissue, which translates to the rate of 1.7% of trabecular bone deficit per day. In the much smaller dataset for primates, the bone deficit after 11.5–14 day missions was equally high, −25.2% [−35.6, −14.7] or 1.9% per day. These estimates for trabecular bone deficits in spaceflight rodents and primates are much greater than estimates of bone loss for astronauts which have been reported as 0.7–2.7% per month4,5,75,76. Nevertheless, similar deficits of 15–50% in tibial and femoral trabecular bone volume were reported in 2–4 week-long hindlimb unloading studies in rats and mice77,78,79,80,81, which can be recalculated to 1.1–3.5% per day. We observed that no single parameter was strongly associated with mission duration. In astronauts, changes to bone were also highly variable for missions less than 30 days in duration5. Of spaceflights studying bone in rodents, only 3 missions were longer than 30 days, one of which (Mice Drawer System (MDS)) was excluded since of the 3 wild type mice aboard, only 1 returned to Earth alive70, preventing us from extracting meaningful quantitative data from it. Thus, continuous measurements of bone parameters in longer missions (>30 days) are required to determine the dynamic association between the duration of exposure to microgravity and bone health.

We have identified several instances of notable regional differences in bone response to microgravity. First, we have found that the deficits in trabecular bone were much greater than those in cortical bone in space-traveling rodents. Similarly, higher deficits in trabecular bone compared to cortical were reported in studies of hindlimb unloaded rats82, as well as in astronauts75. In cortical bone, bone formation was only significantly suppressed on the periosteal surface, which is supported in the observation that Ct.Ar, but not Ma.Ar, was significantly lower in spaceflight rodents. Similar changes in cortical bone formation were observed in hindlimb unloading studies in mice82. Within trabecular bone, we found that rodents exhibited relatively greater deficits in secondary spongiosa compared to primary spongiosa. Secondary spongiosa was also found to be more affected compared to primary in rats after hindlimb unloading61,83,84. However, in the model of immobilization due to sciatic denervation, the bone loss was isolated to primary spongiosa85. Of interest, we also observed a weak association of osteoblast suppression and cortical bone loss with older age in space-traveling rodents. In contrast, extensive and well-controlled studies of the impact of age on bone health in hindlimb unloaded rats reported the opposite trend—higher bone deficits in younger animals77,80. In this regard, it is important to note that the oldest spaceflight rodents were relatively young, 20 weeks of age at the start of the mission, and therefore more studies are needed to fully understand the impact of age on bone health in space. Similarly, even though dramatic sex-related differences were reported in hindlimb unloaded rats81, the effect of sex was poorly investigated in spaceflight animals, with no data available for female rats or primates, and only some mouse studies reporting changes in females.

In humans, the significant association between bone loss and the bone position relative to the gravitational vector was identified5,74. Although it is more difficult to account for an equivalent gravitational vector in rodents, we attempted to assess the regional difference in bones of rodents assuming their quadrupedal movement. We have found that similar to humans, in rodents, distal skeletal regions exhibited a trend of increased trabecular bone deficits compared to axial skeletal regions. Furthermore, in two mouse studies that measured total BV/TV of the calvariae36,40, an increase was reported. These data suggest that local factors, including microgravity-induced redistribution of body fluid86, or change in mechanical environment87 likely contribute to poor bone health.

We demonstrate that spaceflight is associated with strong inhibition of bone formation in rats, mice, and primates, while osteoclast indices were not affected in rats, variably affected in mice, and not reported in primates. In contrast, in astronauts, resorption was found to rise rapidly, reaching a sustained 2-fold increase for the duration of the spaceflight, while formation was decreased or unchanged at the beginning of the mission after which it gradually increased over time5. However, the direct comparison between animal and human data is difficult due to important methodological differences in data acquisition. While in animals bone turnover is predominantly assessed histologically at the end of the space mission, in humans, biochemical markers of bone formation and resorption are measured in serum or urine, allowing for assessment during the spaceflight mission. Importantly, most histological markers only indicate the change in bone cell numbers, while circulating markers reflect both changes in the number and function of bone cells. Nevertheless, we believe that the data conclusively indicate that bone formation is inhibited in animals during spaceflight, because indices related to osteoblast numbers (osteoblast numbers and surface), and histomorphometric measures of osteoblast function (osteoid surface and thickness, mineral apposition rate, and bone formation rate), were lower in spaceflight rodents or primates. In contrast, bone resorption data for spaceflight animals is less consistent and more difficult to compare to humans. Osteoclast numbers or surfaces uniformly did not change in rats, while in mice missions STS-131 and Bion M1 reported strong increases in osteoclast number and surface, but mission STS-108 demonstrated no change. Osteoclast function was assessed using circulating markers in two missions: in mission STS-108, that reported no change in osteoclast number, circulating TRAP5b was higher33; and in mission STS-118, a 13 days mission with mice for which no histological osteoclast data is available, circulating TRAP5b did not change34. Thus, although the data suggest that there may be a difference in the response of bone cells to microgravity between rodents and humans, and/or between mice and rats, we are limited by different nature of measurements in animals and humans, and small sample size for mice. Therefore, more experiments assessing both bone cell numbers and function, especially for osteoclasts, are required to understand the spaceflight-induced changes in bone turnover.

This study has attempted to quantitatively integrate nearly 50 years of bone research in spacefaring animals. The limitations of this analysis included i) the differences in the design of experiments in individual missions, ii) inconsistent reporting, and iii) the need to meta-analytically combine data performed using different protocols over a large time interval. Experimental design of individual missions evolved with time, however notably, there was little data from spaceflights longer than 30 days, and there were no inflight measures of bone turnover or quality, which prevented us from assessing the long-term and dynamic effects of microgravity on the animal bone. Of specific importance for animal experiments, is the design of the ground control, which aimed to model the parameters of spaceflight other than microgravity, which was vastly different between missions. While this resulted in a limitation of comparing experimental groups to very different controls, it also allowed us to perform a preliminary assessment of the relative effects of stressors associated with spaceflight other than microgravity. Since the extent of modeling the stressors in ground control groups was not associated with differential bone deficits, we concluded that microgravity is the main driver of these changes. The most rigorous control for the specific effects of microgravity was in-space artificial gravity, which was performed during three missions, Cosmos 936, SpaceX CRS-9, and CRS-12. When the in-flight 1 g group was used as a “ground control”, the effect sizes for bone changes were not smaller than in missions with ground controls of lower fidelity. In addition, for Cosmos 936 which also had an associated vivarium control group, ground to vivarium control effect sizes and 95% CI were not significantly different from other ground to vivarium control comparisons, altogether suggesting that microgravity is the driving factor for bone loss in space. Nevertheless, we did identify several parameters, including trabecular thickness, and osteoclast surface, and number that appears to be specifically affected in ground control compared to vivarium control groups, suggesting that other spaceflight-associated factors may contribute to those changes. The second set of limitations was relevant to data reporting in the manuscripts. In multiple instances, inconsistent reporting of animal treatment between papers reporting the same mission was observed. Rodent death was not uncommon during spaceflight, however, it was infrequently reported, even though it reflects the stressful conditions during a particular mission, which then could not be accounted for in our analysis. Specifications regarding bone surfaces analyzed in addition to control and spaceflight animal treatment/housing were often vague making categorizing for subgroup analysis difficult. In addition, the degree of movement, which has the potential to affect bone health4, was never reported in the included articles in rodents. This represents a significant shortcoming in reporting of the outcomes of animal experiments in space, since for several missions animal behavior data has been collected88. Therefore, similar to human studies89, improving reporting practices of animal experiments by the Space Life Sciences Programs is critically important. The third set of limitations was related to performing a meta-analysis on studies completed over a considerable interval of time with vastly different protocols. This resulted in our dataset being moderate to highly heterogeneous for 15 out of the 17 parameters. While we attempted to identify all possible factors that may account for the high degree of heterogeneity, no single factor accounted for a major amount of variation in any of the measured outcomes. Since our analysis indicates low publication bias, high heterogeneity likely reflects the multifactorial nature of microgravity-induced bone changes, which can only be investigated through the analysis of larger datasets.

In conclusion, we demonstrate that meta-analysis of animal spaceflight data provides important additional information regarding the effect of microgravity on animal physiology, in particular allowing to perform comparative studies, which otherwise are financially and technologically challenging. Our studies on animals and humans5 demonstrate that microgravity-induced deterioration of bone health is a complex phenomenon, with strong regional and temporal differences, as well as potentially different mechanisms of adaptation in different species. In the future, longer missions with planned in-flight data collection are needed to understand the dynamics of changes in bone tissue and especially bone turnover, which appears to be different between humans and rodents. For nonhuman animals, in particular, it is also important to relate the changes in bone to the movement patterns and activity, which are rarely provided in bone health-focused studies. The quantitative estimates of spaceflight-related changes in bone health provided by our study will inform future studies and help in determining the underlying mechanisms of observed effects.

Methods

This study was compliant with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement. Refer to Supplementary Table 1 for PRISMA Checklist.

Search strategy, inclusion criteria, and quality assessment

A systemic search strategy using terms related to bone, space travel, and animals, including the names of individual missions, bones, and species of nonhuman vertebrates (Supplementary note 1) was constructed by a medical librarian (MM). Medline, Embase, PubMed, BIOSIS Previews, and Web of Science were searched on November 2nd, 2017. An updated search was performed on November 1st, 2019. In addition, a manual search of the NASA Technical Report Server and articles referenced the compendium of animal and cell spaceflight experiments compiled by Ronca et al.9 was performed. Studies in any language were considered. Title and abstract screening for the original search was performed independently by S.D.C. and S.F.C., and for the update by S.V.K. The inclusion criteria were that the article describes any vertebrate species that was taken on a space mission. Studies describing invertebrate animals, humans, or Earth-based spaceflight simulations were excluded. After intermediate analysis, only studies describing spaceflight results for mice, rats, and primates were included in full-text screening for quantitative measurements related to bone health, which was performed by S.D.C., S.F.C., and M.G. for the initial search and by S.V.K. and M.G. for the update. In the final meta-analysis, we included the studies that presented quantitative measurements of trabecular and cortical architecture or bone turnover for bones of axial and appendicular skeleton excluding facial bones. Animals that were pregnant, or received surgery other than a sham, abnormal diet, or hormone supplements, were excluded. Papers presenting average data without a measure of variation were excluded. Included papers were scored for reporting quality (Supplementary note 2), if two different species were reported in a single paper, they were scored independently.

Data extraction

For studies included after abstract/title screening, the year of publication, animal species, and physiological system studied were recorded. For studies that were included in a meta-analysis the following data were independently extracted by M.G. and S.F.C. and verified by J.F.: name and duration of the mission, animal species; animal sample size (n) of spaceflight, ground control, vivarium control, and delayed simulation (when applicable); bone and bone region being measured; and mean, the median and median percent difference in the 18 bone health parameters (Table 1); standard deviations, standard errors of the mean and/or interquartile ranges; day or range of days when measurements were performed. If the type of measure of the dispersion was not stated, it was assumed to be a standard error, which ensures a conservative estimate. If a range of sample sizes was reported, the smallest value was extracted. Extracted study characteristics for covariate analysis included: animal strain, age, sex, spaceflight group sacrifice delays, single vs. grouped spaceflight habitat, the space agency, treatment conditions of the ground control group, and the presence of sham operations. The information regarding a specific mission was pooled from all applicable articles. When different data for apparently identical samples were presented in two papers, we included the data from the study with the higher quality score. For spaceflight group sacrifice delay, if a range of time was given, the largest time interval was used. Alternate terms used for included parameters are presented in Supplementary Table 2.

Measurement-level outcomes

Three types of the control group were used: the vivarium control (VC), where animals lived in a standard laboratory habitat; the ground control (GC), where some or all aspects of space flight excluding microgravity were modeled; the delayed simulation (DS), only seen in primate studies, where spaceflight animals were placed in an earth-based GC habitat several weeks following recovery. When available, we used GC as the comparison group. If multiple GC groups were used, we treat the group that most closely matched flight conditions as the GC. When GC was not available, we used VC or DS as the comparison group. For each individual measurement j, we extracted the mean space flight (SF) values, μSFj, and the mean comparison control (CC) values, μCCj with the corresponding standard errors sej, or standard deviations sdj. If sdj was extracted, it was converted to sej by dividing by the square root of sample size n of the corresponding group, such as nSF for spaceflight and nCC for comparison control. When median P and interquartile range xupper − xlower were given, μj was calculated as μj = (xupper + P + xlower) with: \({\mathrm{se}}_j = x_{{\mathrm{upper}}} - x_{{\mathrm{lower}}}/\sqrt n \times 2.7\). For each measurement, we calculated the percentage difference, θj, between μSFj and μCCj using Eq. (1).

$$\theta _j = \frac{{\mu _{{\mathrm{SF}}_j} - \mu _{{\mathrm{CC}}_j}}}{{\mu _{{\mathrm{CC}}_j}}} \times 100{\mathrm{\% }}$$
(1)

Normalized standard errors SEj were calculated as SEj = sej/μccj. The standard deviation for percentage difference of a single measurement σj was calculated assuming that the SF and CC groups were independent using Eq. (2).

$$\sigma _j = \sqrt {{\mathrm{SE}}_{{\mathrm{SF}}_j}^2 + {\mathrm{SE}}_{{\mathrm{CC}}_j}^2} \times 100{\mathrm{\% }}$$
(2)

Mission-level outcomes

Data for multiple b bones or bone regions presented in one or more studies for the same group of animals were pooled as unweighted averages \(\theta _i = \frac{{{\sum} {\theta _j} }}{b}\) to represent the outcome or effect size of a single mission i. In two instances (Bion M1 and SpaceLab 3) where the data for two animal groups on the same mission were reported separately, they were treated as two independent missions. Equation (3) was used to calculate the overall standard error for each mission.

$${\mathrm{SE}}\left( {\theta _i} \right) = \sqrt {\frac{{{\sum} {\sigma _j^2} }}{{{\sum} {\left( {n_{{\mathrm{SF}}} + n_{{\mathrm{CC}}}} \right)} }}}$$
(3)

Meta-analytic model and global outcome

Since the mission-level data encompass outcomes from many spaceflights performed over a long period of time in multiple animal species, we rejected the fixed-effect model in favor of the random-effects model. However, since individual sample sizes were small (between 4 and 12), the variance is not a representative measure of the better estimate of the mean, making the variance-based weighting scheme biased. Therefore, to calculate the global effect size \({\it{\widehat {\uptheta}}}\), the mission-level outcomes θj were weighted by the sample size of spaceflight animals nSF using Eq. (4).

$${\it{\widehat {\uptheta}}} = \frac{{\mathop {\sum}\nolimits_i {\theta _i \times n_{{\mathrm{SF}}}} }}{{\mathop {\sum}\nolimits_i {n_{{\mathrm{SF}}}} }}$$
(4)

When combining data from multiple articles with differing sample size nSF, the smallest sample size among them was used for global outcome calculations. Global outcomes were calculated for mice, rats, primates, and rodents overall.

To account for heterogeneity between the studies, we adapted the approach developed by Standley and Doucouliagos90, in which we adjusted the pooled standard error by the factor representing the degree of heterogeneity within the dataset. We calculated the adjusted heterogeneity estimator H2 to represent the variability of θi from the global outcome \({\it{\widehat {\uptheta}}}\) within N mission-level outcomes as follows using Eq. (5).

$$H^2 = \frac{{\mathop {\sum}\nolimits_i {\left( {\frac{{\theta _i}}{{{\mathrm{SE}}\left( {\theta _i} \right)}} - \frac{{{\it{\widehat {\uptheta}}}}}{{{\mathrm{SE}}\left( {\theta _i} \right)}}} \right)} ^2}}{{\left( {N - 1} \right)}}$$
(5)

Equation (6) was used to calculate the standard error of the global outcome \({\it{\widehat {\uptheta}}}\).

$${\mathrm{SE}}\left( {\it{{\widehat {\uptheta}}}} \right) = \sqrt {\frac{{H^2}}{N}} \times \root {2} \of {{\frac{{\mathop {\sum}\nolimits_i {\left( {{\mathrm{SE}}\left( {\theta _i} \right)^2 \cdot \left( {n_{{\mathrm{SF}}} - 1} \right)} \right)} }}{{\mathop {\sum}\nolimits_i {\left( {n_{{\mathrm{SF}}} - 1} \right)} }}}}$$
(6)

This meta-analytic model provides the unbiased estimate of the central tendency and conservative estimates for the 95% confidence intervals (CI) which was determined as 95% CI\({\mathrm{CI}} = {\it{\widehat {\uptheta}}} \pm {\mathrm{z}}_{\left( {1 - {\upalpha}/2} \right)} \times {\mathrm{SE}}( {\it{{\widehat {\uptheta}}}}) = {\it{\widehat {\uptheta}}} \pm 1.96 \times {\mathrm{SE}}( {\it{{\widehat {\uptheta}}}})\). To assess the influence of spaceflight associated conditions other than microgravity, we similarly calculated the percentage difference of GC from VC.

Rate of change

To estimate the rate of change per day, we used mission-level outcomes from the parameter with the largest dataset, trabecular BV/TV. For each mission, the percentage difference in trabecular BV/TV was divided by the duration of each mission Days to calculate \(\theta _{i\,{\mathrm{per}}\,{\mathrm{day}}} = \frac{{\theta _i}}{{{\mathrm{Days}}}}\) and \({\mathrm{se}}\left( {\theta _i} \right)_{{\mathrm{per}}\,{\mathrm{day}}} = \frac{{{\mathrm{se}}\left( {\theta _i} \right)}}{{{\mathrm{Days}}}}\), which were then used in the meta-analytic model. Although it is unlikely that changes in bone mass in space occur linearly, with only 2 measurements for each group, any rate estimate other than linear would inevitably result in over-fitting.

Heterogeneity and publication bias analysis

To quantify heterogeneity, we calculated H2 as described above and I2 as \(I^2 = \frac{{H^2 - 1}}{{H^2}}\). To examine the contribution of individual datasets we used single data exclusion analysis when one mission-level outcome was excluded and its effect on heterogeneity on the remaining dataset was calculated; and cumulative data exclusion analysis when multiple mission-level outcomes were excluded in the order of their contributing heterogeneity. To assess publication bias, a funnel plot was used to plot the distribution of the standard errors relative to estimated mission-level outcomes. All the studies were included in the final analysis independent of their contribution to heterogeneity or potential bias.

Additional analysis

We performed subgroup analysis on 11 characteristics: age of animals, the strain of rats, sex of mice, flight duration, individual vs. grouped housing conditions, the space agency, the conditions of ground control, the delay time of SF animal sacrifice, presence of sham operation, the quality score of papers and skeletal region of measurements. For strain, sex, the space agency, ground control, and housing condition, the subgroup analysis was performed by a categorical value for each mission using the mission-level effect size and 95% CI as described above. For continuous values of age of animals, duration of flights, sacrifice delay, and quality score, the missions were divided into 2 groups of approximately equal size for sub-group analysis; or a linear regression against the continuous variable was performed for representative parameters for trabecular and cortical structure and turnover. For the quality score, measurement-level outcomes from a single article were combined to create a paper-level outcome, θp and associated measure of variance SE(θp), replacing mission-level outcomes in subgroup analysis and linear regression. For the skeletal region, measurement-level outcomes were combined. For quality score and bone region analysis, the global effect size \({\it{\widehat {\uptheta}}}\) and standard error \({\mathrm{SE}}( {\it{{\widehat {\uptheta}}}} )\), were estimated using the random-effects model with the Hedges estimator τ for unit weight \(w_i = \frac{1}{{{\mathrm{SE}}\left( {\theta _i} \right)^2 + \tau ^2}}:{\it{\widehat {\uptheta}}} = \frac{{\mathop {\sum}\nolimits_i {\left( {\theta _i \cdot w_i} \right)} }}{{\mathop {\sum}\nolimits_i {\left( {w_i} \right)} }},\,{\mathrm{SE}}\left( {\it{{\widehat {\uptheta}}}} \right) = \frac{1}{{\root {2} \of {{\mathop {\sum }\nolimits_i \left( {w_i} \right)}}}}\)91. Subgroup analysis was only performed on parameters with 6 or more mission-level, paper-level, or measurement-level outcomes.

Outcome reporting

Data are presented as effect size or percentage difference between spaceflight and ground control animals or ground control and vivarium control with lower and upper limits of 95% CI as: ES(%) [lower CI, Upper CI].

Software

Endnote X7 and Rayyan were used for the management of references. WebPlot digitizer was used in data extraction. Numbers (version 4.1.1) were used for data management. R (version 1.1.463) was used for meta-analysis and associated calculations. R (version 1.1.463), JASP (version 0.10), and MATLAB (MATLAB online) were used for initial figure preparation.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.