Bone health in spacefaring rodents and primates: systematic review and meta-analysis

Animals in space exploration studies serve both as a model for human physiology and as a means to understand the physiological effects of microgravity. To quantify the microgravity-induced changes to bone health in animals, we systematically searched Medline, Embase, Web of Science, BIOSIS, and NASA Technical reports. We selected 40 papers focusing on the bone health of 95 rats, 61 mice, and 9 rhesus monkeys from 22 space missions. The percentage difference from ground control in rodents was –24.1% [Confidence interval: −43.4, −4.9] for trabecular bone volume fraction and –5.9% [−8.0, −3.8] for the cortical area. In primates, trabecular bone volume fraction was lower by –25.2% [−35.6, −14.7] in spaceflight animals compared to GC. Bone formation indices in rodent trabecular and cortical bone were significantly lower in microgravity. In contrast, osteoclast numbers were not affected in rats and were variably affected in mice. Thus, microgravity induces bone deficits in rodents and primates likely through the suppression of bone formation.


INTRODUCTION
With plans by NASA to return humans to the lunar surface by 2024 1 and to have the first-ever astronauts journey to Mars within the next 2 decades 2 , in addition to private interests in developing the first human colony on the Martian surface 3 , human space travel will no doubt continue if not increase in the following century. Despite these high ambitions, we still do not fully understand the cause of physiological changes we observe in astronauts who travel to space, one of which is microgravityinduced bone loss 4,5 .
Animals have long been used as models to assess the physiological changes observed as a result of various stimuli and inform their impact on human health. Space-traveling animals have even preceded humans, with several dogs, rodents, and primates being sent to space in the late 1940s-1960s 6 . After developing the necessary technology allowing mammals to survive all phases of spaceflight, beginning in the 1970s animal experiments shifted to focus on the physiological effects of space travel 7 . The information obtained in animal studies significantly augmented our knowledge regarding human adaptations during space travel. Experiments assessing skeletal changes in animals have the benefit of the collection of bone biopsies, which is absent in astronaut studies. These biopsies have allowed for an investigation into changes to cellular and molecular components of bone associated with microgravity, and thus provide further insight into the underlying mechanisms of microgravity-induced changes in bone health. These missions however come at a considerable price, and it has been estimated that NASA spent $1.2 billion per launch over the period from 1982 to 2010 8 , therefore it is critically important to gain as much knowledge as possible from all the space experiments.
Even with the benefits of animal studies, and a significant expense associated with their execution, these experiments have not yet been used for the purposes of quantitative data synthesis.
To overcome the problems associated with small sample sizes and a high degree of variability between individual missions we employed meta-analysis to improves the statistical power of all the studies. Thus, the objectives of this study were to (i) to systematically identify all the published literature regarding bone health in vertebrate animals that were part of experiments performed in space; (ii) use a meta-analytic approach to quantitatively characterize space-related changes to bone architecture and turnover in animals, (iii) identify cofounding variables associated with changes in bone health.

Overview of relevant studies
The systematic search describing the overlap of space travel, animals, and bone executed in Medline, Embase, Web of Science, and BIOSIS, together with the 9 reports found via manual searches of the NASA Technical Report Server and the compendium of animal and cell spaceflight experiments compiled by Ronca et al. 9 resulted in the identification of 1128 candidate articles (Fig. 1a). Of these, 340 articles focused on bone, while the rest discussed a range of physiological systems potentially relevant to bone health, including skeletal muscles, metabolism, and developmental issues (Fig. 1b). The majority of studies (83%) described findings in rats (664/1128), mice (181/1128), and primates (96/1128) (Fig. 1c). The number of papers describing animals in space peaked in the 1990s (Fig. 1d). From the 1970s through the 2000s, rats were the main spacefaring animal model. Interest in primates peaked in the 2000s, however, in the last decade mice have become the predominant animal model studied in space (Fig. 1e). Considering the available data, the full-text screen focused on 340 studies describing bone health in rodents and primates and identified 63 studies that presented quantitative measurements of trabecular and cortical bone architecture or bone turnover (Table 1)   . After excluding studies that reported data on treated animals, reported duplicate data, or demonstrated unclear reporting (Supplementary  Table 3), 40 articles were selected for the final meta-analysis: 23 describing rats 10-32 , 12 describing mice [33][34][35][36][37][38][39][40][41][42][43][44] , 4 describing primates [45][46][47][48] , and 1 describing both mice and primates 49 . The final dataset included a total of 95 rats, 61 mice, and 9 primates (rhesus macaque monkeys) flown to space on 22 missions (Table 2).
Heterogeneity, bias, and the meta-analytic model Statistical heterogeneity was moderate to high (I 2 > 46%) for all the extracted parameters for spaceflight-related changes except for bone marrow area (I 2 = 14.4%) and cortical bone area (I 2 = 0%). Single mission exclusion analysis identified some mission-level outcomes removing which reduced the overall heterogeneity, however, no single mission influenced the heterogeneity of more than one parameter or the global outcome for Tb.BV/TV or Tb.N parameter datasets (Fig. 2a, Supplementary Fig. 1a). Cumulativemission exclusion analysis demonstrated that exclusion of >21% of missions led to a homogeneous (I 2 ≤ 30%) dataset, and that the overall outcomes for Tb.BV/TV and Tb.N were not affected by decreased heterogeneity (Fig. 2b, Supplementary Fig. 1b). The funnel plot demonstrated symmetrical distribution (Fig. 2c, Supplementary Fig. 1c). We assessed the quality of individual papers on a 25-point scale (Supplementary note 2) and examined if quality score affected the reported paper-level variance (Fig. 2d, Supplementary Fig. 1d) or effect size (Fig. 2e), however, no significant association of quality score with reported outcomes was observed. Subgroup analysis further demonstrated no difference between papers with low (<20) and high (≥20) quality scores ( Supplementary Fig. 2). We conclude that the publication bias is negligible within this dataset. To account for low sample sizes, as well as heterogeneity, we used the modified sampling by size method 73 for further analysis.

Changes in trabecular parameters during spaceflight
Many studies included two types of control-vivarium control (VC), where animals lived in a standard laboratory habitat, and the ground control (GC), where some or all aspects of spaceflight other than microgravity, such as physical enclosure, diet and lift off and re-entry forces, were simulated. We examined the percentage difference in spaceflight compared to GC, as well as in GC compared to VC. Of the 6 parameters describing trabecular bone: trabecular bone volume fraction (Tb.BV/TV), thickness (Tb.Th), number (Tb.N), separation (Tb.Sp), connective density, and Total BV/TV; Tb.BV/TV was significantly lower in spaceflight mice and rats compared to ground control, and Tb.Th was significantly reduced for mice (Fig. 3a, b left). For rodents overall, Tb.BV/TV and Tb.Th changed significantly by −24.1% [−43.4, −4.9] and −9.0% [−12.9, −5.2], respectively. Tb.N, Tb.Sp, and connective density demonstrated trends towards poor bone health in spaceflight mice and rats, however only the change in Tb.N reached statistical significance (Fig. 4a-c). Total BV/TV, which was measured in flat bones and in one case vertebra, did not change due to spaceflight (Fig. 4d). When ground and vivarium controls were compared, Tb. BV/TV, Tb.N, and Tb.Sp were unaffected, but Tb.Th was significantly lower in GC compared to VC (Fig. 3a, b right, Supplementary Tables 6, 7), suggesting that flight conditions other than microgravity may contribute to a reduced Tb.Th. In all trabecular parameters in rodents, heterogeneity was moderate to high, I 2 > 46%. Trabecular parameters were measured in 4 primates on missions Bion 10 and 11, and demonstrated significantly lower Tb.BV/TV, a trend to reduced Tb.N, and Tb.Th, and a trend to higher Tb.Sp compared to GC (Table 3). Thus, there was a deficit in trabecular bone in rodents and primates after the spaceflight. Comparison of ground and vivarium controls was available for only two missions for all osteoblast parameters except for Ob.S, for which GC and VC were not significantly different (Supplementary  Tables 10-13). Heterogeneity for osteoblast parameters was moderate to high I 2 > 50%. The osteoblast parameters were from trabecular skeletal regions, except for missions Cosmos 936 (Ob.S) and Cosmos 1667 (OS and O.Th), in which the measurements were from endocortical surface of the tibia diaphysis and metaphysis, respectively, and excluding these data resulted in homogeneous datasets for Ob.S and O.Th (I 2 = 0), but not for OS (I 2 = 74.9%). In contrast to osteoblast parameters, changes in osteoclasts were inconsistent. Osteoclast surface (Oc.S) in SF mice demonstrated large study level increases in 2 of 3 datasets, however, it was unaffected in SF rats (Fig. 6a left). Osteoclast number (N.Oc) was higher in the one group of SF mice where it was measured, and was not significantly affected in spaceflight rats (Fig. 6b left). Moreover, comparing ground and vivarium controls demonstrated strong (10-70%) tendencies for study level increases (Fig. 6 right). Although the overall effect size for GC vs. VC comparisons only reached statistical significance for Oc.N, these data suggest that in rodents osteoclasts may be affected by spaceflight conditions other than microgravity. Heterogeneity was high for Oc.S and N. Oc datasets. The osteoclast parameters were from trabecular skeletal regions, except for missions Cosmos 936 (N.Oc) and one of the bones for mission Bion M1 (Oc.S); excluding these data did not significantly change the outcome. Thus, osteoclast parameters were unaffected in rats and variably affected in mice.

Change in cortical bone parameters during spaceflight
Cortical parameters analyzed were bone marrow area (Ma.Ar, which included data on bone marrow diameter (Ma.Dm) transformed as π(d/2) 2 ), cortical area (Ct.Ar), and thickness (Ct. Th). Ma.Ar and Ct.Th did not significantly differ between SF and GC in mice and rats, while Ct.Ar was significantly lower in spaceflight mice and rats (Fig. 7a-c). GC did not significantly differ from VC for any cortical parameter (Supplementary Tables 14-16). The heterogeneity for cortical parameters was low, I 2 < 15%, except for Ct.Th which showed high heterogeneity, I 2 = 90.7%. The datasets of Ma.Ar, Ct.Ar, and Ct.Th are composed of measures taken in the diaphysis of long bones except for missions STS-131 (femoral neck), Bion M1 (animal group 1, ankle bones, and calcaneus), and SpaceX CRS-10 (rib). Removing these biological outliers did not change effect size and resulted in a homogeneous dataset for Ma.Ar with I 2 = 0%. Cortical thickness measured in 4 primates was not significantly affected by spaceflight (Table 3). Thus, spaceflight resulted in cortical bone deficits, however, it was affected to a smaller degree compared to trabecular bone.  Change in cortical bone turnover during spaceflight Only measures of bone formation rate (BFR) and mineral apposition rate (MAR) from the cortical bone surface in the diaphysis of long bones were included in the analysis. This resulted in the exclusion of measures of MAR and BFR in the pelvis and thoracic vertebrae from STS-78 31 , and in the humeral metaphysis from STS-52 and non-included mission STS-41 50 Tables 17, 18). Heterogeneity was moderate to high for these parameters, I 2 > 52%. When long bone measurements of MAR and BFR taken on the periosteal and endocortical surfaces were separated, we found that the reductions in MAR and BFR were only significant on the periosteal surfaces (Fig. 7f). Thus, bone formation on periosteal surfaces of cortical bone appears to be more affected by microgravity.
Effects of covariates on spaceflight related changes in animal bone health We next examined the contribution of covariates to the overall outcomes using sub-group analysis and meta-regression. First, we examine if animal characteristics, such as age, sex, and strain affect the overall outcome. Using linear regression analysis, we have found that rodent age was weakly associated with changes in osteoblast surface and cortical area, but not with Tb.BV/TV (Fig.  8a). Subgroup analysis further demonstrated that in animals 10 weeks of age or older, larger changes were observed in Tb.N, Ob.S, and Oc.S, while Ct.Th was more affected in younger animals ( Supplementary Fig. 3a). When we compared trabecular parameters in instances when both primary and secondary spongiosa of a single bone were analyzed, we observed that changes in Tb. BV/TV, Tb.N, and Tb.Th in secondary spongiosa was greater than in primary spongiosa (Fig. 8b). Animal sex or strain did not significantly affect the outcome ( Supplementary Fig. 3b, c). Next, we examined if mission-related differences affected the outcome. Spaceflight duration did not significantly correlate with changes in Ob.S and Ct.Ar but was weakly associated with changes in Tb.BV/TV when assessed using meta-regression analysis (Fig. 8c). Moreover, subgroup analysis by mission durations shorter or longer than 2 weeks, demonstrated no significant difference for any parameters (Supplementary Fig. 4a).
To estimate the rate of accumulation of bone deficits in space, we divided individual outcomes of our largest parameter dataset, Tb. BV/TV, by the mission duration. Although not statistically significant, the deficits in Tb.BV/TV per day was smaller in long spaceflights than in short spaceflights (Fig. 8d). We estimated the rate of accumulation of trabecular bone deficits as −1.7%/day [−3.5, 0.2], or −1.0%/day [−1.7, −0.4] when taking into account only long-duration missions. We also assessed if individual vs. group housing affects the outcomes, however, no differences were found except for Tb.N, which changed significantly greater when animals were housed individually ( Supplementary Fig. 4b).
Comparing outcomes by the space agency, we determined no significant difference between space agencies ( Supplementary  Fig. 4c).
Study-related differences included measurement techniques, presence of sham operation, sacrifice delay, and ground control conditions. For all trabecular and cortical architectural parameters, the division of measurement technique (Histology vs. μCT) coincided with the species difference of rats and mice preventing us from conducting any further meaningful subgroup analysis. In sham-operated rodents Tb.Th was affected significantly less than in naïve animals ( Supplementary Fig. 5a). The sacrifice delay did not significantly affect the outcomes in subgroup analysis, although the change in Ob.S was associated with prolonged sacrifice delay in meta-regression analysis (Supplementary Figs. 5b, 6). When ground control groups were divided by the degree to which they mimic the environmental conditions of spaceflight other than the microgravity, we observed no association between the fidelity of the GC and spaceflight-induced changes, suggesting that they were primarily driven by microgravity ( Supplementary  Fig. 6c).
In astronauts, bone loss is strongly affected by its position in relation to the gravitational vector 5,74 . To assess if a similar trend is present in rodents, we performed a sub-group analysis of bones from different regions: region 1 that included calvaria, vertebrae, ribs, and sternum; region 2 with pelvis, humerus, and femur; and region 3 with tibia and ankle bones (Fig. 8e left). Changes in trabecular parameters were larger in bones located more distal from the axial skeleton ( Fig. 8e right), however, the mean effect sizes were not significantly different between the regions. Among other parameters, Ct.Th, Ob.N, OS, and BFR demonstrated significant changes only in regions 2 and/or 3, while changes in Ob.S were only significant in regions 1 and 2 ( Supplementary Fig.  7). These data suggest that bone position in relation to the gravitational vector may be important for rodents, however, targeted studies investigating these relationships would be required.

DISCUSSION
We systematically reviewed and quantitatively synthesized the literature on bone health in space-faring rodents and primates. We report that bone mass is lower in spaceflight rodents and primates, with indications that microgravity is the driving factor inducing bone deficits. Deficits in trabecular bone were larger than in cortical bone and subgroup analysis suggested that distal skeleton was affected more than axial. Osteoblast indices in rodent trabecular bone were significantly lower, however, osteoclast numbers were not affected in rats, and were variably affected in mice. Even though the degree of bone deficit was found to poorly correlate with mission duration, the rate of accumulation of trabecular bone deficit was estimated as −1.7% [−3.5, 0.2] per day, which is much higher than the estimates of bone loss available for humans. Taken together, our data indicate that microgravity induces bone deficits in rodents and primates, and the data suggest that the prevalent mechanism is suppression of bone formation.
We have found that during the 4-39 days space missions rodents accumulated a deficit of −24.1% [−43.4, −4.9] in trabecular bone tissue, which translates to the rate of 1.7% of trabecular bone deficit per day. In the much smaller dataset for primates, the bone deficit after 11.5-14 day missions was equally high, −25.2% [−35.6, −14.7] or 1.9% per day. These estimates for trabecular bone deficits in spaceflight rodents and primates are much greater than estimates of bone loss for astronauts which have been reported as 0.7-2.7% per month 4,5,75,76 . Nevertheless, similar deficits of 15-50% in tibial and femoral trabecular bone volume were reported in 2-4 week-long hindlimb unloading studies in rats and mice [77][78][79][80][81] , which can be recalculated to 1.1-3.5% per day. We observed that no single parameter was strongly associated with mission duration. In astronauts, changes to bone were also highly variable for missions less than 30 days in duration 5 . Of spaceflights studying bone in rodents, only 3 missions were longer than 30 days, one of which (Mice Drawer System (MDS)) was excluded since of the 3 wild type mice aboard, only 1 returned to Earth alive 70 , preventing us from extracting meaningful quantitative data from it. Thus, continuous measurements of bone parameters in longer missions (>30 days) are required to determine the dynamic association between the duration of exposure to microgravity and bone health.
We have identified several instances of notable regional differences in bone response to microgravity. First, we have found that the deficits in trabecular bone were much greater than those in cortical bone in space-traveling rodents. Similarly, higher , and the number of spaceflight animals (n SF ) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to n SF . Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I 2 , and H 2 are given for rodents. Asterisk (*) indicates missions wherein GC was not present, and SF was compared to VC. deficits in trabecular bone compared to cortical were reported in studies of hindlimb unloaded rats 82 , as well as in astronauts 75 . In cortical bone, bone formation was only significantly suppressed on the periosteal surface, which is supported in the observation that Ct.Ar, but not Ma.Ar, was significantly lower in spaceflight rodents. Similar changes in cortical bone formation were observed in hindlimb unloading studies in mice 82 . Within trabecular bone, we found that rodents exhibited relatively greater deficits in secondary spongiosa compared to primary spongiosa. Secondary spongiosa was also found to be more affected compared to primary in rats after hindlimb unloading 61,83,84 . However, in the model of immobilization due to sciatic denervation, the bone loss was isolated to primary spongiosa 85 . Of interest, we also observed a weak association of osteoblast suppression and cortical bone loss with older age in space-traveling rodents. In contrast, extensive and well-controlled studies of the impact of age on bone health in hindlimb unloaded rats reported the opposite trend-higher bone deficits in younger animals 77,80 . In this regard, it is important to note that the oldest spaceflight rodents were relatively young, 20 weeks of age at the start of the mission, and therefore more studies are needed to fully understand the impact of age on bone health in space. Similarly, even though dramatic sex-related differences were reported in hindlimb unloaded rats 81 , the effect of sex was poorly investigated in spaceflight animals, with no data available for female rats or primates, and only some mouse studies reporting changes in females.
In humans, the significant association between bone loss and the bone position relative to the gravitational vector was identified 5,74 . Although it is more difficult to account for an equivalent gravitational vector in rodents, we attempted to assess the regional difference in bones of rodents assuming their quadrupedal movement. We have found that similar to humans, in rodents, distal skeletal regions exhibited a trend of increased trabecular bone deficits compared to axial skeletal regions. Furthermore, in two mouse studies that measured total BV/TV of the calvariae 36,40 , an increase was reported. These data suggest that local factors, including microgravity-induced redistribution of body fluid 86 , or change in mechanical environment 87 likely contribute to poor bone health.
We demonstrate that spaceflight is associated with strong inhibition of bone formation in rats, mice, and primates, while osteoclast indices were not affected in rats, variably affected in mice, and not reported in primates. In contrast, in astronauts, resorption was found to rise rapidly, reaching a sustained 2-fold increase for the duration of the spaceflight, while formation was decreased or unchanged at the beginning of the mission after which it gradually increased over time 5 . However, the direct comparison between animal and human data is difficult due to important methodological differences in data acquisition. While in animals bone turnover is predominantly assessed histologically at the end of the space mission, in humans, biochemical markers of bone formation and resorption are measured in serum or urine, allowing for assessment during the spaceflight mission. Importantly, most histological markers only indicate the change in bone cell numbers, while circulating markers reflect both changes in the number and function of bone cells. Nevertheless, we believe that the data conclusively indicate that bone formation is inhibited in animals during spaceflight, because indices related to osteoblast numbers (osteoblast numbers and surface), and histomorphometric measures of osteoblast function (osteoid surface and thickness, mineral apposition rate, and bone formation rate), were lower in spaceflight rodents or primates. In contrast, bone Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I 2 , and H 2 are given for rodents. Asterisk (*) indicates missions wherein GC was not present, and SF was compared to VC. For mission-level effect sizes and 95% CI, refer to Supplementary Tables 6-9.
resorption data for spaceflight animals is less consistent and more difficult to compare to humans. Osteoclast numbers or surfaces uniformly did not change in rats, while in mice missions STS-131 and Bion M1 reported strong increases in osteoclast number and surface, but mission STS-108 demonstrated no change. Osteoclast function was assessed using circulating markers in two missions: in mission STS-108, that reported no change in osteoclast number, circulating TRAP5b was higher 33 ; and in mission STS-118, a 13 days mission with mice for which no histological osteoclast data is available, circulating TRAP5b did not change 34 . Thus, although the data suggest that there may be a difference in the response of bone cells to microgravity between rodents and humans, and/or between mice and rats, we are limited by different nature of measurements in animals and humans, and small sample size for mice. Therefore, more experiments assessing both bone cell numbers and function, especially for osteoclasts, are required to understand the spaceflight-induced changes in bone turnover.
This study has attempted to quantitatively integrate nearly 50 years of bone research in spacefaring animals. The limitations of this analysis included i) the differences in the design of experiments in individual missions, ii) inconsistent reporting, and iii) the need to meta-analytically combine data performed using different protocols over a large time interval. Experimental design of individual missions evolved with time, however notably, there was little data from spaceflights longer than 30 days, and there were no inflight measures of bone turnover or quality, which prevented us from assessing the long-term and dynamic effects of microgravity on the animal bone. Of specific importance for animal experiments, is the design of the ground control, which aimed to model the parameters of spaceflight other than microgravity, which was vastly different between missions. While this resulted in a limitation of comparing experimental groups to very different controls, it also allowed us to perform a preliminary assessment of the relative effects of stressors associated with spaceflight other than microgravity. Since the extent of modeling the stressors in ground control groups was not associated with differential bone deficits, we concluded that microgravity is the main driver of these changes. The most rigorous control for the specific effects of microgravity was in-space artificial gravity, which was performed during three missions, Cosmos 936, SpaceX CRS-9, and CRS-12. When the in-flight 1 g group was used as a "ground control", the effect sizes for bone changes were not smaller than in missions with ground controls of lower fidelity. In addition, for Cosmos 936 which also had an associated vivarium control group, ground to vivarium control effect sizes and 95% CI were not significantly different from other ground to vivarium control comparisons, altogether suggesting that microgravity is the driving factor for bone loss in space. Nevertheless, we did identify several parameters, including trabecular thickness, and  and 95% CI, the size of the square is proportional to n SF . Overall effect size (%) and 95% CI are indicated by diamonds for rats and rodents, I 2 , and H 2 are given for rodents.
osteoclast surface, and number that appears to be specifically affected in ground control compared to vivarium control groups, suggesting that other spaceflight-associated factors may contribute to those changes. The second set of limitations was relevant to data reporting in the manuscripts. In multiple instances, inconsistent reporting of animal treatment between papers reporting the same mission was observed. Rodent death was not uncommon during spaceflight, however, it was infrequently reported, even though it reflects the stressful conditions during a particular mission, which then could not be accounted for in our analysis. Specifications regarding bone surfaces analyzed in addition to control and spaceflight animal treatment/housing were often vague making categorizing for subgroup analysis difficult. In addition, the degree of movement, which has the potential to affect bone health 4 , was never reported in the included articles in rodents. This represents a significant shortcoming in reporting of the outcomes of animal experiments in space, since for several missions animal behavior data has been collected 88 . Therefore, similar to human studies 89 , improving reporting practices of animal experiments by the Space Life Sciences Programs is critically important. The third set of limitations was related to performing a meta-analysis on studies completed over a considerable interval of time with vastly different protocols. This resulted in our dataset being moderate to highly heterogeneous for 15 out of the 17 parameters. While we attempted to identify all possible factors that may account for the high degree of heterogeneity, no single factor accounted for a major amount of variation in any of the measured outcomes. Since our analysis indicates low publication bias, high heterogeneity likely reflects the multifactorial nature of microgravity-induced bone changes, which can only be investigated through the analysis of larger datasets.
In conclusion, we demonstrate that meta-analysis of animal spaceflight data provides important additional information regarding the effect of microgravity on animal physiology, in particular allowing to perform comparative studies, which otherwise are financially and technologically challenging. Our studies on animals and humans 5 demonstrate that microgravityinduced deterioration of bone health is a complex phenomenon, with strong regional and temporal differences, as well as potentially different mechanisms of adaptation in different species. In the future, longer missions with planned in-flight data collection are needed to understand the dynamics of changes in bone tissue and especially bone turnover, which appears to be different between humans and rodents. For nonhuman animals, in particular, it is also important to relate the changes in bone to the movement patterns and activity, which are rarely provided in bone health-focused studies. The quantitative estimates of spaceflight-related changes in bone health provided by our study will inform future studies and help in determining the underlying mechanisms of observed effects.

METHODS
This study was compliant with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement. Refer to Supplementary Table 1 for PRISMA Checklist.  , and the number of spaceflight animals (n SF ) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to n SF . Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I 2 , and H 2 are given for rodents.

Search strategy, inclusion criteria, and quality assessment
NASA Technical Report Server and articles referenced the compendium of animal and cell spaceflight experiments compiled by Ronca et al. 9 was performed. Studies in any language were considered. Title and abstract screening for the original search was performed independently by S.D.C. and S.F.C., and for the update by S.V.K. The inclusion criteria were that the article describes any vertebrate species that was taken on a space mission. Studies describing invertebrate animals, humans, or Earth-based spaceflight simulations were excluded. After intermediate analysis, only studies describing spaceflight results for mice, rats, and primates were included in full-text screening for quantitative measurements related to bone health, which was performed by S.D.C., S.F.C., and M.G. for the initial search and by S.V.K. and M.G. for the update. In the final meta-analysis, we included the studies that presented quantitative measurements of trabecular and cortical architecture or bone turnover for bones of axial and appendicular skeleton excluding facial bones. Animals that were pregnant, or received surgery other than a sham, abnormal diet, or hormone supplements, were excluded. Papers presenting average data without a measure of variation were excluded. Included papers were scored for reporting quality (Supplementary note 2), if two different species were reported in a single paper, they were scored independently.

Data extraction
For studies included after abstract/title screening, the year of publication, animal species, and physiological system studied were recorded. For studies that were included in a meta-analysis the following data were independently extracted by M.G. and S.F.C. and verified by J.F.: name and duration of the mission, animal species; animal sample size (n) of spaceflight, ground control, vivarium control, and delayed simulation (when applicable); bone and bone region being measured; and mean, the median and median percent difference in the 18 bone health parameters (Table 1); standard deviations, standard errors of the mean and/or interquartile ranges; day or range of days when measurements were performed. If the type of measure of the dispersion was not stated, it was assumed to be a standard error, which ensures a conservative estimate. If a range of sample sizes was reported, the smallest value was extracted. Extracted study characteristics for covariate analysis included: animal Fig. 7 Forest plot of spaceflight induced changes to cortical bone parameters. a-e Changes in bone marrow area (a), cortical bone area (b), cortical thickness (c), as well as bone formation rate (d) and mineral apposition rate (e) for the diaphyses of long bones, of space flight animals (SF) compared to ground control animals (GC). For each indicated species, missions are sorted by mission year (old to new); duration of spaceflight (Days), and the number of spaceflight animals (n SF ) are indicated. Square/line: effect size (%) and 95% CI, the size of the square is proportional to n SF . Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents, I 2 , and H 2 are given for rodents. *missions where SF was compared to VC. #mission where Ma.Ar was derived from average marrow diameter (Av.Ma.Dm) as Ma.Ar = π(Av.Ma. Dm/2) 2 . f Change in BFR and MAR on the periosteal and endocortical surface of long bones in SF compared to GC. The number of measurements (N j ) is indicated. Square/line: overall effect size (%) and 95% CI. strain, age, sex, spaceflight group sacrifice delays, single vs. grouped spaceflight habitat, the space agency, treatment conditions of the ground control group, and the presence of sham operations. The information regarding a specific mission was pooled from all applicable articles. When different data for apparently identical samples were presented in two papers, we included the data from the study with the higher quality score. For spaceflight group sacrifice delay, if a range of time was given, the largest time interval was used. Alternate terms used for included parameters are presented in Supplementary Table 2.

Measurement-level outcomes
Three types of the control group were used: the vivarium control (VC), where animals lived in a standard laboratory habitat; the ground control (GC), where some or all aspects of space flight excluding microgravity were modeled; the delayed simulation (DS), only seen in primate studies, where spaceflight animals were placed in an earth-based GC habitat several weeks following recovery. When available, we used GC as the comparison group. If multiple GC groups were used, we treat the group that most closely matched flight conditions as the GC. When GC was not available, we used VC or DS as the comparison group. For each individual measurement j, we extracted the mean space flight (SF) values, μ SFj , and the mean comparison control (CC) values, μ CCj with the corresponding standard errors se j , or standard deviations sd j. If sd j was extracted, it was converted to se j by dividing by the square root of sample size n of the corresponding group, such as n SF for spaceflight and n CC for comparison control. When median P and interquartile range x upper − x lower were given, μ j was calculated as μ j = (x upper + P + x lower ) with: se j ¼ x upper À x lower = ffiffiffi n p 2:7. For each measurement, we calculated the percentage difference, θ j , between μ SFj and μ CCj using Eq. (1). Overall effect size (%) and 95% CI are indicated by diamonds for mice, rats, and rodents (black), rodents on short duration (dark blue), and long duration (dark red) missions.
Normalized standard errors SE j were calculated as SE j = se j /μ ccj . The standard deviation for percentage difference of a single measurement σ j was calculated assuming that the SF and CC groups were independent using Eq. (2).

Mission-level outcomes
Data for multiple b bones or bone regions presented in one or more studies for the same group of animals were pooled as unweighted to represent the outcome or effect size of a single mission i. In two instances (Bion M1 and SpaceLab 3) where the data for two animal groups on the same mission were reported separately, they were treated as two independent missions. Equation (3) was used to calculate the overall standard error for each mission.

Meta-analytic model and global outcome
Since the mission-level data encompass outcomes from many spaceflights performed over a long period of time in multiple animal species, we rejected the fixed-effect model in favor of the random-effects model. However, since individual sample sizes were small (between 4 and 12), the variance is not a representative measure of the better estimate of the mean, making the variance-based weighting scheme biased. Therefore, to calculate the global effect size b θ, the mission-level outcomes θ j were weighted by the sample size of spaceflight animals n SF using Eq. (4).
When combining data from multiple articles with differing sample size n SF , the smallest sample size among them was used for global outcome calculations. Global outcomes were calculated for mice, rats, primates, and rodents overall.
To account for heterogeneity between the studies, we adapted the approach developed by Standley and Doucouliagos 90 , in which we adjusted the pooled standard error by the factor representing the degree of heterogeneity within the dataset. We calculated the adjusted heterogeneity estimator H 2 to represent the variability of θ i from the global outcome b θ within N mission-level outcomes as follows using Eq. (5).
Equation (6) was used to calculate the standard error of the global outcome b θ.
To assess the influence of spaceflight associated conditions other than microgravity, we similarly calculated the percentage difference of GC from VC.

Rate of change
To estimate the rate of change per day, we used mission-level outcomes from the parameter with the largest dataset, trabecular BV/TV. For each mission, the percentage difference in trabecular BV/TV was divided by the duration of each mission Days to calculate θ i per day ¼ θi Days and se θ i ð Þ per day ¼ se θi ð Þ Days , which were then used in the meta-analytic model. Although it is unlikely that changes in bone mass in space occur linearly, with only 2 measurements for each group, any rate estimate other than linear would inevitably result in over-fitting.

Heterogeneity and publication bias analysis
To quantify heterogeneity, we calculated H 2 as described above and I 2 as To examine the contribution of individual datasets we used single data exclusion analysis when one mission-level outcome was excluded and its effect on heterogeneity on the remaining dataset was calculated; and cumulative data exclusion analysis when multiple missionlevel outcomes were excluded in the order of their contributing heterogeneity. To assess publication bias, a funnel plot was used to plot the distribution of the standard errors relative to estimated mission-level outcomes. All the studies were included in the final analysis independent of their contribution to heterogeneity or potential bias.

Additional analysis
We performed subgroup analysis on 11 characteristics: age of animals, the strain of rats, sex of mice, flight duration, individual vs. grouped housing conditions, the space agency, the conditions of ground control, the delay time of SF animal sacrifice, presence of sham operation, the quality score of papers and skeletal region of measurements. For strain, sex, the space agency, ground control, and housing condition, the subgroup analysis was performed by a categorical value for each mission using the mission-level effect size and 95% CI as described above. For continuous values of age of animals, duration of flights, sacrifice delay, and quality score, the missions were divided into 2 groups of approximately equal size for sub-group analysis; or a linear regression against the continuous variable was performed for representative parameters for trabecular and cortical structure and turnover. For the quality score, measurement-level outcomes from a single article were combined to create a paper-level outcome, θ p and associated measure of variance SE(θ p ), replacing mission-level outcomes in subgroup analysis and linear regression. For the skeletal region, measurement-level outcomes were combined. For quality score and bone region analysis, the global effect size b θ and standard error SEð b θÞ, were estimated using the random-effects model with the Hedges estimator τ for unit weight w i ¼ analysis was only performed on parameters with 6 or more mission-level, paper-level, or measurement-level outcomes.

Outcome reporting
Data are presented as effect size or percentage difference between spaceflight and ground control animals or ground control and vivarium control with lower and upper limits of 95% CI as: ES(%) [lower CI, Upper CI].