Main

Space biology research focuses on answering fundamental mechanistic questions about how molecular, cellular, tissue and whole organismal life responds to the space environment. Biological stressors of spaceflight include ionizing radiation, altered gravitational fields, accelerated day–night cycles, confined isolation, hostile closed environments, distance–duration from Earth1, planetary dust regolith2,3, and extreme temperatures and atmospheres4,5. Moreover, spaceflight stressors are probably compounded and amplified with increasing time in space and distance from Earth1,6. Understanding, predicting and mitigating these changes at all levels of biology is increasingly important, given the deep space exploration goals of the National Aeronautics and Space Administration (NASA) towards cis-lunar and Mars missions. Ultimately, the goal of space biology research is to extend beyond an understanding of how extraterrestrial conditions affect life, to enable bioengineered solutions for sustained life on the Moon, Mars and during deep space missions beyond low Earth orbit (LEO)7.

In this Review, we present findings from the ‘Workshop on Artificial Intelligence and Modeling for Space Biology’ organized by NASA in June 2021, which sought to map the roles of artificial intelligence (AI), machine learning (ML) and biological computational modelling in the field of space biology research over the next decade8. On the basis of mathematical principles and computer science, AI and ML methodology trains algorithms to predict outcomes and probabilities of interest9,10,11. A parallel review article reviews the workshop participants’ recommendations regarding the roles of AI and ML for astronaut, ecosystem and precision space health12.

Workshop participants highlighted three main near-term focus areas, which will be discussed in the following sections. First, fundamentally transformative approaches leveraging AI and ML will be needed to automate biology experiments in settings beyond LEO. These approaches must facilitate the generation and analysis of reproducible datasets that incorporate multiple types of measurement to achieve a comprehensive characterization of organismal responses to a variety of extraterrestrial conditions. Such datasets can then be used for robust predictive modelling of spaceflight responses at every biological level. Second, discussion centred on the need for data management standards to ensure AI readiness and open-source data availability and organization. Workshop participants emphasized the importance of supporting an open-science culture and approach in space biology, which aims to promote transparency, inclusivity, data sharing and data access for reproducibility13, as well as ensuring FAIR data management practices14,15 (that is, findable, accessible, interoperable and reusable). Finally, workshop participants agreed on a set of existing AI and ML methods with tremendous promise for space biology applications, as well as near-term development approaches for novel AI and ML methods designed specifically for space biology challenges.

On the basis of the workshop discussions, this report proposes a widespread implementation of AI and ML methods at every level of space biology research (from ground to spaceborne research). This effort has the potential to revolutionize the breadth and depth of our knowledge in two central ways: (1) self-driving labs to enable efficient, automated and maximally autonomous experimentation and data collection in space research environments; and (2) by assisting management, analysis, modelling and interpretation of current and future space biology datasets.

Space biology research data

Space biology research leverages spaceflown and ground-analogue experiments using model organisms to understand space impacts on increasingly complex life. Experimental models include unicellular organisms (for example, prokaryotic, eukaryotic, yeast, fungi), tissue-on-a-chip models16, invertebrates (for example, Drosophila melanogaster, Caenorhabditis elegans, tardigrades), simple model plants (for example, Arabidopsis thaliana), vertebrates (for example, mice, rats, fish), and crops and edible plants1,16,17. Model organism research is key to translational science, with the resulting evidence influencing the direction of human health research and driving the design of life support systems18.

At the molecular and cellular levels, space biology experiments seek to characterize all possible spaceflight-induced changes in cell morphology19, development and differentiation20, protein regulation21, epigenetic processes22, and gene expression23, among others24. Organ-level modelling systems such as tissue-on-a-chip models are used to study shifts in cellular organization and communication25,26. The current understanding of biological responses to spaceflight incorporates experimental evidence from a variety of data types along hierarchical biological levels, from molecular to single cell to whole organism (Fig. 1). At the cellular level in both humans and rodents, six fundamental responses to spaceflight have been well characterized, including increased oxidative stress, DNA damage, mitochondrial dysregulation, telomere length, and epigenetic, metabolic and microbiome changes1. These responses have been linked to a variety of physiological effects, including cardiovascular dysregulation, central nervous system impairments, bone loss and immune dysfunction1.

Fig. 1: Multi-hierarchical levels of space biological research and data.
figure 1

Space biology research seeks to characterize the effects of spaceflight on living systems across hierarchical biological levels. Our current understanding of the biological responses to spaceflight incorporates multiple types of evidence at the cellular, tissue and whole organism level.

The majority of current space biology knowledge originated through ground-analogue experiments27,28, satellites in LEO17, and from experiments on the Space Shuttle29 or the International Space Station (ISS)30,31. To facilitate NASA’s goals of human exploration beyond LEO to the Moon and Mars, space biological research is now focusing on characterizing the risks for deep space travel for mammalian, plant and microbial life. For example, the successful Artemis 1 launch in 2022 saw the deployment of NASA’s BioSentinel experiment, which sent yeast cells to heliocentric orbit in an automated microfluidic culturing device aboard a CubeSat to measure the effects of deep space radiation32,33. Experimental platforms such as BioSentinel that are sent beyond LEO must be robust to several limitations, including long transit times, extreme environmental conditions, limited crew availability and limited sample return. Small experimental platforms such as CubeSats must also have the capability to generate their own power and control their internal environment for temperature, carbon dioxide levels and so on. Conducting biological research beyond LEO will require advanced technological design not fully developed yet, which will be resilient to space conditions and have limited communication with Earth. Such technology needs to enable partly or fully automated experiments, together with continuous environmental monitoring, and in situ data processing and analysis. Although aspects of these capabilities exist on Earth, there are major technology gaps that must be resolved before routine experimentation with relevant biological models can take place beyond LEO. Due to the necessarily automated and in situ nature of future biological experimentation beyond LEO, it follows that AI and ML will have essential roles in enabling these platforms.

Automated experiments in space supported by AI

Space biology research and data analysis have benefited from innovations in increasingly efficient and sensitive research technologies34,35. In the broader field of biological research, next-generation sequencing platforms36, big data frameworks and computational libraries for data storage, processing and analysis have led to the ability to conduct groundbreaking clinical studies with multi-omics data collected across thousands of samples37,38. Recent innovations in technologies such as single-cell sequencing, exosome sequencing, cell-free nucleic acid sequencing, spatial transcriptomics39,40 and nanopore sequencing41 have greatly broadened the potential for longitudinal characterization of cellular and genomic dynamics39,40,41,42,43,44,45. Findings derived from spaceflown experiments leveraging these technologies include ocular/retinal alterations23,46, liver dysfunction47,48, microRNA signatures49, mitochondrial stress50, gut microbiome alterations51 and alternative splicing in space-grown plants52.

However, it is difficult to leverage these technologies to their full potential in space, where workforce and resources are extremely limited. Most experimentation in space is expensive, time-consuming and not automated. This results in small experiments with few samples and replicates, and high levels of variability due to differing sample-handling procedures53,54,55. This makes AI and ML analysis of current space biological data difficult, as the models become under-determined and overfitted to the training data due to high dimensionality (tens of thousands of variables compared with tens or hundreds of data points), and technical batch effects make it challenging to combine datasets to gain higher sample numbers.

Workshop participants agreed that a comprehensive effort to streamline and automate biological experimentation in space is needed to generate large-scale, high-quality, AI-ready, reproducible datasets required to meaningfully expand and validate our scientific understanding and knowledge base. Here we define ‘AI-ready’ to mean a dataset that can be used to train an AI and ML model without further preprocessing except that which may be uniquely required for model architecture.

Current terrestrial automated science

On Earth, basic molecular biology tasks such as pipetting, sequencing library preparation, cell culture maintenance, microscopy, quantitative phenotyping and behavioural change detection have already been automated in a variety of platforms56,57,58,59,60,61. Biofoundries apply high-throughput laboratory automation to generate thousands of strain constructs and DNA assemblies per week62. These advances now enable robust technical reproducibility across experiments, allowing researchers to isolate only the effects of relevant biological independent variables. However, these platforms still require a great deal of personnel operation and hands-on time. Ideally, a fully automated experimental system for spaceborne research will integrate multiple robotic functions (for example, pipetting + cell culture + microscopy photo capture and analysis + phenotyping + cell lysis and nucleic acid isolation + library preparation + sequencing + data analysis). The only human input required should be the initial set-up of experimental parameters and the command to begin experimentation, and system-requested input when unexpected experimental outcomes are observed. The new domain of cloud laboratories for automated science, such as Emerald Cloud Lab, provides facilities to researchers who design and run experiments through an application programming interface61,63,64.

Current and potential spaceflight automated science

At present, there is limited automation for biological data collection and analysis in spaceflight although progress has been made. This is particularly seen in automating spaceborne biological image acquisition. For example, a real-time multi-fluorescence cell culture microscope was established on the ISS65, and Arabidopsis response to microgravity was live-imaged by confocal microscopy66. A recent deep learning approach for automated cell segmentation based on crowdsourced annotation libraries could be leveraged to greatly expedite in situ deep space knowledge discovery67. One possibility for AI and ML and automation in space would be to move from the current, manual analysis of ISS rodent behavioural video68, to an ML-based analysis of ambulatory, sensorimotor and behavioural spaceflight effects69,70,71,72. Another possibility would be to leverage natural language processing with vision-transformer models to develop platforms for automatic, real-time image descriptions and labelling73,74.

Another area of expanding automated AI and ML data capture analysis in spaceflight is ocular/retinal imaging. IDx-DR, an AI-enabled analysis platform for detection of diabetic retinopathy in retinal images, is a Food and Drug Administration-approved AI-based method75. This indicates potential feasibility of AI-based methods to detect space-related pathologies such as spaceflight-associated neuro-ocular syndrome (SANS; a high-priority ocular/visual risk for long-duration microgravity missions76,77). At present, informative changes in vascular branching of the retina and other tissues can be mapped and quantified by NASA’s AI-enhanced Vessel Generation Analysis (VESGEN) software77. Real-time detection of experimental results and pathologies in spaceflight could be enabled by full integration of VESGEN with computer-supported ophthalmic ocular coherence tomography (OCT) and OCT-angiography (OCT-A), which have recently been updated on the ISS for monitoring SANS78 (a technology increasingly miniaturized79 and AI-integrated). Fundoscopy, OCT and OCT-A are now available for real-time, longitudinal imaging of small animals80, which would greatly expand experimental capabilities81.

Recent years have seen successful sequencing of nucleic acids aboard the ISS, facilitated by a long-read sequencer (Oxford Nanopore Technologies)41,82,83,84. Predictably, testing and adjustment were required for the sample loading and sequencing procedure due to the effects of microgravity on liquid dynamics83, illustrating the investment required to automate complete experimental procedures in space, but providing a powerful example for transitioning state-of-the-art research capabilities to space.

Self-driving labs

Automated science in space should be aimed at enabling partially or fully autonomous deep-spaceflight-ready ‘self-driving labs’85,86 that employ AI and ML in a closed-loop system to produce new knowledge and optimize experimental design based on data collected in previous experiments (Fig. 2). In a closed-loop self-driving lab, the AI and ML system has the capability to choose the hypothesis to be tested and the parameters for the next experiment87. In the past decade, advances in several research areas have made such self-driving labs possible on Earth88,89,90,91,92,93. We now have the ability to automate many biological processes using state-of-the-art microfluidics chips for optics, imaging and robotics94,95,96,97,98.

Fig. 2: Self-driving labs are autonomous experimental platforms with AI and ML closed-loop control for knowledge gain and experimental design.
figure 2

In spaceflown research programmes, implementation of self-driving labs will aid comprehensive characterization of the effects of spaceflight on living systems, ultimately feeding research findings into applications such as in situ analytics, Earth-based open-science research programmes and precision astronaut health systems.

A central goal of developing and employing autonomous, AI-supported bioexperimentation systems such as self-driving laboratories should be to generate data that can inform autonomous precision space health systems that provide decision support for crew health management during LEO, cis-lunar and Mars missions12. As automated experimentation becomes more widely available, the space biology field should shift to conducting longitudinal studies99, characterizing physiological changes over the duration of an entire mission. These longitudinal data will help identify biomarkers from various physiological, molecular and microbial systems that can be integrated to create individualized baseline models for humans and other organisms. Monitoring in-flight changes to these biomarker signals will help predict and prevent adverse organismal health outcomes, and predict how different organisms will react to spaceflight conditions.

Data standards and management

A large portion of the workshop discussion centred on the importance of establishing data standards and increasing support for data management to generate maximally AI-ready datasets in space biology research.

Data management for AI readiness

Raw biological data can be complex, sparse and heterogeneous, and therefore not typically ready for AI and ML applications. Biological measurements relevant to a single scientific question may be discrete or continuous, qualitative or quantitative, single- or multi-dimensional, incomplete, highly descriptive (for example, the appearance of cells in culture), and unstructured (particularly for phenotypic and behavioural data). Different experimental practices between facilities and researchers manifest as biases in the data, complicating integration of data from various experiments into a unified platform. These issues are amplified in the space biology field for several reasons. Each spaceflight experiment is conducted by an astronaut, while the ground control studies are conducted by Earth-based researchers, introducing a notable source of variability. Further, biological datasets from different missions have environmental variables (for example, duration, temperature, radiation, carbon dioxide) associated with them that differ across missions and need to be integrated with biological-results data. For space biological data to become AI-ready, we need harmonization standards that incorporate space-specific data and all metadata.

For space biology and health, the NASA GeneLab repository provides open-source, uniformly processed multi-omics data from spaceflight and ground-analogue studies, making space biology multi-omics data as AI-ready as possible100,101. The success of GeneLab in managing its data and metadata led to more efficient collection, curation and management of other spaceflight-relevant data (phenotypic, physiological, biospecimens, environmental telemetry, imaging, microscopy, behavioural; tabular, imaging, video)102 now all part of a unified NASA Open Science Data Repository. Additional work is needed to establish widely adopted standards for AI readiness in these research domains103,104. To best leverage all available data, space biology needs to invest in tools to perform automated conversion from existing, non-AI-ready formats into AI-ready formats. To facilitate this, the community needs a set of standardized ontologies and data formatting guidelines specifically for space biology (for example, the inclusion of a datasheet to describe each dataset105). These standards can then inform experimental design to ensure that data from future missions are generated in an AI-ready format.

A key part of data standards is the establishment of uniformly used vocabularies that are grounded in common conceptualizations (that is, ontologies), which increase data discovery and reuse. Biomedical ontologies have existed for over 50 years and many are in widespread use106,107,108, but no single ontology includes foundational concepts in the space biology/aerospace medicine domain (for example, specially developed experimentation hardware types, space environment types, parameters and so on). The space biology community should focus efforts on developing one or more such ontologies to standardize metadata with respect to space-relevant concepts and data structures, and across microbial, plant, animal and humans. An early effort is the Radiation Biology Ontology produced by NASA GeneLab and STOREDB109.

Automated, AI-assisted data harmonization and dataset curation will be a critical part of advanced space biology research architectures like the one shown in Fig. 3. Such architectures must be designed to support the entire experimental process from investigation management, to experiment execution, to data publication, through to open-science13 data repository submission (with appropriate security and governance measures to guarantee protection of private data resources). Investigator data can be effectively integrated into the NASA Open Science Data Repository through embedded digital experiment notebooks to preserve experimental parameters and analyses, with link-out capability to approved, external data resources for seamless integration with research data110,111. Use of space biology metadata ontologies can support automated harmonization across the wide spectrum of organisms studied, equipment used and experimental designs. Because space biology datasets cover a range of modalities (each requiring distinct data processing), such advanced architectures must include a suite of metadata, data acquisition and data processing tools. The proposed environment is similar to the successful virtual observatory paradigm in the planetary sciences112,113. Moreover, effective methodology transfer from planetary science to biomedical research has already occurred between NASA’s Jet Propulsion Laboratory and the National Cancer Institute’s Early Detection Research Network114.

Fig. 3: Deep space biological and biomedical data collection and transfer.
figure 3

The diagram shows the data and information flow in which a cloud-based data management environment serves as the nexus between space-based data and research and Earth-based researchers and analysts, enabling open-science access to data and analytics and facilitating preparation of AI-ready datasets.

Full utilization of such an environment would ensure that all newly generated space biology datasets are AI-ready, and facilitate conversion of previously generated datasets into AI-ready formats. In addition, embedded open-science capabilities will enable broad data sharing and reuse, and avoid metadata decay and long-term data maintenance issues. A similar data management tool was implemented recently by the National Institute of Standards and Technology, to address data harmonization and standards for their principal investigator and research community115. A unique aspect of the space biology data management system is that ultimately, such an architecture must be cloud-based and linked to in-flight data acquisition systems, and eventually deep space communication for critical data downlinks116,117.

Finally, it will be important to establish and adopt robust dataset readiness metrics to aid AI-modelling researchers in understanding the applicability of various datasets. Technology readiness levels have already been proposed for ML methods118. Such metrics, if applied to datasets, could be useful for understanding the AI readiness of space biological data. Moreover, a bronze, silver and gold reproducibility standard has been proposed for life science AI and ML workflows119. A similar standard could be implemented for AI and ML analysis of space biological data to ensure reproducibility and confidence in results. These standards would be tailored for different AI and ML methods.

Organizing data

Standardized space biology ontologies will enhance the opportunity to construct knowledge graphs (KGs)120,121 compatible with the unique experimental outcomes of space biology research. These KGs will incorporate and model causal relationships using ontological content and space data, enabling the inference of physiological responses to experimental perturbations from multi-omic, phenotypic, imaging and environmental telemetry data. An existing relevant KG is the National Science Foundation-funded, University of California, San Francisco-developed SPOKE (Scalable Precision Medicine Oriented Knowledge Engine)122, which is linked to about 30 biomedical, chemical, molecular and pharmaceutical databases123,124. Analysis of transcriptomic spaceflown mouse data using SPOKE identified spaceflight-induced physiological changes similar to terrestrial clinical conditions, consistent across multiple tissue types, demonstrating the utility of KG-based systems for furthering our understanding of space biology122. A notable resource for data mining to model causal relationships in a space health context are the various directed acyclic graphs produced and maintained by the Human Systems Risk Board125,126.

AI and ML methods for space research

Space biology combines the complexity of the biological and medical fields with an entirely new dimension: extended spaceflight in environments not truly known, or very different from Earth. Therefore, a portion of the workshop discussion focused on development of AI and ML algorithms specifically designed for data collected in novel space constraints and environments. A parallel problem is limited computational resources in spaceflight, and there is a detailed discussion of recommendations for this problem in the companion review article from this workshop12.

Interpreting biological data

Explainable AI (xAI) provides a human-readable explanation of the evidence and rationale for predictions and recommendations, particularly important in biomedical research127,128,129,130,131. As a central goal of space biological research is to establish predictive characterization of spaceflight effects on human astronaut health through translation from model organisms, all aspects of AI and ML development in this field should embrace and incorporate a degree of xAI practices as well as post hoc explainability and model interpretability with tools such as LIME132 and SHAP133.

Generating new data

Workshop participants recommended the creation of a collection of generative models (model zoo) that have been pre-trained for each of the main types of space biology data. These models, which typically use generative adversarial networks (GANs)134 or variational autoencoder architectures (VAEs)135, can be used to produce synthetic data to validate new and existing space biology AI and ML methods. For example, the ECG Generator of Representative Encoding of Style and Symptoms model generates synthetic electrocardiogram (ECG) signals after training on data from an astronaut wearable device, providing a large dataset of realistic synthetic data on which to train models for astronaut health monitoring136; and GANs have been used to generate synthetic DNA-sequencing137 and RNA-sequencing138 data.

Generative models can also provide powerful solutions for data mapping: generating data based on source data that are often dimensionally smaller than the target. For example, VAEs can translate ECG readings into an activation map that re-creates the electrical activity of the heart139.

Next-generation models

Workshop participants recommended looking beyond established AI and ML techniques and classic deep learning architectures to investigate potential next-generation AI and ML models and related computing hardware. Three opportunities in particular were considered promising because their capabilities could help solve specific challenges of space biology: one-shot learning, advanced transfer learning and alternative hardware architectures. One-shot and transfer learning have potential to help address small sample sizes, or space data collected in different settings than training datasets; and neuromorphic computing can support in situ data collection and analysis.

First, one-shot (or few-shot) learning is a technique for developing an AI and ML model with limited training examples, which is a valuable characteristic for space biosciences due to the sparse availability of biological data gathered in a spaceflight context, especially from astronauts. This technique has been primarily applied to image similarity and classification, which implies that analysis of spaceborne histological data may be particularly well suited to established implementation architectures140,141.

Second, transfer learning aims to mimic the manner in which biological intelligence can leverage expertise in one area to more quickly and effectively learn to tackle problems in an adjacent domain. ML models can be trained using one dataset, and then rapidly adapted to a function in an adjacent problem space using a second dataset. Most commonly, this is done when there is limited or no data available for the target problem space, such as predicting human health risks in a deep space context. The result of such transfer learning, using a large amount data from a related field with subsequent adaptation using a limited amount of data from the actual problem space, can be more effective than attempting to use only the smaller actual dataset, even when used with data boosting techniques142,143.

Third, there are emerging chipsets and hardware architectures that offer promising capabilities for AI and ML applications with minimal SWaP-C characteristics (size, weight, power and cost)144. One particularly important example is neuromorphic computing, which represents a dramatic departure from the classical von Neumann architecture. The term neuromorphic emerged in the late 1980s, and at the time primarily referred to analogue or hybrid analogue–digital implementations of brain-inspired computing, with research that included chips containing miniaturized analogue circuits to represent neural network architectures. However, hardwired analogue chips are impractical for contemporary large-scale neural nets and have limited reconfigurability for the requirements of adaptive systems. For this reason, neuromorphic innovations shifted towards digital solutions, with chips containing many thousands of specially designed cores, each with an arithmetic logic circuit, memory and queue register that can represent one or more neurons. This approach retains the benefits of neuromorphic computing, most notably low power consumption and the biologically inspired nature of spiking neural networks, rather than the clock-synchronized von Neumann architecture145,146. The appeal of neuromorphic computing to AI and ML problems is further amplified in a spaceflight context due to their resilience to ionizing radiation144,147,148. All these benefits are particularly important when in situ spaceborne AI and ML systems must continually learn, as opposed to only performing inferencing with a static model. This need for ‘continual learning’ is anticipated to be a likely scenario for long-duration human space exploration during which human biological systems adapt to spaceflight, thereby establishing a ‘new normal’ from which indications of disease must be detected.

In addition, workshop discussions acknowledged that other advances in AI and ML and computing hardware may soon emerge that are highly applicable to operations in the deep space environment and biosciences domain, such as (1) compute-in-memory solutions based on resistive random-access memory149, (2) advances in the reliability of space-grade graphic processing units and field programmable gate array devices150, (3) self-repairing neural networks, and (4) quantum computing151. In the case of quantum computing, the properties of quantum states, such as superposition and entanglement, are used to represent the entire search space of an optimization problem, which can then be observed to trigger a collapse of the quantum state down to a single eigenstate that defines the optimal result. For certain classes of problems, this can offer computing performance that is many orders of magnitude faster than algorithms that run on binary computing hardware. This is relevant to AI and ML research because models typically ‘learn’ through iterative adjustments in their parameters to minimize a cost function, which makes quantum optimization and AI and ML a promising combination152,153. There are still technology gaps before quantum computing can be fully implemented on spacecraft, but the first successful in-flight training and inference of quantum machine learning for Earth observation was recently completed154.

In the same way that traditional neural networks are inspired by human neural programming, self-repairing neural networks are inspired by organisms such as immortal jellyfish, which continuously regenerate their neural networks in anoxic or very-low-oxygen environments in the deep ocean155. These self-repairing models may be useful for robust training and inference on deep space expeditions, capable of graceful degradation and self-repair even when interrupted by a radiation-induced single-event upset. Some of these model designs focus on emulating the biological self-repair role of astroglial cells, whereas others focus on adding a ‘safe layer’ to the model architecture that imposes self-learned constraints to outputs and can trigger any necessary self-repair156,157.

Predictive systems

The ultimate goal of space biological research is to predict the effects of spaceflight at all physiological levels within diverse living systems, then develop the building blocks to support life and bioengineer the foundations for sustained life beyond Earth. Such predictive modelling and bioengineering will only be possible once we are able to model all parts of living systems, introduce perturbations, and measure genetic, cellular and physiological outcomes longitudinally.

Building on automated, robotic and longitudinal data capture capabilities, workshop participants emphasized that space biology research will benefit from the development of digital twins: predictive models of whole organisms. Digital twins integrate multi-scale mechanistic mathematical modelling of an entire complex organism, from genes to cells to tissues to organs158,159,160. There now exist whole-cell computational models of microbes Mycoplasma genitalium161 and Escherichia coli162 for cellular predictions, and the ongoing Physiome Project develops mathematical models of the human body, from cells to tissues to organs, integrating chemical, metabolic, cellular and anatomical information163,164. Such models could integrate microbial–host cell interactions and environmental coupling data to predict responses to microbial population change or environmental perturbations (which are typical of spaceflight). It will be important to identify an appropriate set of reference organisms (bacteria, eukaryote, archaea, viruses) as targets for high-fidelity digital twin models, which could be used to predict biological response under diverse extraterrestrial environments. Ultimately, this technology will enable the development of predictive models that can be personalized to individual human astronauts based on unique differences in genetics or physiology.

Discussion

Our current understanding of the multi-tiered effects of spaceflight stressors on a mammalian organism is derived from a small amount of human astronaut data and hundreds of small, expensive, model organism biological experiments performed manually during a variety of spaceflight missions (and ground-analogue experiments). Workshop participants agreed that to advance space biology research as a field, a paradigm shift is necessary from the current manual, single-experiment paradigm into a new era of biological research conducted in space facilitated by robotic automation, AI-driven experimental design and analysis. Workshop participants envisioned an AI and ML space biology research lifecycle, with a data management environment and appropriate AI and ML methods facilitating the acceleration of research findings and ultimately powering widespread flight data acquisition and precision support for astronaut and ecosystem health.

Make AI and ML space-ready

Although much of the automation discussed is already in use terrestrially, it is important to note that these hardware and software are not immediately suitable for spaceflight research. Steps must be taken to convert and develop these automated systems for use in-flight, following existing processes for creating space-ready hardware. Many types of scientific equipment have already been cleared for spaceflight and are currently in use on the ISS165, but anticipated future challenges include but are not limited to: (1) known difficulties in microfluidic processing in microgravity83, (2) enabling processes beyond LEO in higher radiation exposure and altered/partial gravity1, and (3) deploying effective edge computing for diverse locations such as the Lunar Gateway, lunar surface, Mars transit and the Martian surface. Spaceflight-ready automated systems will enable cost-effective collection of vast biological data in difficult or constrained conditions. Moreover, workshop participants agreed that the next step is to couple automation with AI-assisted or AI-driven hypothesis generation and experimental design to facilitate the automatic generation of biological insights over time without the need for human input and expertise (self-driving labs). By benefiting from the high reproducibility of machines, we envision a future where automatic data and metadata acquisition will be complete and unambiguous such that AI and ML methods will be able to accumulate such information and constantly learn.

Data standards and fields of research

The future of space biology experimentation envisioned during the workshop will only be possible through widespread adoption and implementation of standards for generating and maintaining data and metadata from automated and AI-driven systems. It will be vitally important to develop a set of guidelines for generating AI-ready, machine-readable data from every space biology experiment, to facilitate open-access AI- and ML-assisted data analysis and reuse. This must include concerted efforts to develop and maintain space biology vocabularies, ontologies and data dictionaries that can be leveraged for automated reasoning, as well as top-down motivation to adopt standardized data management facilities.

Adapting and developing AI and ML methods for space biology

Workshop participants discussed existing AI and ML methods, models and algorithms, and agreed that the next decade of investment must include a focus on adaptation and implementation of existing methods with a specific focus on space biology. Approaches such as transfer learning and generative modelling hold great promise for space biology research, but care must be taken to adapt these methods within spaceflight constraints. Further, novel AI and ML approaches with high potential for space biology were discussed, including neuromorphic computing. Due to limited bandwidth in space, our efforts should focus on developing multi-faceted solutions including pre-training lightweight models on larger Earth-bound datasets, federated training166, edge computing167 and onboard processing. A more in-depth discussion of these solutions is in the accompanying workshop review article12.

In the next decade, investment in AI and ML research design and analysis promises to revolutionize the way biological research is performed in space. Integration of automated and self-training systems will enable hands-off and reproducible generation of substantial cutting-edge imaging, video and multi-omics datasets, ready to be mined by next-generation AI and ML space biology tools. Workshop participants agreed that the key to this future involves the creation of multidisciplinary teams with statisticians, biologists, modelling experts and hardware developers. Such cross-cutting and interdisciplinary teams are able facilitate the experimentation and data analysis necessary to fully understand and begin to predict and mitigate stressors of spaceflight, and enable life to thrive in deep space.

Recommendations and conclusion

The goal of the ‘Workshop on Artificial Intelligence and Modeling for Space Biology’ was to develop a vision for optimal usage of AI and ML in both spaceflight health support and space biological research over the next decade. Workshop participants identified several areas of space biological experimentation that would benefit from AI- and ML-based analysis or automation, and developed a set of fundamental action items for the next ten years. It was challenging to identify precise development strategies, as the current use of AI and ML in space biology is limited. The outcome of the workshop took the form of several broad focus areas rather than a detailed roadmap. These focus areas are as follows.

  1. (1)

    Ensure all space biological data and information are generated with strong data stewardship standards embracing FAIR and open science to enable public access and scientific reuse

  2. (2)

    Develop self-driving labs for spaceflight, using data management standards to inform the data output and organization from automated experimental platforms

  3. (3)

    Adapt relevant existing AI, ML and modelling methods best suited for space biology research implementation (and when necessary lead development of new methods)

Adoption of AI, ML and modelling methods in space biological research is an endeavour that will span the next decade, but will ultimately revolutionize the way we perform experiments and analyse data. The developments discussed in this paper will enable us to gather the necessary data and tools to build a comprehensive characterization of the biological responses of living systems to myriad diverse spaceflight environments. This knowledge base will be essential to facilitate NASA’s goals of lunar, Martian and deep space missions, as we will be able to predict and mitigate adverse effects at all biological levels.