A workflow to process 3D+time microscopy images of developing organisms and reconstruct their cell lineage

The quantitative and systematic analysis of embryonic cell dynamics from in vivo 3D+time image data sets is a major challenge at the forefront of developmental biology. Despite recent breakthroughs in the microscopy imaging of living systems, producing an accurate cell lineage tree for any developing organism remains a difficult task. We present here the BioEmergences workflow integrating all reconstruction steps from image acquisition and processing to the interactive visualization of reconstructed data. Original mathematical methods and algorithms underlie image filtering, nucleus centre detection, nucleus and membrane segmentation, and cell tracking. They are demonstrated on zebrafish, ascidian and sea urchin embryos with stained nuclei and membranes. Subsequent validation and annotations are carried out using Mov-IT, a custom-made graphical interface. Compared with eight other software tools, our workflow achieved the best lineage score. Delivered in standalone or web service mode, BioEmergences and Mov-IT offer a unique set of tools for in silico experimental embryology.

The successive cell division cycles in the developing ascidian or sea urchin embryos were revealed by plotting (a) the total cell number and (b) the cell cycle length (time between two consecutive divisions) as functions of time, in hours post fertilization (hpf). In blue: all detected and tracked cells; in red: cells validated by eye inspection. Cell division synchrony was more prominent in the sea urchin. For the zebrafish embryo, analyzed here at late blastula stages, the interpretation of the increase in cell number was highlighted by the quantification of cell density and internuclear distance (see next row). Whereas the sea urchin and ascidian reconstructed specimens were extensively corrected by experts to define gold standards, we only validated and corrected subsets of the zebrafish clones. The characteristics of the validated cells (in red) were consistent with a linear progression of cycle length for most cells throughout the blastula and gastrula stages. However, the dispersion of cycle length around the average, beside suggesting a certain number of errors in the detection of cell divisions (especially at late developmental stages), also highlighted a greater diversity of behaviors. In this respect, the ascidian Phallusia mammillata showed the largest variety of specific proliferation rates along the lineages. (c) Average distance between pairs of neighboring cell nuclei through divisions as a function of hpf (error margin in gray). Cells' neighborhoods are calculated by 3D Delaunay triangulation using the Computational Geometry Algorithms Library (CGAL) 2 . In all three species, this average distance decreased during early embryogenesis and converged to a minimal value around 10µm, corresponding to an average cell diameter. This observation fits with the plot of cell density, obtained by segmentation of the global imaged volume (Fig. 3h), which increased throughout gastrulation and plateaued at the end of gastrulation (10 hpf). It is also consistent with an estimate of the average proliferation rate during the same developmental period 3 . (d) Average number of neighbors as a function of hpf. In both the zebrafish and the ascidian, this value is near 12, i.e. in the range of the perfect 3D hexagonal close packing 4  This diagram summarizes our general methodology for the integrative modeling of living systems' morphogenesis. Ultimately, imaging data revealing the characteristic scales of biological processes should lead to models coupling the different organization levels of developing embryos (molecular, cellular and tissular). The methods and tools provided in this publication are specific to the cellular level of organization. The architecture of the workflow accessible through our webservice, however, is ready for the integration of new algorithms operating on other types of data. The three bottom pictures illustrate our concepts with the biomechanical modeling of the sea urchin early embryogenesis (blastula stages): imaged live specimen (bottom-left panel), reconstructed specimen (bottom-center panel), and simulated specimen with the MecaGen platform 5 (bottom-right panel, image courtesy of Julien Delile). We call phenomenological multilevel model, or "reconstruction" (middle-left panel and underlying triangle graph) the set of operations based on algorithmic methods to extract measurements from 3D+time images and provide the most accurate quantitative information relevant to the detected components: cell numbers, positions, shapes, interactions, trajectories, and so on. The design and implementation of image processing algorithms, data management, and analysis tools are key steps toward high-throughput 3D+time image analysis. This phenomenological reconstruction feeds a database of raw and reconstructed data used for statistical analysis. Quantitative results are then used to set the parameters of a theoretical model (top panel) and its derived numerical simulations, leading to a simulated multilevel model, or "virtual morphogenesis" (middle-right panel). This scheme creates a virtuous cycle of experimental validation of the models (top triangle graph) by questioning both the biological system and the simulation. Measuring the differences ('s) between the multiscale raw data (imaged specimen), the phenomenological data (reconstructed specimen) and the virtual data (simulated specimen), constitutes a major challenge. So far, the  between the raw and reconstructed data is estimated by a manual procedure using our custom-made visualization interface Mov-IT.     Developmental stages for the different species as described in (Danio rerio) 6 , (Ciona intestinalis) 7 and (Strongylocentrotus sp.) 8 .  The choice of 1 or 2 for embryo_type is related to the initial condition of the SubSurf segmentation process. The usual procedure is to construct a certain initial shape around the cell center, which is then evolved by the SubSurf method. This initial condition should be chosen according to the size of the cells and their expected shape. For different types of data and species, one can either try choices 1 or 2 or then customize the code of the procedure used to construct the initial condition. For user support, please contact: mikula@math.sk Supplementary Note 1 | Computational speed and scalability. Typical image datasets processed here contain 512512120 voxels and 2 channels (e.g. Dr2 dataset). Approximate computational cost of the algorithms:

Supplementary
 The filtering step with the GMCF method and 10 iterations takes 5 minutes for each 3D volume using 8 processors (communicating by MPI).  The cell center detection step by FBLS runs for 20 iterations and takes 200 seconds for one 3D volume using 8 processors (communicating by MPI).  The cell center detection step by DoG (combining filtering and detection processes) takes on average 5.5 seconds per time step, while time steps can be processed in parallel on several processors.  For nucleus segmentation using SubSurf, it takes on average 0.75 second to process each nucleus. With 6,000 nuclei, a single 3D volume is segmented in 1.25 hours  The computation time for the membrane segmentation is 6 seconds per cell (10 hours for a 3D volume containing 6,000 cells).  Whole embryo shape segmentation can also be operated in parallel.  The cell-tracking algorithm SimAnn takes 4 hours on 8 processors to process a dataset with 360 time steps and an average of 6,000 cells per 3D volume. In sum, reconstructing the cell lineage tree (e.g. with DoG and SimAnn) from a typical zebrafish dataset (512512120 voxels with in average 6,000 cells over 360 time steps) with the standalone version of the BioEmergences workflow takes less than 5 hours on a local computer with 8 cores. Performing the complete reconstruction (e.g. lineage tree and shapes from a two channels dataset) in the web service mode with computation on EGI (European Grid Infrastructure) can take 48 hours including data transfer and job queuing until execution. But it should be noted that a number of datasets can be processed in parallel during this period.
Supplementary Note 2 | In silico fate mapping can be performed in three different ways. For the ascidian embryos, the state-of-the-art fate map proposed at the 110-cell stage 13 is implemented by defining distinct cell populations with the Mov-IT visualization software, and assigning them specific colors. The fate map is then propagated along the reconstructed cell lineage. It is this propagation across cohorts of specimens that can tell us whether the lineage is invariant, as it is traditionally assumed. It should be noted that a similar strategy led our colleagues working with C. elegans to revise their ideas about lineage invariance in this species 14 .
Alternatively, and without a priori, cell fate can be assessed as in classical embryology studies, i.e. by following cell clonal history long enough to be able to conclude about the contribution of progenitors to organs, and also about cell differentiation in specific cell types defined by their shape, position and neighborhood. The limitation here is the duration of the time lapse and the evolution of image quality, hence of tracking accuracy. We now know that beyond 15 hpf, it becomes very difficult to resolve individual nuclei in ubiquitously stained zebrafish embryos, even when zooming in on a specific compartment. In this case, mosaic or rainbow type staining is required to decrease image complexity and improve tracking accuracy. The methods provided here are expected to perform well at any stage of development with mosaic staining of nuclei.
Finally, there is also the possibility to backtrack cells from their location at a late stage when compartments or presumptive organs are morphologically recognizable. This is achieved with Mov-It by using the "cell selection" function and back-propagating along the cell tracking.