Detecting abnormal cell behaviors from dry mass time series

Bailly, Romain; Malfante, Marielle; Allier, Cédric; Paviolo, Chiara; Ghenim, Lamya; Padmanabhan, Kiran; Bardin, Sabine; Mars, Jérôme

doi:10.1038/s41598-024-57684-w

Download PDF

Article
Open access
Published: 25 March 2024

Detecting abnormal cell behaviors from dry mass time series

Romain Bailly^1,7,
Marielle Malfante¹,
Cédric Allier^2,3,
Chiara Paviolo²,
Lamya Ghenim⁴,
Kiran Padmanabhan⁵,
Sabine Bardin⁶ &
…
Jérôme Mars⁷

Scientific Reports volume 14, Article number: 7053 (2024) Cite this article

225 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

The prediction of pathological changes on single cell behaviour is a challenging task for deep learning models. Indeed, in self-supervised learning methods, no prior labels are used for the training and all of the information for event predictions are extracted from the data themselves. We present here a novel self-supervised learning model for the detection of anomalies in a given cell population, StArDusTS. Cells are monitored over time, and analysed to extract time-series of dry mass values. We assessed its performances on different cell lines, showing a precision of 96% in the automatic detection of anomalies. Additionally, anomaly detection was also associated with cell measurement errors inherent to the acquisition or analysis pipelines, leading to an improvement of the upstream methods for feature extraction. Our results pave the way to novel architectures for the continuous monitoring of cell cultures in applied research or bioproduction applications, and for the prediction of pathological cellular changes.

Reconstructing growth and dynamic trajectories from single-cell transcriptomics data

Article Open access 30 November 2023

Hierarchical progressive learning of cell identities in single-cell data

Article Open access 14 May 2021

Machine learning phenomics (MLP) combining deep learning with time-lapse-microscopy for monitoring colorectal adenocarcinoma cells gene expression and drug-response

Article Open access 20 May 2022

Introduction

Predicting the future behavior of single cells within a large population is an interesting task for deep learning models. To this end, different approaches have already been developed to predict cell trajectories from transcriptomics dataset¹, detect cells that deviate from normal behavior in time-lapse imaging², and learning cell competition rules³. Once a complete behavioral pattern is known within a temporal interval, any deviation from “normality” can be classified as “abnormal”. This is of crucial importance when monitoring a cell population, as identifying cells that harbor cancerous mutations based on deviation from a “typical” trajectory would allow for predictions about pathological changes.

Recently, novel techniques in bioimaging coupled to advanced deep learning models have allowed the visualization, quantification, and monitoring of important cellular features over time on large populations and in label-free conditions⁴. Lensfree phase microscopy allows indeed the extraction of important cellular features over time, such as the dry mass (i.e. the weight of the cell’s content other than water). This parameter is a critical biological feature, as it remains invariant over different cell generations and reflects the growth and division stages of a cell. Dry mass is therefore key to the understanding of cellular behavior, including cell size, cycle, state, and homeostasis^5,6,7. Monitoring the dry mass of a cell over time is thus a proxy of the overall cellular status, and could allow the prediction about possible abnormal deviations.

We propose a method using self-supervised learning to detect abnormalities or anomalous cell behavior on temporal trajectories of dry mass. Self-supervised learning is a recent training paradigm that does not need any label to train a machine learning model. It uses information extracted from the data themselves, called pseudo-labels. These pseudo-labels are used to train a neural network to perform a pretext task, that can then be used in the downstream pipeline, in our case anomaly detection. Forecasting the future dry mass of a cell is the pretext task used in this paper.

Unsupervised (i.e. no labels are manually provided to the model) anomaly detection has been used lately in a wide variety of domains⁸, including but not limited to, astronomy⁹, earth science¹⁰, neuroscience¹¹, oceanography¹² or physics^13,14. We focused on anomaly detection on time series without any prior labels as presented by Gupta et al.¹⁵. Here, the detection is achieved with a two stage anomaly detector, relying on the comparison of the measured and predicted nominal trajectory.

We call the proposed model StArDusTS, for Self-supervised Anomaly Detection on Time Series, and use it on distinct datasets of cellular dry mass time series. To assess its performances, we designed an experimental validation using different cell lines, i.e. a human cancer HeLa cell line and murine fibroblasts cells, cultured and imaged in different laboratories. Overall, we report a precision of 96% in the automatic detection of anomalies present in these different datasets. Several types of biological anomalies from the measurement of cell dry mass alone and without any human priors were detected in the different time-lapses, e.g. cells dividing to three cells, very large cells and cell fusion. Additionally, anomaly detection was associated not only with abnormal cell behavior but also with cell measurement errors inherent to the acquisition or analysis pipelines, such as segmentation and tracking. This could lead to an improvement of the upstream methods for cell imaging and analysis. Our results pave the way to novel architectures for the continuous monitoring of cell cultures in applied research or bioproduction applications.

Method

Here we describe the cell imaging and analysis pipeline in section 2.1, the architecture of the StaArDusTS model in section 2.2 and the different experiments conducted to validate the algorithms. Importantly, we designed an experimental plan featuring three sets of live cell acquisitions conducted in different laboratories.

Lensfree microscopy for cell dry mass measurement

Lens-free microscopy is a technique providing large field of view, i.e. several $\text {mm}^2$. It has been first developped by Ozcan et al.¹⁶. Later it has been applied to the live-imaging of cell culture¹⁷ and their analysis through machine learning methods⁴. Using this method, thousands of cells are simultaneously imaged over several days. The obtained dataset allows the tracking of thousands of individual cells over several tens of hours. Furthermore, it allows the computation of the cell dry mass through the measurement of the optical path difference (OPD) introduced by the sample^17,18,19. OPD is measured by the integral of the sample’s refractive index along the optical path.

$$\begin{aligned}{} & {} \varphi (x,y)_{\text {shift}}=(\varphi (x,y))-(\varphi _{\text {medium}}) \end{aligned}$$

(1)

$$\begin{aligned}{} & {} {{\,\mathrm{\text {OPD}}\,}}(x,y) = \lambda \frac{\varphi _{\text {shift}}(x,y)}{2 {\pi }} = \int _{0}^h [n(x,y,z)-n_{\text {medium}}] \,dz \end{aligned}$$

(2)

where n is the local sample refractive index and $n_{\text {medium}}$ the surrounding medium refractive index, z the position along the optical axis, h the thickness of the sample and ${\lambda }$ the illumination wavelength¹⁸. The optical volume difference (${{\,\mathrm{\text {OVD}}\,}}$) is obtained by integrating the ${{\,\mathrm{\text {OPD}}\,}}$ over the total projected area. The OVD can then be converted into cell dry mass according to Eq. (3). In our notation, it is a function of ${\alpha }$, the specific refractive increment which relates the refractive index change to the increase in mass density²⁰. The specific refractive index of the different intracellular substances falls with a narrow range, allowing the definition of a constant ${\alpha }$ of 0.18 ${\mu }m^{3}\cdot pg^{-1}$ for most eukaryotic cells²⁰.

$$\begin{aligned} {{\,\mathrm{\text {OVD}}\,}}= & {} \int _{S} {{\,\mathrm{\text {OPD}}\,}}(x,y) \,dx \,dy \end{aligned}$$

(3)

$$\begin{aligned} {{\,\mathrm{\text {CDM}}\,}}= & {} \frac{{{\,\mathrm{\text {OVD}}\,}}}{\alpha } \end{aligned}$$

(4)

The dry mass measurements have been obtained using a previously described cell imaging analysis pipeline⁴. The latter includes the acquisitions of raw images with a lensfree microscope at a frame rate of one acquisition every 10 min, the reconstruction of ${{\,\mathrm{\text {OPD}}\,}}$ images with the algorithm described in²¹, the detection of the cell with a dedicated 2D-CNN and tracking of each individual cell by means of Fiji plugin Trackmate²² and a cell segmentation performed by a watershed-algorithm. For the cell dry mass measurement, several sources of noise are present, i.e. in the acquisition, the reconstruction of the ${{\,\mathrm{\text {OPD}}\,}}$, the cell detection, and segmentation. The measured cell dry mass values are in the order of a few hundreds of picograms (pg), while the precision of our measurements was estimated to be about 35 pg¹⁹. Figure 1 shows an example of the cell imaging analysis pipeline in terms of segmentation (a) and cell dry mass time series (b). Figure A in supplementary materials shows an example of lensfree microscopy acquisition before the segmentation and tracking of cells.

StArDusTS model

To eliminate any potential human bias when identifying abnormal cells, we introduce the StArDusTS model, which leverages artificial intelligence to autonomously acquire insights from cellular data and identify abnormal patterns within it. StArDusTS, an acronym denoting “Self-supervised Anomaly Detection on Time Series”, comprises two independent algorithmic components. The first component extracts a representation (i.e. feature vector, embeddings) from the raw time series to facilitate the detection of abnormal cells. This learned representation is subsequently channeled into the model’s second component, the anomaly detection module, detailed after.

Representation learning block

Self-supervised learning: Traditionally, supervised learning^23,24 has been the dominant paradigm in machine learning, where models are trained on meticulously labeled datasets. However, this labeling process is often labor-intensive, time-consuming, expensive, and can induce human biases. Self-supervised learning, on the other hand, seeks to alleviate these limitations by enabling models to learn directly from the data itself.

At its core, self-supervised learning operates on the principle of leveraging inherent structures and relationships within the data^25,26. It does so by creating surrogate tasks, also called pretext tasks, that generate pseudo-labels or objectives from the input data. These pretext tasks require the model to predict missing parts of the data, reorder sequences, or otherwise make sense of the information it encounters. By solving these tasks, the model learns to capture meaningful features and representations that are useful for downstream tasks, such as image classification, language understanding, recommendation systems, or here, anomaly detection.

Examples of pretext tasks in computer vision include the reconstruction of an image^27,28, namely auto-encoders, the prediction of rotation of an image²⁹, the coloring of black and white images³⁰, image inpainting³¹ or even artifact detection³².

Pretext task adapted to time series : In the context of StArDusTS, we propose the utilization of time series prediction as a pretext task. Specifically, we partition 30-h (180-points) windows of cell dry mass data into two segments: an input segment and a corresponding label. The initial 20 h (120 points) of data serve as the input for the proposed neural network. It is tasked with forecasting the subsequent 10 h (60 points) of cell dry mass, knowing those first 20 h.

The length of the window has been chosen in order to always include a cell division inside the input window. If a smaller input window had been chosen, the cell division would have been a rare event in an input and would therefore not have been learned well, since it would have been subject to catastrophic forgetting³³. It should also be noted that the decision to exclude mother-daughter cell acquisitions lasting less than 30 h may introduce a potential bias into our algorithm. This bias arises from the restriction to studying only cells with sufficiently extended lifespans. Nevertheless, this selection was imperative to enable the model to acquire a meaningful representation of the dataset.

Evaluation metric for prediction: To evaluate the predictive performance, we use the Mean Squared Error ($\mathcal {M}\mathcal {S}\mathcal {E}$) metric, as expressed in Eq. (5). In this equation, y is the actual value of the time series, $\widehat{y}$ denotes the predicted values, $y_t$ and $\widehat{y_t}$ are the actual and predicted values at each time step t, and $N=60$ corresponds to the number of time steps in a prediction. A lower value of $\mathcal {M}\mathcal {S}\mathcal {E}$ indicates superior predictive performances.

$$\begin{aligned} \mathcal {M}\mathcal {S}\mathcal {E}(y, \widehat{y}) = \frac{1}{N} \sum _{t=1}^{N} \left( y_t - \widehat{y_t} \right) ^ 2 \end{aligned}$$

(5)

Neural network architecture: In conjunction with our choice of a pretext task, it is imperative that we meticulously define the architecture of the AI model that will be assigned with this task. The architecture dictates the network’s structure, layer configurations, and parameter settings, all of which play a pivotal role in the model’s ability to extract meaningful patterns and make accurate predictions.

A one-dimensional convolutional neural network (1D-CNN) is a deep learning architecture designed specifically for processing one-dimensional data, such as time series. Unlike traditional convolutional neural networks (2D-CNN) that operate on two-dimensional grids, 1D-CNNs convolve filters over a single axis, typically time or sequence steps. In a 1D-CNN, convolutional layers are responsible for sliding a small set of learnable filters over the input data, capturing local patterns and features. These filters can detect characteristics like edges, gradients, or more complex temporal patterns in the data. Subsequent layers, such as pooling and fully connected layers, help consolidate these features and enable the network to learn high-level representations³⁴.

The choice of the 1D-CNN was motivated by its use in a wide variety of time series applications such as ECG classification^35,36, fault detection^{37,38,39,40,41,42}, or speech recognition^43,44. 1D-CNN for time series processing is beneficial because it can capture local patterns and dependencies in the data through its convolutional filters, making it effective for tasks like feature extraction and anomaly detection.

The 1D-CNN architecture used in this study is presented Fig. 2a) and contains 3 blocks of 3 Conv1D layers with 64 kernels of size 3, paired with tanh activation functions. The blocks of 3 1D-CNN are separated with maxpooling layers of size 2. The features are then fed in dense layers of size 64 and 32 with reLu activation functions and finally an output layer of size 60. This architecture was selected after a manual optimization of hyperparameters of 1D-CNN⁴⁵.

Anomaly detection block

The representation learned by the 1D-CNN is then fed to the anomaly detection block. For this application of detection of abnormal cells, we propose two complementary detectors, the second one working on top of the first one.

Window level anomaly detection: A first threshold detector is used to detect the anomalies within a single predicted window. It is based on the value of the prediction metric. The $\mathcal {M}\mathcal {S}\mathcal {E}$ computes the $l_2$ distance between the prediction and the actual value of the future cell dry mass. The larger the $\mathcal {M}\mathcal {S}\mathcal {E}$, the larger the error of the predictor. Selecting the windows with the larger values of metric is therefore selecting the most anomalous windows.

The threshold $\tau _w$ determining which prediction windows have to be considered anomalous is computed such that windows with metric value outside the 95% confidence interval are anomalous. The threshold $\tau _w$ is computed Eq. (6) with $\mu _{\text {training}}$ and $\sigma _{\text {training}}$ respectively the mean and standard deviation of the $\mathcal {M}\mathcal {S}\mathcal {E}$ of the training set.

$$\begin{aligned} \tau _w = \mu _{\text {training}} + 2\cdot \sigma _{\text {training}} \end{aligned}$$

(6)

Cellular level anomaly detection: The detection of abnormal cells consists in the aggregation of multiple window-wise anomalies. For instance, the dry masses of the mother cell and its daughter cells are observed for 50 h. This full length time series is seen independently through 121 overlapping windows of 30 h by the threshold detector. In order to aggregate all the window level detections into a full cellular level, we build a second detector on top of the first one.

The anomaly score $\mathbb {A}$, Eq. (7), uses all the results of detection of all the windows extracted of a cell dry mass and computes a single score from them. This score is the ratio of abnormal windows to the total number of windows of this full-lengthed time series. Cells with a higher score are expected to be more abnormal than those with a lower score.

$$\begin{aligned} \mathbb {A} = \frac{\# \text { of abnormal windows}}{\text {total } \#\text { of windows}} \end{aligned}$$

(7)

Figure 2a shows the window-wise anomaly detection based on the representation extracted with a 1D-CNN and Fig. 2b shows the StArDusTS model for a whole time series, including the 1D-CNN representation learning block, the window-wise anomaly detection with the threshold detector and the aggregation of window-wise results with the anomaly score.

Experimental plan and constructed datasets

Cell culture acquisition

We designed a set of three live cell time-lapses to validate the models. To obtain a first assessment, a set of acquisitions a were performed with HeLa cells cultured and imaged in the laboratory. To test generalization, the model trained based on acquisitions a have been applied to another set of acquisitions b. The latter was obtained with HeLa cells cultured and imaged in a second laboratory. This is a strong generalization test since there are differences between the HeLa cell lines but also between the cell culture protocols used in the two different laboratories.

Finally, with a last set of acquisitions c, we could train a deep learning model with a wild type murine fibroblast cell line and test it on abnormal mutated fibroblasts d. Acquisitions c and d were conducted in the same laboratory.

HeLa cell culture (dataset a) comming from the ATCC catalog. HeLa cells were cultured in high glucose Dulbecco’s Modified Eagle Medium (DMEM) supplemented with GlutaMAX, pyruvate, and 10% (v/v) calf serum (Gibco). Cells were grown onto 35 mm glass bottom (0.17 mm) dishes and imaged every 10 min on a Cytonote 1W (Iprasense) for 24–48 h at 37 $^{\circ }\hbox {C}$ and 5% $CO_2$ .
HeLa cell culture (dataset b) donated by Dr Dimitrios Skoufias of the Institut de biologie structurale in Grenoble. These cells were grown in DMEM supplemented with GlutaMAX,10% (v/v) heat-inactivated fetal calf serum (FCS), and 1% penicillin and streptomycin. For imaging, 6-well glass bottom culture plates were coated with fibronectin (25 $\upmu \hbox {g}/\hbox {mL}$) for 1 h. Cells were seeded at a concentration of $2 \cdot 10^4$ cells per well and imaged on a Cytonote 6W (Iprasense) every 10 min at 37 $^{\circ }\hbox {C}$ and 5% $CO_2$
For acquisitions c and d, wild type mouse fibroblasts were isolated from C57BL/6 mice (acquisition c) while Per0 fibroblasts were isolated from $\textit{Period1 (mPer1}^{ldc\text {-}/\text {-})}$; $\textit{Period2 (Per2}^ {ldc\text {-}/\text {-})}$; $\textit{Period3 (mPer3}^{\text {-}/\text {-})}$ triple knock-out mice (PMID:35606517, (acquisition d))⁴⁶. All cell lines were cultured in standard DMEM (high glucose) supplemented with 10% FCS (Thermo Fisher), penicillin (25 units/mL, Thermo Fisher) and streptomycin (25 units/mL, Thermo Fisher). Cells were passaged following trypsinization at low density ($2-5 \cdot 10^4$) onto 35mm glass bottom (0.17 mm) dishes (FD-35, Fluoro-dish WPI) and imaged every 10 min for 24–96 h after attachment on a Cytonote 1W (Iprasense) housed inside standard cell culture incubator at a controlled temperature and humidity.

Construction of the datasets

The dry mass time series obtained on the basis of the different acquisitions mentioned above have been post-processed to train and validate the models. The time series are split into training, validation and test datasets containing respectively 80%, 10% and 10% of the data points²⁴. These sub-dataset have been independently normalized such that their mean value is 0 and their standard deviation is 1. Mother-daughter tracks shorter than 30 h are discarded. We used a sliding window, 30 h wide, moved along the full length time series to generate numerous slightly modified versions of the same time series at a 10-min interval difference. This augmentation technique⁴⁷ is used to maximize the data fed to neural networks in order to better learn the phenomenon. Finally, each 30-h window is split in two segments: the first 20 h that will be given as inputs to the neural network and the last 10 h to be predicted and therefore unknown to the neural network. The algorithms must predict the value of the dry mass during this next 10 h. A 1-h sliding window is used to smooth the 10-h signals to be predicted. A summary of the respective number of time series in the splits of each dataset is available in Table 1.

Table 1 Train/Validation/test distribution for all four datasets.

Full size table

Results and discussion

Experiment 1 : Anomaly detection

The purpose of this first experiment is to assess the anomaly detection performances of the StarDusTS model. Therefore, the algorithm is trained, validated and tested on the same datasets. The aim is to check whether the StArDusTS model is capable to learn a representation and detect anomalies among the same cell culture. This experiment is run on both dataset $\mathscr {A}$ and $\mathscr {B}$. In order to assess the performances of the detection, some of the videos were manually analyzed to detected abnormal cells. These labels allowed the assessement of the precision (i.e. the ratio between the anomalies and the detections) of the model.

After training, the StArDusTS models raised respectively 104 and 198 abnormal cells from datasets $\mathscr {A}$ and $\mathscr {B}$. These anomalies are manually annotated thanks to the original videos from which the dry mass time series were extracted. All cells detected in dataset $\mathscr {A}$ are labeled. On dataset $\mathscr {B}$, 104 cells are randomly drawn from the 198 detected as abnormal by the model. Only those cells are annotated to better compare the results on both datasets. We identified three main causes for the model to raise an anomaly:

(i)
The cell, flagged as abnormal by StArDusTS has an abnormal behavior. Such behavior are discussed in detail after and are referred to as cellular anomalies or biological anomalies.
(ii)
The acquisition system, including the lensfree microscope, the reconstruction algorithm, the segmentation algorithm and the tracking one, resulted in an error on the input fed into StArDusTS. Such anomalies are referred to as acquisition anomalies.
(iii)
The cell seems to have a normal behavior, from the video and available time series perspectives. Such anomalies are False Positives (FP) in the detection.

Two more classes emerged from the manual annotation of anomalies. On the one hand, some abnormal cell behaviors mislead the acquisition system, therefore leading to erroneous times series. Such detected anomalies are both acquisition AND biologic anomalies. On the other hand, some anomalies are impossible for us to classify. As it impossible for us to say whether they are acquisition anomaly or biologic ones, they are called acquisition OR biologic anomalies.

Table 2 gives the details of the manual annotation of this experiment on both datasets $\mathscr {A}$ and $\mathscr {B}$. The precision is the ratio of the number of good detections to the total number of detections. StArDusTS algorithm was able to detect anomalies with a precision up to 96.2% on cells from Grenoble CEA and 83.5% on cells from Curie’s Institute.

Table 2 Distribution of annotated anomalies for experiment 1 on datasets $\mathscr {A}$ and $\mathscr {B}$. Underlined anomalies are biologic ones which are detailed in Table 3.

Full size table

Table 3 Manual annotation of biologic anomalies for experiment 1 on datasets $\mathscr {A}$ and $\mathscr {B}$.

Full size table

Most biological anomalies are different one from another. However, we propose here to group them into 5 major classes. The latter identified during the manual labeling of abnormal cells:

(a)
Cells with an abnormal division, including cells that do not divide and stagnate on a plateau of dry mass, cells that divide asymmetrically or cells that divide in more than two daughter cells,
(b)
Cells with an abnormal growth, which may include growth that is too long, too short, non-linear or with unexplained dry mass loss,
(c)
Cells that merge with each other,
(d)
Cells that are too big,
(e)
Dead cells
(f)
All other cellular anomalies.

Table 3 shows the class distribution of these biological anomalies, both on cells from Grenoble CEA and Curie’s Institute. Figure 3a-f show the original acquisitions time-lapses (i.e. series of snapshots) of one example for each cellular anomaly class. For each time-lapse, we display the dry mass time series from which the anomalies were detected. Each instant corresponding to the images shown above are framed in red. Figure 3g focuses on an acquisition anomaly. More examples of all anomalies are available in the supplementary materials figure B. The videos from which these snapshots were extracted are also available as supplementary material.

Experiment 2 : Generalization

We showed in the previous experiment that the StArDusTS model is able to detect anomalies in the dataset on which it has been train. To test its generalization capabilities, the model trained on $\mathscr {A}$ has been applied to a different set of HeLa cell acquisitions $\mathscr {B}$. HeLa cells were considered as a good model for detecting anomalies, as they often present one extra version of most chromosomes with up to five copies detected in a single cell⁴⁸. Moreover, cells in a and in b were cultured and imaged under different protocols and laboratories, therefore increasing the differences between the phenotypes.

When conducting this experiment, the StArDusTS model identified 939 abnormal cells from the 7283 cells of dataset $\mathscr {B}$. Among those cells, 152 were already detected as abnormal is the previous experiment. This shows a strong consistency of the models in the detection of abnormal cells, since 76% of the cells detected in the first experiment were also detected as abnormal in this experiment. Tables 4 and 5 respectively show the distribution of the 104 manually annotated cells and the manual classification of biologic anomalies.

When the model was trained and tested on the same cells from dataset ($\mathscr {A}$), it detected 49% of acquisition anomalies and 36.5% of biological anomalies. During the generalization tests, StArDusTS detected 42.3% of acquisition anomalies and 43.3% of biological anomalies, proving the good generalization of the representation.

Table 4 Distribution of annotated anomalies for experiment 2.

Full size table

Table 5 Manual classification of the biologic anomalies of experiment 2.

Full size table

StArDusTS model obtained a better performance on the anomaly detection for dataset $\mathscr {B}$ when trained on dataset $\mathscr {A}$ (precision of 96.2%) than when it was trained on dataset $\mathscr {B}$ itself (83.5% precision). A plausible explanation is the difference in image quality between the 2 datasets. This experiment also shows that the model trained on dataset $\mathscr {A}$ can be transferred to other datasets. Indeed, it kept the exact same precision of 96.2% regardless of whether it is tested on dataset $\mathscr {A}$ or $\mathscr {B}$.

Two conclusions can be drawn from this experiment : First, we showed that StArDusTS model is general enough to be able to detect anomalies in dataset it has not been trained on. Second, the performances of the model, and especially its precision, is highly impacted by the quality of the dataset used for training.

Experiment 3: Controlled experiment

The previous experiments have shown the capability of StArdusTS for anomaly detection. However, dataset $\mathscr {A}$ and $\mathscr {B}$ could not be used for the recall evaluation. At this purpose, wild type and genetically modified fibroblasts were analysed to quantify the number of undetected anomalies and completely characterize the StArdusTS model.

To address this problem, we propose to artificially create a “labeled” dataset. The model is trained on dataset $\mathscr {C}$ and tested on both datasets $\mathscr {C}$ and $\mathscr {D}$. Cells from $\mathscr {D}$ are expected to be detected as abnormal ones since they are genetically modified. Thus, we can label cells from $\mathscr {C}$ as “normal” and cells from $\mathscr {D}$ as “abnormal” such that both precision and recall of the StArDusTS model can be measured.

This experiment allows the computation of the recall to 0.68 at the threshold $\tau _w$.

Figure 4 shows the ROC (Receiving curve for the detection of abnormal windows on datasets $\mathscr {C}_{\text {test}}$ and $\mathscr {D}$. The orange reference is the ROC curve of a random classifier. Each point is a threshold value of $\tau _w$. The red cross is the $\tau _w$ value computed from the training dataset $\mathscr {C}$ such that the time series outside the 95% interval of confidence are abnormal. It is however important to underline that the absolute values of precision and recall for the detections cannot be taken into account due to the strong hypothesis used for this experiment. Indeed, the assumption that all cells from dataset $\mathscr {C}$ are normal and $\mathscr {D}$ are abnormal might be biologically unrealistic.

The ROC curves show that choice of the $\tau _w$ value depending on the 95% confidence interval of the training dataset was made to have a good balance between the true and false positive rate. This is one of the best choice of threshold that could have been done since it is one of the points closest to the top left-hand corner of the graph.

Conclusion

In this paper, we propose a model for Self-supervised Anomaly Detection on Time Series called StArDusTS, which we applied to the detection of abnormal cells from their dry mass over time series.

StArDusTS relies on the learning of a representation of normal cells with 1D convolutional neural network trained to predict the future cell dry mass. Thanks to self-supervised learning, the detection is processed without any human induced biases during training.

In a first experiment, we validate the anomaly detection abilities of the StArDusTS model by successfully detecting abnormal time series on two datasets with a precision up to 96.2%. We were able to manually identify 2 causes of anomalies, either being cellular anomalies or acquisition anomalies. Biological anomalies were then classified into 6 sub-classes. The acquisition anomalies that we report can be used to compare and improve acquisition pipelines if needed. A second experiment validated that the representation learned from one dataset is general enough to be able to detect anomalies from cells grown in another lab. Moreover, it shows that the results are even better than those obtained with a model trained on the same data. Finally, a third experiment with dummy labels of known anomalies was set up to validate the choice of the anomaly detector.

While the representation is learned only from the dry mass time series, the StArDusTS model could be extended for the prediction of multiple features such a cell area, thickness or speed. Adding more modalities such as cell area or thickness could bring more information about the cell population being analysed and thus allow the predictions of pathological changes.

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

References

Yeo, G. H. T., Saksena, S. D. & Gifford, D. K. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat. Commun. 12, 3222. https://doi.org/10.1038/s41467-021-23518-w (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Soelistyo, C. J., Vallardi, G., Charras, G. & Lowe, A. R. Learning biophysical determinants of cell fate with deep neural networks. Nature Mach. Intell. 4, 636–644. https://doi.org/10.1038/s42256-022-00503-6 (2022).
Article Google Scholar
Soelistyo, C. J., Vallardi, G., Charras, G. & Lowe, A. R. Learning the Rules of Cell Competition Without Prior Scientific Knowledgehttps://doi.org/10.1101/2021.11.24.469554 (2021).
Allier, C. et al. CNN-based cell analysis: From image to quantitative representation. Front. Phys.9 (2022).
Tzur, A., Kafri, R., LeBleu, V. S., Lahav, G. & Kirschner, M. W. Cell growth and size homeostasis in proliferating animal cells. Science 325, 167–171. https://doi.org/10.1126/science.1174294 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, X., Yan, J. & Kirschner, M. W. Beyond G1/S regulation: How cell size homeostasis is tightly controlled throughout the cell cycle? https://doi.org/10.1101/2022.02.03.478996 (2022).
Ghenim, L. et al. A new ultradian rhythm in mammalian cell dry mass observed by holography. Sci. Rep. 11, 1290. https://doi.org/10.1038/s41598-020-79661-9 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60. https://doi.org/10.1038/s41586-023-06221-2 (2023).
Article ADS CAS PubMed Google Scholar
Naul, B., Bloom, J. S., Pérez, F. & van der Walt, S. A recurrent neural network for classification of unevenly sampled variable stars. Nature Astronomy 2, 151–155. https://doi.org/10.1038/s41550-017-0321-z (2018).
Article ADS Google Scholar
Rafique, M. et al. Delegated regressor, a robust approach for automated anomaly detection in the soil radon time series data. Sci. Rep. 10, 3004. https://doi.org/10.1038/s41598-020-59881-9 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chamberland, M. et al. Detecting microstructural deviations in individuals with deep diffusion MRI tractometry. Nature Comput. Sci. 1, 598–606. https://doi.org/10.1038/s43588-021-00126-8 (2021).
Article Google Scholar
Pastore, V. P., Zimmerman, T. G., Biswas, S. K. & Bianco, S. Annotation-free learning of plankton for classification and anomaly detection. Sci. Rep. 10, 12142. https://doi.org/10.1038/s41598-020-68662-3 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kasieczka, G. et al. The LHC olympics 2020 a community challenge for anomaly detection in high energy physics. Rep. Prog. Phys. 84, 124201. https://doi.org/10.1088/1361-6633/ac36b9 (2021).
Article ADS CAS Google Scholar
Govorkova, E. et al. Autoencoders on field-programmable gate arrays for real-time, unsupervised new physics detection at 40 MHz at the Large Hadron Collider. Nature Mach. Intell. 4, 154–161. https://doi.org/10.1038/s42256-022-00441-3 (2022).
Article Google Scholar
Outlier Detection for Temporal Data: A Survey IEEE Transactions on Knowledge and Data Engineering 26(9), 2250–2267. https://doi.org/10.1109/TKDE.2013.184 (2014).
Ozcan, A, & Demirci, U. Ultra wide-field lens-free monitoring of cells on-chip. Lab on a Chip. 8(1), 98–106. https://doi.org/10.1039/b713695a (2008).
Article CAS PubMed Google Scholar
Allier, C. et al. Lens-free video microscopy for the dynamic and quantitative analysis of adherent cell culture. J. Visualized Exp. JoVE 56580. https://doi.org/10.3791/56580 (2018).
Allier, C. et al. Imaging of dense cell cultures by multiwavelength lens-free video microscopy. Cytometry A 91, 433–442. https://doi.org/10.1002/cyto.a.23079 (2017).
Article CAS PubMed Google Scholar
Allier, C. et al. Quantitative phase imaging of adherent mammalian cells: A comparative study. Biomed. Opt. Express 10, 2768–2783. https://doi.org/10.1364/BOE.10.002768 (2019).
Article CAS PubMed PubMed Central Google Scholar
Barer, R. Interference microscopy and mass determination. Nature 169, 366–367. https://doi.org/10.1038/169366b0 (1952).
Article ADS CAS PubMed Google Scholar
Hervé, L. et al. Alternation of inverse problem approach and deep learning for lens-free microscopy image reconstruction. Sci. Rep. 10, 20207. https://doi.org/10.1038/s41598-020-76411-9 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Tinevez, J.-Y. et al. TrackMate: An open and extensible platform for single-particle tracking. Methods (San Diego, CA) 115, 80–90. https://doi.org/10.1016/j.ymeth.2016.09.016 (2017).
Article CAS Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, Cambridge, 2016).
Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer Series in Statistics (Springer, New York, NY, 2009).
Kolesnikov, A., Zhai, X. & Beyer, L. Revisiting Self-Supervised Visual Representation Learning. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1920–1929, https://doi.org/10.1109/CVPR.2019.00202 (IEEE, Long Beach, CA, USA, 2019).
Chen, T., Kornblith, S., Swersky, K., Norouzi, M. & Hinton, G. E. Big self-supervised models are strong semi-supervised learners. Adv. Neural. Inf. Process. Syst. 33, 22243–22255 (2020).
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103, https://doi.org/10.1145/1390156.1390294 (Association for Computing Machinery, New York, NY, USA, 2008).
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P.-A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010).
MathSciNet Google Scholar
Gidaris, S., Singh, P. & Komodakis, N. Unsupervised representation learning by predicting image rotations. In ICLR 2018 (Vancouver, Canada, 2018).
Larsson, G., Maire, M. & Shakhnarovich, G. Learning Representations for Automatic Colorization. arXiv:1603.06668 [cs] (2017).
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. & Efros, A. A. Context Encoders: Feature Learning by Inpainting. arXiv:1604.07379 [cs] (2016).
Jenni, S. & Favaro, P. Self-supervised feature learning by learning to spot artifacts. arXiv:1806.05024 [cs] (2018).
McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation Vol. 24 (ed. Bower, G. H.) 109–165 (Academic Press, 1989). https://doi.org/10.1016/S0079-7421(08)60536-8.
Chapter Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. In Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q. (eds.) Advances in Neural Information Processing Systems 25, 1097–1105 (Curran Associates, Inc., 2012).
Kiranyaz, S., Ince, T. & Gabbouj, M. Personalized monitoring and advance warning system for cardiac arrhythmias. Sci. Rep. 7, 1–8. https://doi.org/10.1038/s41598-017-09544-z (2017).
Article CAS Google Scholar
Li, D., Zhang, J., Zhang, Q. & Wei, X. Classification of ECG signals based on 1D convolution neural network. In 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom), 1–6, https://doi.org/10.1109/HealthCom.2017.8210784 (2017).
Abdeljaber, O. et al. 1-D CNNs for structural damage detection: Verification on a structural health monitoring benchmark data. Neurocomputing 275, 1308–1317. https://doi.org/10.1016/j.neucom.2017.09.069 (2018).
Article Google Scholar
Avci, O., Abdeljaber, O., Kiranyaz, S. & Inman, D. Structural Damage Detection in Real Time: Implementation of 1D Convolutional Neural Networks for SHM Applications. In Niezrecki, C. (ed.) Structural Health Monitoring and Damage Detection, Volume 7, Conference Proceedings of the Society for Experimental Mechanics Series, 49–54, https://doi.org/10.1007/978-3-319-54109-9_6 (Springer International Publishing, Cham, 2017).
Eren, L., Ince, T. & Kiranyaz, S. A Generic intelligent bearing fault diagnosis system using compact adaptive 1D CNN classifier. J. Signal Process. Syst. 91, 179–189. https://doi.org/10.1007/s11265-018-1378-3 (2019).
Article Google Scholar
Ince, T., Kiranyaz, S., Eren, L., Askar, M. & Gabbouj, M. Real-time motor fault detection by 1-D convolutional neural networks. IEEE Trans. Ind. Electron. 63, 7067–7075. https://doi.org/10.1109/TIE.2016.2582729 (2016).
Article Google Scholar
Khan, A., Ko, D.-K., Lim, S. C. & Kim, H. S. Structural vibration-based classification and prediction of delamination in smart composite laminates using deep learning neural network. Compos. B Eng. 161, 586–594. https://doi.org/10.1016/j.compositesb.2018.12.118 (2019).
Article CAS Google Scholar
Zhang, W., Li, C., Peng, G., Chen, Y. & Zhang, Z. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mech. Syst. Signal Process. 100, 439–453. https://doi.org/10.1016/j.ymssp.2017.06.022 (2018).
Article ADS Google Scholar
van den Oord, A. et al. WaveNet: A generative model for raw audio. CoRR arXiv:abs/1609.03499 (2016).
van den Oord, A. et al. Conditional image generation with PixelCNN decoders. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, 4790–4798 (Curran Associates, Inc., 2016).
Bailly, R., Malfante, M., Allier, C., Ghenim, L. & Mars, J. Deep anomaly detection using self-supervised learning: Application to time series of cellular data. In ASPAI 2021 - 3rd International Conference on Advances in Signal Processing and Artificial Intelligence (2021).
Tartour, K. et al. Mammalian PERIOD2 regulates H2A.Z incorporation in chromatin to orchestrate circadian negative feedback. Nature Struct. Mol. Biol. 29, 549–562. https://doi.org/10.1038/s41594-022-00777-9 (2022).
Article CAS Google Scholar
Cui, Z., Chen, W. & Chen, Y. Multi-Scale Convolutional Neural Networks for Time Series Classification. arXiv:1603.06995 [cs] (2016).
Landry, J. J. M. et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 Genes|Genomes|Genetics 3, 1213–1224. https://doi.org/10.1534/g3.113.005777 (2013).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the French ANR via Carnot funding. This work has partially received funding from the European Union’s Horizon 2020 research program under grant agreement no. 101016726.

Author information

Authors and Affiliations

Univ. Grenoble Alpes, CEA, List, F-38000, Grenoble, France
Romain Bailly & Marielle Malfante
Univ. Grenoble Alpes, CEA, Leti, F-38000, Grenoble, France
Cédric Allier & Chiara Paviolo
Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
Cédric Allier
Univ. Grenoble Alpes, INSERM, CEA-IRIG, BGE, Biomics, F-38000, Grenoble, France
Lamya Ghenim
Institut de Génomique Fonctionnelle de Lyon, Univ. Lyon, CNRS/ENS, UMR 5242, Lyon, France
Kiran Padmanabhan
Institut Curie, PSL Research University, CNRS, UMR 144, Molecular Mechanisms of Intracellular Transport, F-75005, Paris, France
Sabine Bardin
Univ. Grenoble Alpes, CNRS, Grenoble-INP, GIPSA-Lab, 38000, Grenoble, France
Romain Bailly & Jérôme Mars

Authors

Romain Bailly
View author publications
You can also search for this author in PubMed Google Scholar
Marielle Malfante
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Allier
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Paviolo
View author publications
You can also search for this author in PubMed Google Scholar
Lamya Ghenim
View author publications
You can also search for this author in PubMed Google Scholar
Kiran Padmanabhan
View author publications
You can also search for this author in PubMed Google Scholar
Sabine Bardin
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Mars
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RB: contributed to the conception and the developement of StArDusTS. It also contributed in the writing of this scientific paper, MM: contributed to the conception of StArDusTS and the revision of this paper, CA: contributed in the acquisition of the data used in this paper and the manual annotation of the raised anomalies. He also contributed in the writing of this paper, CP: contributed to the generation of cellular figures and the writing of this paper, LG: contributed in the acquisition of cellular data and the revision of this paper, KP: contributed in the acquisition of cellular data and the revision of this paper, SB: contributed in the acquisition of cellular data and the revision of this paper, JM: contributed to the conception of StArDusTS and the revision of this paper.

Corresponding author

Correspondence to Marielle Malfante.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bailly, R., Malfante, M., Allier, C. et al. Detecting abnormal cell behaviors from dry mass time series. Sci Rep 14, 7053 (2024). https://doi.org/10.1038/s41598-024-57684-w

Download citation

Received: 29 September 2023
Accepted: 20 March 2024
Published: 25 March 2024
DOI: https://doi.org/10.1038/s41598-024-57684-w

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.