Natural language processing of multi-hospital electronic health records for public health surveillance of suicidality

There is an urgent need to monitor the mental health of large populations, especially during crises such as the COVID-19 pandemic, to timely identify the most at-risk subgroups and to design targeted prevention campaigns. We therefore developed and validated surveillance indicators related to suicidality: the monthly number of hospitalisations caused by suicide attempts and the prevalence among them of five known risks factors. They were automatically computed analysing the electronic health records of fifteen university hospitals of the Paris area, France, using natural language processing algorithms based on artificial intelligence. We evaluated the relevance of these indicators conducting a retrospective cohort study. Considering 2,911,920 records contained in a common data warehouse, we tested for changes after the pandemic outbreak in the slope of the monthly number of suicide attempts by conducting an interrupted time-series analysis. We segmented the assessment time in two sub-periods: before (August 1, 2017, to February 29, 2020) and during (March 1, 2020, to June 31, 2022) the COVID-19 pandemic. We detected 14,023 hospitalisations caused by suicide attempts. Their monthly number accelerated after the COVID-19 outbreak with an estimated trend variation reaching 3.7 (95%CI 2.1–5.3), mainly driven by an increase among girls aged 8–17 (trend variation 1.8, 95%CI 1.2–2.5). After the pandemic outbreak, acts of domestic, physical and sexual violence were more often reported (prevalence ratios: 1.3, 95%CI 1.16–1.48; 1.3, 95%CI 1.10–1.64 and 1.7, 95%CI 1.48–1.98), fewer patients died (p = 0.007) and stays were shorter (p < 0.001). Our study demonstrates that textual clinical data collected in multiple hospitals can be jointly analysed to compute timely indicators describing mental health conditions of populations. Our findings also highlight the need to better take into account the violence imposed on women, especially at early ages and in the aftermath of the COVID-19 pandemic.


Sensitivity analysis
We assessed the robustness of our conclusions by conducting four sensitivity analyses.

Rule-based algorithm
A first sensitivity analysis consisted in replacing the entity-classification algorithm based on machine learning by a simpler rule-based algorithm (see Details on the algorithms section).Although the performances of this algorithm were lower than the machine learning algorithm, the results presented in the main article were robust to this modification.Supplementary Table1.Numbers

Claim-based algorithm
In another sensitivity analysis we selected among the total database hospitalisations that had at least one claim code related to self-harm (X60-X84 codes of the International Classification of Diseases, 10th revision, as in previous studies, extracted from the French PMSI, Programme de Médicalisation des Systèmes d'Information database). 1We underline the limits of this approach as codes X62 (self-intoxication by narcotics and hallucinogens) and X65 (selfintoxication by alcohol) may in particular not correspond to our definition of SA.These codes are not restricted to suicide attempts but they are commonly used by clinicians to report them.
Another limitation was that claim data was only available for stays in medicine, surgery and obstetrics hospital departments.The periods of claim data's availability in the database is equal to the period of administrative data availability, and we considered consequently the same hospitals as in the main analysis.Considering all the hospitalisation stays that were labelled as SA-caused using claim data, we obtained the following results.The variation of trend for the overall population appeared smaller than the one obtained by the NLP algorithm and no significant variation is observed for young women (aged 18-25) but we still observed a positive variation for girls (aged

Per-hospital subgroup analysis
Another sensitivity analysis consisted in conducting per-hospital subgroup analyses.We considered successively each one of the 15 hospitals (see Supplementary Table4 for abbreviations), and reproduced the same analysis.We focused on the trend variations after COVID-19 for the overall population and for the 8-17 female population.

Modalities of suicide attempt
Supplementary Figure3 shows the variation with time of the proportion of each modality of suicide attempt.We observed a stable proportion of each modality with respect to time, and a larger amount of intentional drug intoxications for females compared to males.If many different positive mentions of modalities were detected in the discharge summary of a SAcaused stay, we weighted each modality proportionally to the number of mentions (i.e., if a stay mentioned 3 times drug intoxications and 2 times defenestration, we counted 3/(3+2) drug intoxication and 2/(3+2) defenestration in the aggregate result shown in Supplementary Figure3).Generic mentions (e.g., "suicide attempt") were not counted if other positive modalities were found.Otherwise, if there were only generic mentions, the stay was labelled as "Unknown & other forms".

Details on cohort creation
The source population was restricted to 15 of the 38 AP-HP's hospitals in order to limit data completeness issues.Indeed, only the main and most recent softwares of the clinical information system have seen their data integrated in the database.The deployment of the main electronic health record (EHR) software, ORBIS Dedalus Healthcare, is in particular an ongoing process that started in 2012 and is not yet achieved.Depending on the state of the deployment at a given date in a given medical unit, data may or may not be available in the database for secondary use.In this study we therefore considered only 15 of the 38 AP-HP's hospitals for which the deployment of the EHR was considered as advanced at the beginning of the study (Supplementary Table4).Supplementary Figure16 shows for each one of the 15 included hospitals the proportion of all the hospital stays (i.e., not specific to SA) that have at least one discharge summary available in the research database.One observed a slow variation with time of data completeness that depended on the hospital.A dedicated sensitivity analysis was therefore conducted in order to test the robustness of our study's result with respect to this issue (see Supplementary Figure15 and Supplementary Table2).

Trigra m
Hospital

Document-classification algorithm
The document-classification algorithm consisted in detecting first all entities (i.e., terms) relative to SA in the discharge summary.The text around these entities was pre-processed (tokeniser, sentenciser).These entities were then passed to an entity-classification algorithm that used the context of the entity to determine whether it corresponded to the mention of a SAcaused hospitalisation or to something else.Indeed, a purely dictionary-based approach would have led to many false positive detections as it would not consider the context of SA mentions.
In particular, mentions may be negated, formulated as a hypothesis, not relative to the patient or expressed as a reported speech.The document-classification algorithm classified the document as a true SA-caused hospitalisation if at least one entity was validated by the entity classification algorithm.

Entity-classification algorithm: case 1 -machine learning
The entity-classification algorithm classified each retrieved entity as a true or a false detection.A machine learning model was used therefore, that consisted in a single RoBERTa binary mono-label classification head on top of the eds-CamemBERT model that was itself fed by the output of a CamemBERT tokeniser. 4eds-CamemBERT is a word embedding model that had been previously fine-tuned on 21 million French clinical documents of the clinical data warehouse. 5The entity to be classified was fed to the model along with its context (a window of 35 words before the first word of the entity and 10 words after the last word of the entity).
The eds-CamemBERT model provided an embedding vector of dimension 512 for each token.
The embedding of the first token of each entity was fed into a classification layer.Our method differed from the original RoBERTa method as we classified the token of interest (i.e., not the <s> token as in the original implementation for sentence level classification task).Using a machine learning approach for entity classification in addition to the rule-based approach for entity recognition (regular expressions) led to an overall hybrid approach for text processing.

Entity-classification algorithm: case 2 -rule-based
As part of the sensitivity analysis, an alternative purely rule-based approach was also implemented for entity-classification.It relied on the use of the open-source EDS-NLP library dedicated to the development of rule-based NLP algorithms for the analysis of French clinical documents. 6This library includes a dedicated detection pipeline for term modifiers (family, patient's history, reported speech, negation, hypothesis) and for the detection of dates.In our case, the detected dates were linked to the detected entity if they were in the same sentence and if the date did not correspond to the patient's birth date.If the mentioned date was at least 15 days before the start of the stay, the entity was classified as being part of the patient's history.

Risk factors detection
Each stay was classified as mentioning or not the five risk factors (RF) considered in this study.
-Social isolation, domestic violence, sexual violence and physical violence: the stayclassification algorithm for these RF followed the architecture of the rule-based stayclassification algorithm for SA but using another dictionary (Supplementary Table6) and considering only the negation and hypothesis modifiers to discard false positive detections.-Suicide attempt history: a stay was classified as positive if at least one SA entity was detected in its discharge summary that was neither negated, relative to another experiencer, nor expressed in a hypothetical sentence and that was qualified as being part of the patient's history.Therefore, we used the same text-processing architecture than the rule-based algorithm for the detection of SA-caused stays (see Supplementary Figure3).

Development Dictionary
The dictionaries (Supplementary Tables5-6) and the annotation guideline were initialised by asking a college of junior and senior psychiatrists coming from both pediatric and adult psychiatric units about their a priori knowledge of synonyms used to mention suicide attempts in clinical documents.This initial lexicon was expanded in families of keywords by data scientists to consider usual abbreviations and some typographic errors.It was then translated to the syntax of the preselection query engine.For example: "tentative de suicide" was expanded in {"tentative de suicide","tentatives de suicide"} and then translated to {(tentative & de & suicide) , (tentatives & de & suicide) }.The query engine was not sensitive to case, therefore all terms were expressed in lowercase.We also considered possible spelling errors regarding the accent marks, points and other syntactic markers.For instance, for the lexical variant "tentative d'auto-strangulation" we also considered "tentative d auto strangulation".
The textual criteria of Supplementary Table5 were finally concatenated with the OR (|) logic operator to form the final query.The objective was to increase the sensitivity of the first selection step (i.e., screening, see Supplementary Figure16).
We improved the dictionary using the results of the first annotation campaign on documents that had been pre-selected using the initial dictionary.In particular, we developed regular expressions in addition to the simple keywords used in the preselection query engine.These regular expressions were applied in the stay-classification algorithm and they could discard some false positive cases that were often encountered (e.g., "ts en regression" were "ts" stands for "tissus sains" -healthy tissue-).The first annotation campaign dedicated to the algorithm training consisted for each eligible document of the training set (i.e., any discharge summary containing at least one term of the dictionary) in annotating all the detected entities of SA or RF and in collecting keywords that were not already in the dictionary.All the documents were automatically pre-annotated with the rule-based algorithm in order to facilitate the annotator's task.

Machine learning algorithm
The annotated dataset of 1571 SA entities was randomly split into a ML-training and a MLdevelopment dataset (containing 1216 and 355 entities, respectively).The SA entity to be classified was fed into the model along with a context window (containing 35 words before the entity and 10 words after it).The objective of the model was to label the first token of each SA-entity to a binary value: 1 if the entity corresponded to a SA-caused hospitalisation, 0 otherwise.We optimised the cross-entropy loss function.The dropout rate was set to 0.1 during the training of the machine learning algorithm..A simple hyperparameter search was done on the development set for the learning rate and the batch size.The search space for the learning rate was {1e-5, 2e-5, 3e-5, 5e-5} and for the batch size {16,32}.We used a grid search method and the combination with the best F1-score is used.Finally, the learning rate was set to 3e-5 and the batch size to 16.We used the Adam optimise with the learning rate set to 3e-5, the weight decay to 0.1, the β1 to 0.9, the β2 to 0.98 and the ϵ to 1e-6.We trained the ML model during 10 epochs with a warmup of the learning rate during 2 epochs.The best epoch checkpoint regarding the F1 score of the development set was kept.We used the LambdaLR scheduler with default parameters.

Validation
When the algorithms were deemed satisfactory, they were frozen and their performances were assessed on the validation set.
-Validation of the main (hybrid) SA-detection algorithm: 162 stays detected as being caused by SA (85 and 77 for each period, pre-and post-pandemic) and that took place in one of the hospitals of the validation set were drawn randomly.Two clinicians conducted a chart review and labelled each stay as being a true positive or a false positive detection of a SA-caused hospitalisation.-Validation of the RF-detection algorithms: for each RF, at least 40 stays (at least 20 for each period, pre-and post-pandemic) were drawn randomly among the set of stays detected as being caused by SA, that took place in one of the hospitals of the validation set and with a detected positive mention of the RF.-Validation of the alternative (rule-based) SA-detection algorithm: we first applied the rule-based SA-detection algorithm on the stays that were classified as being caused by SA by the main hybrid algorithm and that had already been manually annotated.We computed     the predictive positive value of the rule-based algorithm on this dataset.This allowed us to reuse already-annotated data, but was biased towards stays detected by the hybrid algorithm.In order to remove this bias and complete the dataset with stays not detected by the hybrid algorithm, we drew randomly in the total dataset 40 additional records (24 and 16 for each pre-and post-pandemic periods) among those that were classified as being caused by SA only by the alternative rulebased algorithm.   the PPV of the rule-based algorithm on this dataset was estimated by conducting a chart review.The overall PPV of the alternative algorithm was then estimated using the following equation: with     the probability of a record drawn randomly among all those that were classified as being SA-caused by the rule-based algorithm in the total dataset to be also classified as SA-caused by the main, hybrid algorithm.In that case the 95% confidence interval was not computed as the Wilson method could not be applied.
The annotator accessed the last-edited discharge summary of the annotated visit.The interannotator agreement was measured by an annotation of approximately 10% of the stays by two annotators.The inter-annotator positive and negative agreements were [0.92;0.5]for SA detection, [1.0;1.0] for history of SA, physical violence, sexual violence, domestic violence and [1.0;-] for social isolation (i.e., no false positive detection was observed by the clinicians in the doubly annotated dataset).Three stays with annotator disagreement were re-annotated and corrected in the validation dataset.

Annotation guidelines -Suicide attempt Definition
We defined a suicide attempt (SA) as a self-directed potentially injurious behavior with any intent to die as a result of the behavior. 7

General annotation guidelines
We applied the following rules to annotate SA in clinical documents: • We discarded both the mentions of self-harm that did not explicitly indicated the intent to die (e.g., scarifications) and the mentions of suicide ideation; • Suspicions of SA were not considered as SA; • Intentional drug overdose or defenestrations were considered as SA even if the intentionality to die was not always explicitly stated as in these cases the intentionality is often implicitly meant;

Entity-level annotation guidelines
We completed the general guidelines to realise the entity-level annotation of the training dataset.We distinguished the annotation of each mention of the SA concept from its characterisation through the following attributes: • Negation: if the patient or the clinical staff are denying or negating the suicide attempt (e.g., "she never attempted suicide") • Family: if the patient or the clinical staff are referring to the patient's close circle who attempted suicide (e.g., "the patient's father attempted suicide") • Hypothesis: if the mention of suicide attempt is expressed in an hypothetical sentence (e.g., "the patient may have attempted suicide") • History: if the mention is related to a previous suicide attempt that did not directly cause the hospitalisation.When dates are available, we consider that a suicide attempt occurring more than 15 days before hospitalisation did not directly cause it (e.g., "the patient attempted suicide on January 1st, 2022" for an hospitalisation on January 24th, 2022) • Reported Speech: if the mention of suicide attempt is expressed by someone else than the clinician (e.g., "the patient indicated that he attempted suicide") We provide some examples to illustrate these guidelines: • Example 1: a negative sentence mentioning SA ○ French: "il ne s'agit pas d'une TS car la blessure est involontaire" ○ English: "it is not a SA because the injury is not voluntary" ○ Annotation: "positive" SA concept and negation attribute • Example 2: a mention detected by the algorithm that did not correspond to a SA ○ French: "le patient prend ses médicament ts les 2 jours" ○ English: "the patient takes his medication every 2 days" ○ Annotation: "negative" SA concept • Example 3: a patient reporting a SA ○ French: "Le patient nie que ce sont geste était une tentative de suicide»."○ English: "The patient denies that he attempted suicide" ○ Annotation: "positive" SA concept and reported speech attribute.

Stay-level annotation guidelines
We completed the general guidelines to realise the stay-level annotation of the validation dataset: • A stay was labeled as a "positive" SA if the stay was caused by a suicide attempt realised by the patient less than 15 days before hospitalisation • A stay was labelled as a "positive" SA if a SA occurred during hospitalisation.In that case the admission to hospital is not caused by SA, but the duration of the hospitalisation is augmented because of the SA and for the sake of simplicity we have therefore chosen to label the overall hospitalisation as SA-caused.• When there was a contradiction in the annotated clinical document between the reason for hospitalisation and its conclusion, we considered the conclusion as the truth. -

Risk factors Definitions
We started by defining the following two first risk factors: • Social isolation: was defined as a lack of social contact.We underline that this definition was not equivalent to loneliness, that is a feeling.Social isolation is usually an observation of the clinician.• History of suicide attempt: defined as a confirmed previous suicide attempt of the patient that did not directly lead to her hospitalisation.We completed these risk factors by three additional risk factors related to violence: • Sexual violence: we followed the World Health Organization (WHO) definition of sexual violence as being "any sexual act, attempt to obtain a sexual act, unwanted sexual comments or advances, or acts to traffic, or otherwise directed, against a person's sexuality using coercion, by any person regardless of their relationship to the victim, in any setting, including but not limited to home and work.". 8• Physical violence: we restricted the WHO definition of violence to its physical aspect and discarded violence against oneself, leading to the definition of physical violence as "the intentional use of actual physical force against the patient that either results in or has a high likelihood of resulting in injury, death, psychological harm, maldevelopment or deprivation.". 8 • Domestic violence: we restricted the WHO definition of violence to violence exerted on the patient by members of her domestic environment: "The intentional use of physical force or power, threatened or actual, exerted on the patient by a member of her family or household members that either results in or has a high likelihood of resulting in injury, death, psychological harm, maldevelopment or deprivation.". 8

Annotation guidelines
We applied the following rules to annotate risk factors at the stay level during the validation campaign: • Social isolation: clinical documents sometimes mentioned aspects that could be interpreted as social isolation even if their qualification as a risk factor was not explicitly stated.In that case we nevertheless annotated the stay as mentioning social isolation (e.g., when the clinician mentioned the absence of domestic animals).• History of suicide attempt: when the date of a previous SA was available, we considered that it was part of the patient's history if it occurred at least 15 days before the hospitalisation.When the date of the previous SA was not explicitly mentioned, if it was clearly stated that SA was not the direct cause of the hospitalisation, we nevertheless annotated it as a history of suicide attempt.• Sexual violence: some ambiguous expressions that are mostly referring to sexual violence were labelled as positive sexual violence (e.g., French "gestes déplacés", English: "inappropriate gesture").
• Physical violence: we did not label mentions of sexual violence as being also mentions of physical violence.A clinical document mentioning a rape could for instance be labeled as "positive" for the sexual violence risk factor but "negative" for the physical violence risk factor.• Domestic violence: if a violence was exerted on the patient by her family, we labelled it as domestic violence even if we did not know whether they both lived at the same place We underline that a stay could be labelled "positive" to more than one risk factor as they were not mutually exclusive.
and modelling of hospitalisations caused by suicide attempts -rule-based algorithm 8-17).Adjusting for potential deployment biasAnother sensitivity analysis consisted in adjusting for a potential bias induced by the temporally unequal availability of discharge summaries (see Supplementary Figure17).For each hospital and each month, the number of detected SA-caused hospitalisations was divided by the proportion of hospitalisations having at least one discharge summary available in the research database (i.e., not restricted to SA-caused stays).Assuming that discharge summaries were missing completely at random (MCAR assumption), dividing by data completeness indeed provides an estimate of the true number of SA-caused hospitalisations.Modifying this single aspect, we obtained the following results that were coherent with the main analysis.