Longitudinal symptom dynamics of COVID-19 infection

As the COVID-19 pandemic progresses, obtaining information on symptoms dynamics is of essence. Here, we extracted data from primary-care electronic health records and nationwide distributed surveys to assess the longitudinal dynamics of symptoms prior to and throughout SARS-CoV-2 infection. Information was available for 206,377 individuals, including 2471 positive cases. The two datasources were discordant, with survey data capturing most of the symptoms more sensitively. The most prevalent symptoms included fever, cough and fatigue. Loss of taste and smell 3 weeks prior to testing, either self-reported or recorded by physicians, were the most discriminative symptoms for COVID-19. Additional discriminative symptoms included self-reported headache and fatigue and a documentation of syncope, rhinorrhea and fever. Children had a significantly shorter disease duration. Several symptoms were reported weeks after recovery. By a unique integration of two datasources, our study shed light on the longitudinal course of symptoms experienced by cases in primary care.

1. The manuscript sets the stage that many of the reports to date use hospital data (though that is similarly used here with the complement of self-collected data), retrospectively-collected vs. prospectively-collected, and cross-sectional vs. longitudinal datasets. Of course, there have been several other reports that have used prospective data. While some of this data has been cited, more could be done to more fully characterize the contributions in this area. While this is certainly a weakness of the majority of reports, the authors should highlight/discuss the contributions that have provided data and findings similar to those that they discuss here more comprehensively. 2. Overall the manuscript lacks statistical rigor. The statistical methods do not describe the "Odds-Ratio" models. Presuming these are logistic regression, but there is no discussion of the approach, the models employed, or the statistical software used. 3. As highlighted by the authors, the strength of this dataset is it's prospective and longitudinal collection of data. Thus, it is unclear why the authors have not attempted to perform some type of time-dependent outcome. This is important given the disease course of COVID-19 where symptoms may be differentially present and may bias reports of symptoms. i.e. minor symptoms early on the disease course may be less likely to be reported or result in presentation to the hospital, whereas more serious symptoms, e.g. fever may result in enough concern to report the full spectrum of symptoms / seek treatment. I would prefer to see hazards regression model that accounts for time to COVID-19 testing results related to symptoms to more sensitively discern the "longitudinal symptom dynamics of COVID-19" as the study is described in the title. Similarly, in the statistical methods section, the authors plot the percentages of reported symptoms across different days relative to the time of the COVID-19 test, since the data is already structured this way, these models would provide a novel aspect upon revision that would greatly improve the strength and impact of the paper. 4. It is not clear in the Odds-Ratio models what, if any, covariates have been adjusted for. Nonetheless, little attempt has been made to adjust for potential confounders for the observed observations significantly limiting the interpretation of the findings. The authors describe in the methods that comorbid condition data was collected this should be considered as potential adjustment variables as these may influence the ability to receive testing or propensity to seek care. 5. Inverse probability weighting or other adjustment methods should be considered to help address biases related to those receiving testing. 6. Given the shifts in testing trends as denoted by the authors, the authors should additionally adjust or stratify for entrance into the cohort according to different time periods. 7. Time to resolution of symptoms could be modeled using a Cox regression model to assess differences rather than the non-parametric Mann-whitney test. A slew of factors (i.e. comorbidities, age, etc.) can contribute to recovery time that seem to be in the dataset, but not used. 8. How were individuals who had not yet recovered considered in the recovery time analysis? How was recovery defined? Were all individuals given the same opportunity to contribute time to "Recovery" in terms of follow-up time? 9. It is not clear why the most robust statistical tests -the odds-ratio analysis was sent to the supplement in favor of largely descriptive figures and tables with no statistical analyses to demonstrate true differences. 10. How was the survey distributed? Was this done using an electronic reporting system? More detail related to survey implementation and the limitations according to survey access should be discussed. 11. A real strength and missed opportunity here is to show validity of self-reported symptoms vs. those reported in the EHR, yet no attempt to demonstrate strong data capture for the 5,083 participants who were similarly recorded in the EHR and via the survey. The authors should take opportunity in a revision to address this and either validate self-reports or clearly demonstrate how self-reports and EHR-captured data may differ.
Minor Comments 1. Figure 1 isn't completely clear. "Responses that did not meet quality control" seems to overlap with the methods that states individuals who didn't complete the survey were excluded. Is this the only QC metric? 2. There are a few typographical errors throughout an otherwise well-written manuscript. Most notably there are additional spaces preceding commas in lists in the introduction. Please check a revised manuscript closely for potential errors.
We have addressed the reviewers comments below, with references to parts of the paper that have been modified pursuant to reviewers' suggestions. Our responses are denoted by "Response".

REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): In this manuscript, Mizrahi and collaborators investigate the longitudinal prevalence of clinical symptoms in COVID-19 infection diagnosed by PCR testing for SARS-CoV-2 from nasopharyngeal swabs. The authors included data from Electronic Health Record linked to longitudinal self-reported surveys. This allowed them to capture data on symptoms of mostly mild COVID-19 cases from both the patient and the physician perspective. The other big strengths of the paper are the inclusion of children and the analysis of symptoms dynamics and temporal trend.

Comment 1:
However, the manuscript is far too long and difficult to follow and the main novel message of the paper (which I believe is summarized in Figures 2 and 3) gets diluted and somehow lost into many secondary confirmatory analyses (eg Figure 4) I would suggest shortening the manuscript, merging table 1 and 2 into one, moving Fig 1 and 4 to the supplements and concentrating on the longitudinal element of the study. Response 1: Thank you for this comment. We have made attempts to shorten the manuscript as suggested and revised the text in order to make the main message more clear. We have moved Figure 1 to the Supplementary appendix. We have merged the two tables and erased less relevant information in them (symptoms which appear in less than 10 positive individuals, reasons for isolation), but kept the information on the two populations separate as the population who responded to the survey and the population who attended primary care visits are different, and have different types of data available. For example, children were not included in the population who filled the surveys and several symptoms were only available for individuals who attended primary care. We believe it is therefore important to show the characteristics of the populations separately for the interpretation of the results by the readers, which are also presented separately for each of the populations. We believe that presenting the results of the odds ratio analysis in Figure 4 is important and therefore choose to include it in the main text. First, this analysis highlights the most discriminative symptoms for COVID-19 according to our data. Second, as other studies worldwide have performed similar analyses, this allows the comparison of our findings to other countries, for example, Menni et al. 2020 in the UK and USA. Third, to our knowledge, our analysis shows, for the first time, the different OR obtained for different symptoms when recorded by physicians versus self-reported symptoms. In order to not make this section longer than necessary, we included all the expended results of this analysis in the 3 tables (revised table S3, S4,S5) in section 5 of the supplementary appendix Comment 2: Also, did the authors adjust for multiple testing when performing multiple comparisons? Response 2: Thank you for this comment. For the comparison between recovery time of different subpopulations, only 3 tests were performed: children versus adults, individuals with chronic medical conditions versus without and males versus females. For odds ratio analysis of symptoms, we initially did not perform multiple testing corrections. Following your suggestion, we have performed False Discovery Rate (FDR) analysis. This analysis revealed that all self-reported symptoms and primary care recorded symptoms in both children and adults remained significantly associated with positive COVID-19 cases, when FDR was employed at the rate of 0.1. The results are presented in the tables below: Thank you for this comment. First, we would like to clarify that the Electronic health records (EHR) analysed in our study originates from the second largest health maintenance organization (HMO) in Israel and contains information from primary care clinics and not hospital's records. We believe that this is a major strength of our study, as most of the reports thus far originated from hospitalized patients. To better clarify this point, we have highlighted this point in the discussion accordingly. In addition, we have added a more comprehensive discussion regarding the contributions from previous studies on the symptoms in COVID-19 infection.

2.
Overall the manuscript lacks statistical rigor. The statistical methods do not describe the "Odds-Ratio" models. Presuming these are logistic regression, but there is no discussion of the approach, the models employed, or the statistical software used.

Response 2:
We apologize for not describing the method used for calculating the odds-ratio. It was indeed calculated by a logistic regression model. All statistical analyses were done using Python 3, and logistic regression was done using the statsmodels package. We added a description of this model to the Methods section.

As highlighted by the authors, the strength of this dataset is it's prospective and longitudinal collection of data. Thus, it is unclear why the authors have not attempted to perform some type of time-dependent outcome. This is important given the disease course of COVID-19 where symptoms may be differentially present and may bias reports of symptoms. i.e. minor symptoms early on the disease course may be less likely
to be reported or result in presentation to the hospital, whereas more serious symptoms, e.g. fever may result in enough concern to report the full spectrum of symptoms / seek treatment. I would prefer to see hazards regression model that accounts for time to COVID-19 testing results related to symptoms to more sensitively discern the "longitudinal symptom dynamics of COVID-19" as the study is described in the title. Similarly, in the statistical methods section, the authors plot the percentages of reported symptoms across different days relative to the time of the COVID-19 test, since the data is already structured this way, these models would provide a novel aspect upon revision that would greatly improve the strength and impact of the paper.

Response 3:
We thank you for this important comment. First, we would like to clarify that information on symptoms were obtained in our study by two sources: EHR of primary care visits and self-reported symptoms surveys. Therefore, we are less exposed to the potential bias mentioned which may result from the fact that minor symptoms early on the disease course will be less likely to be reported at hospital admission, as we are not analyzing symptoms at this specific time-point. Nonetheless, we agree with the reviewer on the major importance in performing time to event analyses and as suggested, we have now added time-to-event models to the analysis. We present these analyses by constructing Kaplan-Meier curves from the time in which the symptom is self reported or recorded in the EHR to a positive PCR result. Hazard ratios for each symptom, calculated by Cox proportional hazards models, and adjusted for age, gender, presence of a chronic medical condition and time (number of days since study initiation) are presented to account for the time from symptoms onset to COVID-19 testing results.
We present these results separately for self-reported symptoms and for EHR recorded symptoms. Following the fifth comment given by the reviewer, for self-reported symptoms, Inverse probability weighting (IPW) analysis was performed to help address biases related to those receiving testing (see response 5).
For both sources of information, two analysis schemes were conducted. In the first scheme, all surveys/ primary care visits are analysed together, regardless of patient id (i.e. a patient with 4 different surveys will contribute 4 rows to the analysis). In the second scheme, we weighted surveys/ primary care visits by their number per individual (i.e. a patient with 4 different surveys will contribute 4 rows, each with weight ¼ to the analysis). Time-to-event models were constructed using Python and the lifelines package. Results are presented in the figures and tables below for the following different types of analysis.
Overall, these results highlight the importance of loss of taste or smell symptoms throughout the disease course, with an HR of 22.5 and 13.3 followed by fever with HR of 9.7 and 4.3 in self-reported symptoms and EHR respectively. In several symptoms the percentage of positive tests for individuals who reported the symptom or had a record of the symptom in the EHR gradually increased with time, while in others, such as self-reported fever and shortness of breath, a steep increase in the first few days following the symptom occurred. It is most probably due to the fact that the later were part of the testing policy for COVID-19, but also partly since there were a relatively small number of individuals who reported these symptoms. In addition, several symptoms such as nausea and vomiting, muscle pain, headache, and shortness of breath reveal different patterns between the two sources of information. Individuals who selfreported these symptoms had an increasingly higher percentage of positive tests in contrast to their record in the EHR. We added these results to the results section.
1. Self-reported symptoms among individuals who were tested for COVID-19 in time.
Outcome was considered the first positive PCR test for COVID-19. Negative test results or time after 21 days was considered as censored. a. All surveys Kaplan-Meier analysis curves for self-reported symptoms of individuals who were tested for COVID-19 in time.
The curves present the percentage of individuals who tested positive for COVID-19 and report a specific symptom (red) versus those who did not report this symptom (blue) in time. Hazard ratios (HR) adjusted for gender, age, prior conditions and time (number of days since study initiation) are marked in the figure. All surveys were included in this analysis.

b. Weighted surveys
Kaplan-Meier analysis curves for self-reported symptoms of individuals who were tested for COVID-19 in time.
The curves present the percentage of individuals who tested positive for COVID-19 and report a specific symptom (red) versus those who did not report this symptom (blue) in time. Hazard ratios (HR) adjusted for gender, age, prior conditions and time (number of days since study initiation) are marked in figure text. Surveys were weighted according to the number of surveys which were filled by an individual person.
The results of the two analyses schemes for self-reported symptoms are summarized in the  We added a section in the results which describes these results. Kaplan-Meier with the result obtained for all surveys and all EHR and adjusted HR values were added to the revised Figure 1. Full results were added to section 6 of the Supplementary appendix (tables S6 and S7).  These results are also presented in the revised Figure 3, along with the results of IPW analysis (see response5). These results are also presented in supplementary Figure S3:

Inverse probability weighting or other adjustment methods should be considered to help address biases related to those receiving testing. Response 5:
Thank you for this excellent suggestion, we have now included IPW analysis to help address biases related to those receiving testing. Inverse Probability weighting was applied by fitting a logistic regression model for the probability of being tested (regardless of result). Covariates used for this model were gender, age, chronic medical condition, time (number of days since study initiation) and reported symptoms. The individual probabilities were then used to inversely weight individuals in the logistic regression model. The results of this analysis are now described in the manuscript and below, alongside the results of the OR obtained by the basic and adjusted logistic regression models. When applying IPW and the adjusted IPW, all self-reported symptoms which were significant by the basic and adjusted OR model, remain significant with the exception of fatigue. Of note, IPW was applied only for self-reported symptoms, as the population of individuals who had information from primary care visits was available only for individuals tested for COVID-19. We fully agree with your comment and adjusted our OR analysis also by the number of days since the initiation of the study. (i.e. comorbidities, age, etc.) can contribute to recovery time that seem to be in the dataset, but not used.

Response 7:
Thank you for this comment. We wish to clarify that we used Mann-Whitney test to compare recovery time between different groups of individuals. Recovery time was considered as the date in which a second consecutive negative PCR test result for COVID-19 was recorded, in line with the Israeli ministry of health policy during the study period. Unfortunately, modeling the time of symptoms resolution was not feasible based on our data, as the majority of individuals did not fulfill the survey every day, and therefore it was not possible to infer the exact timing of the resolution of a specific symptom.
Following your suggestion, we modeled the recovery time by Cox regression models.
We agree that many factors may contribute to recovery time and thus used these models to further analyse the effect of age, gender and the presence of comorbidities on recovery time. These analyses revealed that children have a significantly shorter recovery time compared to adults (p=0.04). Gender and the presence of a chronic medical condition did not affect the recovery time significantly (p=0.46 and p=0.87 respectively). These analyses are presented in the following figures and We agree with the reviewer comment that a better modeling approach is to analyse recovery time using cox regression models, which can model censoring properly. We therefore performed these analyses which are presented above. In this analysis, all positive patients, including patients who did not yet recover as they did not have enough follow up time were included, and censored at the last time of data available 9. It is not clear why the most robust statistical tests -the odds-ratio analysis was sent to the supplement in favor of largely descriptive figures and tables with no statistical analyses to demonstrate true differences.

Response 9:
We fully agree that the odds-ratio analysis is of major importance and therefore included the results of this analysis in the main text and presented them visually in Figure 4. In addition, we added 3 tables ( In order to check the validity of our survey, we directly compared self-reported answers on COVID-19 diagnoses in our survey to the documented test in the medical chart and found a very high agreement between the two sources. This point is discussed in the discussion. For example, from those who reported not diagnosed with COVID-19 in the survey, only 0.03% had a record of a positive COVID-19 test in the EHR thus further validating our survey. We agree with the reviewer that it is also an opportunity to validate self-reports of symptoms or demonstrate how self-reports and EHR-captured data may differ. However, the comparison between self-reported symptoms and EHR recorded symptoms is more problematic, as symptoms may be dynamic and the timing of the visits in the clinics and the survey filling are not identical. To further investigate this issue, we searched for individuals among the 5,083 participants who filled a symptom survey on the same date in which they had a clinic visit. In total, we identified 915 different events in which the same person filled a self-reported survey and had a physician visit documented in the EHR on the same day , for a total of 706 different individuals. When comparing these events, we found the overall agreement between the two sources was generally low. Overall, most of the symptoms, with the exception of fever and myalgia, were self-reported in a higher percentage than they were recorded by physicians in the EHR. The results of the comparison are presented in the table below. As expected, symptoms which are part of the Israeli testing policy had a higher agreement between the two sources since they were more likely to be asked by a physician during the visit. These included cough, which had a 52% agreement between the two sources and fever, which had a 34% agreement. Diarrhea also had a relatively high agreement of 35%. Other symptoms had a lower agreement of up to 16%. Disturbance of the sensation in smell and taste, had no agreement at all between the two sources, and were mostly self-reported, potentially due to the fact that early in the course of the pandemic the evidence on the existence of this symptoms in individuals infected with COVID-19 was not strong, so it is possible that it was less asked and reported by physicians. The low agreement between both data sources may be due to several reasons including the local practices of diagnosis documentation by physicians, who usually document only the main diagnoses as ICD-9 codes, as there is limited time dedicated to the clinic visit. Moreover, previous studies comparing individual's self-report versus clinicians reporting in regard to other diagnosis, have found that physicians are more likely to record diagnoses for more severe, complex cases (Morgan, Maria A., et al. General hospital psychiatry, 2019) and that self-report is more sensitive to identifying symptoms-based conditions (VIOLÁN, Concepción, et al. BMC public health, 2013). Altogether, we believe that this point highlights the strength of our study in integrating two different data sources together, each with its own pros and cons, to obtain a more complete picture on symptoms as reflected by both perspectives. We added a section on this analysis to the results and discussed it in the discussion. The full results were added as Table S1 to section 1 of the supplementary appendix:  Minor Comments 1. Figure 1 isn't completely clear. "Responses that did not meet quality control" seems to overlap with the methods that states individuals who didn't complete the survey were excluded. Is this the only QC metric? Response 12: Indeed, the only QC metric was surveys which were not fully completed by participants and were therefore excluded from the analysis. For clarity, we added it to the legend of this figure which was moved to the supplementary in response to a comment by the first reviewer and is now termed Figure S1.

2.
There are a few typographical errors throughout an otherwise well-written manuscript. Most notably there are additional spaces preceding commas in lists in the introduction. Please check a revised manuscript closely for potential errors.

Response 13:
Thank you very much, we apologize for the typographical errors. We have erased the additional spaces, revised the manuscript again and corrected additional errors.