COVID-19 patient accounts of illness severity, treatments and lasting symptoms

First-person accounts of COVID-19 illness and treatment complement and enrich data derived from electronic medical or public health records. With patient-reported data, it is uniquely possible to ascertain in-depth contextual information as well as behavioral and emotional responses to illness. The Novel Coronavirus Illness Patient Report (NCIPR) dataset includes complete survey responses from 1,592 confirmed COVID-19 patients ages 18 to 98. NCIPR survey questions address symptoms, medical complications, home and hospital treatments, lasting effects, anxiety about illness, employment impacts, quarantine behaviors, vaccine-related behaviors and effects, and illness of other family/household members. Additional questions address financial security, perceived discrimination, pandemic impacts (relationship, social, stress, sleep), health history, and coping strategies. Detailed patient reports of illness, environment, and psychosocial impact, proximal to timing of infection and considerate of demographic variation, is meaningful for understanding pandemic-related public health from the perspective of those that contracted the disease.

including duration of symptoms, categories of mood symptoms, and two questions about lasting cognitive complaints. Additionally, five questions were added about blood type, height and weight, history of tonsillectomy and the Macarthur Ladder. [12] The survey was closed to potential respondents on April 7, 2021.
The primary goal motivating collection of the New York NCIPR dataset was to obtain a comprehensive record of the subjective experiences of those ill with COVID-19, proximal to the time of illness. Along with this, we addressed a number of specific questions percolating across scientific, media and word-of-mouth channels. For example, there have been anecdotal accounts about individuals with pets becoming less ill, about unexpected side effects (e.g., hair loss), about lasting illness sequalae, about underlying vulnerabilities making certain individuals more susceptible. The NCIPR dataset can be used to address a large number of questions that remain unanswered about COVID illness, about human behavior, and about environmental determinants of health.
Rapid placement of the data in the public domain better assures that investigation of these and other topics will commence quickly and will be rapidly communicated to wide audiences.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Methods
The copyright holder for this preprint this version posted June 2, 2021. University. [15,16] The measure was administered in English. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2021. ; https://doi.org/10.1101/2021.05.26.21257743 doi: medRxiv preprint Sample description. The NCIPR dataset contains data from 2,212 individual respondents. 2,147 of these respondents confirm having been ill with COVID-19 in addition to having COVID-19 diagnosis in their medical record. However, description of illness severity and demographics provided here are restricted to 1,584 cases that passed the Technical Validation steps described in the section below. Timing of COVID-19 illness in the sample reflects peak prevalence rates in March 2020 and January 2021 (Fig. 2). Illness severity varied across the sample, as seen in length of illness, fever duration, peak fever, hospitalizations, and in self-reported illness severity ratings (Fig. 2). Sample demographic data are provided in Fig. 3. Respondent ages range from 18 to 98 years old. Due to a survey administration error described below, complete data are available at a ratio of ~2:1, females to males. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Geo-positioning of COVID-19 survey respondents
Geographical information about survey respondents was derived from a subset of patients (N = 697) that provided consent to future contact within the online consent form. Those that made this selection were asked to provide contact information and zip code data. Zip codes were converted to corresponding Federal Information Processing System (FIPS) codes. The distribution of patient FIPS is displayed in Fig. 4. The majority reside in Manhattan, Brooklyn and Long Island. A small number provided zip codes in states other than New York, New Jersey and Connecticut, N = 9.

Data Records
The dataset resulting from the NCIPR survey is stored in a CSV format via the https://osf.io/82rkj/ open access platform. Each row represents one respondent and each column represents a variable. The file includes every survey respondent except for those who completed the consent form only (N=68). Date of birth was converted to age in years, variable name [age_calculated]. A second variable, [db_52], is the age in years provided by the participant. Inclusion of both age-related data fields was intentional, as this provides a means of data validation, described in more detail below. Ordering of the variables in the CSV files reflects the order in which items were administered. During data preparation and validation, 10 variables were added to aid in future data processing. Table 2 summarizes the variables that were added to the raw data set during quality assessment and data validation.   Table 2. Summary of variables added to dataset during preparation and validation steps . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2021. ; https://doi.org/10.1101/2021.05.26.21257743 doi: medRxiv preprint the dataset were converted to Month-Year format (e.g. Mar-21) and individuals age 90 or older were edited to 89+ to disallow potential re-identification.

Technical Validation
Data assurance and quality checking were performed using R version 4.0.2 and Excel. Table 2 provides a summary of variables added to the dataset during quality validation steps, inclusive of QA/QC codes assigned to survey respondents. Criterion assessed for determinations about quality of patient responses included isolating implausible and/or inconsistent responses. Patients were flagged [quality_check_flag] as (1) "implausible" if they provided a height feet value greater than 7, or a height inches value greater than 12; (2) "inconsistent" if the self-reported date of birth (DOB) and current age were incongruent (defined as different by >1 year); or (3) "inconclusive" if DOB or age in years was not provided. It was noted that 5 individual respondents gave their full height in inches (e.g., 5.2 was entered as feet and 62 was entered as inches), and 2 participants typed a decimal point before self-reported age in years that matched the date of birth provided (e.g., born in 1997 and provided age .24). For those 7 cases, the [quality_check_flag = 1] was changed to [quality_check_flag = 0] and they were included in the final sample, [final_sample = 1], but the raw data causing the flag was not changed. Patient age was computed based on DOB and inserted as a new variable in the dataset [age_calculated]. Findings from these preparation and validation steps guided selection of a final sample that is coded as [excluded_sample] = '0' in the released data; these are the 1,584 described above as passing technical validation for which group level demographics are provided.