Introduction

Electronic healthcare records (EHR) have been widely adopted across different healthcare settings globally and have become an integral part of health infrastructure: saving time, improving communication and record keeping, and supporting learning1. As the scale and breadth of EHR data increases, so does its ability to fulfil secondary functions including quality improvement, product development, and research, contingent on appropriate regulation and transparency2. Example applications of EHR data include population-level epidemiological studies3,4,5, machine learning-based diagnostic assistants for clinicians6, screening for child maltreatment and family violence7, and detecting and tracking infectious disease outbreaks8,9.

However, conclusions rely on the reliability and accuracy of EHR data, which is not guaranteed10,11. Indeed, the use of EHR data beyond its original purpose (clinical care and billing) raises specific challenges. Data collection in the clinical environment is imperfect12 and often incomplete13; it may lack comparability or reproducibility14 or even simply be wrong15,16. Additionally, sensor-derived data such vital signs, are also subject to intrinsic measurement errors arising from variation in calibration, accuracy and drift over time. Attempts to quantify or evaluate EHR data quality are limited, and even fewer have investigated causes for variability in quality17. Most studies have focused on checking the accuracy of clinical and diagnostic codes rather than numerical observations. In one notable exception, evaluating the quality of vital sign data across multiple hospitals and EHR systems, there was a skew of completeness and correctness in favour of arriving patients and higher fidelity in wholly EHR based systems compared to a combination of paper and EHR18.

Digit preference in vital sign measurement is a well-recognized phenomenon, however it is infrequently formally accounted for in healthcare delivery or in studies using EHR data. Additionally, systems to ensure vital signs are attributable, correct, and contemporaneously recorded can be limited. Many hospitals rely on manual transcription of readings into patient records rather than using potentially more accurate automated systems, e.g. because of cost or problems with interoperability of measurement devices and EHR systems. Terminal digit preference for multiples of ten in blood pressure (BP) recordings has been shown to be extremely common19,20, to introduce systemic bias potentially effecting mortality21 and to produce inaccurate epidemiological results22. This phenomenon has been observed in other vital sign measurements such as respiratory rate23, with attempts to rectify inaccuracy through continuous, automated monitoring24. Digital preference in vital signs may also impact derived values such as pulse pressure, and platforms that depend on them including early warning systems. However, in addition to standard digit preference, when reviewing data from our hospital group we noted an excess of a specific temperature measurement, 36.0 °C, that appeared unlikely to have arisen from rounding alone, warranting further investigation which we describe here.

We investigated observations of vital signs gathered over 3.5 years from inpatients at a large UK teaching hospital group. We assessed the frequency of recordings with preferences for a specific values (e.g. multiples of ten for BP or temperature readings of 36.0 °C) as a marker of sub-optimal data quality. We extend prior work in this area19,20,21,22,23 by investigating associations between digit preference and patient factors, such as age and sex, and hospital factors, such as the specialty caring for a patient. The associations we describe provide insights into what may drive digit preference and may help healthcare institutions improve the quality of the data they collect and use for patient care.

Methods

Setting

We conducted a retrospective observational study at Oxford University Hospitals NHS Foundation Trust (OUH) in Oxfordshire, UK. OUH consists of four teaching hospitals with a total of 1000 beds: Hospital A (providing acute care, trauma, and neurosurgery services); Hospital B (elective cancer surgery, transplant, haematology, oncology); Hospital C (district hospital, acute medical services) and Hospital D (elective orthopaedics). OUH acts as a tertiary referral centre for the surrounding region, providing approximately 1 million patient contacts a year and serving a population of around 655,000.

Data

We used individual observations of vital signs conducted at OUH for adult inpatients (≥ 18y) between 01-January-2016 and 30-June-2019. The vital signs observed, with dates and times of collection, included respiratory rate (RR), heart rate (HR), tympanic temperature, systolic and diastolic BP (SBP and DBP) and oxygen saturations. Vital signs were included for all general wards, but those from intensive care units, operating theatre recovery areas, day case units, and OUH’s hospice were not included as these were collected using a separate system or in locations with a different care delivery focus.

Observations were collected by healthcare assistants and registered nurses using a semi-automated vital sign observation system across all 4 hospital sites. HR, SBP, DBP, and oxygen saturations were collected using an observation machine combining an electronic sphygmomanometer and pulse oximeter. RR was manually timed, typically expected to be recorded by counting the number of breaths over 60 s. Temperature was measured with a separate tympanic thermometer. All observations were then manually transcribed into a tablet computer attached to the same stand, this was usually done at the bedside as the tablet computer allowed the patient’s wristband to be scanned to add results to their record. The tablet computer automatically uploaded results into the EHR in real time. Although the tablet computer and observation equipment were co-located on the same mobile stand, there was no automated check that the observations documented had been performed or matched those measured. We do not believe that any of the measurement devices show any intrinsic value preference. All devices produce an error rather than a default reading if measurement is unsuccessful. Supplemental oxygen devices and alertness (alert, responsive to voice, pain or unresponsive, AVPU) were also recorded. However, these non-numerical measurements are not considered further here. Additional data were obtained: hospital-level data (hospital where the observation was made, the specialty managing the patient); and patient data (age, sex, ethnicity, index of multiple deprivation (IMD) score at home address, Charlson comorbidity score).

Statistical analysis

Several approaches have been previously described for identifying and quantifying digit preference25. For example, jointly estimating a flexible, but smooth, underlying distribution and modelling rounding from adjacent values to the nearest number showing digit preference, e.g. from 9 or 11 to 1026,27. Extensions of this approach allow for rounding of groups of adjacent values, e.g. to the nearest 1028. However, here we also wanted to account for a phenomenon in temperature recordings which went beyond simple rounding, where a subset of all observations was set to 36.0 °C. We therefore used a simple maximum likelihood-based estimator to jointly estimate the underlying distribution of temperature, HR, SBP, DBP, and respiratory rate measurements, and the proportion of observations affected by digit preference. Oxygen saturation measurements had only limited dynamic range and no clear evidence of digit preference and so were not studied further. For all other vital signs, we assume that a given vital sign follows an underlying distribution, here we fit both normal and gamma distributions. This leads to the following expression for the statistical likelihood of the observed data, given the parameters governing the underlying distribution and any digit preference (i.e. the probability of digit preference and mean/standard deviation or shape/rate):

Pr(observation was subject to digit preference) * Pr(true value is from the interval of the source distribution leading to rounding) + Pr(observation not subject to digit preference) * Pr(true value given the precision values reported at).

In the case of BP and HR recordings, which are initially reported by the measurement device to the nearest whole number, we estimate the extent of rounding to the nearest 10, for example where the BP reading was 120 mmHg, then the likelihood becomes:

Pr(observation rounded) * Pr(observation from the interval [114.5, 124.5)) + Pr(observation not rounded)*Pr(observation from the interval [119.5, 120.5)).

In the case when the HR or BP is not a multiple of ten, then the probability of rounding is set to zero, and only the second term of the likelihood applies. As this term includes the probability that the observation is not rounded it accounts for the fact that rounding leads to depletion in the frequency of observed values relative to the underlying distribution at values that are rounded up/down. The most common way rounding occurs in the RR is by only timing the number of breaths over 15 or 30 s (rather than 60 s), and then multiplying by 4 or 2 respectively to report breaths per minute. We therefore simultaneously estimated the extent of rounding leading to multiples of 4 and 2 for RR. The formula used for the likelihood means that the estimated proportion of observations subject to rounding includes observations where the true value is a multiple of 4 or 2; as such we estimate the total proportion of respiratory rate observations that might have been measured by timing breaths over 15 or 30 s respectively. Similarly, the form of the likelihood for HR and BP rounding means we estimate the total extent of rounding behaviour, including in our calculations the approximately 1 in 10 instances where the true value and the rounded value are the same.

For temperature readings we assume that any true observation can lead to a documented recording of 36.0 °C, as our hypothesis is that an excess of these readings occurs when the temperature is not actually measured but simply documented as 36.0 °C instead, such that the likelihood becomes:

Pr(observation subject to preference for 36.0 °C) + Pr(observation not subject to preference for 36.0 °C) * Pr(observation from the observed interval of the source distribution).

For temperature recordings we make the simplifying assumption that any observation that is not 36.0 °C is not subject to digit preference.

Maximum likelihood estimates were obtained using R, version 4.2 and pnorm, pgamma and optim functions (see Supplement for code). Confidence intervals were estimated by non-parametric bootstrap sampling using 1000 iterations. For computational efficiency only 10,000 observations were included in each iteration. The accuracy of the code was tested through simulation prior to use.

We used multivariable logistic regression to investigate associations between temperature recordings of 36.0 °C and several factors potentially driving value preferences. Analyses were restricted to patients with complete data, and to complete vital sign sets (i.e. all of temperature, HR, RR, SBP, DBP, and oxygen saturations recorded). We used natural cubic splines to account for non-linear relationships for continuous variables (allowing up to five default placed knots, selecting the final number of knots by minimising the Bayesian Information Criterion, BIC). To avoid undue influence of outlying values, continuous variables were truncated at the 1st and 99th percentiles. Pairwise interactions between model main effects were included where this improved model fit based on BIC. We used clustered robust standard errors to account for repeated measurements obtained from the same patient.

To investigate if associations were specific to temperature or applied to vital signs more widely, we also refitted the same model (i.e. with the same spline terms and interactions) with BP digit preference as the outcome, regarding measurements where the SBP and DBP both ended in zero as indicative of possible digit preference. We use this combined measure across both SBP and DBP as it is likely to be most enriched for digit preference. We fitted the same models for HR and RR digit preference regarding readings ending in zero or multiples as two as showing possible digit preference respectively.

We also investigated if the presence of abnormal previous readings affected subsequent digit preference. Temperatures of ≤ 35.5 °C or ≥ 37.5 °C, SBP readings of > 160 or < 90 mmHg, DBP readings of > 100 or < 60 mmHg, HR readings of < 50 or > 120, and RR readings of < 10 and > 24 were arbitrarily considered abnormal. For each observation with a prior observation from the same patient within ≤ 36 h, we selected the most recent prior observation for comparison. A look back period of up to 36 h was allowed to capture vital signs measured just once a day, but at different times. However, where vital signs were measured more frequently only the most recent was considered. We then refitted the regression models above including a term for if the prior vital sign reading (temperature, HR, SBP, DBP, or RR) had been abnormal as a covariate.

Regression analyses were conducted using R, version 4.2.

Ethical approval

Deidentified data were obtained from the Infections in Oxfordshire Research Database which has approvals from the National Research Ethics Service South Central – Oxford C Research Ethics Committee (19/SC/0403), the Health Research Authority and the national Confidentiality Advisory Group (19/CAG/0144), including provision for use of pseudonymised routinely collected data without individual patient consent. Patients who choose to opt out of their data being used in research are not included in the study. The study was carried out in accordance with all relevant guidelines and regulations.

Results

Between 01-January-2016 and 30-June-2019, a total of 5,007,650 sets of vital signs were recorded. Of these, 469,904 (9.4%) did not include temperature, 395,445 (7.9%) were missing SBP and/or DBP, 403,364 (8.1%) missing RR, 353,083 (7.1%) missing HR, and 326,474 (6.5%) missing oxygen saturation recordings. Rates of missing data were similar across different patient groups, but missing data were more common near the start of a hospital admission and outside times of day that vital signs were routinely measured. Missing data were more common at Hospital D (elective orthopaedics) and in some specialties, e.g. obstetrics and gynaecology and cardiology (Table S1).

Restricting to complete sets of vital signs left 4,375,654 (87.4%) records in the final dataset from 135,173 patients. The median (IQR) patient age was 61 (42–76) years, 70,515 (52.2%) patients were female, and 100,655 (74.5%) were of white and 27,453 (20.3%) of unstated or unknown ethnicity. The most common specialties recording vital signs were general surgery (882,956, 20.2%), acute and emergency medicine (660,530, 15.1%), and trauma and orthopaedics (597,834, 13.7%).

Prevalence of value preferences in vital signs readings

Compared with the overall distribution of temperature values, there was an excess of temperature readings of 36.0 °C (Fig. 1), readings of 36.0 °C accounted for 15.0% (658,124/4,375,654) of all values. The same pattern of excess readings of 36.0 °C was seen across all four hospitals (Fig. S1). Assuming true temperature readings followed a normal distribution (Table 1, Fig. S2), then 11.3% (95% CI 10.6–12.1%) of observations were estimated to be inappropriately recorded as 36.0 °C instead of the true value. Similar estimates were obtained assuming an alternative gamma distribution for temperature readings (Table S2, Fig. S2).

Figure 1
figure 1

Observed distribution of temperature, systolic blood pressure (SBP), diastolic blood pressure (DBP), respiratory rate, oxygen saturation, and heart rate recordings. Readings showing possible value preferences are shown in orange/red. For SBP and DBP, readings where the SBP and DBP both end in zero are shown in red, readings where only the SBP or DBP respectively end in zero are shown in orange. Values below the 1st percentile or above the 99th percentile are omitted for visualisation purposes.

Table 1 Estimated value preference proportions and underlying distributions for temperature, blood pressure, heart rate, and respiratory rate.

Approximately 1% of all BP readings would be expected to have both a SBP and DBP ending in zero by chance, however 2.3% (99,209) of readings showed this pattern. Assuming SBP and DBP both followed a normal distribution, 2.2% (95% CI 1.4–2.8%) and 2.0% (1.3–5.1%) of readings respectively were estimated to be rounded to the nearest 10 mmHg. Digit preference also occurred for HR readings, with 12.1% (531,219) ending in zero and 2.4% (1.7–3.1%) of readings estimated to be rounded to the nearest ten. RR readings were also more likely to be multiples of 2 (62.0%, 2,711,524) or 4 (29.1%, 1,273,972) than expected by chance, with digit preference to the nearest multiple of 2 or 4 affecting an estimated 22.5% (22.2–24.8%) and 2.5% (< 0.1–3.5%) of readings respectively. Estimates were similar if SBP, DBP, HR, and RR were assumed to be gamma distributed, with the exception that rounding of RR to the nearest multiple of 4 was found to be less common, < 0.1% (< 0.1–0.1%) (Table S2). There was no clear evidence of value preference in oxygen saturation readings (Fig. 1).

Value preference associations with patient demographics

Associations with value preferences for each vital sign were investigated using multivariable models (Tables 2, 3 and Figs. 2, 3). For 41,350 (0.9%) records no deprivation score was documented; these records were excluded. Complete data were available for all other hospital/patient variables. Temperature was independently more likely to be recorded as 36.0 °C with increasing age above 50 years and BP most likely to be recorded with SBP and DBP both ending in zero for those above 80 years (Fig. 2A). Conversely, RR was less likely to be a multiple of 2 as age increased and HR value preference was greatest in younger and older adults. Male patients were more likely than female patients to have readings with value preferences across all vital signs, with differences by sex increasing for temperature and HR as performance improved overall with passing calendar time (Fig. 2B). Temperature value preference was slightly less common in patients from less deprived areas (aOR per 10 unit change in deprivation percentile = 0.99 [0.99–0.99, higher percentiles are less deprived]), but with no evidence of a difference in other vital signs. There was no evidence for consistent differences in value preference by ethnicity, however temperatures of 36.0 °C were more commonly recorded in patients of Asian ethnicity (aOR vs. white = 1.08 [1.03–1.13]) and BPs ending in zero were more common in those of unstated or unknown ethnicity (aOR vs. white = 1.13 [1.07–1.19]). Patients with higher Charlson scores were slightly more likely to have recorded temperatures of 36.0 °C (aOR per 5 unit increase = 1.02 [1.02–1.03]), but less likely to have BP ending in zero (aOR per 5 unit increase = 0.93 [0.92–0.94]), with only small changes by Charlson score in HR or RR value preferences.

Table 2 Value preferences in temperature, blood pressure, respiratory rate and heart rate and descriptive data for associated factors.
Table 3 Multivariable relationships between value preferences in temperature, blood pressure, respiratory rate and heart rate and associated factors.
Figure 2
figure 2

Multivariable associations between age, study year, sex, and time of day and vital sign value preferences. Model predictions from a multivariable model are shown with all other factors set to the reference category or median value as shown in Table 3 and Fig. 3. The shaded ribbon indicates the 95% confidence interval. BP blood pressure, RR respiratory rate, HR heart rate, M male, F female. The minimum value on each y-axis represents the approximate expected value without any value preference.

Figure 3
figure 3

Relationship between specialty and vital sign value preferences. ENT ear, nose and throat, plastics plastic surgery, maxfax maxillofacial surgery, BP blood pressure, RR respiratory rate, HR heart rate. Values plotted are provided in Table S3.

Changes over calendar time, hour of day, and by hospital and time in admission

The frequency of value preferences for all vital signs decreased during the study (Fig. 2B). Temperatures were most likely to be recorded as 36.0 °C at around 6-8am, i.e. at the first routine set of observations performed per day in most patients, whereas BP was most likely to end in zero during the late evening, and to a lesser extent between 6 and 8am. Relatively little change in HR or RR value preference was seen by time of day (Fig. 2C). Value preference for all vital signs became slightly more common the greater the prior length of stay (e.g. for temperature, aOR per 7 day increase in prior length of stay = 1.03 [95% CI 1.03–1.03]).

Differences were also seen between hospitals in temperature value preferences. After adjusting for differences in the specialties present and all other factors, compared to hospital A (acute care, trauma, and neurosurgery), value preferences in temperature measurement were more common in hospital C (district hospital) and less common in hospital B (elective cancer surgery, transplant, haematology, oncology) and particularly hospital D (elective orthopaedics) (Table 3). RR value preference was also more common in hospital C, as well as D, whereas BP value preference was more common in hospital B. Relatively little difference between hospitals was seen in HR value preference.

Variation by specialty

Value preferences in temperature, BP, and RR readings varied by specialty, whereas differences in HR value preference were more limited (Fig. 3). Compared to acute and emergency medicine, value preferences for temperature and BP were less common in most surgical specialties and medical sub-specialties, but RR value preference was more common in several surgical specialties and BP value preferences substantially more common in haematology and oncology. Value preference in temperature readings was least common in haematology and oncology. Both nephrology and trauma and orthopaedics exhibited less value preference across all vital signs than acute and emergency medicine.

Effect of previous abnormal measurements

A total of 4,125,851 sets of observations had a previous measurement from the same patient within  ≤ 36 h. Temperature readings of 36.0 °C were less frequent following an abnormal prior temperature measurement, 20,224/354,837 (5.7%), than following a normal prior measurement, 601,092/3,771,014 (16.0%). Given it may take time for temperature to normalise it would not be expected that those with a previously abnormal temperature would have the same proportion of true temperatures of 36.0 °C as the overall hospital population (estimated as 4.0% from the normal distribution fitted above). However even allowing for this, preference for recording a temperature of 36.0 °C was likely more common with a normal prior measurement. Adjusting for the same factors as in the main analysis, a previous abnormal temperature independently reduced the odds of a recording of 36.0 °C (aOR = 0.34, [95% CI 0.34–0.35]). Similarly, 25,091/1,307,048 (1.9%) of BP observations had SBP and DBP both ending in zero after an abnormal SBP or DBP, compared to 69,773/2,818,803 (2.5%) without a prior abnormal reading (aOR = 0.86 [0.84–0.88]). However, the opposite pattern was seen for HR and RR where abnormal previous readings were associated with increased subsequent digit preference (aOR = 1.22 [1.19–1.24] and aOR = 1.12 [1.10–1.14] respectively).

Discussion

In this analysis of records from a large UK teaching hospital group, we show preference for specific values or digits in vital sign records in EHRs. Our findings have implications for patient management, quality improvement initiatives and for research conducted using EHRs. Three potential mechanisms underlie the value preferences seen. HR and BP measurements exhibit classical digit preference with rounding occurring during human transcription of readings. Value preference in RR most likely occurred during the measurement process, with RR values that are multiples of two arising from recording the number of breaths per minute over 30 s and doubling the measured count. Thirdly, value preferences in temperature readings occurred due to preference for a specific value, 36.0 °C.

A key question is whether temperature value preferences indicate simply convenience rounding on transcribing values or whether they are also a marker for incompletely observed observations. Potentially favouring the latter, we observed differences in the relative frequency of value preferences. We found that around 2% of BP and HR measurements showed evidence of rounding to the nearest 10. In contrast, over 5-times more temperature readings were estimated to affected by value preference: an estimated excess of 11% of all temperature recordings were recorded as 36.0 °C. One alternative explanation to rounding is that these recordings were documented when in fact no temperature was measured, e.g. it was presumed to be normal where the thermometer, which was separate to the rest of the vital sign measuring equipment, was missing or not working, or alternatively where patients appeared well and a normal measurement was assumed as has been hypothesised for respiratory rate measurement too29. It is also possible that implausibly low readings, e.g. when thermometers were mis-calibrated, were recorded as 36.0 °C, but this is unlikely to have been common. Although over 20% of RR recordings were estimated to be rounded to the nearest two, this most likely reflects the measurements process described above, rather than digit preference per se. Automated measurements of RR may increase accuracy, depending on the setting and device used, in some instances automated RR measurements have been shown correlated better with outcomes30, but not in others29.

We found differences between hospital specialties, even after adjustment for other factors. Generally surgical specialities recorded vital signs with greater precision than acute and emergency medicine. However, the prevalence of value preferences also potentially reflects the culture within a speciality, where greater importance may be placed on measured values, e.g. in nephrology, or on specific vital signs, e.g. temperature in neutropenic and other immunosuppressed patients in haematology and oncology or BP in cardiology. We also found marked differences in temperature measurement between the four hospitals in the organisation, even following adjustment for the specialties present. This may reflect systemic factors, e.g. staffing levels and the importance placed on vital signs may vary by setting. In higher acuity settings, reliance on vital signs for treatment escalation could increase vital sign fidelity compared to less acute settings focused on rehabilitation. Although patients are admitted to acute medicine as an emergency, the increased digit preference seen in this specialty may reflect that for many longer staying patients rehabilitation and provision of social care are the dominant issues for much of each admission. We also found that normal prior temperature and BP measurements were more likely to be followed by digital preference in subsequent observations, with previous abnormal measurements being associated with greater accuracy in subsequent observations. However, this effect was not seen consistently with the opposite for HR and RR, where possibly 3 digit heart rates are more likely to be rounded or more rapid respiratory rates more difficult to count precisely.

Older patients and male patients were more likely to have temperature and BP recordings with value preferences, whereas RR value preferences we more common in younger patients. Further work is required to better understand the reasons for this. For example, variation in temperature recording by age may reflect differences in the acuity of patients and associated culture around vital sign measurement, the relative importance placed on curative treatment vs. patient comfort, and physical barriers to temperature measurement including patient agitation. There were no systematic differences by ethnicity across all vital signs.

Changes over time suggest institution-wide improvement is possible, with increased precision of all vital signs seen during the study. The study builds on previous studies of vital sign recording quality31, and highlights that institutions may wish to monitor vital sign recording to identify areas of the hospital or patient groups where specific interventions to improve quality may be required.

Multiple variables representing the timing of measurements were investigated. Routine morning temperature measurements, e.g. 6–7am, were most likely to be impacted by digit preference. BP measurements were also more likely to be rounded in morning as well as in late evening. Vital sign precision was greatest around the time of hospital admission with value preferences increasing as length of stay increased, likely reflecting that patients are most unwell when first presenting to hospital and so vital signs are performed and recorded carefully. However, it was also more common to have one or more vital signs missing after short prior lengths of stay, e.g. < 1 day, possibly reflecting different approaches to short stay patients, or rechecking of specific vital signs in some acute settings. There was less temporal variation in digit preference in RR and HR measurements.

Digit preference is a well described phenomenon19,20. However, particularly for temperature measurement, the question that arises from our findings is; if a vital sign is more difficult to measure for some reason, then why does current culture potentially favour documenting an inaccurate reading instead of leaving it missing, especially within a system where safety is prioritised. There may be explicit or implied pressure to always record a complete set of vital signs but less scrutiny of their accuracy32 (although ~ 13% of observations in our study were excluded because of missing one or more vital signs), or it may be that recording an observation as unavailable may be more onerous and require entering a justification. There may also be disincentives to recording abnormal values if this requires escalation of care and additional action. Related to this point, value preferences may impact early warning scores, such as NEWS233, e.g. value preferences for temperatures of 36.0 °C may score a point that would not otherwise be scored with temperatures ranging from 36.1 to 38.0 °C.

Limitations of our study include that it is based on a single organisation and data entry system for recording vital signs. Further studies are required to confirm if our findings are replicated more widely. As this was a retrospective study, we were not able to identify the reasons behind missing or potential inaccurate readings; future investigations could consider both practical barriers such as malfunctioning devices and behavioural factors such as perceptions around the importance of vital signs. We did not investigate more granular variation in vital sign recording by hospital ward or individual staff member, the latter as the identity of the healthcare worker recording the vital signs was not available in our data extract. We also did not investigate the downstream consequences of vital sign values (as has been done elsewhere to create early warning systems) or the consequences of value preferences. The latter could be looked at in future work, e.g. considering associations with length of stay or mortality, although care would be required to avoid reverse causation where delays in discharge or a more palliative focus change value preferences.

There are also several technical limitations. Our model for estimating the proportion of vital signs affected by value preference is relatively simple. For BP and HR, we only consider rounding to the nearest 10, which was the most dominant form in our data, but rounding to the nearest 5 or 2 also occurs. However, our main focus here is not the absolute quantification of value preference, but rather to explore the potential drivers of it and to highlight it as an issue. Our estimation framework could be extended to consider multiple types of rounding, e.g. by expanding the likelihood to simultaneously consider rounding to the nearest 2, 5 and 10. We also assume that all underlying temperatures are equally likely to be recorded as 36.0 °C; in reality external signs of a fever, which is often accompanied by other abnormal vital signs, may prompt more accurate recording of the temperature. The underlying distributions chosen result in a good fit for HR and BP, particularly the gamma distribution. For temperature the fit is less good, but a reasonable approximation and a better fitting distribution is unlikely to explain the substantial excess in recordings of 36.0 °C. There are relatively few unique commonly recorded RR values resulting in the fitted continuous distribution being a less good approximation. The logistic regression models fitted include both true values and recordings affected by value preferences as outcomes. Therefore, for temperature where an absolute value preference is common, it is possible that in part the resulting associations are indicative of a normal temperature of 36.0 °C as well as value preferences. For all other vital signs value preferences occur throughout the full range of measurements and so the logistic models are still able to estimate factors associated with a relative increase in value preferences robustly. Finally, inaccuracies not leading to value preferences are not assessed in the current analysis, but also need to be considered when using EHR data, e.g. miscalibration of devices or measurement error arising from failure of tympanic thermometers to accurately record low temperatures.

Our study provides evidence that vital sign measurement displays value preference to a such a degree that it could affect conclusions based on unadjusted vital sign data, in both clinical and research settings. We show that hospital, speciality, admission stage and patient age all have important impacts on the accuracy of vital signs. Changes over time in our hospital suggest improvements in accuracy are possible. Ultimately fully connected systems that automatically measure and/or record vital signs into patient records are likely to address many of the issues identified; however, these are only likely to be implemented if this is prioritised by device manufacturers and healthcare providers. Work with institutions and individuals is required to fully elucidate and understand the mechanisms behind values preferences on a systems, patient and clinician level. Greater consensus on what health information is essential and what level of accuracy is required, across different settings, would help define benchmarks for acceptable performance, which could potentially be monitored automatically. In the meantime, clinicians and researchers need to be aware that vital signs may not always be accurately documented, and to make appropriate allowances and adjustments for this in delivering care to patients and in analyses using these factors as outcomes or exposures.