Date: October 28, 2024

Reference: Verma et al. Clinical evaluation of a machine learning–based early warning system for patient deterioration. CMAJ September 2024 

Guest Skeptic: Michael Page is currently the Director of Artificial Intelligence (AI) Commercialization at Unity Health Toronto. He leads an AI team intending to improve patient outcomes and healthcare system efficiency. Michael is a sessional lecturer within the Ivey Business School’s Executive MBA program, where he teaches a Technology and Innovation course. Previously, he held senior roles at the Vector Institute for AI, and the University of Toronto. Michael has over 15 years of experience building and leading corporate strategies for innovation, social impact, and research and development for various organizations.

Case: The Chief of Emergency Medicine (EM) at a large urban hospital recently approached the AI Committee at Unity Health, intrigued by the CMAJ article describing the apparent success of CHARTWatch in detecting early signs of patient deterioration. Their hospital has struggled with a growing number of adverse events that often occur without warning. With emergency department (ED) volumes rising, administrators are eager to explore AI-driven solutions to improve patient safety and reduce staff burnout. They want to know how CHARTWatch integrates with electronic health records (EHRs), whether it can adapt to their patient ED population, and how clinicians respond to using the tool. The Chief of EM wants to be sure that any new system they introduce will enhance workflow and not add to clinicians’ cognitive burden. 

Background: There are many ways to define artificial intelligence. One definition of AI is a computer system capable of performing tasks that typically require human intelligence, such as pattern recognition, decision-making, and language processing. In recent years, advancements in AI—particularly machine learning and predictive analytics—have opened new frontiers in various industries, including healthcare.

In healthcare, AI is being leveraged to enhance clinical decision-making, streamline administrative processes, and improve patient outcomes. Machine learning algorithms, a core component of AI, can process vast amounts of data to identify patterns that might elude human clinicians. This predictive capability is transforming the way hospitals manage patient care, from optimizing staffing levels to providing personalized treatment recommendations.

A promising application of AI is the development of early warning systems to detect patient deterioration. These systems use real-time data from electronic health records (EHRs) and other sources to predict which patients are at risk of adverse outcomes, such as cardiac arrest or transfer to an intensive care unit (ICU) [1.2]. By alerting clinicians to potential problems before they become critical, AI-driven systems aim to reduce unplanned ICU transfers and improve survival rates.

Despite the potential benefits, integrating AI into clinical workflows presents challenges. Some studies suggest that the effectiveness of early warning systems varies widely [3], with factors such as alarm fatigue [4] and clinician engagement influencing outcomes. Moreover, there are ongoing debates about the balance between algorithmic precision and interpretability. Transparent, evidence-based deployment is essential to build trust and ensure these tools support rather than complicate clinical decision-making​.


Clinical Question: Can the implementation of a real-time, machine learning-based early warning system (CHARTWatch) reduce adverse events and mortality in patients in the emergency department?


Reference: Verma et al. Clinical evaluation of a machine learning–based early warning system for patient deterioration. CMAJ September 2024 

  • Population: Patients admitted to the General Internal Medicine (GIM) unit of an academic medical center
    • Exclusions: Palliative patients and patients diagnosed with COVID-19 or influenza.
  • Intervention: Implementation of CHARTWatch, a machine learning-based early warning system designed to alert clinicians of patient deterioration.
  • Comparison: Pre-intervention period (Nov 2016 – June 2020) versus post-intervention period (Nov 2020 – June 2022). A control group (patients from cardiology, nephrology, and respirology units who did not receive the intervention) was also used.
  • Outcome: 
    • Primary Outcome: In-hospital deaths
    • Secondary Outcomes: Proportion of alerts for high-risk patients and changes in mortality rates among those groups.
  • Type of Study: Observational study

Authors’ Conclusions: “The implementation of a machine learning–based early warning system was associated with a reduction in non-palliative in-hospital deaths among patients admitted to general internal medicine. This suggests that CHARTWatch could improve outcomes for at-risk patients in similar settings.”

Quality Checklist for Observational Studies:

  1. Did the study address a clearly focused issue? Yes
  2. Did the authors use an appropriate method to answer their question? Yes
  3. Was the cohort recruited in an acceptable way? Yes
  4. Was the exposure accurately measured to minimize bias? Yes
  5. Was the outcome accurately measured to minimize bias? Yes
  6. Have the authors identified all-important confounding factors? Unsure 
  7. Was the follow-up of subjects complete enough? Yes
  8. How precise are the results? The results were not precise, with the relative risk reduction having wide confidence intervals with an upper margin of 1.0 (no statistical difference)
  9. Do you believe the results? Unsure 
  10. 10.Can the results be applied to the local population? Unsure
  11. 11.Do the results of this study fit with other available evidence? Yes
  12. Funding of the Study: “Amol Verma is supported by the Temerty Professorship of AI Research and Education in Medicine at the University of Toronto. This project was supported in-part by the Vector Institute Pathfinder Project and the AMS Healthcare Compassion and AI Fellowship. Funders played no role in the research.”
  13. Conflicts of Interest: Several authors declared conflicts of interest related to this study. Specifically, Amol Verma, Chloe Pou-Prom, Joshua Murray, and Muhammad Mamdani co-invented CHARTWatch, which was acquired by a start-up company named Signal1. These authors have the potential to acquire minority interests in this company in the future. Additionally, Amol Verma has received travel support from the Alberta Machine Intelligence Institute and holds a part-time employment position at Ontario Health, unrelated to the submitted work​.

Results: The study included 13,649 patients in the GIM unit and 8,470 in subspecialty units. The median age of patients was approximately 68 years with 42% female.


Key Result: Non-palliative deaths in GIM patients were statistically lower in the post-intervention (CHARTWatch) period.


  • Primary Outcome: In-hospital mortality was 1.6% v. 2.1% for GIM
    • Adjusted Relative Risk [aRR] 0.74 (95% CI 0.55–1.00) 
  • Secondary Outcome: There was no observed statistical difference for in-hospital mortality in the subspecialty cohorts (1.9% v. 2.1%; aRR 0.89 [95% CI 0.63–1.28]). 

1. Financial Conflicts of Interest (fCOI): There were multiple declared fCOI by the authors of this study. While these do not automatically invalidate the evidence, they should make us more skeptical. Pointing out fCOIs is not an ad hominem attack on the authors. Industry having financial relationships with researchers is a reality. Knowing and quantifying any fCOIs is another potential bias which should be considered when evaluating the medical literature. There are several lines of evidence that fCOI can introduce potential bias in randomized control trials [5], systematic reviews and metaanalyses [6], guidelines [7] and medical education [8]. 

2. Residual Confounding: Despite using statistical techniques like propensity score matching overlap weighting, unmeasured confounders could still bias the results in an observational study​​. The authors explicitly recognize this in their limitation section. They acknowledge the study design does not permit concluding causation only an association. In their conclusions, they warn readers to be cautious in interpreting the findings due to this limitation.

3. Tiny Difference: The tiny absolute difference for in-hospital mortality observed was 0.5%. Expressing it as a relative adjusted risk reduction of 26% over-represents the data.  This small difference of <1% could be explained by unmeasured confounders. The 95% confidence interval around the aRR point estimate was wide, signalling the uncertainty of the data. Another measure of the uncertainty for the observed tiny difference in mortality is the fragility index. (FI). The FI for this observational study was 7. This means that only 7 patients would need to be classified as an event or non-event for this study to no longer be considered statistically significant based on a p-value of 0.05.

4. Hawthorne Effect: This is also called the observer effect. It is when individuals modify an aspect of their behaviour in response to their awareness of being observed. The Hawthorne effect can have non-specific effects in medical research that can bias the results. There are ways to design studies to minimize this bias [9]. The Hawthorne effect could be responsible for the tiny difference reported in the study. 

5. COVID-19: A general concern about this type of study design is the reliance on administrative data that may have inaccuracies in capturing all instances of patient deterioration​. Adding to this limitation is that part of the data for this study was collected during a global pandemic. There was a lot of uncertainty during this time which could have increased coding errors. ICD-10 for COVID-19 did not exist during the control period because COVID-19 did not exist yet. In addition, they had nearly no cases of influenza admissions during the intervention period. The authors did take steps to try and mitigate against this potential bias. 

Comment on Authors’ Conclusion Compared to SGEM Conclusion: We generally agree with the authors’ conclusions. 


SGEM Bottom Line: AI in healthcare represents a potential tool to improve patient care in the ED.


Case Resolution: The Chief of EM is impressed with CHARTWatch. However, they are not prepared to implement this AI in their department until evidence of its efficacy in the ED. 

Michael Page

Clinical Application: We are boarding more and more in-patients in the ED due to a lack of in-patient beds. However, to have confidence that CHARTWatch can reduce adverse events, including death, a multi-centred randomized control trial would need to be conducted in the ED. 

What Do I Tell the ED Chief? Let’s do some research in the ED and see if CHARTWatch can be applied in this setting. 

Keener Kontest: Last week’s winner was Steven Stelts. He knew Geoffrey Hinton is the Canadian that won the most recent Noble Prize in Physics. He shared the Prize with John Hopfield.

Listen to the SGEM podcast for this week’s question. If you know, then send an email to thesgem@gmail.com with “keener” in the subject line. The first correct answer will receive a shoutout on the next episode.


Remember to be skeptical of anything you learn, even if you heard it on the Skeptics’ Guide to Emergency Medicine.


References:

  1. Muralitharan S, Nelson W, Di S, et al. Machine learning-based 1 early warning systems for clinical deterioration: systematic scoping review. J Med Internet Res 2021
  2. Verma AA, Pou-Prom C, McCoy LG, et al. Developing and validating a prediction model for death or critical illness in hospitalized adults, an opportunity for human-computer collaboration. Crit Care Explor 2023;5:e0897. doi: 10.1097/ CCE.0000000000000897.
  3. Blythe R, Parsons R, White NM, et al. A scoping review of real-time automated clinical deterioration alerts and evidence of impacts on hospitalised patient outcomes. BMJ Qual Saf 2022;31:725-34.
  4. van der Vegt AH, Campbell V, Mitchell I, et al. Systematic review and longitud inal analysis of implementing artificial intelligence to predict clinical deteriora- tion in adult hospitals: what is known and what remains uncertain. J Am Med Inform Assoc 2024;31:509-24.
  5. Ahn R, Woodbridge A, Abraham A, Saba S, Korenstein D, Madden E, Boscardin WJ, Keyhani S. Financial ties of principal investigators and randomized controlled trial outcomes: cross sectional study. BMJ. 2017 Jan 17;356:i6770. doi: 10.1136/bmj.i6770. PMID: 28096109; PMCID: PMC5241252.
  6. Lundh A, Lexchin J, Mintzes B, Schroll JB, Bero L. Industry sponsorship and research outcome. Cochrane Database Syst Rev. 2017 Feb 16;2(2):MR000033. doi: 10.1002/14651858.MR000033.pub3. PMID: 28207928; PMCID: PMC8132492.
  7. Tabatabavakili S, Khan R, Scaffidi MA, Gimpaya N, Lightfoot D, Grover SC. Financial Conflicts of Interest in Clinical Practice Guidelines: A Systematic Review. Mayo Clin Proc Innov Qual Outcomes. 2021 Jan 19;5(2):466-475. doi: 10.1016/j.mayocpiqo.2020.09.016. PMID: 33997642;PMCID: PMC8105509.
  8. Fugh-Berman A. Industry-funded medical education is always promotion-an essay by Adriane Fugh-Berman. BMJ. 2021 Jun 4;373:n1273. doi: 10.1136/bmj.n1273. PMID: 34088736.
  9. McCambridge J, Witton J, Elbourne DR. Systematic review of the Hawthorne effect: new concepts are needed to study research participation effects. J Clin Epidemiol. 2014;67(3):267-277