Date: January 3, 2026
Reference: Shroyer et al. Accuracy of cath lab activation decisions for STEMI-equivalent and mimic ECGs: Physicians vs. AI (Queen of Hearts by PMcardio). Am J Emerg Med. 2025 Nov.
Guest Skeptic: Dr. Amal Mattu has been on the faculty at the University of Maryland since 1996. He has developed an academic niche in emergency cardiology and electrocardiography, and he also enjoys teaching and writing on other topics, including emergency geriatrics, faculty development, and risk management. Amal is currently a tenured professor and Vice Chair of Emergency Medicine at the University of Maryland School of Medicine, and a Distinguished Professor of the University of Maryland-Baltimore.
Case: A 58-year-old man with diabetes and hypertension arrives at the emergency department (ED) 30 minutes after the sudden onset of substernal chest pressure radiating to the left arm, now improved to 3/10. His vital signs are BP 146/88, HR 92, RR 18, O2 sat 98% on room air. The initial 12-lead ECG shows RBBB with left anterior fascicular block and subtle anterior ST‑depression with proportionally tall, broad T waves in V2 to V4. This is an appearance that can be seen with Hyper-Acute T Wave Occlusive Myocardial Infarction (HATW‑OMI) or an ST-Elevated Myocardial Infarction (STEMI)‑mimic in conduction disease. A debate ensues between emergency medicine and cardiology on whether to activate the cath lab now or get troponins plus serial ECGs?
Background: Emergency physicians need to be experts at interpreting ECGs. For decades, we’ve been taught STEMI criteria, only to learn repeatedly that important exceptions exist (posterior OMI, de Winter, hyperacute T waves, modified Sgarbossa in LBBB, etc.). Those exceptions have evolved into two distinct categories. There are the STEMI‑equivalents (OMI without classic ST‑elevation) and STEMI‑mimics (ST‑elevation without OMI). That expanding exception list increases diagnostic complexity and uncertainty. This is the area where artificial intelligence (AI), utilizing computer vision and machine learning, could provide a benefit.
ECG-specific AI models now aim squarely at this problem. The study we are reviewing today evaluated the Queen of Hearts (QoH) AI. It is a deep neural network trained to detect occlusive myocardial infarction (OMI) on 12-lead ECGs. The model is described as “91% accurate” in prior work and is undergoing FDA review as of March 24, 2025, but whether it outperforms practicing clinicians on the hardest cases (STEMI‑equivalents and mimics) remained unclear.
ECG diagnostic accuracy is important in emergency medicine because misclassification cuts both ways. Missed OMI delays reperfusion, while overcalls send patients and teams to the cath lab unnecessarily, putting patients at risk and using up valuable resources. A diagnostic aid that catches true positive OMIs while reducing false activations could improve outcomes and team throughput.
Clinical Question: Among EM physicians and cardiologists interpreting STEMI‑equivalent and STEMI‑mimic ECGs, how accurate are they compared with a machine‑learning ECG algorithm?
Reference: Shroyer et al. Accuracy of cath lab activation decisions for STEMI-equivalent and mimic ECGs: Physicians vs. AI (Queen of Hearts by PMcardio). Am J Emerg Med. 2025 Nov.
- Population: 53 emergency physicians and 42 cardiologists from a community system.
- Intervention: Human interpretation and QoH AI algorithm classifying each ECG as OMI requiring immediate CLA vs not
- Comparison (Reference Standard):
- OMI Present: Angiographic culprit with ≤TIMI II flow and elevated troponin, or culprit with TIMI III flow and significantly elevated troponin.
- OMI Absent: No culprit ≥50% stenosis on angiography or, when no angiography, negative serial troponins, no new echo wall‑motion abnormality, and negative clinical follow-up
- Outcome: Diagnostic accuracy of ECG-based CLA decisions. CLA‑positive was defined a priori for STEMI/STEMI‑equivalents and for “reperfused OMI” (Wellens, transient STEMI).
- Type of Study: A cross-sectional diagnostic accuracy study using a fixed case‑set, with comparisons to a reference standard.
Authors’ Conclusions: “Physicians frequently misinterpret STEMI-equivalent and STEMI-mimic ECGs, potentially impacting CLA decisions. QoH AI demonstrated superior accuracy, suggesting a potential to reduce missed OMIs and unnecessary catheterization laboratory activations. Prospective studies are needed to validate these findings in clinical practice.”
Quality Checklist for a Diagnostic Study:
- The clinical problem is well-defined. Yes
- The study population represents the target population that would normally be tested for the condition (ie no spectrum bias). No
- The study population included or focused on those in the ED. No
- The study participants were recruited consecutively (i.e. no selection bias). No
- The diagnostic evaluation was sufficiently comprehensive and applied equally to all patients (i.e. no evidence of verification bias). No
- All diagnostic criteria were explicit, valid and reproducible (i.e. no incorporation bias). Unsure
- The reference standard was appropriate (i.e. no imperfect gold-standard bias). Yes/No
- All undiagnosed patients underwent sufficiently long and comprehensive follow-up (i.e. no double gold-standard bias). No
- The likelihood ratio(s) of the test(s) in question are presented or can be calculated from the information provided. Yes
- The precision of the measure of diagnostic performance is satisfactory. Reasonable
- Funding and Conflicts of Interest. No external funding. Several authors report stock ownership/consulting with Powerful Medical (QoH developer), and other authors reported no conflicts.
Results: They recruited 95 physicians to interpret the ECGs. There were 53 EM physicians and 42 cardiologists (23 general, 15 interventional, 4 EP electrophysiology). Experience: EPs 7 years (IQR 3 to 15) vs cardiologists 15 years (IQR 9.2 to 21).
Key Result: QoH AI had significantly higher accuracy than humans, and there was no significant difference between EM and cardiologists.
- Primary Outcome:
- EM Physicians 65.6% (95% CI ~51 to 78)
- Cardiologists 65.5% (95% CI ~51 to 77)
- QoH AI 88.9% (95% CI 82 to 93)
The most frequently misclassified by humans were LBBB (±OMI), transient STEMI, HATW‑OMI, and de Winter. QoH AI missed LBBB‑OMI and LV aneurysm. RBBB + fascicular block and HATW‑OMI produced the largest EP-cardiologist disagreement.


1) Spectrum Bias: The investigators intentionally selected “ambiguous” STEMI‑equivalent and STEMI‑mimic ECGs and fixed the OMI prevalence at 50% for the reader study. That design improves efficiency in comparing readers and the AI, but it does not reflect the spectrum or prevalence we see in day-to-day ED practice and therefore threatens external validity. In diagnostic accuracy research, spectrum bias occurs when the distribution of disease/non-disease, disease severity, or look-alikes in the sample differs from that in the clinical population in which the test will be used. It can change sensitivity and specificity in either direction. Selecting borderline cases may deflate both compared with routine practice, and it will certainly distort PPV/NPV because predictive values are prevalence‑dependent. The authors acknowledge this by noting the 50% OMI prevalence and the deliberate use of ambiguous ECGs “may not accurately reflect predictive values observed in real-world settings.”
2) Differential Verification & Imperfect Gold Standard: Not every patient had the same reference standard. While most OMI determinations used angiography, some mimic cases without angiography were adjudicated by serial troponins, echocardiography, and clinical follow-up. Using different reference standards in different subgroups constitutes differential verification (double gold‑standard) bias and can bias sensitivity and specificity up or down, depending on whether the disease can resolve or only become detectable over time. In addition, any composite or clinical adjudication process is an imperfect gold standard, which can either inflate or deflate the index test’s performance depending on how errors correlate across tests. The authors explicitly note these issues in their discussion.
3) Incorporation/Review Bias: The paper reports that cardiologists performing angiography were not masked to the ECG. When the result of (or information from) the index test helps determine the reference diagnosis, that is incorporation (review) bias. This typically inflates both sensitivity and specificity of the index test because the gold standard classification is partially “contaminated” by the test under study. In this context, seeing a concerning ECG may tilt the invasive assessment and adjudication toward “culprit” lesion labelling or influence borderline calls, making ECG-based classification look better than it truly is.
4) Unit‑of‑analysis & Precision Limitations: This was a reader study with 95 clinicians classifying the same small set of 18 ECGs. Even with appropriate statistics, the small number of cases means performance estimates can be fragile, and the 95% confidence intervals reflect that imprecision. To their credit, the authors modelled accuracy with multi-level robust variance to account for clustering (multiple readers rating the same cases), but the design still limits precision and generalizability across the full morphology spectrum of each category. The authors themselves state that “one representative ECG per type…cannot represent all ST‑T variants”, and that asking physicians to read far more than 18 tracings was impractical. This imprecision concerns should raise our skeptical radar, and we should factor this into our study interpretation.
5) External Validity: The study is single-center and uses an online survey without the interruptions, time pressure, serial ECGs, bedside echo, or troponin trends that influence ED decision‑making. The authors explicitly caution that the controlled survey conditions do not replicate real clinical environments and could over- or under-estimate real-world accuracy. AI performance can also be domain‑dependent (ECG device/process, patient mix). Showing “simulation” superiority does not guarantee clinical utility until confirmed in prospective practice studies. This is a limitation known in the diagnostic literature (including the AI diagnostic literature) that emphasizes the potential difference between an artificial scenario and prospective bedside clinical application. In other words, until it is released into the wild involving other hospitals and workflows, we don’t know if it will have a net positive patient-oriented outcome (POO) of benefit.
Comment on the Authors’ Conclusion Compared to the SGEM Conclusion: We generally agree with the authors’ conclusions.
SGEM Bottom Line: A second set of artificial eyes on the ECG may help us catch occlusions we miss and avoid some unnecessary cath lab activation, but the QoH AI needs to be tested in a real-world trial before using it as an adjunct.
Case Resolution: Given the concerning RBBB + LAFB with anterior repolarization changes and ongoing symptoms, we activate the cath lab. If available, an AI read supporting OMI would reinforce the call. If it disagreed, we would not delay for the algorithm. In the lab, the patient is found to have a proximal LAD culprit and undergoes PCI.

Dr. Amal Mattu
Clinical Application: The most significant finding that immediately makes me doubt the utility of the study is the chart that shows sensitivity and specificity. Even without all of the nerdy details, it is obvious that a 65% sensitivity for picking up OMIs (or my long-preferred term: acute coronary occlusion, ACO) is not possible. Any emerg phys or cardiologist who is missing 35% of ACOs is going to be fired, sued many times over, and driven out of medicine! Without reading any of the paper, as soon as I see that chart, I know it can’t be right unless the clinicians at his institution are incompetent.
No one has discussed the costs associated with integrating AI systems into all those ECG machines.
This QoH AI system is not ready for implementation in clinical practice outside a research study. The use of AI has the potential to augment clinical decision-making, but it is not currently done on autopilot without a human-in-the-loop.
What Do I Tell the Patient? Your ECG shows changes that could mean a blocked heart artery. We think the best course of action is to take you to the cath lab now to restore blood flow if needed. We also use a computer tool to double-check ECGs. It supports our concern, but it doesn’t replace us as doctors. The goal is to act fast and safely to get you the care you need.
Keener Kontest: Last week’s winner was Dr. Steven Stelts from Auckland, NZ. He knew the enzyme inhibited by etomidate to decrease cortisol and aldosterone is 11-beta-hydoxylase.


You must be logged in to post a comment.