Date: February 24th, 2022

Reference: Parish et al. An umbrella review of effect size, bias, and power across meta-analyses in emergency medicine. AEM 2021

Guest Skeptic: Professor Daniel Fatovich is an emergency physician and clinical researcher based at Royal Perth Hospital, Western Australia. He is Head of the Centre for Clinical Research in Emergency Medicine, Harry Perkins Institute of Medical Research; Professor of Emergency Medicine, University of Western Australia; and Director of Research for East Metropolitan Health Service.

Case: A resident has been following the literature over their four years of training. They have already seen several things come into fashion and go out of fashion during this short time. This includes therapeutic hypothermia for out-of-hospital cardiac arrest (OHCA), tranexamic acid (TXA) for epistaxis and electrolyte solutions for mild pediatric gastroenteritis. They wonder how strong the evidence is for much of what we do in emergency medicine.

Background: There are many things in medicine that could be considered myth or dogma. We have covered some of these over the 10 years.

  • Topical anesthetic uses of 24-48 hours for mild cornea abrasions will cause blindness- No (SGEM# 315)
  • Epinephrine for adult out-of-hospital cardiac arrests (OHCAs) results in better neurologic outcomes – No (SGEM#238)
  • TXA for intracranial hemorrhage, isolated traumatic brain injury, post-partum hemorrhage or gastrointestinal bleed results in better primary outcomes – No (SGEM#236, SGEM#270, SGEM#214, and SGEM#301)
  • Therapeutic hypothermia in adult OHCA saves lives – No (SGEM#336)
  • Electrolyte solutions are needed in mild pediatric gastroenteritis – No (SGEM#158)

A lot of medical practice is based on low quality research. Tricoci et al. JAMA Feb 2009 looked at the ACC/AHA guidelines from 1984 to 2008. They found 53 guidelines with 7,196 recommendations. The results were only 11% of recommendations were considered Level A, 39% were Level B and 50% were Level C.

The definitions used for each level of evidence are as follows:

An update was published by Fanaroff et al in JAMA 2019. The level of high-quality evidence had not changed much when looking at the ACC/AHA guidelines from 2008-2018. There were 26 guidelines with 2,930 recommendations. Now Level A recommendations were down to 9%, Level B 50% and Level C 41%.

This lack of evidence is not isolated to cardiology. A recent study looked at the top ten elective orthopaedic procedures. It was an umbrella review of meta-analyses of randomized control trials (RTCs) or other study designs if no RCTs existed (Blom et al BMJ 2021). The comparison was the clinical efficacy of the most common orthopaedic procedures with no treatment, placebo, or non-operative care. The primary outcome was the quality of the evidence for each procedure. Only two out of ten common procedures, carpal tunnel decompression and total knee replacement, showed superiority over non-operative care.

Clinical Question: What is the effect of faults such as underpowered studies, flawed studies (i.e. methodologic and statistical errors, poorly designed studies) and biases in the field of therapeutic interventions in the emergency medicine literature?

Reference: Parish et al. An umbrella review of effect size, bias, and power across meta-analyses in emergency medicine. AEM 2021

  • Population: SRMAs 1990-2020 in the top 20 journals under the google scholar subcategory: emergency medicine; emergency medicine meta-analyses from JAMA, NEJM, BMJ, The Lancet, and the Cochrane Database of Systematic Reviews; emergency medicine topics across all PubMed journals; and an extraction of all studies from the Annals of Emergency Medicine Systematic Review Snapshots (SRS) series.
    • Exclusions: Articles were excluded if they did not include a quantitative synthesis (meta-analysis); did not contain at least two summarized studies; did not make a comparison between two groups to assess an effect size; did not report an effect size as at least one of mean difference or standardized mean difference (SMD; Cohen’s d), odds ratio (OR), risk ratio (RR), hazard ratio (HR), or transformations of these effect sizes; were meta-analyses of diagnostic accuracy studies; or were not related to the practice of emergency medicine.
  • Intervention: Data supplement 1 lists all 431 MAs derived from 332 published SRMAs.
  • Comparison: Includes placebo, usual care, nothing.
  • Outcomes: Identify broad patterns in study parameters (effect size, power, mortality benefit and potential bias).

Dr. Austin Parish

We are fortunate to have the lead author on this episode even though it is not an SGEMHOP. Dr. Austin Parish is the Chief Resident in Emergency Medicine at the Lincoln Medical Center, Bronx NY. He is also a researcher for the Meta Research Innovation Center at Stanford (METRICS)

Authors’ Conclusions: Few interventions studied within SRMAs relevant to emergency medicine seem to have strong and unbiased evidence for improving outcomes. The field would benefit from more optimally powered trials.”

Quality Checklist for Therapeutic Systematic Reviews:

  1. The clinical question is sensible and answerable. Yes
  2. The search for studies was detailed and exhaustive. Yes
  3. The primary studies were of high methodological quality. No
  4. The assessment of studies were reproducible. Yes
  5. The outcomes were clinically relevant. Yes
  6. There was low statistical heterogeneity for the primary outcomes. No
  7. The treatment effect was large enough and precise enough to be clinically significant. Sometimes

Results: The systematic review identified 431 eligible meta-analyses (MAs) relevant to emergency medicine. The MAs included a total of 3,129 individual study outcomes of which 2,593 (83%) were from randomized controlled trials.

Key Result: A minority of interventions published in SRMA and relevant to emergency medicine have unbiased and strong evidence for improved outcomes.

  • Primary Outcome: Broad patterns in study parameters
    • Effect Size: The median Odds Ratio (OR) across all studies was 0.70. Within each MA, the earliest study effect on average demonstrated larger benefit compared to the overall summary effect. Only 57 of 431 meta-analyses (13%) both favored the experimental intervention and did not show any signal of small study effects or excess significance.
    • Power: Only 12 of 431 MAs had at least one study with 80% or higher power to detect an OR of 0.70
    • Mortality: Zero out of 431 MAs reported the interventions significantly decreased mortality in well-powered trials. Although the power of studies increased somewhat over time, most studies were underpowered.
    • Biases: 92 of the SRMAs included 10 or more studies that could be analyzed with a funnel plot for asymmetry. 25% (23/92) showed evidence of asymmetry suggesting excess significance. 85 (20%) of the SRMAs reported statistical significance in favor of the intervention. Of these, 1/3 showed a signal of small study effect and/or excess significance while 2/3 (57/85) did not. Of the 57, only 36 (63%) had a GRADE assessment reported. Half were rated as low-quality evidence and only 11% rated as high-quality evidence.

1. How Good is the Evidence? I’ve often posed the question: what proportion of our EM clinical practice is backed up by high level evidence? After speaking with thought leaders the answer I got to was less than 10%. This umbrella review quantifies the answer in more detail: 12/431 = 2.8%. There is not a large amount of high-level evidence supporting most EM practices. The results demonstrate that very few interventions meet the highest evidence standards, and most of the SRMAs are significantly flawed and may overstate true treatment effects. So, we need to advance our knowledge and practice through never ending questioning of it, via a research culture, whereby clinical trials and clinical research are a routine part of everyday EM work, research that engages clinicians and patients with clinically useful questions – to be a learning health system. What is the proportion of our EM clinical practice is backed up by high level evidence?

2. The Best Evidence: Table 1 in the paper lists the 12 MAs in EM that have statistically significant results (p < 0.05 by random effects), based on data with no signal for small study effects or excess significance and at least one RCT and at least one study with 80% power to detect a small effect (d = 0.2). The biggest effect of an intervention was the rate of haemolysis using straight needle venepuncture vs an IV; OR 0.11(95%CI; 0.05-0.23).

Of the 12 MAs, only another three had a 95% confidence intervals that did not cross 1 (the line of no statistical difference), for well powered studies (fixed effect): senior doctor vs no senior doctor in triage for preventing patient left without being seen (OR 0.74, 95% CI; 0.70-0.77); clopidogrel pre-treatment vs no clopidogrel pre-treatment in acute coronary syndrome patients to receive percutaneous intervention (OR 0.79, 95% CI; 0.73-0.85) for a major coronary event; glucocorticoids and usual care vs usual care for croup (OR 0.44, 95% CI; 0.27-0.72) on rate of return visits.

While there were no mortality benefits listed under fixed effect, well powered studies, under the heading of random effects, all studies – there were some mortality benefits for mechanical CPR, transfer for angioplasty, thrombolysis for PE and vasopressin + catecholamines. The details will be listed in the blog.

  • Mechanical CPR vs manual CPR for OHCA on mortality by arrival to hospital (OR 0.80, 95% CI; 0.68-0.94);
  • Transfer for angioplasty vs on site thrombolysis for ST elevated myocardial infarction on 30-day mortality (OR 0.78, 95% CI; 0.61-0.99);
  • Thrombolysis vs conventional anticoagulation for pulmonary embolism (OR 0.42, 95% CI; 0.19-0.93);
  • Vasopressin and catecholamines vs catecholamines alone on 30-day mortality (OR 0.74, 95% CI; 0.58-0.).

What should we make of the lack of high-quality evidence for what we do in EM?

3. Robust: The statistical results were not robust: most of the statistically significant results were near the P < 0.05 threshold and using a more stringent type 1 error acceptance rate of P < 0.005 would make <10% of all MAs “positive”. Among studies with lower risk of bias, the effect sizes further decreased and/or disappeared. Furthermore, most of these MAs were grossly underpowered, thus leading to continued ambiguity.

 We would expect the p-values to cluster just below 0.05 due to publication bias. Those studies reaching this low bar are more likely be published than those that do not reach statistical significance (Hopewell et al 2008, Sune et al 2013 and Dwan et al 2013). How do you think we could make results in EM research more “robust”?

4. Thrombolysis: One of our “favourite” topics is thrombolysis for stroke, so I was interested to see what was reported in the umbrella review. Appendix S2 lists the topics of redundant MAs found in the EM literature. The total number of MAs on this subject was 3. The data supplement 1 on this subject only lists the Donaldson et al 2016 SRMA.

The conclusion from that SRMA was: “The available data are unlikely to resolve the controversy regarding the use of intravenous thrombolysis in this population, and further randomised controlled trials are urgently required.” This topic of thrombolysis vs no thrombolysis for stroke did not make it into table 1 (the adequately powered studies). What are your thoughts on thrombolysis for AIS?

Dr. Jerome Hoffman

5. Philosophical Approach to the Literature: While we think the methods and results are the most important elements of a paper, sometimes we come across a discussion that articulates the subject so emphatically well that it’s worth highlighting. Many of these concepts have been promoted by our mentor Dr. Jerry Hoffman for years.

Here are some of the concepts mentioned in your paper that we would like you do comment further upon:

“early results need to be seen with caution, as the postulated treatment benefits may diminish with additional evidence.”

given that some harms are also recognised only after substantial time has elapsed, a vigilant approach to early evidence about new interventions is warranted.” 

Claims of significance dependent on statistical thresholds depend on what threshold is chosen and it should be remembered that statistical and clinical significance have some overlap but may be different entities.” 

“Few emergency medicine interventions seem to have convincingly strong evidence and interventions that save lives in randomized trials. Some interventions apparently save lives and have such dramatic effects that they are never subjected to randomized trials. (These interventions may include in the acute setting insulin for diabetic ketoacidosis, blood transfusion for severe hemorrhagic shock, defibrillation for ventricular fibrillation, neostigmine for myasthenia gravis, tracheostomy for tracheal obstruction, suturing for repair of large wounds, pressure or suturing for stopping hemorrhage, and one-way valve or underwater seal drainage for pneumothorax and hemothorax). However, these interventions are very few and the vast majority of emergency medicine interventions do require randomized trial evaluations.” 

Most medical interventions are not parachutes (Hayes et al CMAJ 2018). It is ethical to perform proper RCTs to ensure patients get the best care, based on the best evidence. Remember that bloodletting used to be the standard of care until an RCT in 1809 challenged that practice and demonstrated an NNT for death with bloodletting of 4 (SGEM#200).

Comment on Authors’ Conclusion Compared to SGEM Conclusion: We agree with the authors of the paper that we need to promote further research on interventions in EM. 

SGEM Bottom Line: Many interventions in emergency medicine are not supported by high-quality, unbiased evidence.

Case Resolution: You discuss scientific skepticism with the resident. Remind them that each claim needs to be supported by evidence and logical arguments. Without high-quality evidence we should usually accept the null hypothesis. That does not mean an intervention could not work. Rather, we do not have good evidence that it does work. This is an important distinction.

Dr. Daniel Fatovich

Clinical Application: The literature guides and informs our care but should not dictate are care. There are few interventions in EM with high-quality, unbiased evidence. We still need to apply our clinical judgement and ask the patients about their values and preferences. “It is instructive to note that most people make patient-centred decisions every day without high-quality (eg RCT) evidence, and these decisions are not always wrong. Furthermore, foundational papers in EBM make it explicitly clear that EBM was never meant to exclude information derived from experience and intuition.” (Braithwaite RS. JAMA 2013).

What Do I Tell the Resident: Much of what we do in emergency medicine based upon low-quality biased evidence. We are often standing on pillars of salt and sand. Stay skeptical, develop your critical appraisal skills and try to avoid nihilism.

Keener Kontest: Last weeks’ winner was Daniel Walter from the UK. This is a repeat win for Daniel. He knew the name of the empath on Star Trek The Original Series episode was Gem.

Listen to the SGEM podcast for this weeks’ question.  If you know, then send an email to with “keener” in the subject line. The first correct answer will receive a cool skeptical prize.

Remember to be skeptical of anything you learn, even if you heard it on the Skeptics Guide to Emergency Medicine.