Date: August 18th, 2022

Reference: Munn et al. Fragility Index Meta-Analysis of Randomized Controlled Trials Shows Highly Robust Evidential Strength for Benefit of <3 Hour Intravenous Alteplase. Stroke 2022

Dr. Jeff Saver

Guest: Dr. Jeff Saver is a Professor and SA Vice Chair for Clinical Research, Carol and James Collins Chair, Department of Neurology, Director of the UCLA Comprehensive Stroke and Vascular Neurology Program at the David Geffen School of Medicine at UCLA.

This is an SGEM Xtra. Jeff and I have an interesting back story to how we met. I knew about Jeff from his multiple publications in the stroke literature. I did not know he knew about me until an EM physician sent me a video of a presentation that was given at an international stroke meeting. On one of the slides, Professor Daniel Fantovich and I were referred to as “Non-Expert EM Contrarians”.

I reached out to Jeff and we had a very good conversation. He clarified what he meant by “non-experts”: that we were not stroke neurologists or emergency physicians with subspecialty neuro expertise, such as having completed fellowship training in neurologic critical care. He did acknowledge that both Dr. Fatovich and I had expertise on critical appraisal of the medical literature.

The conversation ended well with Jeff requesting one of the t-shirts I planned to make with the title of non-expert ER contrarian on the chest.

Jeff recently reached out to me with his new publication called Fragility Index Meta-Analysis of Randomized Controlled Trials Shows Highly Robust Evidential Strength for Benefit of <3 Hour Intravenous Alteplase asking about my thoughts.  I thought this would be a great opportunity to dig deeper into the fragility index and have another expert in stroke neurology on the SGEM.

Dr. Eddy Lang

We have had a couple of individuals previously on the SGEM who strongly support the use of tPA in acute ischemic stroke (AIS). One was Dr. Eddy Lang who is a well-known Canadian researcher and emergency physician in Calgary, Alberta. Eddy appeared on the SGEM Xtra episode called the Walk of Life discussing AIS.  We had a debate on the issue of tPA for stroke published in CJEM 2020 as part of their debate series. Eddy is also the senior author on the CJEM article summarizing the Canadian Stroke Best Practice (CSBP) 2018 Guidelines. This Canadian guideline gives a level “A” recommendation for the use of tPA in AIS in patients last seen normal within 4.5 hours.

  • “All eligible patients with disabling ischemic stroke should be offered intravenous alteplase (tPA). Eligible patients are those who can receive intravenous alteplase (tPA) within 4.5 hours” of symptom onset time or last seen normal (Evidence Level A; Section 5.3.i).

We also had a neurology resident on to critically appraise a systematic review and meta-analysis of endovascular therapy plus/minus tPA as a bridging therapy (SGEM#349). A few more publications have come out since that podcast and the European Stroke Organization (ESO) recommends intravenous thrombolysis before mechanical thrombectomy in patients with acute ischemic stroke and anterior circulation large vessel occlusion.

There have been several tPA skeptics on the SGEM including Dr. Hoffman, Dr. Fatovich, and Dr. Morgenstern. However, not until now have we had a stroke neurologist who is very much in support of using tPA in AIS. I think it is very important to try and mitigate against echo chambers, our own biases and listen carefully to other points views.


Fragility Index Meta-Analysis of Randomized Controlled Trials Shows Highly Robust Evidential Strength for Benefit of <3 Hour Intravenous Alteplase.


Jeff was asked a number of questions about his new publication. Some of the answers are listed as bullet points, but most of his responses can be heard in full by listening to the SGEM podcast:

Who were your co-authors on this publication? Why did you decide to write this article? What is the fragility index (FI)?

  • The FI is the minimum number of nonevents that when changed to events in one arm of an interventional trial or meta-analysis of trials converts the result to statistical nonsignificance. Lower FIs indicate greater fragility, higher FIs more robust results.

This definition of the FI is slightly different than the one provided by Walsh et al JClinEpi 2014 because it did not mention SRMA and was only looking at RCTs.

There are critics of the FI who say, among other things, it could be viewed as just restating the p-value in a different way (Dr. Ed Palmer). Medicine has this very low bar of p-value of 0.05 (95%) or two sigmas to get over to consider something “statistically significant”. In contrast, particle physics uses five sigma or 99.9999%, this is a p-value of 3×10-7, or about 1 in 3.5 million chance the data is at least as extreme as what they observed.

A lot of ink has been spilled about the problems with p-values. Over 800 scientists call for the abandonment of statistical significance”. What are your thoughts on the use/misuse of the p-value?

Others have said let’s raise the bar by lowering the number we would consider statistically significant from 0.05 to 0.005 to be more certain and mitigate against things like p-hacking. Do you think we should change what is considered statistically significant by an order of magnitude to 0.005?

Does the fragility index convey different information than the p-value of the test statistic? If not, how would your analysis using the cumulative FI change our confidence in the tPA evidence for acute ischemic stroke that we could not obtain from the gold standard SRMA of individual patient data like the 2014 Emberson et al publication?

The FI is a summary statistic not unlike the Number Needed to Treat (NNT) with both strengths and weaknesses. A major strength of the FI is its simplicity, making complex research easier to understand. A weakness, however, is also simplicity, hiding the complexity of research, ignoring confidence intervals, and obscuring potential biases. Do you think the FI is a useful metric?

Often people will criticize a trial because the FI is low. However, studies are generally powered for their primary outcome of efficacy. To be efficient, researchers estimate how many participants would be needed to observe the magnitude of effect to be statistically significant. This power calculation should be done a priori based upon the “delta” or difference between treatment and control cohorts.

If a study is done correctly, most should give a result that clusters around a p-value of 0.05. Therefore, the study would be designed to have a low FI. It could be considered a circular argument to then criticize the study as being “fragile”.

Another way to interpret a low FI is that the researchers did a great job estimating the number of participants necessary to answer their hypothesis. They could be congratulated for conducting a very efficient trial that was not overpowered which wastes time, resources and patients.

The introduction of the fragility index paper starts by saying the era of performing randomized placebo control trials comparing tPA for AIS in < 3 hours for patients with small to medium level occlusions is over. This is because it’s the standard of care making it unethical to randomize patients to placebo. What evidence do you provide to support your position?

ACEP updated their policy on stroke in 2015 with lead author Dr. Michael Brown and gave no Level “A” recommendations in their policy statement.

  • < 3 hours: Level B recommendations. With a goal to improve functional outcomes, IV tPA should be offered and may be given to selected patients with acute ischemic stroke within 3 hours after symptom onset at institutions where systems are in place to safely administer the medication. The increased risk of symptomatic intracerebral hemorrhage (sICH) should be considered when deciding whether to administer IV tPA to patients with acute ischemic stroke.

Do you have any ideas why the ACEP policy statement seems to differ from AHA, ESO and CBSP?

The ethics of conducting a placebo-controlled tPA trial is an interesting question. Stroke neurologist Dr. Peter Appelros and colleagues wrote an editorial called: Ethical issues in stroke thrombolysis revisited. It was a follow-up to a bioethics paper written in 1997 by Furland and Kanoti. The original article identified five areas of concern. Appelros’ position is that the ethical issues raised over two decades ago have not been satisfactorily answered. Have you read that editorial and what are your thoughts?

Standard of care is also an interesting topic. It is a legal term that has a specific definition.

  • the reasonable degree of care a person should provide to another person, typically in a professional or medical setting. 

SGEM#200

Standard of care is often discussed by  emergency physicians (Moffett and Moore WestJEM 2011). Standard of care does not necessarily mean the best care. There are many examples in the medical literature where the standard of care was not the best care. The classic story I’ve often told is about bloodletting (SGEM#200). Standard of care could be considered an argument from popularity, and I think it is better for us as scientists to look at the evidence.  Do you agree?

Can you briefly describe the methods used for your fragility index study?

How many studies did you find and how big was the included cohort?

Using your definition of FI: The minimum number of nonevents that when changed to events in one arm of an interventional trial or meta-analysis of trials converts the result to statistical nonsignificance. In other words, the FI is the minimum number of patients who would need to have a different outcome to change the p value from <0.05 to >0.05.

How many would be required to flip statistically significant to insignificant (or “positive” result to a “negative” result). There is only one RCT that had a “positive” result for their primary outcome (NINDS part 2). Is it reasonable to include the other seven RCTs that did not have a statistically positive result for their primary outcome? All those individual RCTs would have a FI of zero because their p-value was greater than 0.05.

One quality metric we look for when authors are meta-analyzing data is to do a Risk of Bias (RoB) assessment. What tool did you use to assess potential bias in the included studies of your SRMA?

You rated all the studies as having low risk of bias in the randomization process and missing outcome data. There are two papers recently published that differ in this assessment (Garg and Mickenautsch BMC Med Res Methodol 2022 and Garg R. Acta Neruo Scand 2022). Why do you think there is a difference between your assessment on RoB and others?

You also rated the IST-3 trial as having a low-risk bias from deviations from the intended interventions. The trial was mostly open-label (unblinded) and most participants were aware of their assigned intervention during the trial. Could you explain why your assessment of low-risk deviates from Cochrane guidance of high risk?

Only one of the eight RCTs reported a statistical benefit for their primary outcome (NINDS part 2). The number of missing outcomes in the study was 11 and that is greater than the fragility index of 5. How can readers be certain about the robustness of the results without using an analysis using multiple different imputation methods for managing missing data as GRADE recommends?

The primary outcome for your SRMA of FI for tPA in AIS for patients treated within 3 hours of last seen normal was disability freedom. You defined this as a modified Rankin Scale (mRS) score of 0-1. What did you find?

  • Disability freedom (mRS score 0–1), data were available from 8 trials enrolling 1960 patients. Alteplase treatment was associated with increased disability-free outcome, 31.0% versus 22.3%; relative risk 1.39 (95% CI, 1.20–1.61); P<0.00001. There was no evidence of heterogeneity across studies: I2=2%, P heterogeneity=0.42.

The secondary outcome was functional independence. This was defined as a mRS score of 0-2. What did you find for the secondary outcome?

  • Functional independence (mRS score 0–2) outcome, 39.7% versus 31.2%; relative risk, 1.29 (95% CI, 1.14–1.45); P<0.0001. There was no evidence of heterogeneity across studies: I2=0%, P heterogeneity=0.95

What did you find for the other secondary safety endpoint of mortality?

  • Alteplase treatment was not associated with a statistically significant difference in mortality, 24.1% versus 26.1%; relative risk, 0.91 (95% CI, 0.78–1.06); P<0.23.

As mentioned earlier, with summary statistics we need to be mindful of the biases that may be in the original studies. How do you think the likely error in randomization identified by Dr. Garg and the lack of blinding in IST-3 would potentially impact your analysis?

What was the FI for disability freedom (mRS 0-1), functional independence (mRS 0-2) and mortality?

  • Disability freedom: mRS score 0–1, FIs were 42 and 40 for the study- level and individual participant data-level meta-analyses, respectively, placing the evidential strength in the highly robust category (FI>33).
  • Functional Independence: mRS score 0–2, FIs were 40 for both the study-level and individual participant data-level meta-analyses, again placing the evidential strength in the highly robust category.
  • Mortality: For safety, the individual participant-level meta-analysis for mortality showed an overall RFI of 30, indicating a robust evidential foundation.

What scale or metric are you using to characterize this result as “highly robust”? Is there some agreed upon definition or categories?

  • FI values were categorized based on prior FI quantification of 906 study-level Cochrane Systematic Reviews from 2011 to 2014 encompassing 6,625 trials for diverse medical conditions. Among 400 statistically significant meta-analyses, the median FI was 12, interquartile range, 4 to 33.

The mRS is known to have substantial interobserver variability even by experienced researchers. The lowest kappa values are for mRS of 1 and 2 and reported by Quinn et al Stroke 2009 as 0.43 and 0.51 respectively.  This is only considered only “moderate” reliability. How do you think that could impact the stroke literature?

You mentioned a couple limitations of your study. One was that IST-3 used the uncertainty principle for inclusion and that would bias the trial towards the null hypothesis and underestimate the robustness of the study. Why did you not mention the other limitation that IST-3 was largely (91%) unblinded and that would have biased the study away from the null hypothesis?

Walsh et al 2014 mention inadequate blinding and loss to follow-up (missing data) as a limitation to statistical significance testing (p-value). What do you think the impact of IST-3 being largely unblinded and the issue of missing outcome data described by Garg 2022 would have on your cumulative FI calculation?

I have not seen a cumulative fragility index reported. Have other authors reported this type of statistical analysis or was this a new method you have developed?

We all have biases (me, you, everyone) and there are many forms of bias that can impact research. I define bias as something that systematically moves us away from the “truth” (best point estimate of an observed effect size with a confidence interval around the point estimate).

Conflicts of interest (COIs) are a reality of modern research. COI do not invalidate studies, but they have been identified as an another potential source of bias in RCTs, SRMA, Guidelines and medical education. You have been open about your COI including on this publication and other publications. How do you think readers should interpret your COIs and others who have similar COIs in the stroke literature?

There has been a vocal minority who have expressed concerns about the stroke literature. Most of these voices seem to come from emergency physicians. Do you think your publication will change any hearts and minds?

Dr. Ravi Garg

Dr. Garg did an analysis of the NINDS data and published his findings in BMC Medical Research Methodology. He was recently a guest on an SGEM Xtra episode discussing this paper. Dr. Garg specifically mentioned you and your response to the Hoffman and Schriger graphic reanalysis. Do you have any general thoughts about Dr. Garg’s publication?

Do you think Dr. Garg’s concerns about randomization errors likely resulting in selection bias are reasonable and should it have any impact on how we interpret the stroke literature?

Any final thoughts?


The SGEM will be back next episode doing a structured critical appraisal of a recent publication. Trying to cut the KT window down from over 10 years to less than 1 year using the power of social media. So, patients get the best care, based on the best evidence.


Remember to be skeptical of anything you learn, even if you heard it on the SkepticsGuide to Emergency Medicine