Date: December 15, 2024

Guest Skeptics: Dr. Chris Carpenter, Vice Chair of Emergency Medicine at Mayo Clinic.

Today, we’re sleighing through the holiday season with a special episode filled with statistical cheer, a dash of skepticism, and a hint of eggnog-flavoured nerdiness.

This is an SGEM Xtra like the one we did on What I Learned from Top Gun. It’s fun to mix it up and not do a structured critical appraisal of a recent publication and have a more philosophical chat. 

The inspiration for the SGEM Xtra episode is the BMJ 2022 holiday article called The 12 Days of Christmas, the Statistician Gave to Me,”. The BMJ statistical editors gifted us a publication highlighting common statistical faux pas, laid out in true holiday spirit. We came up with our own SGEM 12 nerdy days.


The 12 Days of Christmas the SGEM Gave to Me


Day 1: A P Value in a Pear Tree

Ah, the P value of frequentists statistics —often misunderstood and frequently abused. The key here is remembering that a P value isn’t proof of the “truth”. We define truth as the best point estimate of an observed effect size with a confidence interval around the point estimate. We should not hang our clinical decisions on a single P value. When do we ever base our care on one data point?

Day 2: Two Confidence Intervals

Confidence intervals (CIs) tell us the range of plausible values for our estimate. If they’re too wide, it’s like a holiday sweater that’s too loose—unflattering and not very useful. Reconsider that sample size or effect size to get clinically impactful intervals you would want to share with Santa Claus. And remember, CIs don’t mean certainty!

Day 3: Three Missing Values

Missing data is the Grinch of research. Ignoring it or using improper methods to handle it can bias your results in unpredictable directions. Using methods like multiple imputation or sensitivity analysis can salvage your data without sacrificing rigour. We often forget that missing data extends beyond interventional studies. For example, in diagnostic research like emergency ultrasound indeterminate results are often swept under the rug and either excluded or not reported – even when methods like 3×2 tables exist to improve the transparency of these indeterminate results that reflect real-life when trying to deploy these diagnostics that require technical expertise and subjective interpretation.

Day 4: Four Overfit Models

Overfitting is like over-decorating your Christmas tree—too many ornaments and it collapses. Overfit models describe the noise, not the signal, making them poor predictors in new datasets. Keeping your models simple and robust can pay downstream dividends for widespread usability and replicability.

Day 5: Five Golden Rules

Here are the five statistical golden rules for a randomized control trial:

  1. Power Calculation: Do an a priori power calculation that defines the expected effect size, choose a significance level (α, commonly 0.05 and two-sided), specify the desired power (commonly 80-90%) and account for anticipated dropouts.
  2. Randomization: Use proper randomization techniques (simple, stratified, or block randomization). Ensure allocation concealment to prevent prediction of group assignment.
  3. Outcomes: Clearly defined primary and secondary endpoints. Pick patient-oriented outcomes (POO) rather than disease-oriented outcomes (DOOs), surrogate-oriented outcomes (SOOs) or monitor-oriented outcomes (MOOs)
  4. Statistical Analysis Plan: Provides a roadmap for how data will be analyzed and reduces the risk of data dredging or p-hacking. Specify which tests will be used, pre-plan subgroup and sensitivity analyses and define how missing data will be handled.
  5. Control of Bias and Confounding: Bias and confounding can distort study results and lead to incorrect conclusions. Use blinding (single, double, or triple) to reduce performance and detection biases. Collect baseline characteristics to assess the balance between groups (e.g., multivariable regression).

And of course, citing and transparently adhering to applicable EQUATOR Network reporting standards that are appropriate for the research design is a holly-jolly path to fulfilling these five golden rules.

Day 6: Six Sensitivity Analyses

Sensitivity analyses test the robustness of your results under different assumptions. It’s like asking, “If we change the cut-off for naughty versus nice, does our conclusion hold?”

Day 7: Seven Skewed Distributions

Skewed distributions can throw off your statistical assumptions, like trying to fit a rectangular present into a circular box. Many common tests assume normality, so when you’re dealing with skewness, consider transformations or non-parametric tests. And always plot your data first—it’s the gift of clarity.

Day 8: Eight Non-Significant Findings

Non-significant findings are like getting socks for Christmas—not exciting but often useful. They remind us that the absence of evidence isn’t evidence of absence. Report them transparently, as they contribute to the broader scientific conversation. If you asked a good question and designed the study to answer the question it does not matter what the result was. We need to move away from saying a study was “positive” or “negative”.

Day 9: Nine Misleading Graphs

Misleading graphs are the Christmas fruit cake of statistics—they draw attention for all the wrong reasons. Sure it looks like a cake but does not taste very great. You need to watch out for truncated axes, distorted scales, or cherry-picked data. Remember, a good graph tells the truth, even if it’s not festive. And never use pie charts unless you’re talking actual pies. 

Day 10: Ten Multivariable Models

Multivariable models are powerful tools but easy to misuse. Too many variables, and you risk overfitting. Too few, and you might miss important confounders. Think of it like setting up a holiday light display—careful planning ensures it’s both effective and not overwhelming.

Day 11: Eleven Spurious Correlations

Spurious correlations are statistical mirages—two variables that seem linked but aren’t causally related. As the saying goes, correlation is not causation. It’s like saying ice cream sales cause sunburns. If it is an observational study and not a randomized control trial (RCT), it is very difficult to conclude causation. Yes, there are the Bradford Hill criteria, but that is an entire other podcast. Always ask yourself: Is there a plausible mechanism, or are you just seeing Rudolph’s red nose herring?

Day 12: Twelve Underpowered Studies

Underpowered studies are like a holiday dinner without dessert—unsatisfying and inconclusive. They often stem from small sample sizes or unrealistic effect sizes. Plan your study like you plan your holiday shopping—early, thoroughly, and with enough resources to get meaningful results. Also, remember that RCTs are almost always powered for efficacy and not harm/adverse events. This means we often cannot claim an intervention is “safe”. A more accurate statement from RCTs is that we did not observe an increase in harm.


Thanks for listening to The Skeptics’ Guide to Emergency Medicine this year. Wishing all of you a happy holiday season. Don’t forget to follow us on social media, now including BluSky. The SGEM will be back next episode doing a structured critical appraisal of a recent publication. We will continue to try and cut the knowledge translation window down from over ten years to less than one year using the power of social media. So, patients get the best care based on the best evidence.


Remember to be skeptical of anything you learn, even if you learned it from the Skeptics’ Guide to Emergency Medicine.