Same p-Value, Different Certainty: How Bayesian Reanalysis Is Quietly Rewriting Critical Care Practice in 2026
The closing installment of our 3-week evidence appraisal trilogy. Why two ICU trials with identical p-values can tell you completely different stories, the priors and posteriors every clinician should be able to read, and the trials Bayesian methods have already re-interpreted.
Imagine two ICU randomized trials cross your desk on the same morning. Both report a p-value of 0.04 for mortality. Both were published in high-impact journals. Both compare interventions you have used in real patients in the last week.
A frequentist reader treats them as equivalent. Both crossed the line. Both deserve the same epistemic weight.
A Bayesian reader looks at them and reaches a completely different conclusion. The first trial has a 95 percent posterior probability that the intervention reduces mortality. The second trial has a 76 percent posterior probability. Same p-value. Different certainty. Different conversation at the bedside.
That distinction is no longer theoretical. In 2025 alone, two separate Bayesian reanalyses of the DanGer Shock trial in cardiogenic shock arrived at this exact answer. One found a posterior probability of mortality benefit ranging from 76 to 98 percent depending on the prior used. EMPULSE, a trial with a nearly identical frequentist result, sat at 90 to 99 percent. Both trials sat at the conventional threshold of statistical significance. Their Bayesian profiles differed substantially.
This Thursday closes the three-week evidence appraisal trilogy. Two weeks ago we addressed how to read observational ICU evidence through target trial emulation. Last week we addressed how to test the stability of RCT significance through the Fragility Index. This week we address the lens that puts a probability on the question every clinician actually wants to ask. Given the data, what is the chance this treatment helps my patient?
ICCN Update
The new ICCN website is live at iccn.io. Every article published in the past two weeks is now archived in one place, and our new Research section pulls recent published data from 26 major critical care and medical journals into a single curated feed for subscribers. Bookmark iccn.io.
Why This Matters
For 100 years, the dominant statistical framework in medicine has been frequentist. Trials are powered to detect a specific effect size. Results are interpreted through p-values and confidence intervals. A p-value below 0.05 becomes a green light. A p-value above 0.05 becomes a stop sign. The probability that the treatment helps the patient sitting in front of you is rarely calculated directly.
The Bayesian framework asks a different question. Given the trial data and what we knew before the trial, what is the probability that the treatment reduces mortality by any clinically meaningful amount? That probability is called the posterior probability, and it can be expressed as a single number that maps directly onto the way clinicians actually think.
In critical care, where effect sizes are usually small, sample sizes are usually limited, and the underlying mortality is usually high, the Bayesian framework has special value. Trials that the frequentist literature labeled as negative (EOLIA, ANDROMEDA-SHOCK, parts of the ECLS-SHOCK program) have all been subjected to Bayesian reanalyses that produced posterior probabilities of benefit well above 80 percent. Trials that the frequentist literature celebrated as positive (DanGer Shock, the early ART signals) have been subjected to Bayesian reanalyses that produced more cautious posterior probability distributions than the headlines suggested.
For the interprofessional team, the implications are direct. A posterior probability of benefit gives the team a single number to bring to multidisciplinary rounds. A credible interval gives the team a range of plausible effect sizes consistent with the data and prior knowledge. A region of practical equivalence (ROPE) gives the team a transparent way to ask whether the effect, even if real, is large enough to matter clinically.
These tools are now standard literacy for anyone who reads ICU literature in 2026.
A p-value tells you whether the data are surprising under the null hypothesis. A posterior probability tells you what to believe about the treatment effect. Critical care has spent a century interpreting the first as if it were the second.
The Study / Evidence in Context
The Bayesian reanalysis literature in critical care has grown rapidly since 2018, and a small number of landmark papers anchor the conversation.
ANDROMEDA-SHOCK Bayesian reanalysis (Zampieri et al, AJRCCM 2020). The original ANDROMEDA-SHOCK trial compared peripheral perfusion targeted resuscitation against lactate-guided resuscitation in septic shock. The frequentist analysis reported a p-value of 0.06 for 28-day mortality, conventionally interpreted as negative. The Bayesian reanalysis, using neutral, optimistic, and pessimistic priors, found posterior probabilities of mortality benefit ranging from approximately 90 to 97 percent depending on the prior. The trial moved from “no benefit detected” to “high probability of benefit, magnitude uncertain” purely through a change in statistical lens.
EOLIA Bayesian reanalysis (Goligher et al, JAMA 2018). The EOLIA trial of venovenous ECMO in severe ARDS was halted early and reported a frequentist p-value of 0.07 for 60-day mortality. The post hoc Bayesian reanalysis found a posterior probability of mortality benefit ranging from 88 to 99 percent depending on prior assumptions. The interpretation of EOLIA shifted measurably. Many ECMO programs that were waiting for a formally “positive” trial reconsidered.
ART Bayesian reanalysis (Zampieri et al, AJRCCM 2021). This paper served two purposes. It reanalyzed the Alveolar Recruitment for Acute Respiratory Distress Syndrome (ART) trial, and it laid out a proposed framework for standardized priors in future critical care Bayesian reanalyses. The methodological proposal, suggesting a minimum set of skeptical, neutral, and optimistic priors, has become foundational to subsequent work.
DanGer Shock Bayesian reanalyses (Tomasino et al, CJC Open 2026; Aubdool et al, J Cardiovasc Transl Res 2025). Two separate Bayesian reanalyses of the DanGer Shock trial of microaxial flow pump support in STEMI-related cardiogenic shock have now appeared. Both confirm a high probability of mortality benefit at relative risk less than 1 (76 to 98 percent across priors), with lower posterior probability of large clinical effects (15 to 72 percent at relative risk less than 0.85). The frequentist result of borderline significance is given a richer probabilistic interpretation.
Tracheostomy Bayesian meta-analysis (Quinn et al, BJA 2022). Early versus late tracheostomy in ICU patients has been studied repeatedly with inconclusive frequentist results. The Bayesian meta-analysis produced a probability distribution for the mortality effect that respected the uncertainty in the data while quantifying the probability of clinically meaningful benefit.
SUP-ICU heterogeneity of treatment effect (Granholm et al, Intensive Care Medicine 2020). This Bayesian post hoc analysis of the SUP-ICU trial of stress ulcer prophylaxis examined how the treatment effect of pantoprazole varied across patient subgroups. The work demonstrated how Bayesian methods can illuminate heterogeneity that frequentist subgroup analyses tend to mishandle.
The methodological foundation. Yarnell, Granton, and Tomlinson published an editorial in AJRCCM in 2020 that framed the rise of Bayesian methods in critical care as a long-overdue correction to a century of frequentist overreach. Their argument was simple. The frequentist comfort most clinicians feel is partly a false comfort, given the well-documented problems with p-value-based inference catalogued by the American Statistical Association in 2016.
Recent ICU integration. A 2025 Critical Care Science overview by Taylor and colleagues provides one of the clearest clinician-facing introductions to Bayesian methods now in print. The same year, a Bayesian analysis of nonagenarian ICU outcomes across European cohorts (Dankl et al, Annals of Intensive Care 2025) demonstrated how the framework supports prognostication as well as treatment effect estimation.
ANDROMEDA-SHOCK was a “negative” trial with a 90 percent posterior probability of mortality benefit. EOLIA was a “negative” trial with an 88 to 99 percent posterior probability of mortality benefit. The frequentist verdict and the Bayesian verdict were the same data, read with different tools.
What Stood Out
Several elements of this body of work have changed how I now read critical care evidence.
Two trials with identical p-values can carry very different certainty. The DanGer Shock versus EMPULSE comparison published in the Journal of Cardiovascular Translational Research in 2025 is the clearest illustration to date. Both trials reported p-values near 0.04. Both crossed conventional significance. Their Bayesian posterior probability profiles, however, differed substantially. EMPULSE landed at 90 to 99 percent posterior probability of mortality benefit at relative risk less than 1. DanGer Shock landed at 76 to 98 percent. Same threshold. Different certainty. Different clinical conversation.
The prior selection problem is real but tractable. Critics of Bayesian reanalysis correctly point out that priors are chosen by investigators and can be tuned to produce a desired result. Zampieri’s 2021 framework proposed a minimum set of standardized priors (skeptical, neutral, optimistic) that should be reported in every critical care Bayesian reanalysis. When all three are reported transparently, the reader can decide which most closely reflects their own clinical belief and read the corresponding posterior. The framework converts the prior selection problem from a hidden vulnerability into an explicit transparency requirement.
Credible intervals communicate uncertainty more naturally than confidence intervals. A 95 percent credible interval can be read as “given the data and the prior, there is a 95 percent probability that the true effect lies within this range.” A 95 percent confidence interval cannot be read that way without methodological error, despite the fact that most clinicians do read it that way. The Bayesian framework aligns the natural reading with the mathematical reality.
The Region of Practical Equivalence reframes the bedside question. A ROPE is a range of effect sizes the clinician considers practically equivalent to no effect. The Bayesian framework calculates the posterior probability that the true effect lies outside that range. This is the question every interprofessional team actually wants to answer. Not “is the effect non-zero?” but “is the effect large enough to matter to the patient in front of us?”
Bayesian methods support trial design, not only reanalysis. Adaptive platform trials like REMAP-CAP, ATTACC, ACTIV, and the new generation of ICU trial networks are increasingly Bayesian by design. The frequentist-Bayesian conversation is no longer about which framework should win. It is about which framework best answers a specific question.
Clinical, Research, and Leadership Interpretation
For the interprofessional ICU team, this matters in concrete and discipline-specific ways.
For intensivists and APPs, posterior probability of benefit is a number that can enter rounds and protocol discussions in a way p-values cannot. When a colleague says “the trial was negative,” the next question is no longer “what was the p-value?” but “what was the posterior probability of benefit under a neutral prior?” That single question elevates the level of every multidisciplinary clinical conversation.
For respiratory therapists, the ART and EOLIA Bayesian reanalyses are directly relevant. ARDS ventilation strategy literature has been subjected to multiple Bayesian reanalyses because the trials are exactly the kind for which Bayesian methods shine. Borderline frequentist significance. Plausible biological mechanism. Substantial prior data. RTs who can articulate the Bayesian posterior probability of benefit for prone positioning, lung recruitment, and ECMO timing operate at a higher methodological level than the frequentist headlines suggest.
For ICU nurses, the SUP-ICU Bayesian heterogeneity analysis is the most directly relevant example. Bayesian methods are particularly useful for understanding how treatment effects vary across patient subgroups. Many of the practices nurses execute most rigorously (stress ulcer prophylaxis, sedation interruption, mobility) have been studied with Bayesian methods that reveal patterns the frequentist literature missed.
For pharmacists, drug-efficacy trials are a natural home for Bayesian reanalysis. Pantoprazole. Hydrocortisone. Vasopressin. Dexmedetomidine. Pharmacists who can read a posterior probability of benefit alongside a p-value add real protective and advocacy value to their teams.
For perfusionists, the EOLIA and DanGer Shock Bayesian reanalyses are required reading. ECMO and mechanical circulatory support literature is exactly the literature for which Bayesian methods produce the most clinically actionable reframing. Perfusionists who can articulate posterior probability profiles for ECMO and microaxial pump support speak the language of the modern cardiothoracic and shock literature.
For ICU leaders and educators, this is a curriculum opportunity. A journal club template that asks for the posterior probability of benefit (where available) and the credible interval alongside the p-value and confidence interval will change the texture of every evidence-based discussion within a month. Add it once. Use it weekly. Watch the level rise.
Bedside and Workplace Takeaways
Five reflexes to install this week.
Translate every “negative” trial into a posterior probability question. When a colleague calls a trial “negative” based on a p-value just above 0.05, ask what the Bayesian posterior probability of benefit would be under a neutral prior. ANDROMEDA-SHOCK and EOLIA are the canonical examples. Both were “negative.” Both had posterior probabilities of mortality benefit above 88 percent.
Demand the three-prior transparency standard. When a Bayesian reanalysis is cited in a clinical discussion, ask whether skeptical, neutral, and optimistic priors were all reported. If only the optimistic prior was used, the result deserves more skepticism, not less.
Read credible intervals as probability statements. A 95 percent credible interval can be read as “95 percent probability the true effect is in this range.” Use that interpretation. It aligns the natural clinical reading with the mathematical reality.
Define a ROPE before discussing trial relevance. Before debating a trial’s clinical significance, ask the team to specify the range of effect sizes that would be practically equivalent to no effect. Then evaluate the posterior probability of benefit outside that range. The conversation sharpens immediately.
Pair Bayesian and frequentist analyses, do not replace one with the other. Use the frequentist p-value, confidence interval, and Fragility Index alongside the Bayesian posterior probability and credible interval. Each tool answers a different question. Together they provide a richer picture than either alone.
These five reflexes will compound across every clinical discussion on your unit for the remainder of 2026.
Listen to the following podcast:
Teaching Pearl
The frequentist framework asks how surprising the data are if the treatment did not work. The Bayesian framework asks how probable it is that the treatment works, given the data and what we knew before. The first is a statement about the data. The second is a statement about the patient.
What We Should Not Over-Assume
Bayesian methods are not a verdict. A few cautions.
The choice of prior is consequential, and a Bayesian reanalysis conducted with a single optimistic prior can mislead just as readily as a frequentist analysis with a marginal p-value. The Zampieri 2021 framework calling for standardized priors is the right correction. Until the field universally adopts that standard, the careful reader checks which priors were used.
A high posterior probability is not the same as clinical truth. It is a probability statement conditional on the data and the prior. If the data are biased, the posterior will be biased. Bayesian methods do not rescue an underpowered, confounded, or selection-biased dataset.
A posterior probability of benefit at relative risk less than 1 is not the same as a posterior probability of clinically meaningful benefit. The DanGer Shock reanalyses illustrate this directly. The posterior probability of any benefit was high. The posterior probability of large benefit (relative risk less than 0.85) was substantially lower. A sophisticated reader looks at both.
And finally, Bayesian methods do not eliminate the need for replication. A single trial with a 90 percent posterior probability of benefit is more informative than a single trial with a frequentist p-value of 0.06, but neither replaces a successful replication. The first time anyone reanalyzes a single ICU trial and concludes that a practice should change, ask for the replication evidence.
Limitations
The body of literature reviewed here has real limitations worth naming.
Most Bayesian reanalyses in critical care are post hoc. The priors are selected after the trial result is known. Even with the Zampieri three-prior framework, the post hoc nature of the analysis introduces a degree of subjectivity that is not always acknowledged by enthusiastic readers.
Bayesian meta-analyses depend on the quality of the included trials. If the underlying trials are heterogeneous in design, population, or outcome definition, the posterior probability distribution reflects that heterogeneity. The Quinn tracheostomy meta-analysis is honest about this limitation. Many subsequent meta-analyses are less so.
Bayesian computation is more complex than frequentist computation, and the choice of software, model structure, and convergence diagnostics can affect results. Reproducibility of Bayesian reanalyses is not yet at the level the field needs.
The clinical interpretation of posterior probabilities depends on the reader’s familiarity with the framework. A 76 percent posterior probability of benefit means different things to different clinicians. Until Bayesian thinking is genuinely standard, communication of these numbers carries unavoidable risk of misinterpretation.
Bottom Line
Bayesian reanalysis is no longer an academic curiosity in critical care. It is a routinely applied lens that has already changed how we interpret ANDROMEDA-SHOCK, EOLIA, ART, DanGer Shock, EMPULSE, and a growing list of ICU trials. The posterior probability of benefit, the credible interval, and the region of practical equivalence are now standard vocabulary for any clinician serious about reading the modern critical care literature.
For ICCN readers, the integration is straightforward. When evaluating any trial that will change practice, ask three questions in sequence. What does the p-value tell us about how surprising the data are under the null? What does the Fragility Index tell us about how stable that significance is to small perturbations? What does the Bayesian posterior probability tell us about how confident we should be that the treatment helps? Each tool answers a different question. Together they produce a far richer picture than any one of them alone.
That is the close of our three-week evidence appraisal trilogy. Target trial emulation for observational evidence. Fragility Index for the stability of randomized significance. Bayesian reanalysis for the probability of benefit. Three lenses. One sharper reader. One safer patient.
References
Zampieri FG, Casey JD, Shankar-Hari M, Harrell FE Jr, Harhay MO. Using Bayesian methods to augment the interpretation of critical care trials. An overview of theory and example reanalysis of the alveolar recruitment for acute respiratory distress syndrome trial. Am J Respir Crit Care Med. 2021;203(5):543-552. doi:10.1164/rccm.202006-2381CP
Zampieri FG, Damiani LP, Bakker J, et al. Effects of a resuscitation strategy targeting peripheral perfusion status versus serum lactate levels among patients with septic shock. A Bayesian reanalysis of the ANDROMEDA-SHOCK trial. Am J Respir Crit Care Med. 2020;201(4):423-429. doi:10.1164/rccm.201905-0968OC
Yarnell CJ, Granton JT, Tomlinson G. Bayesian analysis in critical care medicine. Am J Respir Crit Care Med. 2020;201(4):396-398. doi:10.1164/rccm.201910-2019ED
Goligher EC, Tomlinson G, Hajage D, et al. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome and posterior probability of mortality benefit in a post hoc Bayesian analysis of a randomized clinical trial. JAMA. 2018;320(21):2251-2259. doi:10.1001/jama.2018.14276
Tomasino M, Urio-Garmendia G, Ródenas-Alesina E, Uribarri A, Ferreira-González I. Mortality benefit of microaxial flow pump use in infarct-related cardiogenic shock: a Bayesian reanalysis of the DanGer Shock trial. CJC Open. 2026;8(4):448-456. doi:10.1016/j.cjco.2025.11.017
Taylor C, Puxty K, Quasim T, Shaw M. Understanding Bayesian analysis of clinical trials: an overview for clinicians. Crit Care Sci. 2025;37:e20250267. doi:10.62675/2965-2774.20250267
Quinn L, Veenith T, Bion J, Hemming K, Whitehouse T, Lilford R. Bayesian analysis of a systematic review of early versus late tracheostomy in ICU patients. Br J Anaesth. 2022;129(5):693-702. doi:10.1016/j.bja.2022.08.012
Granholm A, Marker S, Krag M, et al. Heterogeneity of treatment effect of prophylactic pantoprazole in adult ICU patients: a post hoc Bayesian analysis of the SUP-ICU trial. Intensive Care Med. 2020;46(4):717-726. doi:10.1007/s00134-019-05903-8
Dankl D, Bruno RR, Beil M, et al. Prognosis of nonagenarian ICU patients: a Bayesian analysis of prospective European studies. Ann Intensive Care. 2025;15:62. doi:10.1186/s13613-025-01496-2
Wasserstein RL, Lazar NA. The ASA statement on p-values: context, process, and purpose. Am Stat. 2016;70(2):129-133. doi:10.1080/00031305.2016.1154108
Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337-350. doi:10.1007/s10654-016-0149-8
Goodman SN. Toward evidence-based medical statistics. 1: the P value fallacy. Ann Intern Med. 1999;130(12):995-1004. doi:10.7326/0003-4819-130-12-199906150-00008
Walsh M, Srinathan SK, McAuley DF, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. J Clin Epidemiol. 2014;67(6):622-628. doi:10.1016/j.jclinepi.2013.10.019
Ridgeon EE, Young PJ, Bellomo R, Mucchetti M, Lembo R, Landoni G. The Fragility Index in multicenter randomized controlled critical care trials. Crit Care Med. 2016;44(7):1278-1284. doi:10.1097/CCM.0000000000001670
Reep CAT, Wils EJ, Heunks L. Opportunities, challenges and future perspectives for target trial emulation in critical care clinical research. Crit Care. 2025;29(1). doi:10.1186/s13054-025-05723-x
Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758-764. doi:10.1093/aje/kwv254
Clinical Disclaimer
The content above is for educational purposes only and is not intended to replace clinical judgment, institutional protocols, or care delivered by qualified healthcare professionals. Patient care decisions should always be individualized, made in collaboration with the full interprofessional team, and aligned with current local guidelines, regulatory standards, and the patient’s clinical context. ICCN is not responsible for clinical actions taken solely on the basis of this article.
Javier Amador-Castaneda, BHS, RRT, FCCM | Founder & CEO, ICCN



