After running a causal analysis in CausoAI, one of the first things you see is the Causal Readiness Score — a number between 0 and 100. But what does it actually mean, and how should it change what you do with the results?
The CRS is a weighted aggregate of five validation layers, each of which probes a different dimension of your analysis. A score of 82 doesn't just mean "good" — it means something specific about where your analysis is strong and where it has gaps.
The Four Confidence Bands
- —80–100: High confidence. Suitable for budget allocation, strategy decisions, and executive reporting
- —60–79: Directionally reliable. Use to inform hypotheses and prioritize further investigation, not to act immediately
- —40–59: Use with caution. Significant assumptions are at play. Share caveats explicitly
- —Below 40: Do not act. The data or causal structure has critical gaps that need addressing first
Layer 1: Data Coverage (25%)
This layer checks whether your dataset has enough data to support a reliable estimate. It examines sample size, missingness rates, and variability across key variables.
Common gaps: fewer than 100 rows overall, treatment or outcome variable missing in more than 20% of rows, outcome variable with near-zero variance (everyone converted, or no one did).
Data coverage gaps are the most fixable. They usually mean you need more data, better data collection, or a narrower analysis scope.
Layer 2: Identifiability (25%)
Identifiability asks whether the causal effect is mathematically recoverable from the data you have, given the DAG structure. This layer checks whether the backdoor criterion is satisfied — meaning all backdoor paths from treatment to outcome are blocked by your adjustment set.
It also detects collider bias (conditioning on a variable that is caused by both treatment and outcome), temporal ordering violations (the treatment recorded after the outcome), and cycles in the DAG.
Identifiability gaps are the most conceptually serious. If the causal effect is not identified, no amount of additional data will help — you need to revisit the DAG structure.
Layer 3: Estimation Feasibility (20%)
This layer checks whether the auto-selected estimator can run reliably on your data. It evaluates treatment prevalence (a binary treatment with 2% treated units will have poor estimates), minimum sample size requirements for the chosen estimator, and whether the selected covariates include the necessary confounders.
A common feasibility issue: the analysis is technically identified (Layer 2 passes) but practically infeasible because there are only 20 treated observations — not enough for AIPW to fit stable models.
Layer 4: Statistical Power and Overlap (20%)
Power analysis checks whether you have enough observations to detect the effect size you're interested in. CausoAI computes effective sample sizes after weighting or matching and flags when the confidence intervals will be too wide to be actionable.
Overlap analysis checks the propensity score distribution — whether the treated and control groups have comparable covariate distributions. Poor overlap means that for some subgroups, you're extrapolating rather than estimating, and your ATE includes groups where the estimate is unreliable.
Layer 5: Robustness (10%)
Robustness checks run sensitivity analyses — refutation tests that ask: "How much would an unmeasured confounder need to influence both treatment and outcome to explain away the estimated effect?" A result that requires a large unmeasured confounder to be explained away is more robust than one that collapses with a small perturbation.
This layer carries the least weight (10%) because it's diagnostic: it helps you understand how sensitive your result is, but it doesn't invalidate the analysis on its own.
Reading the Gap List
More useful than the overall score is the gap list: CausoAI returns each individual gap with a severity (blocking vs. warning), the layer it belongs to, and a description of what to fix. Blocking gaps are the ones that should stop you from acting on results. Warning gaps reduce confidence but don't invalidate the analysis.
A common pattern: an analysis might score 71 overall but have a single blocking gap in Layer 2 (identifiability). The right response is to fix the DAG, not to proceed because "71 is decent." Always read the gap list, not just the total score.
What to Do When CRS is Low
- —Layer 1 low: collect more data, fix missing values, broaden the date range of your dataset
- —Layer 2 low: revisit the DAG — remove colliders from the adjustment set, check temporal ordering, look for unmeasured common causes
- —Layer 3 low: increase the proportion of treated observations, or reduce the number of covariates to the most important confounders
- —Layer 4 low: collect data from a more balanced population, or restrict analysis to the region of common support
- —Layer 5 low: run sensitivity analysis manually, or collect data on the suspected unmeasured confounder