Tutorial9 min read

The 5 Most Common Causal Graph Mistakes

Including a collider as a confounder. Reversing a causal arrow. Omitting a common cause. These mistakes invalidate your estimates — learn how to avoid them.

A causal graph (DAG) is only as good as the assumptions behind it. Get the structure wrong, and your causal estimates will be wrong — even if the statistical machinery is perfect. The bad news: DAG mistakes are easy to make. The good news: there are only a handful of patterns responsible for most errors.

CausoAI's 5-layer CRS validation catches many of these automatically. But understanding why they matter helps you build better graphs in the first place — and interpret validation warnings more confidently.

Mistake 1: Conditioning on a Collider

A collider is a variable that is caused by two other variables — it has two arrows pointing into it. If you include a collider as a confounder (add it to your adjustment set), you open a backdoor path that wasn't there before, biasing your estimate.

Example: You're estimating the effect of ad spend on sales. You include "lead quality" as a confounder. But lead quality is caused by both ad spend (ads attract leads) and sales effort (sales team qualifies leads). Lead quality is a collider. Including it creates a spurious correlation between ad spend and sales, inflating your estimated effect.

How to detect it: if a variable has arrows coming from both your treatment and your outcome (or their ancestors), it's a collider. Never include it in your adjustment set.

Mistake 2: Reversing a Causal Arrow

Causal direction matters. An arrow from A to B means A causes B. Reversing it — drawing the arrow from B to A — changes everything about which paths are open, which confounders need adjustment, and whether the effect is identified.

This mistake most often happens with variables that are correlated but where the direction is ambiguous from data alone. Customer satisfaction and retention are correlated — but does satisfaction cause retention, or does retention cause satisfaction (customers who stay reinterpret their experience positively)?

CausoAI's discovery algorithms can suggest a direction based on statistical independence tests, but they cannot guarantee causal direction from observational data alone. You need domain knowledge to orient edges — especially for contemporaneous variables measured at the same time point.

Mistake 3: Omitting a Common Cause

This is the most common source of confounding bias. A common cause (confounder) is a variable that causes both the treatment and the outcome. If you leave it out of the DAG and the adjustment set, the backdoor path it creates remains open and your effect estimate is biased.

Example: You're measuring the effect of email frequency on purchases. Customer engagement level causes both — engaged customers opt into more frequent emails AND are more likely to purchase regardless. If you don't include engagement as a confounder, you overestimate the effect of email frequency.

  • Ask yourself: "What would cause someone to be more or less likely to receive the treatment I'm studying?" Those variables are candidate confounders
  • Then ask: "Does that variable also affect the outcome independently?" If yes, include it
  • Use CausoAI's automatic confounder suggestion (based on data profiling) as a starting point, not a complete list

Mistake 4: Including a Mediator in the Adjustment Set

A mediator is a variable on the causal path from treatment to outcome — it's how (or part of how) the treatment produces its effect. If you include a mediator as a confounder, you block the path you're trying to measure.

Example: you're measuring the effect of a new onboarding email on 90-day retention. "Feature adoption" is on the causal path — the email causes feature adoption, which causes retention. If you include feature adoption as a confounder, you estimate the effect of the email holding feature adoption constant, which is not the total effect you care about.

The correct approach: to estimate the total effect, do not include mediators. To decompose the effect into direct and mediated components, use mediation analysis (available in CausoAI's advanced analysis features).

Mistake 5: Building a Graph with Cycles

A DAG must be acyclic — no variable can be its own ancestor. Cycles create logical contradictions: if A causes B and B causes A simultaneously, the causal effect of A on B is undefined.

In practice, cycles appear when analysts try to model feedback loops — pricing causes demand, which causes pricing adjustments. These are real dynamics, but they cannot be represented in a static DAG. The correct approach is to introduce time: price at time T causes demand at time T+1, which causes price adjustment at time T+2.

CausoAI automatically validates acyclicity when you save a graph. If a cycle is detected, the graph is rejected and you'll see an error pointing to the problematic edge.

Panel data with time-indexed variables is the standard solution for feedback loops. CausoAI's Difference-in-Differences analysis module is designed for exactly this case.

A Practical Checklist

  • For every variable in your adjustment set: check that it is not caused by both treatment and outcome (collider check)
  • For every edge: verify the causal direction using temporal ordering or domain knowledge
  • For the treatment variable: list all variables that determine who receives it, and add them as confounders if they also affect the outcome
  • For the treatment-outcome path: identify all intermediate variables and exclude them from the adjustment set unless doing mediation analysis
  • Check the CRS Layer 2 (identifiability) output for specific collider and backdoor violations

Ready to apply causal inference to your data?

CausoAI takes you from CSV to causal insights in minutes — no data science background required.

Start Free Trial