Tutorial15 min read

From CSV to Counterfactual: A Step-by-Step Walkthrough

Upload a marketing attribution dataset, let CausoAI discover the causal graph, validate readiness, estimate effects, and run a what-if simulation — in under 5 minutes.

This walkthrough uses a simulated marketing attribution dataset to show you the full CausoAI workflow from raw CSV to AI-powered causal insights. The dataset has 2,400 rows with columns for customer segment, email frequency, ad spend, prior purchase history, seasonality index, and 30-day revenue.

Our question: does increasing email frequency from 2x to 4x per week causally increase 30-day revenue — and for which customers?

Step 1: Upload and Profile Your Data

After logging in, click "New Analysis" and upload your CSV. CausoAI immediately profiles the dataset: it computes summary statistics for every column, checks for missing values, detects column types (numeric, binary, categorical), and infers likely roles.

For our dataset, the profiler automatically identifies:

  • email_frequency as a likely treatment variable (binary split with ~45% treated at 4x)
  • revenue_30d as a likely outcome variable (numeric, right-skewed)
  • prior_purchases, customer_segment, and seasonality_index as likely confounders
  • customer_id as an ID column to exclude from analysis

You can accept the inferred roles or override them. In this case, the profiler got everything right. Click "Save Schema" to continue.

Step 2: Discover the Causal Graph

Navigate to the DAG tab and click "Discover Graph." You'll choose an algorithm:

  • PC Algorithm: constraint-based, uses conditional independence tests. Good general-purpose choice
  • GES: score-based, optimizes BIC. Slightly more reliable on smaller datasets
  • NOTEARS: continuous optimization, faster on wider datasets with many columns
  • Granger: for time-series data with a time column

We select PC. The algorithm runs (typically 10–30 seconds for a 2,400-row dataset) and returns a causal graph. The discovered graph shows: prior_purchases → email_frequency (engaged customers receive more emails), seasonality_index → revenue_30d, prior_purchases → revenue_30d, and email_frequency → revenue_30d.

Notice that email_frequency and revenue_30d have a direct causal path — which is what we want to estimate. Prior_purchases is a confounder: it affects both who receives high-frequency emails and how much they spend.

Always review the discovered graph with domain knowledge before running the analysis. Algorithms can suggest incorrect edge directions. In this case, the edges all make intuitive sense: past purchases predict future spend, and the platform assigns higher email frequency to more engaged customers.

Step 3: Check the Causal Readiness Score

After saving the graph, CausoAI computes the CRS. Our dataset scores 84/100:

  • Layer 1 (Coverage): 22/25 — Good sample size, low missingness. Slight penalty because outcome is right-skewed with 8% of values at exactly $0
  • Layer 2 (Identifiability): 25/25 — Backdoor criterion satisfied. No colliders in adjustment set. Temporal ordering looks correct
  • Layer 3 (Feasibility): 18/20 — Estimator (AIPW) can run reliably. Slight penalty for minor treatment imbalance (45% vs 55%)
  • Layer 4 (Power/Overlap): 16/20 — Propensity score overlap is good for most of the sample. Small region of poor overlap for customers with 10+ prior purchases
  • Layer 5 (Robustness): 3/10 — Sensitivity analysis suggests a moderate-sized unmeasured confounder (e.g., customer acquisition channel) could attenuate the estimated effect

A score of 84 puts us in the "high confidence" band. The Layer 5 warning about unmeasured confounders is worth noting — if you have acquisition channel data, add it to the schema before the final run.

Step 4: Run the Causal Effect Estimation

Click "Run Analysis." CausoAI auto-selects AIPW (N = 2,400 ≥ 500, binary treatment). The pipeline runs in the background — for this dataset, it completes in about 45 seconds.

Results:

  • Average Treatment Effect (ATE): +$34.20 per customer per month (95% CI: $22.80 – $45.60)
  • Relative effect: +18.3% increase in 30-day revenue
  • Effect is statistically significant (p < 0.001)
  • Feature importance: prior_purchases accounts for 61% of the propensity model, confirming it's the dominant confounder

Heterogeneous Effects

With "Subgroup Discovery" enabled, CausoAI also fits a DR-Learner and runs a CATE decision tree. The most important finding: the effect is concentrated in customers with 1–3 prior purchases (+$51 ATE). Customers with 0 prior purchases show no significant effect (+$3, CI crosses zero). High-value customers with 4+ prior purchases show a moderate effect (+$28).

This means blanket high-frequency email sends to your full list are likely wasteful for new and inactive customers. The targeting recommendation is clear.

Step 5: Run the What-If Counterfactual Simulator

Navigate to the Simulator tab. The question: "If we increase email frequency from 2x to 4x per week for all customers in the 1–3 prior purchases segment, what is the expected revenue impact?"

Set treatment value to 1 (4x frequency), restrict to the segment. The simulator returns:

  • Predicted revenue change: +$51.20 per customer per month
  • 95% confidence interval: +$38.40 – +$64.00
  • Segment size: 740 customers
  • Total monthly revenue impact: +$37,888 (95% CI: +$28,416 – +$47,360)

This is the number you can take to leadership: "Increasing email frequency for mid-funnel customers adds approximately $38K–$47K in monthly revenue with high confidence."

Step 6: Get AI-Powered Insights

Navigate to the Insights tab. Claude generates three artifacts:

Executive Summary

Example output: "Increasing email send frequency from twice to four times weekly causally increases 30-day revenue by an average of $34.20 per customer (95% CI: $22.80–$45.60). The effect is strongest for customers with 1–3 prior purchases, where the expected lift is $51.20/customer/month. There is no significant causal effect for first-time visitors or high-frequency buyers."

Assumptions

Claude explains that the estimate assumes all confounders (prior purchases, segment, seasonality) have been controlled for, and that no unmeasured variables — such as acquisition channel — simultaneously drive email assignment and revenue. It flags the Layer 5 sensitivity warning and recommends adding acquisition channel to the schema if available.

Recommendations

  • Increase email frequency to 4x/week for all customers with 1–3 prior purchases. Expected monthly incremental revenue: ~$38K
  • Do not increase frequency for customers with 0 prior purchases — no causal effect detected and risk of unsubscribes
  • Run a follow-up analysis adding acquisition channel as a covariate to tighten the estimate and improve Layer 5 score
  • Test a 3x/week frequency for high-value customers (4+ purchases) to balance lift and email fatigue

What You Just Did

In under 5 minutes, you went from a CSV to a validated causal estimate with confidence intervals, a segmented treatment effect, a revenue impact forecast, and actionable recommendations — all automatically. The same workflow would have taken a data scientist several days using DoWhy, EconML, and custom validation scripts.

That's the core value of CausoAI: not replacing statistical rigor, but making it fast enough to be part of a regular analytics workflow rather than a one-off research project.

Ready to apply causal inference to your data?

CausoAI takes you from CSV to causal insights in minutes — no data science background required.

Start Free Trial