Research12 min read

AIPW vs PSM: When to Use Each Estimator

Doubly Robust estimation (AIPW) and Propensity Score Matching (PSM) both estimate causal effects, but they make different trade-offs. Learn when to choose each.

When you run a causal analysis on binary treatment data — did the customer receive the campaign or not, did they get the discount or not — CausoAI automatically selects between two estimators: AIPW (Augmented Inverse Propensity Weighting) and PSM (Propensity Score Matching). The rule is simple: N ≥ 500 uses AIPW; N < 500 uses PSM.

But why? Understanding the logic behind this decision helps you interpret results, spot when something looks off, and know when to override the default.

What Both Estimators Have in Common

Both AIPW and PSM are designed to estimate the Average Treatment Effect (ATE) — the expected change in outcome from moving someone from the control condition to the treatment condition. Both are based on the propensity score: the probability that a unit receives treatment, given its observed covariates.

The propensity score is used to balance the treatment and control groups, making them comparable on observed confounders. The key assumption both methods share is unconfoundedness: there are no unmeasured variables that influence both treatment assignment and the outcome.

Propensity Score Matching (PSM)

PSM works by pairing each treated unit with one or more control units that have a similar propensity score. The treatment effect is estimated as the average difference in outcomes between matched pairs.

The intuition is straightforward: if a treated customer and an untreated customer were equally likely to receive treatment (based on all observed covariates), then the difference in their outcomes is attributable to the treatment itself.

Strengths of PSM

  • Intuitive and interpretable — matched pairs are easy to explain to non-technical stakeholders
  • Works well with small samples where more complex models would overfit
  • Transparent: you can inspect the matched pairs and verify balance
  • Robust when the outcome model is misspecified, as long as the propensity model is correct

Weaknesses of PSM

  • Discards unmatched units, reducing effective sample size and statistical power
  • Sensitive to choice of matching algorithm (nearest neighbor, caliper, kernel)
  • Can be inefficient — throwing away data that contains information about the treatment effect
  • With large samples, the propensity model needs to be well-specified or estimates are biased

AIPW (Doubly Robust Estimation)

AIPW combines two models: a propensity score model (probability of treatment given covariates) and an outcome model (expected outcome given treatment and covariates). The "augmented" part adds a correction term based on the outcome model residuals.

The key property that gives AIPW its name — "doubly robust" — is this: the estimator produces unbiased estimates if either the propensity model OR the outcome model is correctly specified. You don't need both to be correct, just one of them.

Double robustness is valuable because real-world data is messy. Your propensity model might miss some confounders, or your outcome model might misspecify the functional form. AIPW gives you two chances to get the estimate right.

Strengths of AIPW

  • Doubly robust: consistent if either the propensity or outcome model is correct
  • More statistically efficient than PSM — uses all the data, not just matched pairs
  • Achieves semiparametric efficiency bounds — optimal variance under correct model specification
  • Naturally handles continuous and high-dimensional covariates without manually tuning a matching algorithm

Weaknesses of AIPW

  • Requires enough data to reliably estimate two models — unreliable with N < 500
  • If both models are misspecified, the estimate can be worse than PSM
  • Less interpretable to stakeholders unfamiliar with weighting estimators
  • Sensitive to extreme propensity scores near 0 or 1 (poor overlap)

The N < 500 Threshold

CausoAI uses N = 500 as the cutoff because this is where the tradeoffs flip. With small samples:

  • AIPW's two-model approach becomes unstable — propensity and outcome models overfit, producing high-variance estimates
  • PSM's matching approach is less affected by model instability because it operates on the propensity score directly
  • The sample size penalty from discarding unmatched units in PSM is less severe when total N is small

Above N = 500, AIPW's efficiency advantage dominates. It uses all the data, and the larger sample stabilizes both underlying models. The doubly robust property also becomes more valuable: with more data, both models are better specified, and the correction term adds meaningful variance reduction.

Practical Implications for Analysts

In practice, the choice between AIPW and PSM matters most in three situations:

1. You have poor overlap

Overlap means that for every combination of covariate values, there are both treated and untreated units. Poor overlap — where certain subgroups only appear in treatment or control — hurts both estimators. AIPW can produce extreme weights; PSM fails to find matches. When CRS Layer 4 (power and overlap) flags this, consider restricting your analysis to the region of common support.

2. You're analyzing a small experiment or pilot

If you ran a pilot campaign with 200 treated users and 300 controls, PSM is likely the right choice. The estimates will have wide confidence intervals either way — that's unavoidable with small samples — but PSM will be more stable.

3. You want heterogeneous effects

Neither AIPW nor PSM natively produces Conditional Average Treatment Effects (CATE) — effects for subgroups. If you want to know "what's the effect for high-LTV customers vs. low-LTV customers," you need a different estimator (T-Learner, X-Learner, or DR-Learner). CausoAI handles this automatically when you enable heterogeneous effects in the analysis settings.

Summary

  • Both AIPW and PSM estimate the ATE for binary treatments using the propensity score
  • PSM is better for small samples (N < 500): more stable, more interpretable
  • AIPW is better for large samples (N ≥ 500): more efficient, doubly robust
  • Both assume unconfoundedness — all confounders are measured and included
  • CausoAI selects automatically, but you can see which estimator was used in the analysis results

Ready to apply causal inference to your data?

CausoAI takes you from CSV to causal insights in minutes — no data science background required.

Start Free Trial