Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies

Am J Epidemiol. 2017 Jan 1;185(1):65-73. doi: 10.1093/aje/kww165. Epub 2016 Dec 9.

Abstract

Estimation of causal effects using observational data continues to grow in popularity in the epidemiologic literature. While many applications of causal effect estimation use propensity score methods or G-computation, targeted maximum likelihood estimation (TMLE) is a well-established alternative method with desirable statistical properties. TMLE is a doubly robust maximum-likelihood-based approach that includes a secondary "targeting" step that optimizes the bias-variance tradeoff for the target parameter. Under standard causal assumptions, estimates can be interpreted as causal effects. Because TMLE has not been as widely implemented in epidemiologic research, we aim to provide an accessible presentation of TMLE for applied researchers. We give step-by-step instructions for using TMLE to estimate the average treatment effect in the context of an observational study. We discuss conceptual similarities and differences between TMLE and 2 common estimation approaches (G-computation and inverse probability weighting) and present findings on their relative performance using simulated data. Our simulation study compares methods under parametric regression misspecification; our results highlight TMLE's property of double robustness. Additionally, we discuss best practices for TMLE implementation, particularly the use of ensembled machine learning algorithms. Our simulation study demonstrates all methods using super learning, highlighting that incorporation of machine learning may outperform parametric regression in observational data settings.

Keywords: causal inference; machine learning; observational studies; super learner; targeted maximum likelihood estimation.

MeSH terms

  • Bias*
  • Causality*
  • Computer Simulation
  • Confounding Factors, Epidemiologic
  • Data Interpretation, Statistical*
  • Depression / psychology
  • Depression / therapy
  • Epidemiologic Research Design*
  • Exercise / psychology
  • Humans
  • Likelihood Functions*
  • Machine Learning*
  • Observational Studies as Topic / methods
  • Observational Studies as Topic / standards*
  • Observational Studies as Topic / statistics & numerical data
  • Propensity Score