Importance sampling, inverse probability weighting, and off-policy reinforcement learning

Much work in medical decision making (like the paper I mentioned last week), is based on an idea that spans disciplines.

In the statistical analysis of Monte-Carlo experiments, it’s called importance sampling (e.g., Kloek and van Dijk), where it involves a multiplication of a density by 1.

In the Causal inference / Dynamic Treatment Regimes/ Epi literature, it’s called inverse probability weighting1 (e.g., Horvitz–Thompson, Robins, Murphy, Pearl).

In reinforcement learning, it’s called “batch-mode,” “offline,” or “off-policy” learning (e.g., Precup, Ernst, Riedmiller, Lange).

All of these approaches pursue the same question:

What would have happened if things had been different?

Applications in healthcare abound. For example:

What would have happened if one had given drug X instead of drug Y?

What would have happened if one had tested for disease Z?

What would have happened if one had followed guideline A instead of guideline B?

These might seem like hypotheticals – if a patient already received a med, why do we want to know the counter factual? However, when combined with data, firm answers to these types of questions could revolutionize evidence-based medicine.

It could allow researchers to treat observational data like they treat data from randomized controlled trials.

This would be game changing, because observational data is routinely collected, comprehensive, and available for many treatments/exposures that can’t be randomized.

  1. Note that G-computation and inverse probability weighting are not the same, as I had originally included G-computation here!! Thanks to @Baersighting for his sharp eye!
    ↩︎

Leave a comment