Action-conditioned world models

A discussion here about causality and clinical trials came up a few months ago, and now a similar discussion is taking place on causality and artificial intelligence. In particular, here, Pearl notes that any aim to build models that “understand the world” requires a causal component. Here, in response, Lecun said that world models in optimal control (i.e., reinforcement learning (RL)) are “action conditioned and hence causal.” I think it is possible to agree with Lecun and also with Pearl more broadly.

RL is usually expressed with probability without causal notation, because the causal model is innate in RL structure. In other words, by default, RL probability distributions are interventional (i.e., although they are written $P(s'|a,s),$ they actually represent $P(s'|do(a),s)$ , because we know the agent is taking actions). The same applies to clinical trials, which are often single-stage RL environments. I.e., if one estimates a treatment effect from a clinical trial, it is implicit within the idea of a “trial” that the treatment was assigned to participants and not observed. However, the causal inference literature considers the default probability distribution to be observational (i.e., $P(s'|a,s)$ ), and hence Pearl considers both RL and clinical trials to be lacking the causal language necessary to describe interventions (and their extensions, counterfactuals). It is correct that the causal notation is missing from these research areas, but RL and trials do embed causality into their models.

In summary, I agree with Lecun that RL has interventional distributions despite the lack of causal notation, and I think that trials do as well. Overall though, I recommend formally distinguishing observational and interventional distributions, and hence agree w/ Pearl more generally that causal structure must be defined with some formal representation (e.g., directed acyclic graphs) and described with specific notation (e.g., $do(a)$ , potential outcomes).

How does this apply to patient care?

Overall, a formal causal language leads to more rigor in the description and interpretation of studies in the medical literature. Consider this recent study describing an “association” between childhood caries and heart disease, which then, in a medscape article, was described as a “link,” and then finally made it to my email inbox as “Childhood cavities may affect adult heart disease risk”.

Further, when we begin to talk about treatment models, and the difference between online reinforcement learning (which is similar to clinical trials, although we are trying to find an optimal treatment policy rather than estimate a treatment effect) and offline reinforcement learning (e.g., see Murphy‘s work, which is similar to observational data analysis, except we are estimating a treatment policy instead of observational treatment effect), it becomes important to formalize causal notation, because offline reinforcement learning (like the analysis of observational data for treatment effects) depends upon causal assumptions, which must be stated.

Action-conditioned world models

Leave a comment Cancel reply