Quick links

The explore/exploit dilemma in human reinforcement learning: Computation, behavior, and neural substrates

Date and Time
Thursday, March 30, 2006 - 4:30pm to 6:00pm
Fine Hall 101
Nathaniel Daw, from Gatsby Computational Neuroscience Unit
Robert Schapire
We have rather detailed, if tentative, information about how organisms learn from experience to choose better actions. But it is much less clear how they arrange to obtain this experience. The problem of sampling unfamiliar options is a classic theoretical dilemma in reinforcement learning, because the costs and benefits of exploring unfamiliar options (which are substantial and difficult to quantify) must be balanced against those of exploiting the options that appear best on current knowledge.

Using behavioral analysis and functional neuroimaging in a bandit task, we study how humans approach this dilemma. We assess the fit to participants' trial-by-trial choices of different exploratory strategies from reinforcement learning, and, having validated an algorithmic account of behavior, use it to infer subjective factors such as when subjects are exploring versus exploiting. These estimates are then used to search for neural signals related to these phenomena. The results support the hypothesis that exploration is encouraged by the active override of an exploitative choice system, rather than an alternative, computationally motivated hypothesis under which a single (putatively dopaminergic) choice system integrates information about both the exploitative and exploratory ("uncertainty bonus") values of candidate actions. Although exploration is ubiquitous, it is also difficult to study in a controlled manner: We seize it only through the tight integration of computational, behavioral, and neural methods.

Follow us: Facebook Twitter Linkedin