Andrew Barto: Reinforcement Learning

Projects with Andrew Barto will focus on pure reinforcement learning. Projects will start by exploring the strengths and weaknesses of different reinforcement learning algorithms, and then procede to more advanced topics. Examples can be found here.

Lab Website

Nathaniel Daw: Multiplayer games

Students will collect data from pairs of participants repeatedly playing rounds of "work-or-shirk" (a game reminiscent of rock-paper-scissors; see Dorris & Glimcher, Neuron 2005). By varying the payoffs participants expect for different outcomes, students can test whether the participants' overall strategy adjustments follow those predicted by classical game theory. Secondly, students may also examine trial-by-trial strategic adjustments, by fitting reinforcement learning models to the data (as in Hampton et al, PNAS 2008, Camerer & Ho, Econometrica 1999).

Lab Website

Anthony Dickinson and Simon Killcross: Habits and actions

Students will be running a learning experiments and gather data on the difference between habitual and goal-directed learning in humans. Time permitting, these data will be fitted by RL models, and the quality of fits to goal-directed and habitual parts of the task compared.

Lab Website

Michael Frank: Dopamine receptors

Projects with Michael Frank will concentrate on fitting reinforcement learning models to human data from probabilistic learning tasks. The data paper) come from both healthy controls and people with Parkinson's disease, and allow an investigation of the effects of the consequences of dopamine dysfunction. In addition, students will explore the effects of D1 and D2 receptors in the context of this data and a detailed, biophysically realistic model.

Lab Website

Adam Kepecs: How uncertainty boosts learning

According to statistical learning theory the rate of learning should depend on the current estimate of uncertainty: learn more when uncertain and less when certain. Students will explore various aspects of these theories by running psychophysical experiments and comparing the results to rat behavioral data, focusing on the trial-by-trial updating of decision strategies. Additionally, students can examine how simple extensions of reinforcement learning models can account for their data.

Lab Website

Yael Niv: The effects of noise on temporal difference learning.

Temporal difference (TD) learning is by now well-ingrained into our thinking about the role of dopamine in learning. However, TD models usually assume a fully observable state space, which is known to be an unrealistic simplification. In this project we will examine the effects of different sources of noise on TD learning. We will consider external noise (probabilistic rewards, as in Nakahara et al. (2004), Morris et al. (2004) and Fiorillo et al. (2003)), internal noise as a result of a noisy representation, and most importantly -- timing noise which is inherent in most learning scenarios (see the first part of Gallistel & Gibbon (2000)'s "Time, rate and conditioning" for a comprehensive review). Suggested directions:

  • Investigate robustness of tapped-delay line TD to each source of noise and compare to available data.
    Niv Y., Duff M.O. and Dayan P. -- The effects of uncertainty on TD learning [pdf]; Niv Y., Duff M.O. and Dayan P. (2005) -- Dopamine, uncertainty and TD learning [pdf]
  • (More advanced) Incorporate a semi-Markov framework and investigate scalar timing noise.
    Daw N.D., Courville A.C. and Touretsky D.S. (2003) -- Timing and partial observability in the dopamine system [pdf]; Daw N.D., Courville A.C. and Touretsky D.S. (2002) -- Dopamine and inference about timing [pdf]; Gibbon J., (1977) -- Scalar expectancy theory and Weber's Law in animal timing -- Psychological Review, 84, 279-325.;

Lab Website

Angela Yu: Neuromodulation and uncertainty

This project will concentrate on how the neurophysiological effects of ACh/NE at the cellular level can carry out computations at the systems level (signaling different kinds of uncertainty: expected and unexpected uncertainty, see Yu and Dayan 2005). Angela Yu's recent work on adaptive responding to a changing world provides a context in which one can think about this concretely. It turns out the necessary computations can be implemented approximately by leaky integrator neurons, and ACh/NE can play the roles of adjusting the relative weights of the feedforward and recurrent terms, which correspond at a systems level to their distinct uncertainty roles, and at a computational level to changing the time constant of integration over time.

Lab Website

Course content / syllabus

Monday :: Introduction to Animal Learning

Anthony Dickinson, Cambridge University: The psychology of animal learning

Lecture 1

  • Pavlovian Conditioning
  • Instrumental (operant) conditioning
  • Associative learning processes

Lecture 2

  • Distinguishing actions from habits
  • Pavlovian-Instrumental transfer
  • Associative-cybernetic model
  • Dual-process theories

Slides for Conditioning and Learning class and for Conditioning and Behaviour class.

Simon Killcross, University of Cardiff: The neurobiology of animal learning

Lecture 1 Pavlovian learning

  • Amygdala: aversive and appetitive
  • Neurobiology of omission schedules
  • Preparatory and consummatory conditioning
  • Neurobiology of associative processes

Lecture 2: Instrumental learning

  • Neurobiology of habits and goal-directed actions
  • Neural dissociations of outcome specific and general PIT

Tuesday :: Introduction to Reinforcement Learning (RL)

Andrew Barto, UMass Amherst

Lecture 1: basics

  • The reinforcement learning problem
  • Value functions / Policies
  • Bellman equation
  • Evaluating a decision tree: strengths and challenges
  • Dynamic programming: value of a fixed policy
  • Monte Carlo approaches: TD learning and Q learning

Lecture 2: advanced topics

  • Generalisation and the use of neural networks
  • Representation
  • Multiple controllers
  • Direct policy methods

Slides for Lecture 1 and Lecture 2.

Wednesday :: RL and neuromodulation

Yael Niv, Princeton University

Dopamine: from anhedonia to motivation

  • Rewards: The TD story
  • Dopamine and punishments
  • Wanting vs liking
  • Motivation

Slides for the lecture.

Angela Yu, Princeton University

Acetylcholine and Norepinephrine

  • Attention and Learning: the role of uncertainty
  • ACh & NE: expected and unexpected uncertainty

Quentin Huys, Columbia University


  • 5HT and reflexive actions
  • 5HT and temporal discounting
  • 5HT and DA in psychiatry

Slides for the lecture.

Thursday :: Neuroeconomics and multiplayer games

Nathaniel Daw, NYU

Lecture 1: The neurobiology of reinforcement learning

  • values from revealed preference
  • actors and critics in the brain
  • a single value function?
  • habits vs goal-directed behaviour
  • beyond habits: model-based decisions
  • beyond TD: non-normative choices

Lecture 2: Mutliplayer games and social effects

  • Introduction to game theory
  • Modelling the world
  • Reinforcement learning in a social context

Friday :: RL and psychiatry mini-symposium

Quentin Huys, Columbia University



Adam Kepecs, Cold Spring Harbor Laboratories

Anxiety: the role of uncertainty

  • Animal models of anxiety
  • Fear conditioning
  • The effects of uncertainty

Quentin Huys, Columbia University

Introduction: computational psychiatry


  • Anhedonia
  • Learned helplessness
  • Modelling and testing learned helplessness in humans

Slides for the lecture.

Michael Moutoussis, SW London & St. George's Mental Health NHST.


  • Severe mental illness, including Schizophrenia
  • Explanatory theories of psychosis
  • Paranoid psychosis
  • Psychological and Computational models of psychosis.

Jonathan Williams, Institue of Psychiatry


  • impulsivity / discounting: DA and 5HT
  • modelling in psychiatry...?

Michael Frank, University of Arizona

Parkinson's Disease

  • DA in Parkinson's
  • Receptor-specific effects of dopamine
  • Learning from rewards and punishments

Slides for the lecture.