• Tuesday, September 20th 2016 at 15:00 - 16:00 UK (Other timezones)
  • General participation info   |   Participate online   |   + Phone in United States: +1 (571) 317-3129 Australia (toll-free): 1 800 193 385 Australia: +61 2 8355 1020 Austria (toll-free): 0 800 202148 Belgium (toll-free): 0 800 78884 Canada (toll-free): 1 888 455 1389 Denmark (toll-free): 8090 1924 France (toll-free): 0 805 541 047 Germany (toll-free): 0 800 184 4222 Israel (toll-free): 1 809 454 830 Italy (toll-free): 800 793887 Netherlands (toll-free): 0 800 020 0182 New Zealand (toll-free): 0 800 47 0011 Norway (toll-free): 800 69 046 Poland (toll-free): 00 800 1213979 Portugal (toll-free): 800 819 575 Spain (toll-free): 800 900 582 Switzerland (toll-free): 0 800 740 393 United Kingdom (toll-free): 0 800 169 0432 United States (toll-free): 1 877 309 2073 Access Code: 731-636-357 Audio PIN: Shown after joining the meeting Meeting ID: 731-636-357

Optimal response timing is often uncertain, but can be learned by responding at different moments in time and evaluating the outcome (reinforcement-based timing).  In this process, one is confronted with a large action space.  The dimensionality of this space, however, can be reduced through temporal generalization.  One very efficient approach, validated for Pavlovian learning, is to use a set of temporal basis functions to approximate expected value as a function of time.   We employed this approach to understand how humans deal with cognitive constraints and uncertainty when searching for optimal response timing.  Humans’ sampling trajectories were best explained by a simple memory constraint: action values for infrequently chosen response times decayed, reverting toward the prior expectation and erasing the reinforcement history.  Contrary to our expectations, this cognitively constrained model foraged successfully in novel simulated environments.  An agent thus constrained represents one or more hypotheses about the best response times instead of a high-fidelity representation of all possible rewards.   In simulations, uncertainty-driven exploration yielded a more precise representation of the environment early in learning but conferred no foraging advantage over random exploration.  Humans generally avoided uncertain areas of the time interval to a greater extent than dictated by their reward value.  Neither value decay nor uncertainty aversion were explained by choice autocorrelation.

 

Michael Hallquist and Alexandre Dombrovski: Strategic Exploration/exploitation of a Temporal Instrumental Contingency (SCEPTIC) under uncertainty and cognitive constraints

Leave a Reply