Home Forums TCPW forum Fitting behavioural data (Nov 10th 2015)

This topic contains 10 replies, has 6 voices, and was last updated by  Thomas Wiecki 2 years, 7 months ago.

  • Author
  • #151

    Quentin Huys

    How to best fit behavioural data?

  • #442

    John Murray

    I have a couple general questions on this topic:

    1. Can one use Bayesian methods for model fitting when it’s not feasible/possible to calculate a likelihood function?

    2. How can one use multi-level models to fit individual-level data while taking advantage of data from the whole group? How can these methods reveal clusters within a group?

    • #455

      Quentin Huys

      Thank you for the questions!

      1. If you can’t specify the likelihood function at all then I’m not sure what model fitting would mean. But maybe you are thinking of the scenario where the likelihood function is too expensive to evaluate?

      2. We’ll attempt to provide some ways of doing just that today. There is some matlab code at http://www.quentinhuys.com/pub/emfit_151110.zip.

      • This reply was modified 2 years, 7 months ago by  Quentin Huys.
      • This reply was modified 2 years, 7 months ago by  Quentin Huys.
  • #452

    What are the best ways to combine choice and RT data into a single likelihood function when fitting a model?

    • #457

      Quentin Huys

      This comes down to specifying the link function, i.e. how your observables are related to the internal ‘model’ you are evaluating and fitting. One option is to have two seaparate ‘link’ functions which describe how your internal variable of interest is related to each of your two observables. This makes sense if you think that the RT of a particular choice is independent of the choice’s identity. If they are dependent, then you’d need to write a link function that describes this dependency, possibly with its own additional parameters to be fitted.

    • #460

      Thanks to both of you for the nice talks. Sorry, we didn’t quite unmute the microphone at the MPS-UCL centre at the end 🙂 But the implementation of DDM in Stan sounds like it might be a useful place to look for this kind of fitting.

      Looking at the DDM fitting procedure in Wiecki/Sofer/Frank 2013 seems to propose using a likelihood function from Navarro&Fuss 2009. Is this the same as is used in the Stan implementation? Is there a reference for the Stan-DDM?

      Am I also correct in assuming (from John’s question, above) that a likelihood function for a neural model is currently too computationally expensive to fit to behavioural data?

    • #463

      A couple of notes about HDDM in relation to Stan-DDM (note that I haven’t used it and don’t know which implementation they use):

      * A lot of work on HDDM went into tuning the samplers (we’re using Slice) to give good convergence (and it indeed is quite good even for complex models),
      * choosing good priors to give optimal parameter recoverability (this could be replicated in STAN),
      * Speed: The likelihood is optimized heavily for speed as well as the numerical integration for variability parameters st and sz (which I’d be surprised if they were included in Stan-DDM). In addition, NUTS requires evaluation of the gradient for each step which will be slower (it helps with convergence of course, but we get good convergence),
      * We err on the side of usability which makes multiple conditions and trial-by-trial regressions e.g. with neurodata very easy to do. That comes at the cost of flexibility though, so if you want to build a more complex model than what you can do with HDDM, you’re better off using Stan-DDM.

      HDDM uses PyMC2 which has a lot of API quirks we had to deal with (that’s what Nathaniel referred to in his talk). However, the interface provided by HDDM requires no knowledge of PyMC so that’s all invisible to the user. PyMC is certainly fast though as all the likelihoods are written in Fortran (which is hellacious to compile but with anaconda you can get binary packages).

      Separately, PyMC3 is a complete rewrite with completely different API which also offers autodiff and NUTS like Stan does. In addition, it allows you to declare discrete random variables, which Stan does not support (and there are no plans to do so), as well as sampling on the GPU. For more information, see here: http://pymc-devs.github.io/pymc3/getting_started/

  • #464

    Re: likelihood function for a neural model (or other complex model), the issue is that there may not be an explicit (analytic) closed form likelihood. I think this is what John is referring to above. Likelihood-free methods are available, in which synthetic likelihoods can be generated using approximate bayesian computation (though this may be indeed computationally costly for complex models). See for example Brandon Turner’s work.
    A deeper problem though is that regardless of the estimation method, the model needs to be identifiable – if you fit a complex nonlinear multi-parameter model to behavioral data, you will likely not be able to recover known generative parameters, given multicollinearity etc. Statistically, simpler models are better for quantitative fitting, so you are better off if you can summarize the properties of the complex model with a more minimal description.

    Re: packages for DDM, I echo Thomas’ all comments (surprise surprise), including the tradeoff between HDDM and Stan. Thomas has got HDDM well greased for applications using the DDM including neural regressors, within-subjects effects, and support for posterior predictive checks (which is important for model evaluation in complement to any statistic of relative model fit), and convergence is good. But if you want to implement other non-DDM models that are not already supported, might as well code it in Stan (or PyMC3; I haven’t experimented with that myself but it looks promising and I trust Thomas).

  • #468

    Thanks to both of you for the feedback – very helpful!

  • #469

    I think I oversold the DDM capability of Stan in my presentation. It’s new in the latest version or two, reasonably rudimentary and also not really documented yet. (There’s a few references in the manual and if you google around you can find some sample code.) It’s not really comparable to HDDM, for fitting DDMs. But for doing anything else, or implementing anything new, my experience is that Stan is vastly better than PyMC (v2) as a general purpose modeling engine; it doesn’t really require the sorts of tweaking and optimization that Thomas describes. For the case of HDDM, Thomas has already done that, but given my own experiences getting even reasonably simple multilevel RL models to mix in PyMC 2 I would have a hard time imagining jointly modeling anything else alongside the DDM using PyMC.

    Depending the level of detail you are interested in, another way to go at least to get started is to model (log) RTs alongside choices using something more rudimentary like a regression model, eg with RT ~ Normal(w * x , sigma) for some trial/subject-wise feature vector x. (In our experience with this kind of simple model it is important to “clean up” RTs a little bit, e.g. by omitting unreasonably slow or fast ones.) Then you might assume x contains features derived from the same model as the choices, eg chosen value or abs(chosen minus unchosen value) or whatever. Then you can swap this out for more detailed models that unpack the effects if you want, like DDM, or perhaps LBA which should be pretty easy to implement from scratch in Stan.

    The nice thing is, if you assume (as Quentin points out) that RTs are conditionally independent of choices given the shared latent variables, then this just amounts to adding more observable terms to the same RL model of choices (i.e. just summing likelihoods across a softmax choice rule and a Gaussian RT model and estimating as before). Note that you don’t have to (and shouldn’t) explicitly choose a relative weighting for either data source — the inference over softmax temperature and RT noise and the magic of Baysian cue combination will take care of that based on the variability in either observable. You could also compare models (which couple vs uncouple the underlying latent variables Q) to test the assumption that the two observables are windows on the same latents.

    Finally I’m not sure what you have in mind for a neural model but FWIW although in principle you could explicitly implement a DDM in Stan from first principles (ie model the accumulating variable and threshold explicitly, and this approach I think actually works for the LBA) it’s in practice too heavyweight and the actual implementation, like HDDM, relies on analytic results about the distribution of crossing times from Navarro. So some more detailed neural version of a DDM is probably a no go.

  • #470

    As an FYI to the last point: I have used a likelihood-free model for the Flanker and antisaccade task before (see http://ski.clps.brown.edu/papers/Dillon_PM_InPress_wSuppl.pdf). Yes, it’s pretty slow but worked well enough for MAP estimation and there are certain tricks you can do to speed things up. I used PDA by Turner et al (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3704297/) but there’s also more fancy options (see e.g. Ted Meed’s work, also open source: https://github.com/tedmeeds/abcpy). My unusable and undocumented research code for the Flanker model lives here: https://github.com/twiecki/accumodel

You must be logged in to reply to this topic.