Skip to the content.

TRiRL is a half-day interdisciplinary workshop that aims to push the boundaries of our understanding of brain's ability to represent time and explore the potential benefits for reinforcement learning.

About

The ability to perceive and estimate temporal dynamics can be considered as one of the central elements of intelligent biological agents – equipped with a model of their environment. Similarly, if one takes the view that an agent’s (internal) model is its primary guide to behaviour, the ability to learn appropriate temporal representations and employ them for action selection is a crucial consideration in reinforcement learning (RL).

Currently, RL agents have matured such that model-based unsupervised approaches achieve competitive and even SOTA behaviours (for instance, MuZero - Schrittwieser et al. 2020, or DreamerV2 - Hafner et al. 2020). However, these models tend to operate over a physical timescale aligned with the shift in environment dynamics. Consequently, further considerations are required to mimic the types of spatio-temporal representations observed in neuronal responses – operating at both subjective and objective (physical) timescales. Indeed, a large amount of neuroimaging and modeling studies in cognitive science have been focused on explaining temporal representations and how they influence human behaviour (using neural networks and Bayesian inference) e.g., Jazayeri & Shadlen (2010), Roseboom et al. (2019), Deverett et al. (2019), Fountas et al. (2022).

TRiRL will bring together experts in model-based RL and neuroscientists working on the brain’s ability to represent time, in order to exchange insights, brainstorm, and encourage a multi-angle discussion on this important topic.

Speakers

Snow

Richard Sutton

DeepMind, Amii and University of Alberta

"Some Foundations of Temporal Representations"

Snow

Marc Howard

Boston University

"Temporal memory in the brain and reinforcement learning"


A large body of modeling work has focused on temporal representations in memory, including working memory and episodic memory. There is extensive evidence that neurons in a number of brain regions, including hippocampus, mPFC, lPFC and striatum, fire sequentially, forming a temporal memory of what happened when in the recent past. More recent evidence from the entorhinal cortex shows neurons that, rather than firing sequentially, are perturbed by a stimulus and then relax back to baseline at different rates. In the context of reinforcement learning, this population behaves as an eligibility trace, but with a spectrum of decay rates. At a deeper level, the graded heterogeneity of decay rates lets us identify this population with the real Laplace transform of the recent past. We propose a simple associative model to use this temporal memory to store and retrieve temporal predictions of the future. This form of association could form the core of a new generation of RL models that take temporal representations of what happened when to predict what will happen when over an extended future.

Snow

Ida Momennejad

Microsoft Research

Temporal Abstraction in Biological and Artificial RL


Biological and artificial reinforcement learning rely on varieties of temporal abstraction. Common examples of temporal abstraction in computer science and cognitive neuroscience, among others, include associative inference (AB, BC -> AC), retrospective revaluation (ABC, BD -> AD), chunking, event segmentation, and related notions of transfer. I will first show behavioral, neural, and modeling results using RL approaches to temporal abstraction. These studies combine associative learning with multiscale predictive representations and varieties of replay. I will then show examples of temporal abstraction, for which current approaches are insufficient, and discuss ongoing and future RL solutions to capturing temporal abstraction in brains, behavior, and machines.

Schedule

Time Agenda Details
1:00 - 1:15 Introduction A short introduction to the topic and the workshop structure by the organisers.
1:15 - 3:00 Keynote speakers Three keynote presentations will last approxmately 35 minutes each including questions.
3:00 - 3:15 Break / Group allocation During a coffee break, the participants will be divided into moderated groups with the aim to maintain diversity in levels of seniority and field of expertise.
3:15 - 4:20 Group discussions All groups will be given a list of open problems in the field to discuss and propose solutions.
4:25 - 4:55 Panel with questions Each group will nominate representative to present the outcome of the discussion and defend the group’s position to the rest of the workshop participants, including online participants.
4:55 - 5:00 Closing remarks In closing remarks, the results of the panel discussion will be summarised.
5:00 Social event Participants who are physically present will be encouraged to attend a social event organised by the workshop to continue the discussion

RLDM requires speakers and active participants to be physically present in the workshop. However, we plan to stream the whole program and we encourage online participants to submit questions online, which we will try to convey during the group and panel discussions. Finally, the most important outcomes of TRiRL will be formally described in opinion papers organised by the group moderators and written by willing participants after the workshop.

Organisers

Forest

Noor Sajid
UCL & Huawei

Mountains

Warrick Roseboom
University of Sussex

Mountains

Panagiotis Tigas
University of Oxford

Mailing list

Sign up to receive the latest updates on the event (programme announcement and livestream):

References

  1. Deverett, B., Faulkner, R., Fortunato, M., Wayne, G. & Leibo, J. Z. (2019), ‘Interval timing in deep reinforcement learning agents’, Advances in Neural Information Processing Systems 32.
  2. Fountas, Z., Sylaidi, A., Nikiforou, K., Seth, A. K., Shanahan, M. & Roseboom, W. (2022), ‘A predictive processing model of episodic memory and time perception’, (in press) Neural Computation.
  3. Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H. & Davidson, J. (2019), Learning latent dynamics for planning from pixels, in ‘International conference on machine learning’, PMLR, pp. 2555–2565.
  4. Jazayeri, M. & Shadlen, M. N. (2010), ‘Temporal context calibrates interval timing’, Nature neuro- science 13(8), 1020–1026.
  5. Roseboom, W., Fountas, Z., Nikiforou, K., Bhowmik, D., Shanahan, M. & Seth, A. K. (2019), ‘Activity in perceptual classification networks as a basis for human subjective time perception’, Nature communications 10(1), 1–9.