PhD Thesis Proposal: Gregory Hyde

Jul

Thursday
9:00am - 11:00am ET

Rm 201, MacLean ESC (Rett’s Room)/Online

Optional ZOOM LINK
Meeting ID: 976 5521 2279
Passcode: 712400

"The Role of Non-Markov Rewards in Learning Representation"

Abstract

In this work, we challenge the misconception in Reinforcement Learning (RL) that the reward function ought to be defined after the state space representation has been chosen. Instead, we view the state as a representation informed by reward. We introduce algorithms for representational learning addressing scenarios where reward observations appear non-Markov, yet are revealed to be Markov upon identifying hidden aspects of the environment. The state representation, $S$, is informed by identifying \emph{hidden triggers} over histories of observations, actions, and rewards, $H_R = \{ \Omega \times A \times \mathbb{R}\}^*$, where rewards explicitly serve as memory. These \emph{hidden triggers} necessitate a shift in the agent's perspective, driven by the need to render reward observations Markov. We model the search for \emph{hidden triggers} as an Integer Linear Program (ILP), solved by leveraging powerful discrete optimizers.

We implement representational learning algorithms over weakly revealing observations and rewards in both RL and Inverse RL (IRL) scenarios. For our RL scenarios, we train agents in an online learning fashion where necessary representation must be dynamically acquired to build Markov reward prediction models. We empirically validate agent performance in these scenarios as well as demonstrate the effectiveness of learning representations informed by reward in instances of reward dependency. In our IRL scenarios, we relax the assumption that inferred reward functions be made Markov to the observation space. Rather, we learn latent representations of the decision-maker in tandem with their reward function motivator. We empirically validate our approach by inferring hidden ground truth representational knowledge leveraged by decision-makers when exhibiting behavior.

We propose future efforts motivated by our preliminary findings on incorporating rewards into representational learning. Namely, we propose to investigate Linear Program (LP) relaxations to our ILP formulation to afford scalability and explore how these representations might be leveraged for efficient training in RL scenarios. We also propose to represent $S$ as a model of hierarchical representation to capture long-range dependencies, identify multi-scale relationships, and promote efficient reasoning.

Thesis Committee

Eugene Santos Jr. (Chair)
George Cybenko
Vikrant Vaze
Prithviraj Dasgupta (Naval Research Laboratory, Washington DC)

Contact

For more information, contact Thayer Registrar at thayer.registrar@dartmouth.edu.