Hire an Illini

Advisor:

Departments:

Areas of Expertise:

Thesis Title:

Methods for Efficient Imitation and Exploration in Reinforcement Learning Algorithms

Thesis abstract:

Reinforcement Learning (RL) agents are tasked with maximization of long-term environmental rewards. Since the reward function is the sole source of external supervision, its characteristics critically impact the performance of RL algorithms. If the rewards are only sparsely available, this necessitates strong exploration to discover useful learning signals and avoid local optima during policy search. If the rewards are delayed in an episode, then temporal credit-assignment becomes a challenge due to large bias or high variance in the RL gradients. If the rewards are misspecified, this could produce agents that exhibit unintended behaviors. This thesis proposes algorithms that accommodate these complexities via techniques such as population-based training (inspired by Neuroevolution), exploration using an ensemble of diverse interacting agents, and better credit-assignment by capitalizing on past trajectories (self-imitation). Different from RL, imitation learning (IL) trades off designing the reward function with collection of expert demonstrations of the task, which is arguably easier in many environments. To accelerate the adoption of IL algorithms, we present approaches that work under weaker assumptions than usual. These include learning in partially observable environments and tolerance to discrepancies between the transition dynamics of the expert and the imitator.

Downloads:

limitless