04-21
Jens Tuyls FPO

Jens Tuyls will present his FPO "From Expert Imitation to Autonomous Discovery in Intelligent Agents" on Tuesday, April 21, 2026 at 2 PM in CS 105 & Zoom.

Zoom Link: https://princeton.zoom.us/j/97513168637

The members of Jens’ committee are as follows:
Examiners: Karthik Narasimhan (Adviser), Benjamin Eysenbach, Tom Silver
Readers: Danqi Chen, Chi Jin

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

Learning from demonstrations and learning from experience are two learning paradigms that lie at the heart of today’s most capable AI systems. This thesis focuses on questions related to both of these paradigms: (1) how to recover expert behavior from demonstration data, (2) how to explore an environment without access to demonstration data, and (3) how to leverage expert knowledge to guide exploration.

Chapter 1 focuses on learning from expert demonstrations and asks the following question: What role do model and data size play in recovering the expert’s performance when training policies with behavioral cloning? We demonstrate that the same empirical scaling laws used in language modeling accurately describe the loss and return functions in behavioral cloning on single-agent games, allowing practitioners to predict the compute-optimal model and data size to reach a certain level of performance.

Chapters 2 and 3 introduce two novel reinforcement learning-based methods for exploring an environment without access to expert demonstrations. The first method uses explicitly disentangled exploration and exploitation phases combined with curiosity-based intrinsic rewards to explore the state space, setting a new state-of-the-art in challenging text game environments. The second method learns a large set of policies (or “skills”) through mutual information skill learning that, taken together, cover the state space of the MDP. In continuous control tasks, the method is competitive with current state-of-the-art skill-learning methods.

Finally, Chapter 4 studies the interplay between learning from expert data and autonomous exploration by asking the following question: Can the knowledge in language model representations, which is mostly obtained from expert data, help drive exploration toward learning novel behaviors? Our empirical results suggest deliberate exploration with reinforcement learning on top of pre-trained language models can be a promising path for learning novel behaviors beyond those of human experts.

Taken together, this thesis provides foundational empirical insight for learning from expert data as well as novel empirical methods for learning from experience through exploration.

Date and Time

Tuesday April 21, 2026 2:00pm - 4:00pm

Location

Computer Science Small Auditorium (Room 105)

Event Type

Final Public Oral

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

04-21 Jens Tuyls FPO

04-21
Jens Tuyls FPO