02-24
The Value Alignment Problem in Artificial Intelligence

Much of our success in artificial intelligence stems from the adoption of a simple paradigm: specify an objective or goal, and then use optimization algorithms to identify a behavior (or predictor) that optimally achieves this goal. This has been true since the early days of AI (e.g., search algorithms such as A* that aim to find the optimal path to a goal state), and this paradigm is common to AI, statistics, control theory, operations research, and economics. Loosely speaking, the field has evaluated the intelligence of an AI system by how efficiently and effectively it optimizes for its objective. This talk will provide an overview of my thesis work, which proposes and explores the consequences of a simple, but consequential, shift in perspective: we should measure the intelligence of an AI system by its ability to optimize for our objectives.

In an ideal world, these measurements would be the same -- all we have to do is write down the correct objective! This is easier said than done: misalignment between the behavior a system designer actually wants and the behavior incentivized by the reward or loss functions they specify is routine, it is commonly observed in a wide variety of practical applications, and fundamental, as a consequence of limited human cognitive capacity. This talk will build up a formal model of this value alignment problem as a cooperative human-robot interaction: an assistance game of partial information between a human principal and an autonomous agent. It will begin with a discussion of a simple instantiation of this game where the human designer takes one action, write down a proxy objective, and the robot attempts to optimize for the true objective by treating the observed proxy as evidence about the intended goal. Next, I will generalize this model to introduce Cooperative Inverse Reinforcement Learning, a general and formal model of this assistance game, and discuss the design of efficient algorithms to solve it. The talk will conclude with a discussion of directions for further research including applications to content recommendation and home robotics, the development of reliable and robust design environments for AI objectives, and the theoretical study of AI regulation by society as a value alignment problem with multiple human principals.

Bio: Dylan is a final year Ph.D. student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell. His research focuses on the value alignment problem in artificial intelligence. His goal is to design algorithms that learn about and pursue the intended goal of their users, designers, and society in general. His recent work has focused on algorithms for human-robot interaction with unknown preferences and reliability engineering for learning systems.

*Please note, this event is only open to the Princeton University community.

Lunch for talk attendees will be available at 12:00pm.
To request accommodations for a disability, please contact Emily Lawrence, emilyl@cs.princeton.edu, 609-258-4624 at least one week prior to the event.

Date and Time

Monday February 24, 2020 12:30pm - 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Event Type

CS Department Colloquium Series

Speaker

Dylan Hadfield-Menell, from University of California, Berkeley

Host

Tom Griffiths

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

02-24 The Value Alignment Problem in Artificial Intelligence

02-24
The Value Alignment Problem in Artificial Intelligence