05-20
Jimmy Wu FPO

Jimmy Wu will present his FPO "Spatial Representations for Learning Robotic Mobile Manipulation" on Tuesday, May 20, 2025 at 1:30 PM in CS 401 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/96975727674

The members of Jimmy’s committee are as follows:
Examiners: Szymon Rusinkiewicz (Adviser), Thomas Funkhouser (Adviser), Anirudha Majumdar
Readers: Jia Deng, Jeannette Bohg (Stanford University)

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis. 

Everyone is invited to attend his talk. 

Abstract follows below:
The development of intelligent mobile manipulators has long been a goal of robotics research, promising transformative applications. However, enabling robots to operate outside of controlled laboratory settings remains a formidable challenge. This dissertation explores the role of spatial representations in learning-based systems for mobile manipulation. We develop a sequence of spatially grounded frameworks that progressively increase robot capabilities. First, we introduce spatial action maps, an action representation that is spatially aligned with the robot’s state representation, allowing a policy to predict actions on a 2D map of possible target locations. This approach simplifies the perception-to-control mapping, yielding more sample-efficient reinforcement learning and improved task performance. We then extend this idea to multi-robot systems with spatial intention maps, in which the intentions of other robots are spatially encoded in the state representation. This enables implicit coordination between robots, leading to improved multi-agent task success and emergent cooperative strategies such as collision avoidance and task specialization. Next, we tackle dynamic manipulation tasks with multi-frequency spatial action maps, which operate on multiple temporal scales, allowing a single policy to exhibit both high-level deliberative planning and low-level reactive behaviors. Finally, we present TidyBot, a mobile manipulation system for open-world household cleanup that leverages foundation models in vision and language. We show that spatial grounding allows the robot to tap into the broad semantic understanding embedded in foundation models, enabling generalization to novel objects and scenarios. Together, these contributions demonstrate that spatially grounded learning is a powerful paradigm for robotic mobile manipulation, bringing us closer to practical, intelligent robotic systems capable of operating in the real world.

Date and Time
Tuesday May 20, 2025 1:30pm - 3:30pm
Location
Computer Science 401
Event Type

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List