Abstract:
As robotics progresses, the need for more adaptable and generalizable learning approaches becomes critical. In this talk, we explore the trajectory of robotic learning methodologies, starting from data generation through imitation to the integration of multimodal systems. We begin by examining approaches that leverage simulation to generate training data, wherein robots imitate optimal behaviors to acquire skills that generalize effectively to real-world environments. These methods form the basis for advancing reinforcement learning architectures, particularly transformer-based policies, which demonstrate significant improvements in robotic navigation and decision-making tasks. To address the inherent limitations of isolated learning paradigms, we explore the synergy between imitation learning and reinforcement learning. This hybrid approach enables the fine-tuning of learned behaviors, allowing for adaptation to new tasks and environments where sparse rewards challenge traditional reinforcement learning models. Moreover, we consider the challenge of embodiment, presenting techniques that allow a single policy to generalize across multiple robot configurations, pushing the boundaries of what can be achieved with unified policies. We conclude by addressing the limitations of templated task descriptions in robotics, and the role of vision-language models (VLMs) in advancing toward true open-world understanding. By training and leveraging VLMs that are designed with robotics tasks in mind, we enable robots to interpret and act upon flexible, open-ended instructions, moving beyond constrained environments and objects toward real-world applications involving novel objects and tasks. These contributions collectively represent a step forward in developing robots capable of more intuitive and versatile interaction within dynamic and unstructured environments.
Biography:
Kiana Ehsani is a Senior Research Scientist at the Allen Institute for AI (PRIOR), specializing in Embodied AI and robotics. She earned her PhD in Computer Science from the University of Washington under the guidance of Ali Farhadi. Kiana’s research spans computer vision, machine learning, and AI, with a particular focus on enabling robots to perceive and manipulate their environments using visual and multimodal inputs. Her contributions have been presented at leading conferences such as CVPR, NeurIPS, CoRL, and ICRA. In addition to her technical achievements, Kiana is dedicated to pushing the boundaries of robotic manipulation and multimodal AI.