AI Meets Autonomy: Vision, Language, and Autonomous Systems
An IROS 2024 Workshop

The workshop aims at creating value to the AI and robotics community by sharing knowledge and resources to realize embodied AI, with an emphasis on the intersection of vision, language, and autonomous systems. Accompanied by the fast-growing interest in embodied AI, an increasing number of researchers have advocated that the embodiment is essential to developing robust and generalizable computer vision algorithms, for it empowers the algorithms not only to observe but also to interact with the real world. We have seen a trend that recent conferences hosted embodied AI workshops and tutorials more frequently, such as the "CVPR 2022 Tutorial on Building and Working in Environments for Embodied AI" and the "CVPR 2023 Workshops on Synthetic Data for Autonomous Systems, 3D Vision and Robotics, embodied AI". This tutorial is not targeted at "reinventing the wheel" but compensating for the gap left by the existing ones -- bringing everybody's work a step closer to real-world deployment.

As of today, the community still lacks the knowledge and resources to build an agent that's robust and reliable for real-world tasks. Researchers rely heavily on datasets and simulations. Although the latest simulators are becoming increasingly photorealistic, they fall short when it comes to the complexity of real-world physical dynamics, interactions, and real sensors' noises and errors, which together contribute to the sim-to-real gap.

The workshop aims to provide the knowledge and resources for the community to realize Embodied AI - moving the study from datasets and simulations to real-world deployments. To this end, we plan to review materials in several aspects including robot autonomy stacks, simulation and real-world tools, vision-language-action datasets and models, and embodied intelligence. We expect to establish links among these materials and build bridges for AI and robotics researchers to connect their work with real-world systems. At the end of the workshop, we will have an open discussion on bridging the gap and deploying AI algorithms in reality.

We plan to showcase the results of the CMU Vision-Language-Autonomy Challenge on the workshop. The workshop will leverage the challenge systems and provide an example of autonomous navigation systems to the audience. Specifically, we will use a real robot with a 3D LiDAR and a 360 camera. The system has base autonomy onboard for vehicle state estimation, collision avoidance, terrain traversability analysis, and path following. Users can send waypoints to guide the autonomous navigation. In addition, we provide two simulation setups running the same autonomy stack. This allows users to develop AI algorithms in simulation and seamlessly migrate the algorithms to real-world systems. All of our systems are made public so that users can reproduce the systems to support their AI algorithm development. The robustness of our system has been validated. The work has won both the Best Paper Award and Best System Paper Award of RSS 2021, the Best Student Paper Award of IROS 2022, and helped the CMU-OSU Team win a "Most Sectors Explored Award" on DARPA Subterranean Challenge.

Content

The workshop will cover the following units:

Organizers and Speakers

Wenshan Wang
CMU Robotics Institute

Ji Zhang
CMU NREC & Robotics Institute

Shibo Zhao
CMU Robotics Institute

Haochen Zhang
CMU Robotics Institute

Abhinav Gupta
CMU Robotics Institute

Deva Ramanan
CMU Robotics Institute

Katerina Fragkiadaki
CMU Machine Learning Department

Xiaodan Liang
Sun Yat-sen University & MBZUAI

Andy Zeng
Google DeepMind

Dhruv Batra
Georgia Tech & Meta FAIR

Ayoung Kim
Seoul National University

Carlos Nieto-Granda
Army Research Laboratory