AI Meets Autonomy: Vision, Language, and Autonomous Systems
An IROS 2024 Workshop

With the rising popularity of Large Language Models (LLMs), Visual Language Models (VLMs), and other general foundation models, the workshop aims to explore the synergy between them and robotics. In the workshop, we would like to discuss how the recent advances from AI and CV communities could benefit robotic research - incorporating LLMs and VLMs in robotic systems could potentially make the systems more explainable, instructable, and generalizable. However, seamlessly integrating the two is still an open problem. The existing LLMs and VLMs fall short of the real-world knowledge that roboticists care about, for example, the physics properties, spatial relations, temporal causal relations, etc. These issues may be overcome by robotic systems, where real-world experiences that contain rich sensations and physical feedback can potentially bring AI to the next level.

The workshop aims to create an inclusive and collaborative platform for professionals, researchers, and enthusiasts alike to exchange ideas, share experiences, and foster meaningful connections within the AI and robotics community. It will feature a mix of presentations, open panel discussions, and a section of our vision-language-autonomy competition. Four invited speakers will discuss their related research, thoughts, and experiences in various directions of the combination of AI and autonomy systems, in a broad coverage of public datasets, benchmarks, software stacks, visual-language navigation, spatial reasoning, robot foundation models, and more.


There are a variety of ongoing efforts to improve the state-of-the-art in robot autonomy with the help of recent advances in AI, including new developments in areas such as human-robot interaction, vision-language navigation, multi-task planning, and knowledge acquisition and reasoning. This workshop will provide a common meeting point for different researchers and stakeholders working on the same problems to communicate with each other about how their philosophies are shaping the current solutions. In doing so, it will not only provide new discussion points and insights for established researchers but also motivate less-experienced researchers to get more involved in the field. This in turn will reflect on the improvement of the numerous applications that AI-powered autonomy can enable such as in healthcare and rehabilitation, human-robot collaboration, and industrial automation.

Although integrating AI and autonomy systems has multiple advantages, it also involves issues and challenges. First, the language or visual foundation models are shown to be insufficient in understanding the actual physics, spatial relationships, and causal relationships, which are critical to most of the robotic tasks. Second, deploying those large models on robotic systems requires intense GPU resources, especially if the processing needs to be real-time. Further, existing work combining LLMs/VLMs and robotics solely uses LLMs/VLMs as a tool to translate human instructions into executable plans. In reality, humans can provide more than instructions - such as explanation of the scene and correction of execution errors, in richer contexts such as multi-turn dialogues. The speakers and organizers in this workshop are leading researchers from diverse backgrounds. The talks and discussions will not only give us a concrete picture of the current research landscape but also promote new insights and directions.

Another key barrier to entry for AI and robotics research is the resources required for acquiring data and testing algorithms. Robust robot platforms with rich sensing modalities are required to collect data and test the algorithms. To lower the barrier, we plan to host the CMU Vision-Language-Autonomy Challenge over the span of a few months before IROS 2024 and present the first stage of results at IROS. We will release a large vision-language navigation dataset together with a navigation autonomy system. Specifically, the challenge will use a robot with a 3D LiDAR and a 360 camera. The system has base autonomy onboard (e.g. state estimation, terrain traversability analysis, collision avoidance, waypoint following) to support the high-level AI tasks. In addition, we provide two simulation setups running the same autonomy system.  This allows users to develop and integrate AI algorithms in simulation and seamlessly migrate the algorithms to real-world systems for deployment. All of our systems are made public so that users can reproduce the systems to support their AI algorithm development. The robustness of our system has been validated. The work has won both the Best Paper Award and Best System Paper Award of RSS 2021, the Best Student Paper Award of IROS 2022, and helped the CMU-OSU Team win a "Most Sectors Explored Award" on DARPA Subterranean Challenge.

We have invited four speakers to present their work in these areas. We will also invite the challenge teams to present their works in the form of poster sessions and spotlight talks. At the end of the workshop, the speakers will come together for a panel session where attendees can ask questions to the speakers and engage in discussion.

Topics of interest and open questions include but are not limited to the following:

These topics span a diverse range across the field of robotics while also intersecting with many other disciplines. This workshop will significantly expand the diversity of content at IROS by introducing an interdisciplinary perspective of cutting-edge technologies. The integration of robotics, language and vision will demonstrate how this fusion of technologies can enhance robotics research in a variety of real-world application areas.

Tentative Program

Organizers and Speakers

Wenshan Wang
CMU Robotics Institute

Ji Zhang
CMU NREC & Robotics Institute

Haochen Zhang
CMU Robotics Institute

Shibo Zhao
CMU Robotics Institute

Abhinav Gupta
CMU Robotics Institute

Deva Ramanan
CMU Robotics Institute

Katerina Fragkiadaki
CMU Machine Learning Department

Xiaodan Liang
Sun Yat-sen University & MBZUAI

Andy Zeng
Google DeepMind

Dhruv Batra
Georgia Tech & Meta FAIR

Siyuan Huang
Beijing Institute for General Artificial Intelligence

Ayoung Kim
Seoul National University

Carlos Nieto-Granda
Army Research Laboratory