AI Meets Autonomy: Vision, Language, and Autonomous Systems
An IROS 2024 Workshop

With the rising popularity of Large Language Models (LLMs), Visual Language Models (VLMs), and other general foundation models, the workshop aims to explore the synergy between them and robotics. In the workshop, we would like to discuss how recent advances from the AI and CV communities could benefit robotics research - incorporating LLMs and VLMs in robotic systems could potentially make the systems more explainable, instructable, and generalizable, however, seamlessly integrating the two is still an open problem. The existing LLMs and VLMs fall short of the real-world knowledge that roboticists need to consider, such as the physical properties, spatial relations, temporal causal relations, etc. in an environment. These issues may be alleviated by integration with robotic systems, where real-world experiences that contain rich sensory input and physical feedback can potentially bring AI to the next level.

The workshop aims to create an inclusive and collaborative platform for professionals, researchers, and enthusiasts alike to exchange ideas, share experiences, and foster meaningful connections within the AI and robotics community. It will feature a mix of presentations, open panel discussions, and results and demos from our Vision-Language-Autonomy competition challenge. Four invited speakers will discuss their related research, thoughts, and experiences in various aspects at the intersection of AI and autonomous systems, with a broad coverage of topics such as: available datasets, benchmarks, software stacks, visual-language navigation, spatial reasoning, robot foundation models, and more.

Content

There are a variety of ongoing efforts to improve the state-of-the-art in robot autonomy with the help of recent advances in AI, including new developments in areas such as human-robot interaction, vision-language navigation, multi-task planning, and knowledge acquisition and reasoning. This workshop will provide a common meeting point for different researchers and stakeholders working on the same problems to communicate with each other about how their philosophies are shaping the current solutions. In doing so, it will not only provide new discussion points and insights for established researchers but also motivate less-experienced researchers to get more involved in the field. This in turn will reflect on the improvement of the numerous applications that AI-powered autonomy can enable such as in healthcare and rehabilitation, human-robot collaboration, and industrial automation.

Although integrating AI and autonomy systems has multiple advantages, it also involves issues and challenges. First, the language or visual foundation models are shown to be insufficient in understanding the actual physics, spatial relationships, and causal relationships, which are critical to most of the robotic tasks. Second, deploying those large models on robotic systems requires intense GPU resources, especially if the processing needs to be real-time. Further, existing work combining LLMs/VLMs and robotics solely uses LLMs/VLMs as a tool to translate human instructions into executable plans. In reality, humans can provide more than instructions - such as explanation of the scene and correction of execution errors, in richer contexts such as multi-turn dialogues. The speakers and organizers in this workshop are leading researchers from diverse backgrounds. The talks and discussions will not only give us a concrete picture of the current research landscape but also promote new insights and directions.

Another key barrier to entry for AI and robotics research is the resources required for acquiring data and testing algorithms. Robust robot platforms with rich sensing modalities are required to collect data and test the algorithms. To lower the barrier, we plan to host the CMU Vision-Language-Autonomy Challenge over the span of a few months before IROS 2024 and present the first stage of results at the workshop. We will release a novel vision-language grounding dataset together with an award-winning navigation autonomy system. Specifically, the challenge will use a robot with a 3D LiDAR and a 360 camera. The system has base autonomy onboard (e.g. state estimation, terrain traversability analysis, collision avoidance, waypoint following) to support the high-level AI tasks. Users can conduct development and integration in provided simulation environments and eventually deploy on the real robot, which will be demoed at the workshop. Further, all of our autonomy stacks have been made public so that users can reproduce the systems to support their CV and AI algorithm development. The challenge will be open to anyone interested in the field to participate and we will summarize the results and invite the winning teams to present their work at the workshop. More information can be found on the challenge website.

As a result, the workshop will not only have four invited leading researchers present on their relevant work, but we will also invite the top challenge teams to present their work in the form of poster sessions and spotlight talks. The workshop will conclude with the speakers coming together for a panel session where attendees can ask questions to the speakers and engage in discussion.

Topics of interest span a diverse range across the field of robotics and other disciplines and include but are not limited to the following:

Tentative Program

Organizers and Speakers

Wenshan Wang
CMU Robotics Institute

Ji Zhang
CMU NREC & Robotics Institute

Haochen Zhang
CMU Robotics Institute

Shibo Zhao
CMU Robotics Institute

Abhinav Gupta
CMU Robotics Institute

Deva Ramanan
CMU Robotics Institute

Katerina Fragkiadaki
CMU Machine Learning Department

Xiaodan Liang
Sun Yat-sen University & MBZUAI

Andy Zeng
Google DeepMind

Dhruv Batra
Georgia Tech & Meta FAIR

Siyuan Huang
Beijing Institute for General Artificial Intelligence

Ayoung Kim
Seoul National University

Carlos Nieto-Granda
Army Research Laboratory