AI Meets Autonomy: Vision, Language, and Autonomous Systems
An IROS 2024 Workshop
Event Details
This workshop is held at the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), in Abu Dhabi, UAE on the morning of Oct. 14, 2024 from 9:00AM - 1:00PM GST in Room 17.
With the rising popularity of Large Language Models (LLMs), Visual Language Models (VLMs), and other general foundation models, the workshop aims to explore the synergy between them and robotics. In the workshop, we would like to discuss how recent advances from the AI and CV communities could benefit robotics research - incorporating LLMs and VLMs in robotic systems could potentially make the systems more explainable, instructable, and generalizable, however, seamlessly integrating the two is still an open problem. The existing LLMs and VLMs fall short of the real-world knowledge that roboticists need to consider, such as the physical properties, spatial relations, temporal causal relations, etc. in an environment. These issues may be alleviated by integration with robotic systems, where real-world experiences that contain rich sensory input and physical feedback can potentially bring AI to the next level.
The workshop aims to create an inclusive and collaborative platform for professionals, researchers, and enthusiasts alike to exchange ideas, share experiences, and foster meaningful connections within the AI and robotics community. It will feature a mix of presentations, open panel discussions, and results and demos from our Vision-Language-Autonomy competition challenge. Four invited speakers will discuss their related research, thoughts, and experiences in various aspects at the intersection of AI and autonomous systems, with a broad coverage of topics such as: available datasets, benchmarks, software stacks, visual-language navigation, spatial reasoning, robot foundation models, and more.
Content
There are a variety of ongoing efforts to improve the state-of-the-art in robot autonomy with the help of recent advances in AI, including new developments in areas such as human-robot interaction, vision-language navigation, multi-task planning, and knowledge acquisition and reasoning. This workshop will provide a common meeting point for different researchers and stakeholders working on the same problems to communicate with each other about how their philosophies are shaping the current solutions. In doing so, it will not only provide new discussion points and insights for established researchers but also motivate less-experienced researchers to get more involved in the field. This in turn will reflect on the improvement of the numerous applications that AI-powered autonomy can enable such as in healthcare and rehabilitation, human-robot collaboration, and industrial automation.
Although integrating AI and autonomy systems has multiple advantages, it also involves issues and challenges. First, the language or visual foundation models are shown to be insufficient in understanding the actual physics, spatial relationships, and causal relationships, which are critical to most of the robotic tasks. Second, deploying those large models on robotic systems requires intense GPU resources, especially if the processing needs to be real-time. Further, existing work combining LLMs/VLMs and robotics solely uses LLMs/VLMs as a tool to translate human instructions into executable plans. In reality, humans can provide more than instructions - such as explanation of the scene and correction of execution errors, in richer contexts such as multi-turn dialogues. The speakers and organizers in this workshop are leading researchers from diverse backgrounds. The talks and discussions will not only give us a concrete picture of the current research landscape but also promote new insights and directions.
Another key barrier to entry for AI and robotics research is the resources required for acquiring data and testing algorithms. Robust robot platforms with rich sensing modalities are required to collect data and test the algorithms. To lower the barrier, we plan to host the CMU Vision-Language-Autonomy Challenge over the span of a few months before IROS 2024 and present the first stage of results at the workshop. We will release a novel vision-language grounding dataset together with an award-winning navigation autonomy system. Specifically, the challenge will use a robot with a 3D LiDAR and a 360 camera. The system has base autonomy onboard (e.g. state estimation, terrain traversability analysis, collision avoidance, waypoint following) to support the high-level AI tasks. Users can conduct development and integration in provided simulation environments and eventually deploy on the real robot, which will be demoed at the workshop. Further, all of our autonomy stacks have been made public so that users can reproduce the systems to support their CV and AI algorithm development. The challenge will be open to anyone interested in the field to participate and we will summarize the results and invite the winning teams to present their work at the workshop. More information can be found on the challenge website.
As a result, the workshop will not only have four invited leading researchers present on their relevant work, but we will also invite the top challenge teams to present their work in the form of poster sessions and spotlight talks. The workshop will conclude with the speakers coming together for a panel session where attendees can ask questions to the speakers and engage in discussion.
Topics of interest span a diverse range across the field of robotics and other disciplines and include but are not limited to the following:
Vision-Language-Action datasets and models
Foundation models for navigation
Human-robot interaction for navigation
Human-robot collaborations
Social navigation
Embodied intelligence
Bridging sim-to-real gap
Mixed-initiative autonomy
Spatial and causal reasoning
Robot autonomy stacks
Program
Talk Recordings
Invited Speakers:
Angel Chang: Towards Intelligent Agents That Understand Language in 3D
Luca Weihs: Generalizable Robotic Agents Through Large Scale Simulation
Luca Carlone: Language-Enabled Metric-Semantic World Models
Xiaodan Liang: Building World Model for Generic Robot Manipulation
Siyuan Huang: Understanding the 3D World for General Agent
Top Challenge Participant:
Organizers and Speakers
Wenshan Wang
CMU Robotics Institute
Ji Zhang
CMU NREC & Robotics Institute
Haochen Zhang
CMU Robotics Institute
Shibo Zhao
CMU Robotics Institute
Abhinav Gupta
CMU Robotics Institute
Deva Ramanan
CMU Robotics Institute
Angel Chang
Simon Fraser University
Xiaodan Liang
Sun Yat-sen University & MBZUAI
Siyuan Huang
Beijing Institute for General Artificial Intelligence
Luca Carlone
Massachusetts Institute of Technology
Luca Weihs
Allen Institute for AI
Andy Zeng
Google DeepMind