IROS Workshop 2025 (Upcoming)

2nd AI Meets Autonomy: Vision, Language, and Autonomous Systems Workshop

IROS 2025, Hangzhou, China

Friday, Oct. 24th, 1:30PM - 5:00PM, Room 210C.

Event Details

This workshop is held in the afternoon of Friday, Oct. 24, 2025 from 1:30PM - 5:00PM in Room 210C.

With the rising popularity of Large Language Models (LLMs), Visual Language Models (VLMs), and other general foundation models, the 2nd iteration of this workshop aims to explore the synergy between these models and robotics, in the context of recent developments. In the workshop, we will discuss how recent advances from the AI and CV communities could benefit robotics research – incorporating LLMs and VLMs in robotic systems could potentially make the systems more explainable, instructable, and generalizable. However, seamlessly integrating the two is still an open problem, as existing LLMs and VLMs often fall short of the real-world knowledge that roboticists need to consider, such as the physical properties, spatial relations, temporal relations, etc. in an environment. These issues may be alleviated by integration with robotic systems, where real-world experiences that contain rich sensory input and physical feedback can potentially bridge the gap for physically grounded systems.

This workshop aims to create an inclusive and collaborative platform for professionals, researchers, and enthusiasts alike to exchange ideas, share experiences, and foster meaningful connections within the AI and robotics community, with a focus on connecting early-career researchers. It will feature a mix of presentations, open panel discussions, networking, and exclusive results and demonstrations from our CMU Vision-Language-Autonomy challenge competition. Four invited speakers will discuss their related research, ideas, and future plans in various topics at the intersection of AI and autonomous systems, with a broad coverage of topics such as: datasets/benchmarks, software stacks, visual-language navigation, situated reasoning, robotics foundation models, and more.

See AI Meets Autonomy 2024 for the previous iteration of our workshop at IROS 2024 and past talk recordings.

Content

There are a variety of ongoing efforts to improve the state-of-the-art in robot autonomy using recent advances in AI, particularly using large foundation models, including new developments in areas such as human-robot interaction, vision-language navigation, multi-task planning, and knowledge acquisition and reasoning. Although integrating AI and autonomous systems has multiple advantages, it also involves issues and challenges. First, large language or visual foundation models are shown to be insufficient in understanding the actual physics, spatial relationships, and causal relationships, lacking the physical grounding that’s critical to most robotic tasks. Second, deploying these large models on robotic systems requires intense GPU resources, especially if the processing needs to be real-time. Further, existing work combining LLMs/VLMs and robotics focuses on using LLMs/VLMs as a tool to translate human instructions into executable plans. In reality, humans can provide more than instructions—such as explanation of the scene and correction of execution errors, and in richer contexts such as multi-turn dialogues. The workshop will focus on the current research work being done to tackle such challenges in the field of robotics with a diverse line-up of speakers. The speakers will each share their recent work and experiences with integrating vision and language methods into various autonomous systems and their active research projects in this area from the perspective of both academia and industry. The speakers will cover a wide range of topics they have expertise in, such as generalist embodied agents, vision-language-navigation, world modeling, robotics foundation models, and situated dialogue agents. Their talks and subsequent discussions will not only give a concrete picture of the current and rapidly developing research landscape, but also promote new insights and directions for future work. Q&A sessions after each talk as well as a panel discussion at the end of the workshop will allow the audience to engage with speakers and discuss ideas.

Another key barrier to integrating vision and language learning methods into robotics systems is the resources required for testing new methods, including the data needed for training and evaluation, or robust real-world robot platforms to test deployment. To lower the barrier, we have been hosting the CMU Vision-Language-Autonomy Challenge. In the challenge, we provide a set of language questions, sample training environments, and a full pipeline for integrating any vision-language method for object-centric indoor navigation with a real-world ground vehicle running an autonomy stack with an onboard 3D LiDAR sensor and 360-degree camera. Challenge participants can conduct development and integration in a simulator and eventually deploy on our real robot system, which will be demoed at the workshop. All code, autonomy stacks, and resources used for the challenge are open-sourced and the challenge is open to anyone. The winning teams will be presenting their methods at the workshop.

In summary, the workshop will not only have invited leading researchers present on their relevant work, but we will also invite the top challenge teams to present their results in the form of poster sessions and short spotlight talks, and include an interactive demo of the challenge system. Speakers will also engage in a panel session where attendees can ask questions to the speakers and engage in discussion. The workshop will include networking opportunities and the ability for attendees to share their profile and resume in a compiled contact book to encourage forming connections beyond the span of the workshop.

The topics we expect to cover through talks, challenge results, panel discussions, and networking span an incredibly diverse range across the field of robotics while also intersecting with many other disciplines. It will demonstrate how the integration of robotics, language and vision can and will enhance robotics research in a variety of real-world application areas.

Topics of discussion and open questions include but are not limited to the following: