Autonomous Vehicles meet Multimodal Foundation Models

ECCV 2024 Workshop

September 29, 2024 | MiCo Milano

Overview

Building safe and intelligent Autonomous Vehicles (AVs) capable of human-like reasoning is a challenging problem, pushing the limits of computer vision. Current AV systems struggle with diverse and unseen driving scenarios, necessitating a shift in research focus. Recently, multimodal large language models (MLLMs) have shown great promise in understanding human intent and solving complex problems. Such models not only showcase incredible capabilities in understanding human intent and solving complex and unstructured problems, but scale gracefully with data and compute. This workshop explores leveraging MLLMs to tackle key challenges in AV.


Invited Speakers

Marco Pavone
Marco Pavone

Stanford University & NVIDIA

Hongyang Li
Hongyang Li

The University of Hong Kong & Shanghai AI Lab

Raquel Urtasun
Raquel Urtasun

Waabi & University of Toronto

Long Chen
Long Chen

Wayve

Katerina Fragkiadaki
Katerina Fragkiadaki

Carnegie Mellon University


Schedule

Time Event
13:00 - 13:30 Marco Pavone
13:30 - 14:00 Hongyang Li
14:00 - 14:30 Raquel Urtasun
14:30 - 15:00 Break
15:00 - 15:30 Long Chen
15:30 - 16:00 Katerina Fragkiadaki
16:00 - 16:30 Oral Session
16:30 - 17:30 Poster Session

Call for Papers

We welcome authors to submit their papers in two different formats: full-paper version (4-8 pages) and short-abstract version (2-pages). The full-paper should describe the work that has not been published or accepted recently. The short-abstract highlights the significant work that has been published or accepted recently. Please use the ECCV 2024 paper template and follow the ECCV submission guidelines. Accepted papers will be posted on the website, but there will not be archival proceedings.

The submission needs to be submitted to the CMT system: https://cmt3.research.microsoft.com/MLLMAV2024.

Topics

  • General system design of MLLMs for AV: The integration of MLLMs into AVs necessitates a reevaluation of data collection and usage, training and evaluation methodologies, and the overall system architecture.
  • Perception: How can we leverage MLLMs to build more robust and powerful perception models in AV? Can we have a systematic way to deal with "tail" examples that are hard for traditional methods but easy for a human driver?
  • Motion prediction: Can we use MLLMs to better understand the intents of other traffic participants and accurately forecast the movements of them?
  • Trajectory planning: Can we use MLLMs to enable more sophisticated and adaptable planning algorithms that account for a wider range of variables and scenarios, leading to safer and more efficient navigation?
  • Simulation and world models: Can MLLMs help generate more realistic and comprehensive simulation environments or build world models?
  • End-to-end solutions: Can MLLMs play a crucial role in end-to-end AV solutions?
  • Testing and safety: Can MLLMs make our AV system safer?

Important Dates

  • Submission Open: June 25, 2024
  • Submission Deadline: August 15, 2024, 11:59 PM Pacific Time
  • Acceptance Decision: September 3, 2024
  • Camera Ready Deadline: September 20, 2024

Organizers

Yan Wang
Yan Wang

NVIDIA

Yurong You
Yurong You

NVIDIA

Kashyap Chitta
Kashyap Chitta

University of Tübingen

Yue Wang
Yue Wang

NVIDIA & University of Southern California

Yiyi Liao
Yiyi Liao

Zhejiang University

Li Erran Li
Li Erran Li

AWS AI, Amazon & Columbia University

Deva Ramanan
Deva Ramanan

Carnegie Mellon University

Kilian Q. Weinberger
Kilian Q. Weinberger

Cornell University

Laura Leal-Taixe
Laura Leal-Taixe

NVIDIA & Technical University of Munich