ECCV 2024 Workshop
Room Brown 2 (ID W22)
September 29, 2024 | MiCo Milano
Building safe and intelligent Autonomous Vehicles (AVs) capable of human-like reasoning is a challenging problem, pushing the limits of computer vision. Current AV systems struggle with diverse and unseen driving scenarios, necessitating a shift in research focus. Recently, multimodal large language models (MLLMs) have shown great promise in understanding human intent and solving complex problems. Such models not only showcase incredible capabilities in understanding human intent and solving complex and unstructured problems, but scale gracefully with data and compute. This workshop explores leveraging MLLMs to tackle key challenges in AV.
NVIDIA
The University of Hong Kong & Shanghai AI Lab
Tsinghua University
Wayve
Carnegie Mellon University
Time | Event |
---|---|
13:50 - 14:00 | Opening Remarks |
14:00 - 14:30 | Boris Ivanovic |
14:30 - 15:00 | Hongyang Li |
15:00 - 15:30 | Hang Zhao |
15:30 - 16:00 | Break |
16:00 - 16:30 | Long Chen |
16:30 - 17:00 | Katerina Fragkiadaki |
17:00 - 17:30 | Oral Session |
17:30 - 18:00 | Poster Session |
We welcome authors to submit their papers in two different formats: full-paper (4-8 8-14 pages) or short-abstract (2 4 pages). A full-paper should describe work that has not been published or accepted to another venue. A short-abstract can highlight work that has been published or accepted recently. Please use the ECCV 2024 paper template and follow the ECCV submission guidelines. Accepted papers will be posted on the website, but there will not be archival proceedings for this workshop.
The submission needs to be submitted to the CMT system: https://cmt3.research.microsoft.com/MLLMAV2024.
Title and Authors | Link |
---|---|
Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection Authors: Mehar Khurana, Neehar Peri, James Hays, Deva Ramanan |
|
Think-Driver: From Driving-Scene Understanding to Decision-Making with Vision Language Models Authors: Qiming Zhang, Meixin Zhu, Frank Yang |
|
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning Authors: Weijie Wei, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald |
|
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models Authors: Yi Yang, Qingwen Zhang, Kei IKEMURA, Nazre Batool, John Folkesson |
|
Distillation of Vision Language Models for Enhancing End-to-End Autonomous Driving Authors: Feng Tao, Abhirup Mallik, Chenbin Pan, Xin Ye, Yuliang Guo, Burhaneddin Yaman, Liu Ren |
New York University
NVIDIA & University of Southern California
Zhejiang University
AWS AI, Amazon & Columbia University
Carnegie Mellon University
Cornell University
NVIDIA & Technical University of Munich