Welcome to the IROS 2025 RoboGen Workshop on World Understanding and Generation!
RoboGen focuses on 2 aspects: multimodal world understanding and generation, because we believe these two problems are tightly bound, only if understanding is done right, then the generation is reasonable. Recent advances in 3D Vision (NeRF, Gaussian Splatting) and multimodal foundation models (LLM, VLM, diffusion and flow-based model) are transforming robotics by enabling the creation of high quality data for training, testing, and validation. Despite impressive capabilities of learning-based robotics in embodied AI, autonomous driving, unmanned aerial navigation, progress in generalizable systems remains constrained by the fundamental challenge of data acquisition.
Building towards AGI, RoboGen brings together researchers and industry experts to address this data bottleneck for scaling law. The goal is to develop innovative approaches that leverage recent advances in 3D scene generation and understanding methods to . Our workshop emphasizes practical applications and solutions to long-tail data problems in challenging robotics scenarios, with the goal of empowering 3D deep learning systems for real-world deployment.
The RoboGen workshop aims to advance 3D world generation for robotics with four key objectives:
- (1) Understanding: advance multimodal world understanding and spatial-temporal reasoning through vision-language models,
- (2) Representation: explore video diffusion and flow-based models conditioned on sensor geometry cues for physical 3D world representation,
- (3) Action: democratize access to high-quality synthetic data generation to train robust vision-language-action models.
- (4) Scaling: enable practical deployment of generalizable robot learning systems overcoming the data scaling law bottleneck.