Abstract
This research proposes a Multi-Agent Dynamic 3D Open-Vocabulary SLAM framework thatleverages advanced vision-language models (VLMs) and large language models (LLMs) toenhance robotic scene understanding and task planning in dynamic, unstructuredenvironments. By integrating 2D open-vocabulary segmentation with depth and multi-sensordata, the system builds a hierarchical 3D scene graph for …