理想司机Agent的一些细节

Core Viewpoint - The article discusses the advancements in the AD Max driver agent product, focusing on its capabilities in closed park and underground garage scenarios, emphasizing multi-modal information integration for decision-making. Group 1: Product Definition and Capabilities - The AD Max driver agent has achieved full model-based trajectory output, differing significantly from previous AVP product experiences, providing a driving experience that closely resembles that of human drivers in specific environments [1] - The agent can understand road signs and engage in voice interactions, utilizing both local multi-modal LLM for simple commands and cloud-based large-scale LLM for complex instructions [1][2] - The agent builds associative points rather than precise maps, allowing it to navigate based on general driving structures, similar to human behavior in underground garages [2] Group 2: Perception and Reasoning Abilities - The AD Max agent integrates data from various sensors, including cameras and LiDAR, to achieve comprehensive environmental perception capabilities [2] - The agent demonstrates the ability to remember associative points, enabling it to navigate without needing to roam through the area again, and can adapt if the memory is incorrect [3] Group 3: Industry Comparison - The AD Max driver agent and NIO AD's NWM are highlighted as the only two applications currently integrating multi-modal perception information into a single model for complex reasoning [3]