VLM（视觉语言模型） - filings, earnings calls, financial reports, news

VLM（视觉语言模型）

Search documents

Jinqiu Select | Physical Intelligence 联创：AI训练的真实数据不可替代

锦秋集· 2025-07-22 15:04

Core Viewpoint - Over-reliance on alternative data sources can severely limit the ultimate capabilities of models, and true breakthroughs must be built on real data [1][10] Group 1: The Dilemma of Alternative Data - Researchers in robotics often seek cheaper alternatives to real data due to high collection costs, leading to a compromise in model performance [2][3] - Common alternative methods include simulation training, learning from human videos, and using handheld devices to mimic robotic actions, but each method ultimately weakens the model's true potential [3][4] Group 2: Intersection Dilemma - The collection of data inevitably involves human judgment, which can limit the problem-solving approach when avoiding real data [4][6] - As models grow stronger, they can better distinguish between alternative and real data, leading to a smaller intersection of effective behaviors [6][7] Group 3: The Importance of Real Data - Attempting to bypass real data results in a "spork" scenario, where neither alternative data nor real data is effectively utilized [10][11] - To build robust robotic models that generalize well, real data is essential, but it can be complemented with diverse data sources [11][12] Group 4: The "Spork" Phenomenon - The concept of "spork" applies to various AI research areas, where attempts to combine manual design with learning systems ultimately create performance bottlenecks [13]

理想重押VLA，「端到端」模型负责人夏中谱将离职｜36氪独家

36氪· 2025-05-21 11:18

Core Viewpoint - The article discusses the recent departure of Xia Zhongpu, the head of the end-to-end model for assisted driving at Li Auto, and the implications of this change on the company's strategic direction towards the VLA (Vision-Language-Action) model for autonomous driving technology [3][7][14]. Summary by Sections Departure of Xia Zhongpu - Xia Zhongpu, who joined Li Auto in 2023 and was responsible for the planning and control model of the assisted driving system, is set to leave the company. His departure may be linked to a shift in Li Auto's technology strategy [5][7]. - Xia's rapid promotion from P9 to 21st level within two years is noted as unusual within the company [6]. Shift in Technology Strategy - Li Auto has transitioned its assisted driving technology from a reliance on high-precision maps and rule-based systems to an end-to-end model, and now to the VLA model [9][10]. - The VLA model, which Li Auto is now fully committed to, is seen as a more advanced approach that incorporates action capabilities, allowing for interaction with the physical world [12][14]. VLA Model Advantages - The VLA model is positioned as superior to the previous end-to-end model, as it combines 3D and 2D visual understanding with the ability to execute actions, aligning more closely with human operational methods [12]. - This model is part of a broader industry trend towards enhancing the world knowledge and reasoning capabilities of assisted driving systems, as seen in recent developments from competitors like NIO and XPeng [12]. Internal Changes and Future Outlook - The leadership within Li Auto's assisted driving team has also seen changes, with the head of the team, Lang Xianpeng, being promoted to a higher level, indicating a strengthening of resources towards the VLA model [9]. - Despite the enthusiasm for the VLA model, industry insiders caution that it is still in its early stages and has not undergone extensive practical application [13].

ICML Spotlight | MCU：全球首个生成式开放世界基准，革新通用AI评测范式

机器之心· 2025-05-13 07:08

Core Insights - The article discusses the development of the Minecraft Universe (MCU), a generative open-world platform designed to evaluate general AI agents in dynamic and non-predefined environments, addressing the limitations of existing assessment frameworks [1][2][6]. Group 1: Challenges in Current AI Assessment - Traditional testing benchmarks are limited to tasks with standard answers, which do not reflect the complexities of open-world environments like Minecraft [2]. - Existing Minecraft testing benchmarks face three major bottlenecks: limited task diversity, reliance on manual evaluation, and a lack of real-world complexity [3][6]. Group 2: Innovations of the Minecraft Universe (MCU) - MCU features 3,452 atomic tasks that can be infinitely combined, creating a vast task space that reflects real-world complexities [6]. - The platform supports fully automated task generation and multimodal intelligent assessment, significantly improving evaluation efficiency, with a scoring accuracy of 91.5% and an 8.1 times increase in assessment speed compared to manual methods [11][14]. - MCU includes high-difficulty and high-freedom "litmus test" tasks that deeply examine the generalization and adaptability of AI agents [16]. Group 3: Performance of Current AI Models - Current state-of-the-art (SOTA) models like GROOT, STEVE-I, and VPT show acceptable performance on simple tasks but struggle significantly with combinatorial tasks and unfamiliar configurations, revealing weaknesses in their spatial understanding and generalization capabilities [17][21]. - The evaluation results highlight a gap in the core abilities of AI agents in terms of generalization, adaptability, and creativity, indicating that they lack the autonomous problem-solving awareness seen in humans [22].

通用人工智能

开放世界AI

Minecraft Universe (MCU)

Minecraft Universe (MCU)

MineStudio

GROOT

STEVE-I