视觉语言模型（VLM） - filings, earnings calls, financial reports, news - Reportify

视觉语言模型（VLM）

Search documents

性能超越GPT和Google，北京人形机器人创新中心开源全球最强具身VLM

具身智能之心· 2025-11-17 00:47

作者丨咖啡不加糖编辑丨焉知机器人点击下方卡片，关注" 具身智能之心 "公众号 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球（戳我），这里包含所有你想要的! 2025 年 11 月 14 日，北京具身智能机器人创新中心正式发布 Pelican-VL 1.0 具身视觉语言模型（ VLM ），不仅宣称性能超越 GPT-5 同类模型和 Google Gemini 系列，更以 " 全球最大规模开源具身多模态大模型 " 的身份，展示了中国在具身智能领域的技术硬实力。具身智能，简单来说就是让机器人像人类一样感知世界、做出决策并执行动作的技术，而视觉语言模型（ VLM ）相当于机器人的 " 眼睛 " 和 " 大脑中枢 " ，负责把看到的图像信息转化为可理解的语言指令，再规划出具体的行动步骤。图 Pelican-VL 1.0 （中文是塘鹅或者鹈鹕的意思）在抱脸虫和魔搭都可下载 Pelican-VL 1.0 称为 " 视觉语言大脑 " ，它的开源有力推动了具身智能技术的进步。一、北京人形机器人创新中心和 Pelican-VL ...

视觉语言模型（VLM）

DPPO（刻意训练）

Pelican-VL 1.0具身视觉语言模型

通用机器人母平台“天工”系列

视觉语言模型（VLM）

DPPO（刻意训练）

Pelican-VL 1.0具身视觉语言模型

通用机器人母平台“天工”系列

宾夕法尼亚大学！MAESTRO：基于VLM的零样本通用机器人框架

具身智能之心· 2025-11-05 00:02

点击下方卡片，关注" 具身智能之心 "公众号作者丨 Junyao Shi等编辑丨具身智能之心本文只做学术分享，如有侵权，联系删文 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。 MAESTRO 是一种以视觉语言模型（VLM）为核心的模块化机器人框架，通过动态组合感知、规划、控制等专用模块，在无需大规模机器人训练数据的情况下，实现了超越现有视觉语言动作（VLA）模型的零样本操作性能，同时具备可扩展性、可调试性等优势。论文链接：https://arxiv.org/pdf/2511.00917 核心架构与关键设计 1. 整体框架 MAESTRO 以VLM编码代理为核心，接收语言指令和场景图像后，动态编写代码组合工具模块，形成程序化策略。框架采用闭环交互机制，在执行过程中持续监控环境反馈，实时调整代码和动作，构成"感知-动作-学习"的自适应循环。利用VLM已有的强大通用能力，避免对机器人专属数据的依赖；通过模块化设计整合机器人领域成熟的专用工具，弥补VLM在低级别操作上的不足；突破传统模 ...

视觉语言模型（VLM）

零样本通用机器人框架

视觉语言模型（VLM）

零样本通用机器人框架

跨行转入自动驾驶大厂的经验分享

自动驾驶之心· 2025-11-04 00:03

Core Insights - The article emphasizes the importance of seizing opportunities and continuous learning in the rapidly evolving field of autonomous driving [1][4] - It highlights the creation of a comprehensive community platform, "Autonomous Driving Heart Knowledge Planet," aimed at facilitating knowledge sharing and career development in the autonomous driving sector [4][16] Group 1: Career Development - Transitioning to the autonomous driving industry can be successful through dedication and preparation, as illustrated by the experience of a professional who switched careers and excelled in various roles [1] - Continuous learning and adapting to industry trends are crucial for career advancement, as demonstrated by the professional's progression from algorithm evaluation to advanced safety algorithms [1] Group 2: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" has over 4,000 members and aims to grow to nearly 10,000 in two years, providing a platform for discussion, technical sharing, and job opportunities [4][16] - The community offers a variety of resources, including video content, learning pathways, and Q&A sessions, to support both beginners and advanced learners in the autonomous driving field [7][10] Group 3: Technical Learning and Networking - The community organizes discussions with industry experts on various topics, including entry points for end-to-end autonomous driving and the integration of multi-sensor fusion [8][20] - Members have access to a wealth of technical routes and resources, including over 40 technical pathways and numerous datasets relevant to autonomous driving [10][36] Group 4: Job Opportunities - The community facilitates job referrals and connections with leading companies in the autonomous driving sector, enhancing members' chances of securing positions in the industry [11][12] - Regular updates on job openings and industry trends are provided, helping members stay informed about potential career advancements [21][93]

自动驾驶世界模型

视觉语言模型（VLM）

自动驾驶VLA

端到端自动驾驶

自动驾驶世界模型

视觉语言模型（VLM）

自动驾驶VLA

端到端自动驾驶

世界模型==VQA？机器人不用想象画面，预测语义就够了

机器之心· 2025-10-28 00:41

Core Insights - The article discusses the necessity of precise future predictions in world models for AI, questioning whether detailed visual representations are essential for decision-making [1][6] - It introduces the concept of the Semantic World Model (SWM), which focuses on predicting semantic information about future outcomes rather than generating visual frames [9][18] Summary by Sections World Models and Their Limitations - World models enable AI to learn the dynamics of the world and predict future events based on current states [6] - Traditional models often generate realistic images but may miss critical semantic details necessary for decision-making [7][8] Semantic World Model (SWM) - SWM redefines world modeling as a visual question-answering (VQA) problem, focusing on task-relevant interactions rather than raw visual data [8][9] - SWM utilizes a visual language model (VLM) to answer questions about future actions and their semantic effects [9][11] Training and Data Generation - SWM can be trained on low-quality sequence data, including both expert and non-expert data, making it versatile [15] - A dataset called SAQA (State-Action-Question-Answer) is generated to train the model effectively [22] Experimental Results - SWM demonstrated high accuracy in answering future outcome questions and showed generalization capabilities in new scenarios [17] - In multi-task simulations, SWM significantly improved performance compared to baseline models, achieving success rates of 81.6% in LangTable and 76% in OGBench [30][34] Generalization and Robustness - SWM retains the generalization capabilities of the underlying VLM, showing improvements in performance even with new object combinations and background changes [39][41] - The model's attention mechanisms focus on task-relevant information, indicating its ability to generalize across different scenarios [41]

视觉问答（VQA）

语义世界模型（SWM）

视觉语言模型（VLM）

视觉问答（VQA）

语义世界模型（SWM）

视觉语言模型（VLM）

做了几期线上交流，我发现大家还是太迷茫

自动驾驶之心· 2025-10-24 00:04

Core Viewpoint - The article emphasizes the establishment of a comprehensive community called "Autonomous Driving Heart Knowledge Planet," aimed at providing a platform for knowledge sharing and networking in the autonomous driving industry, addressing the challenges faced by newcomers in the field [1][3][14]. Group 1: Community Development - The community has grown to over 4,000 members and aims to reach nearly 10,000 within two years, providing a space for technical sharing and communication among beginners and advanced learners [3][14]. - The community integrates various resources including videos, articles, learning paths, Q&A, and job exchange, making it a comprehensive hub for autonomous driving enthusiasts [3][5]. Group 2: Learning Resources - The community has organized over 40 technical learning paths, covering topics such as end-to-end autonomous driving, multi-modal large models, and data annotation practices, significantly reducing the time needed for research [5][14]. - Members can access a variety of video tutorials and courses tailored for beginners, covering essential topics in autonomous driving technology [9][15]. Group 3: Industry Insights - The community regularly invites industry experts to discuss trends, technological advancements, and production challenges in autonomous driving, fostering a serious content-driven environment [6][14]. - Members are encouraged to engage with industry leaders for insights on job opportunities and career development within the autonomous driving sector [10][18]. Group 4: Networking Opportunities - The community facilitates connections between members and various autonomous driving companies, offering resume forwarding services to help members secure job placements [10][12]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from experienced professionals in the field [87][89].

端到端自动驾驶

自动驾驶世界模型

视觉语言模型（VLM）

自动驾驶VLA

端到端自动驾驶

自动驾驶世界模型

视觉语言模型（VLM）

自动驾驶VLA

执行力是当下自动驾驶的第一生命力

自动驾驶之心· 2025-10-17 16:04

Core Viewpoint - The article discusses the evolving landscape of the autonomous driving industry in China, highlighting the shift in competitive dynamics and the increasing investment in autonomous driving technologies as a core focus of AI development [1][2]. Industry Trends - The autonomous driving sector has undergone significant changes over the past two years, with new players entering the market and existing companies focusing on improving execution capabilities [1]. - The industry experienced a flourishing period before 2022, where companies with standout technologies could thrive, but has since transitioned into a more competitive environment that emphasizes addressing weaknesses [1]. - Companies that remain active in the market are progressively enhancing their hardware, software, AI capabilities, and engineering implementation to survive and excel [1]. Future Outlook - By 2025, the industry is expected to enter a "calm period," where unresolved technical challenges in areas like L3, L4, and Robotaxi will continue to present opportunities for professionals in the field [2]. - The article emphasizes the importance of comprehensive skill sets for individuals in the autonomous driving sector, suggesting that those with a short-term profit mindset may not endure in the long run [2]. Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community has been established to provide a comprehensive platform for learning and sharing knowledge in the autonomous driving field, featuring over 4,000 members and aiming for a growth to nearly 10,000 in the next two years [4][17]. - The community offers a variety of resources, including video content, learning pathways, Q&A sessions, and job exchange opportunities, catering to both beginners and advanced learners [4][6][18]. - Members can access detailed technical routes and practical solutions for various autonomous driving challenges, significantly reducing the time needed for research and learning [6][18]. Technical Focus Areas - The community has compiled over 40 technical routes related to autonomous driving, covering areas such as end-to-end learning, multi-modal models, and various simulation platforms [18][39]. - There is a strong emphasis on practical applications, with resources available for data processing, 4D labeling, and engineering practices in autonomous driving [12][18]. Job Opportunities - The community facilitates job opportunities by connecting members with openings in leading autonomous driving companies, providing a platform for resume submissions and internal referrals [13][22].

端到端自动驾驶

自动驾驶世界模型

视觉语言模型（VLM）

自动驾驶VLA

端到端自动驾驶

自动驾驶世界模型

视觉语言模型（VLM）

自动驾驶VLA

突然发现，新势力在集中IPO......

自动驾驶之心· 2025-10-06 04:05

Group 1 - The article highlights a surge in IPO activities within the autonomous driving sector, indicating a significant shift in the industry landscape with new players entering the market [1][2] - Key events include the acquisition of Shenzhen Zhuoyu Technology by China First Automobile Works, Wayve's partnership with NVIDIA for a $500 million investment, and multiple companies filing for IPOs or completing strategic investments [1] - The article discusses the intense competition in the autonomous driving field, suggesting that many companies are pivoting towards embodied AI as a response to market saturation [1][2] Group 2 - The article emphasizes the importance of comprehensive skill sets for professionals remaining in the autonomous driving industry, as the market is expected to undergo significant restructuring [2] - It mentions the creation of a community platform, "Autonomous Driving Heart Knowledge Planet," aimed at providing resources and networking opportunities for individuals interested in the field [3][19] - The community offers a variety of learning resources, including video tutorials, technical discussions, and job placement assistance, catering to both beginners and experienced professionals [4][11][22] Group 3 - The community has gathered over 4,000 members and aims to expand to nearly 10,000 within two years, focusing on knowledge sharing and technical collaboration [3][19] - It provides structured learning paths and resources for various topics in autonomous driving, including end-to-end learning, multi-sensor fusion, and real-time applications [19][39] - The platform also facilitates discussions on industry trends, job opportunities, and technical challenges, fostering a collaborative environment for knowledge exchange [20][91]

视觉语言模型（VLM）

自动驾驶VLA

视觉语言模型（VLM）

自动驾驶VLA

有人在自驾里面盲目内卷，而有的人在搭建真正的壁垒...

自动驾驶之心· 2025-09-29 23:33

Core Viewpoint - The automotive industry is undergoing a significant transformation, with numerous executive changes and a focus on advanced technologies such as autonomous driving and artificial intelligence [1][3]. Group 1: Industry Changes - In September, 48 executives in the automotive sector underwent changes, indicating a shift in leadership and strategy [1]. - Companies like Li Auto and BYD are restructuring their teams to enhance their capabilities in autonomous driving and cockpit technology [1]. - The industry is witnessing a rapid evolution in algorithm development, moving from BEV to more complex models like VLA and world models [1][3]. Group 2: Autonomous Driving Focus - The forefront of autonomous driving technology is centered on VLA/VLM, end-to-end driving, world models, and reinforcement learning [3]. - There is a notable gap in understanding the industry's actual progress among students and mid-sized companies, highlighting the need for better communication between academia and industry [3]. Group 3: Community and Knowledge Sharing - A community called "Autonomous Driving Heart Knowledge Planet" has been established to bridge the gap between academic and industrial knowledge, aiming to grow to nearly 10,000 members in two years [5]. - The community offers a comprehensive platform for learning, including video content, Q&A, and job exchange, catering to both beginners and advanced learners [6][10]. - Members can access over 40 technical routes and engage with industry leaders to discuss trends and challenges in autonomous driving [6][8]. Group 4: Learning Resources - The community provides various resources for practical questions related to autonomous driving, such as entry points for end-to-end systems and data annotation practices [6][11]. - A detailed curriculum is available for newcomers, covering essential topics in autonomous driving technology [20][21]. - The platform also includes job referral mechanisms to connect members with potential employers in the autonomous driving sector [13][14].

端到端自动驾驶

视觉语言模型（VLM）

自动驾驶汽车

端到端自动驾驶

视觉语言模型（VLM）

自动驾驶汽车

具身智能，为何成为智驾公司的下一个战场？

雷峰网· 2025-09-26 04:17

Core Viewpoint - Embodied intelligence is emerging as the next battleground for smart driving entrepreneurs, with significant investments and developments in the sector [2][4]. Market Overview - The global embodied intelligence market is on the verge of explosion, with China's market expected to reach 5.295 billion yuan by 2025, accounting for approximately 27% of the global market [3][21]. - The humanoid robot market is projected to reach 8.239 billion yuan, representing about 50% of the global market [3]. Industry Trends - Several smart driving companies, including Horizon Robotics and Zhixing Technology, are strategically investing in embodied intelligence through mergers, acquisitions, and subsidiary establishments to seize historical opportunities [4]. - The influx of talent from the smart driving sector into embodied intelligence has been notable since 2022, with many professionals making the transition in 2023 [13]. Technological Integration - The integration of smart driving and embodied intelligence is based on the concept of "embodied cognition," where intelligent behavior is formed through continuous interaction with the physical environment [6]. - The technical pathways for both fields are highly aligned, with smart driving vehicles functioning as embodied intelligent agents through multi-sensor perception, algorithmic decision-making, and control systems [6]. Technical Framework - The technical layers of smart driving applications and their migration to embodied intelligence include: - Perception Layer: Multi-sensor fusion for environmental modeling and object recognition [7]. - Decision Layer: Path planning and behavior prediction for task planning and interaction strategies [7]. - Control Layer: Vehicle dynamics control for motion control and execution [7]. - Simulation Layer: Virtual scene testing for skill learning and adaptive training [7]. Investment and Growth Potential - The embodied intelligence market is expected to maintain a growth rate of over 40% annually, providing a valuable channel for smart driving companies facing growth bottlenecks [21]. - The dual development pattern of humanoid and specialized robots allows smart driving companies to leverage their technological strengths for market entry [22]. Profitability Insights - The gross profit margins for embodied intelligence products are generally higher than those for smart driving solutions, with professional service robots achieving margins over 50%, compared to 15-25% for autonomous driving kits [23][25]. - This profit difference arises from the stronger differentiation and lower marginal costs of embodied intelligence products, allowing for rapid market entry and reduced development costs [25]. Future Outlook - The boundaries between smart driving and embodied intelligence are increasingly blurring, with companies like Tesla viewing autonomous vehicles as "wheeled robots" and developing humanoid robots based on similar AI architectures [26]. - Early movers in this transition are likely to secure advantageous positions in the future intelligent machine ecosystem [26].

端到端模型

视觉语言模型（VLM）

视觉语言动作端到端模型（VLA）

端到端模型

视觉语言模型（VLM）

视觉语言动作端到端模型（VLA）

机器人指数ETF（560770）逆市翻红，当前科技行情进展到哪里了？

2 1 Shi Ji Jing Ji Bao Dao· 2025-09-02 06:17

Core Viewpoint - The A-share market experienced a pullback with all three major indices declining, while the robotics sector showed resilience with significant gains in related stocks and ETFs [1][2]. Market Performance - As of September 2, the A-share market saw a rapid increase in trading volume, surpassing 2 trillion yuan, marking the 15th consecutive trading day above this threshold [1]. - The TMT (Technology, Media, and Telecommunications) sector accounted for approximately 40% of total trading volume, indicating strong market interest [1]. Robotics Industry Insights - The robotics industry is accelerating due to continuous technological advancements and the realization of industrial orders, with significant orders such as a 124 million yuan contract from China Mobile marking a shift towards large-scale production [3][6]. - The integration of AI language models and multi-modal sensor technology is enhancing the capabilities of humanoid robots, improving their understanding and perception [3]. Investment Opportunities - The robotics sector is highlighted as a potential area for investment, particularly in sub-sectors like semiconductors and battery technology, which have shown resilience and potential for future growth [6]. - The Robot Index ETF (560770) tracks the robotics industry and includes major companies such as Huichuan Technology and iFlytek, indicating a diversified exposure to the sector [6][7]. Future Projections - According to forecasts, the number of humanoid robots in use in China could exceed 100 million by 2045, with a market size reaching approximately 10 trillion yuan, covering various applications from industrial manufacturing to healthcare [7]. Fund Management Perspective - The fund manager of the Robot Index ETF believes that the robotics industry is in a rapid development phase, with increasing capital allocation, suggesting a positive outlook for future investments [8].

SIASUN(SZ:300024)

机器人概念

科技成长行情

AI大语言模型（LLM）

视觉语言模型（VLM）

机器人概念

科技成长行情

AI大语言模型（LLM）

视觉语言模型（VLM）