Workflow
Multimodal Large Model
icon
Search documents
AI需求侧核心逻辑正式向多模态大模型延展:国产算力认知强化!Tokens消耗
Soochow Securities· 2025-10-08 01:27
Investment Rating - The report maintains an "Overweight" rating for the electronic industry [1] Core Insights - The investment logic for domestic computing power is evolving from the supply side to the demand side, with AI application demand becoming a new engine for "domestic computing power" [1] - The release of multi-modal large models marks a significant breakthrough, driving the growth of AI applications and consequently the demand for domestic computing power [5] - Key companies such as DeepSeek, Zhiyuan, and Alibaba are demonstrating advancements in AI model compatibility and performance, showcasing the collaborative capabilities of domestic computing power [1][5] Summary by Sections Industry Trends - The electronic industry is experiencing a growth trajectory, with significant advancements in AI capabilities and multi-modal applications [5] - The competition is shifting from single-language intelligence to multi-modal generation and understanding capabilities, with domestic companies rapidly catching up to international standards [5] Investment Recommendations - Recommended companies for cloud computing power include Cambrian, Haiguang Information, Chipone, Shengke Communication, and Zhaoyi Innovation, with a focus on companies like Aojie Technology and Yutai Micro [2] - For edge computing power, recommended companies include Amlogic, Rockchip, and Hengxuan Technology, with attention to companies like Lexin Technology [2] Key Company Valuations - Cambrian (688256) has a market cap of 554.31 billion, with a projected PE ratio of 325.55 for 2025E [7] - Haiguang Information (688041) has a market cap of 587.13 billion, with a projected PE ratio of 205.37 for 2025E [7] - Chipone (688521) has a market cap of 96.21 billion, with a projected PE ratio of -963.16 for 2025E [7] - Zhaoyi Innovation (603986) has a market cap of 142.33 billion, with a projected PE ratio of 86.01 for 2025E [7] - Amlogic (688099) has a market cap of 46.82 billion, with a projected PE ratio of 44.12 for 2025E [7] - Rockchip (603893) has a market cap of 94.89 billion, with a projected PE ratio of 89.15 for 2025E [7] - Hengxuan Technology (688608) has a market cap of 50.09 billion, with a projected PE ratio of 57.88 for 2025E [7]
合伙人招募!4D标注/世界模型/VLA/模型部署等方向
自动驾驶之心· 2025-09-27 23:33
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced models and technologies related to autonomous driving, such as large models, multimodal models, and 3D target detection [3] - Candidates are preferred to have a master's degree or higher from universities ranked within the QS200, with priority given to those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing related to job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives [5] - Opportunities for collaboration on entrepreneurial projects are also highlighted [5] - Interested parties are encouraged to contact via WeChat for further inquiries regarding collaboration in the autonomous driving field [6]
打算招聘几位大佬共创平台(4D标注/世界模型/VLA/模型部署等方向)
自动驾驶之心· 2025-09-25 07:36
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced technologies such as large models, multimodal models, and 3D target detection [3][4] - The article highlights the benefits of joining, including resource sharing for job seeking, PhD recommendations, and substantial cash incentives [5][6]
招聘几位大佬,打算共创平台(模型部署/VLA/端到端)
自动驾驶之心· 2025-09-04 08:42
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, research guidance, and hardware development [2][5] - The main areas of expertise sought include large models, multimodal models, diffusion models, SLAM, 3D object detection, and closed-loop simulation [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing for job seeking, PhD recommendations, and overseas study opportunities, along with substantial cash incentives [5] - There are opportunities for collaboration on entrepreneurial projects [5] - Interested parties are encouraged to contact via WeChat for further inquiries [6]
又有很多自动驾驶工作中稿了ICCV 2025,我们发现了一些新趋势的变化...
自动驾驶之心· 2025-08-16 00:03
Core Insights - The article discusses the latest trends and research directions in the field of autonomous driving, highlighting the integration of multimodal large models and vision-language action generation as key areas of focus for both academia and industry [2][5]. Group 1: Research Directions - The research community is concentrating on several key areas, including the combination of MoE (Mixture of Experts) with autonomous driving, benchmark development for autonomous driving, and trajectory generation using diffusion models [2]. - The closed-loop simulation and world models are emerging as critical needs in autonomous driving, driven by the limitations of real-world open-loop testing. This approach aims to reduce costs and improve model iteration efficiency [5]. - There is a notable emphasis on performance improvement in object detection and OCC (Occupancy Classification and Counting), with many ongoing projects exploring specific pain points and challenges in these areas [5]. Group 2: Notable Projects and Publications - "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation" is a significant project from Huazhong University of Science and Technology and Xiaomi, focusing on integrating vision and language for action generation in autonomous driving [5]. - "All-in-One Large Multimodal Model for Autonomous Driving" is another important work from Zhongshan University and Meituan, contributing to the development of comprehensive models for autonomous driving [6]. - "MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding" from Chongqing University aims to enhance understanding of driving scenarios through multimodal analysis [8]. Group 3: Simulation and Reconstruction - The project "Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images" from TUM focuses on advanced reconstruction techniques for autonomous driving [14]. - "CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving" from Fraunhofer IVI and TU Munich is another notable work that addresses dynamic scene reconstruction [16]. Group 4: Trajectory Prediction and World Models - "Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics" from Hong Kong University of Science and Technology and Didi emphasizes the importance of trajectory prediction in autonomous driving [29]. - "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model" from the Chinese Academy of Sciences focuses on developing a comprehensive world model for autonomous driving [32].
自动驾驶之『多模态大模型』交流群成立了!
自动驾驶之心· 2025-06-26 12:56
自动驾驶之心是国内领先的技术交流平台,关注自动驾驶前沿技术与行业、职场成长等。如果您的方向是 具身智能、视觉大语言模型、世界模型、端到端自动驾驶、扩散模型、车道线检测、2D/3D目标跟踪、 2D/3D目标检测、BEV感知、多模态感知、Occupancy、多传感器融合、transformer、大模型、点云处 理、在线地图、SLAM、光流估计、深度估计、轨迹预测、高精地图、NeRF、Gaussian Splatting、规划控 制、模型部署落地、自动驾驶仿真测试、产品经理、硬件配置、AI求职交流 等,欢迎加入自动驾驶之心大 家庭,一起讨论交流! 添加小助理微信加群 备注公司/学校+昵称+研究方向 ...
突破开放世界移动操作!首个室内移动抓取多模态智能体亮相,微调模型真实环境零样本动作准确率达 90%
机器之心· 2025-06-20 11:59
在家庭服务机器人领域,如何让机器人理解开放环境中的自然语言指令、动态规划行动路径并精准执行操作,一直是学界和工业界的核心挑战。 近日,上海人工智能实验室联合新加坡国立大学、香港大学等机构的研究团队,提出了 " OWMM-Agent " 具身智能体——首个专为开放世界移动操作 (OWMM)设计的多模态智能体 (VLM Agent) 架构,首次实现了全局场景理解、机器人状态跟踪和多模态动作生成的统一建模。 同时该工作通过仿真器合成智能体轨迹数据,微调了针对该任务的多模态大模型 OWMM-VLM,在真实环境测试下,该模型零样本单步动作预测准确率达 90%。 论文链接:https://arxiv.org/pdf/2506.04217 Github 主页:https://github.com/HHYHRHY/OWMM-Agent 一、问题背景介绍:开放语义下的移动抓取任务 传统移动抓取机器人在家庭场景处理 "清理餐桌并将水果放回碗中" 这类开放指令时,往往需要依赖预先构建的场景 3D 重建或者语义地图,不仅耗时且 难以应对动态环境。OWMM 任务的核心难点在于: 二、OWMM-Agent:用 VLM 重构机器人 "大脑 ...
2025年全球多模态大模型行业发展现状 AI服务器和算力发展推动市场爆发式增长【组图】
Qian Zhan Wang· 2025-04-22 07:44
行业主要上市公司:阿里巴巴(09988.HK,BABA.US);百度(09888.HK,BIDU.US);腾讯(00700.HK, TCEHY);科大讯飞(002230.SZ);万兴科技(300624.SZ);三六零(601360.SH);昆仑万维(300418.SZ);云从科技 (688327.SH);拓尔思(300229.SZ)等 本文核心数据:市场规模;算力;排名; 全球多模态大模型行业发展历程 全球大模型产业的发展经历了早期探索期(1956年-2005年),在这一阶段,人工智能学科诞生,神经网络模 型开始发展。随后进入快速成长期(2006年-2019年),深度学习概念被重新引入,Transformer等模型推动了 行业进步。2020年至2022年为大模型兴起期,参数规模迅速扩大,2022年更被视为大模型元年。从2023年 开始,大模型进入广泛应用期,其在各领域的深度应用得到不断拓展。这一过程并非严格分期,而是体现 了大模型技术发展的连续性和阶段性。 转自:前瞻产业研究院 全球人工智能服务器现状 由于大模型对计算能力和数据的高需求,其所需要的服务器设施将在人工智能基础设施市场中占据越来越 大的份额。ID ...