Workflow
Multimodal Large Model
icon
Search documents
AI需求侧核心逻辑正式向多模态大模型延展:国产算力认知强化!Tokens消耗
Soochow Securities· 2025-10-08 01:27
Investment Rating - The report maintains an "Overweight" rating for the electronic industry [1] Core Insights - The investment logic for domestic computing power is evolving from the supply side to the demand side, with AI application demand becoming a new engine for "domestic computing power" [1] - The release of multi-modal large models marks a significant breakthrough, driving the growth of AI applications and consequently the demand for domestic computing power [5] - Key companies such as DeepSeek, Zhiyuan, and Alibaba are demonstrating advancements in AI model compatibility and performance, showcasing the collaborative capabilities of domestic computing power [1][5] Summary by Sections Industry Trends - The electronic industry is experiencing a growth trajectory, with significant advancements in AI capabilities and multi-modal applications [5] - The competition is shifting from single-language intelligence to multi-modal generation and understanding capabilities, with domestic companies rapidly catching up to international standards [5] Investment Recommendations - Recommended companies for cloud computing power include Cambrian, Haiguang Information, Chipone, Shengke Communication, and Zhaoyi Innovation, with a focus on companies like Aojie Technology and Yutai Micro [2] - For edge computing power, recommended companies include Amlogic, Rockchip, and Hengxuan Technology, with attention to companies like Lexin Technology [2] Key Company Valuations - Cambrian (688256) has a market cap of 554.31 billion, with a projected PE ratio of 325.55 for 2025E [7] - Haiguang Information (688041) has a market cap of 587.13 billion, with a projected PE ratio of 205.37 for 2025E [7] - Chipone (688521) has a market cap of 96.21 billion, with a projected PE ratio of -963.16 for 2025E [7] - Zhaoyi Innovation (603986) has a market cap of 142.33 billion, with a projected PE ratio of 86.01 for 2025E [7] - Amlogic (688099) has a market cap of 46.82 billion, with a projected PE ratio of 44.12 for 2025E [7] - Rockchip (603893) has a market cap of 94.89 billion, with a projected PE ratio of 89.15 for 2025E [7] - Hengxuan Technology (688608) has a market cap of 50.09 billion, with a projected PE ratio of 57.88 for 2025E [7]
合伙人招募!4D标注/世界模型/VLA/模型部署等方向
自动驾驶之心· 2025-09-27 23:33
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced models and technologies related to autonomous driving, such as large models, multimodal models, and 3D target detection [3] - Candidates are preferred to have a master's degree or higher from universities ranked within the QS200, with priority given to those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing related to job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives [5] - Opportunities for collaboration on entrepreneurial projects are also highlighted [5] - Interested parties are encouraged to contact via WeChat for further inquiries regarding collaboration in the autonomous driving field [6]
打算招聘几位大佬共创平台(4D标注/世界模型/VLA/模型部署等方向)
自动驾驶之心· 2025-09-25 07:36
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced technologies such as large models, multimodal models, and 3D target detection [3][4] - The article highlights the benefits of joining, including resource sharing for job seeking, PhD recommendations, and substantial cash incentives [5][6]
招聘几位大佬,打算共创平台(模型部署/VLA/端到端)
自动驾驶之心· 2025-09-04 08:42
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, research guidance, and hardware development [2][5] - The main areas of expertise sought include large models, multimodal models, diffusion models, SLAM, 3D object detection, and closed-loop simulation [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing for job seeking, PhD recommendations, and overseas study opportunities, along with substantial cash incentives [5] - There are opportunities for collaboration on entrepreneurial projects [5] - Interested parties are encouraged to contact via WeChat for further inquiries [6]
又有很多自动驾驶工作中稿了ICCV 2025,我们发现了一些新趋势的变化...
自动驾驶之心· 2025-08-16 00:03
Core Insights - The article discusses the latest trends and research directions in the field of autonomous driving, highlighting the integration of multimodal large models and vision-language action generation as key areas of focus for both academia and industry [2][5]. Group 1: Research Directions - The research community is concentrating on several key areas, including the combination of MoE (Mixture of Experts) with autonomous driving, benchmark development for autonomous driving, and trajectory generation using diffusion models [2]. - The closed-loop simulation and world models are emerging as critical needs in autonomous driving, driven by the limitations of real-world open-loop testing. This approach aims to reduce costs and improve model iteration efficiency [5]. - There is a notable emphasis on performance improvement in object detection and OCC (Occupancy Classification and Counting), with many ongoing projects exploring specific pain points and challenges in these areas [5]. Group 2: Notable Projects and Publications - "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation" is a significant project from Huazhong University of Science and Technology and Xiaomi, focusing on integrating vision and language for action generation in autonomous driving [5]. - "All-in-One Large Multimodal Model for Autonomous Driving" is another important work from Zhongshan University and Meituan, contributing to the development of comprehensive models for autonomous driving [6]. - "MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding" from Chongqing University aims to enhance understanding of driving scenarios through multimodal analysis [8]. Group 3: Simulation and Reconstruction - The project "Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images" from TUM focuses on advanced reconstruction techniques for autonomous driving [14]. - "CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving" from Fraunhofer IVI and TU Munich is another notable work that addresses dynamic scene reconstruction [16]. Group 4: Trajectory Prediction and World Models - "Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics" from Hong Kong University of Science and Technology and Didi emphasizes the importance of trajectory prediction in autonomous driving [29]. - "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model" from the Chinese Academy of Sciences focuses on developing a comprehensive world model for autonomous driving [32].
自动驾驶之『多模态大模型』交流群成立了!
自动驾驶之心· 2025-06-26 12:56
Core Viewpoint - The article emphasizes the importance of a leading technology exchange platform in the field of autonomous driving, focusing on cutting-edge technologies and career development opportunities in the industry [1]. Group 1: Technologies and Research Areas - The platform covers a wide range of topics including embodied intelligence, visual large language models, world models, end-to-end autonomous driving, diffusion models, lane line detection, and 2D/3D object tracking [1]. - It also addresses advanced perception techniques such as BEV perception, multi-modal perception, occupancy detection, and multi-sensor fusion [1]. - Other areas of focus include transformer models, large models, point cloud processing, online mapping, SLAM, optical flow estimation, depth estimation, trajectory prediction, high-precision maps, NeRF, and Gaussian Splatting [1]. Group 2: Career Development and Community Engagement - The platform encourages discussions and exchanges among professionals and students interested in autonomous driving, AI job opportunities, and hardware configuration [1]. - It invites individuals to join the community by adding a WeChat assistant and providing their company/school, nickname, and research direction [1].
突破开放世界移动操作!首个室内移动抓取多模态智能体亮相,微调模型真实环境零样本动作准确率达 90%
机器之心· 2025-06-20 11:59
Core Insights - The article discusses the development of "OWMM-Agent," a multimodal intelligent agent architecture specifically designed for Open World Mobile Manipulation (OWMM), achieving unified modeling of global scene understanding, robot state tracking, and multimodal action generation [1][5]. Background - Traditional mobile manipulation robots struggle with open-ended tasks in dynamic environments, requiring pre-built 3D reconstructions or semantic maps, which are time-consuming and inefficient [5]. - The OWMM task's challenges include global scene reasoning, embodied decision-making, and system integration to derive low-level control targets from a VLM base model [5]. Multimodal Agent Architecture - The OWMM problem is modeled as a multi-round, multi-image reasoning and grounding task, allowing the multimodal large model to perform end-to-end perception, reasoning, decision-making, and state updating [6]. - The team designed a data synthesis scheme based on the Habitat simulation platform to address the "hallucination" issue in VLM models, incorporating long-term environmental memory and transient state memory [8][9]. Experimental Validation - In simulated environments, the OWMM-VLM model demonstrated significant advantages, achieving a 90% zero-shot action generation success rate in real-world tests with the Fetch robot [12]. - The model successfully executed tasks such as moving a soy milk box from a desk to a conference table, showcasing strong generalization capabilities [12]. Future Outlook - This research establishes that a VLM model fine-tuned with large-scale simulated data can serve as a universal foundational model for open-world mobile operations [14]. - The advancements made by OWMM-Agent lay the groundwork for the development of general-purpose household robots, potentially enabling voice-commanded home assistance in the near future [15].
2025年全球多模态大模型行业发展现状 AI服务器和算力发展推动市场爆发式增长【组图】
Qian Zhan Wang· 2025-04-22 07:44
Core Insights - The global multimodal large model industry has evolved through distinct phases, from early exploration (1956-2005) to rapid growth (2006-2019), the rise of large models (2020-2022), and now into a phase of widespread application starting in 2023 [1] Market Size and Growth - The global AI hardware market, particularly for servers, is projected to grow from $19.5 billion in 2022 to $34.7 billion by 2026, with a compound annual growth rate (CAGR) of 17.3% [3][4] - The market for servers specifically used for generative AI is expected to increase its share from 11.9% in 2023 to 31.7% by 2026 [4] Computational Demand - The demand for computational power in AI is increasing, with models like ChatGPT requiring significant resources; for instance, the GPT-3 model needs 1,750 billion parameters and consumes 3,640 PF-days of computational power [5] - A tenfold increase in model parameters can lead to more than a tenfold increase in computational requirements, influenced by model architecture and hardware capabilities [5] Large Model Market Dynamics - The global large model market is experiencing rapid growth, with an estimated size of $21 billion in 2023 and a projected increase to $28 billion in 2024, reflecting a year-on-year growth of 33% [7] Competitive Landscape - According to the SuperCLUE benchmark report, GPT-4o leads the global model rankings with a score of 81, while six Chinese models have surpassed GPT-4-Turbo-0409, indicating a strong competitive presence in the market [10]