Workflow
多模态大模型
icon
Search documents
东吴证券:距离真正的具身智能大模型有多远?
智通财经网· 2025-08-09 14:20
Core Viewpoint - The future of embodied large models will continue to evolve in three areas: modality expansion, reasoning mechanisms, and data composition [1][4] Group 1: Importance of High-Intelligence Large Models for Humanoid Robots - The key to the industrialization of humanoid robots lies in overcoming the limitations of traditional industrial robots, which are based on deterministic control logic and lack perception, decision-making, and feedback capabilities [2] - Humanoid robots aim to be "general intelligent agents," emphasizing a complete link of perception, reasoning, and execution, which requires support from large models for multi-modal understanding and generalization capabilities [2] - The rise of multi-modal large models provides humanoid robots with a "primary brain," initiating an intelligent evolution from 0 to 1, although overall intelligence is still at the L2 initial stage [2] Group 2: Progress of Large Models in Robotics from Architecture and Data Perspectives - The rapid evolution of large models in robotics is driven by breakthroughs in both architecture and data [3] - Current models have developed from early language planning models to end-to-end action output, integrating multi-modal perception capabilities into a unified model space [3] - A structured system supporting pre-training and practical capabilities has emerged, relying heavily on high-precision motion capture equipment for real-world data collection [3] Group 3: Future Development Directions of Large Models - Future embodied large models are expected to expand modalities by incorporating tactile and temperature perception channels [4] - Architectures like Cosmos aim to endow robots with "imagination" through state prediction, enhancing environmental modeling and reasoning capabilities [4] - The integration of simulation and real data for training is becoming mainstream, with high-standard, scalable training environments being crucial for the general robot training system [4] Group 4: Investment Recommendations - Companies to focus on in the model sector include Galaxy General, Star Motion Era, and Zhiyuan Robotics [5] - In the data collection field, attention should be given to Qingtong Vision, Lingyun Light, and Obsidian Zhongguang [5] - For data training environments, Tianqi Co., Ltd. is recommended [5]
机器人大模型深度报告:我们距离真正的具身智能大模型还有多远?
Xin Lang Cai Jing· 2025-08-09 10:32
Core Insights - The key to industrializing humanoid robots lies in overcoming the limitations of traditional industrial robots, which are based on deterministic control logic and lack perception, decision-making, and feedback capabilities [1] - The rise of multimodal large models provides humanoid robots with an "initial brain," enabling intelligent evolution and continuous improvement in model capabilities and product performance through a data flywheel [1] - Current intelligent models are still at the L2 initial stage, facing challenges in modeling methods, data scale, and training paradigms, with high-intelligence large models being a core variable in the path to general humanoid robots [1] Progress in Robot Large Models - The rapid evolution of robot large models is driven by breakthroughs in architecture and data [2] - Architecturally, models have progressed from early language planning models to end-to-end action output, integrating multimodal perception capabilities [2] - By 2024, the π0 model will introduce an action expert model with an output frequency of 50Hz, and by 2025, the Helix model will achieve a control frequency of 200Hz, enhancing operational fluidity and response speed [2] - The data structure now includes a collaborative system of internet, simulation, and real machine action data, with real machine data collection relying heavily on high-precision motion capture equipment [2] - The mainstream training paradigm is shifting from "low-quality pre-training + high-quality fine-tuning" to "data pile optimization," indicating a transition in model intelligence leaps [2] Future Development Directions of Large Models - Future embodied large models will evolve in three areas: modality expansion, reasoning mechanisms, and data composition [3] - The next phase is expected to introduce additional sensory channels such as touch and temperature, enhancing the robot's perception capabilities [3] - Architectures like Cosmos aim to provide robots with "imagination" through state prediction, creating a closed loop of perception, modeling, and decision-making [3] - The integration of simulation and real data for training is becoming the mainstream direction, with high-standard, scalable training environments being crucial for general robot training systems [3] Investment Recommendations - Companies to focus on in the model sector include Galaxy General, Star Motion Era, and Zhiyuan Robotics [4] - In the data collection field, attention should be given to Qingtong Vision, Lingyun Light, and Aobi Zhongguang [4] - For data training environments, Tianqi Co., Ltd. is recommended [4]
中国“机器人之城”大盘点:深广沪领跑,北京、苏州紧随其后
21世纪经济报道· 2025-08-08 15:21
编辑丨陈洁 8月8日,2025世界机器人大会在北京开幕,全球超200家机器人企业再次迎来"同台竞技"。 自 年初人形机器人在春晚一舞"出圈"以来,机器人产业今年已屡次登榜热搜"C位",迎来资本、 政策等多重风口。 风口之下,哪些城市握住了机遇? 记者丨 郑玮 实习生王硕 南方财经记者在天眼查平台统计数据显示,截至2025年8月4日,全国共有22座城市辖内集聚 超过万家机器人企业,东、中、西部均有城市上榜。其中,东部城市体量优势明显,深圳、广 州、上海3城集聚机器人企业数量分别达到65291家、53288家和45801家,领跑全国。北京、 苏州两市紧随其后,辖内机器人企业数量双双突破3万家。 踏入产业高速增长期,各地也正加快布局。 据南方财经记者不完全统计,目前深圳、上海、 北京等16城均出台了支持机器人产业发展的专项政策。其中,北京、上海已分别成立国家地方 共建具身智能机器人创新中心、国家地方共建人形机器人创新中心,浙江、安徽、湖北、广 东、四川等地也成立省级机器人创新中心。 广东省机器人协会执行会长任玉桐向南方财经记者表示,今年以来,在政策与资本双轮驱动 下,不同区域、城市机器人产业集群在技术路径、应用场景 ...
腾讯研究院AI速递 20250808
腾讯研究院· 2025-08-07 16:01
Group 1: GPT-5 and MiniMax Voice Model - OpenAI has disclosed four versions of GPT-5: standard, mini, nano, and chat, with varying capabilities for different user tiers [1] - Community testing shows GPT-5 achieves 90% accuracy in SimpleBench reasoning tests, with improvements in programming and visual performance [1] - MiniMax has launched a new voice generation model, Speech 2.5, supporting 40 languages and enabling natural switching between languages while preserving voice characteristics [2] Group 2: Xiaohongshu and MiniCPM Models - Xiaohongshu has open-sourced its first multimodal large model, dots.vlm1, which closely rivals leading closed-source models in visual understanding and reasoning [3] - The MiniCPM-V 4.0 model has been released with only 4 billion parameters, achieving state-of-the-art results while being optimized for mobile use [4] - MiniCPM-V 4.0 shows significant throughput advantages under increased concurrent user loads, reaching 13,856 tokens per second [4] Group 3: Qwen Models and Chess Competition - Qwen has introduced two smaller models, Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507, both suitable for edge deployment and achieving high performance in reasoning tasks [6] - The first round of the inaugural large model chess competition saw OpenAI's o3 achieve a perfect score against o4-mini, while Grok 4 advanced after a tie with Gemini 2.5 Pro [7] Group 4: Gemini's Guided Learning and Skild AI - Google has launched a "Guided Learning" tool for Gemini, designed to help users build deep understanding through interactive learning [8] - Skild AI has developed an end-to-end visual perception control strategy that allows robots to navigate complex environments with unprecedented adaptability [9] Group 5: Li Auto and a16z Insights - Li Auto has introduced the VLA model, which integrates visual, language, and action components to enhance vehicle decision-making [10] - a16z analysts predict that the AI application generation platform market will move towards specialization rather than a winner-takes-all scenario, with over 70% of users active on a single platform [12]
600亿AI巨头,一年内融资近53亿港元
Sou Hu Cai Jing· 2025-08-07 11:29
Financing and Capital Structure - In July, the company completed a financing round of HKD 2.5 billion, bringing the total raised in less than a year to nearly HKD 5.3 billion [1][3][7] - The recent placement involved issuing 1.667 billion new B shares at HKD 1.5 per share, representing 4.31% of the total issued shares [3][7] - Since its establishment, the company has raised a total of USD 5.225 billion across 12 financing rounds from various investors, including IDG Capital and Alibaba [7] Financial Performance - The company has not achieved profitability since its inception, with losses narrowing in recent years but still significant, amounting to CNY 6.045 billion, CNY 6.44 billion, and CNY 4.278 billion over the last three years [9][12] - Revenue for the years 2022 to 2024 was CNY 3.809 billion, CNY 3.406 billion, and CNY 3.772 billion, with a notable decline in the first two years followed by a growth of 10.75% in the last year [9][11] - The core revenue driver has shifted to generative AI, which saw revenues of CNY 1.184 billion and CNY 2.404 billion in the last two years, reflecting a growth of 103.1% [9][11] Organizational Changes - The company has undergone significant organizational restructuring, including the appointment of two new executive directors and the transition of a co-founder to lead the AI chip business [1][15][20] - Employee numbers have decreased from 5,098 to 3,756 over the past three years, contributing to reduced employee welfare expenses, which have been a factor in narrowing losses [17][18] Strategic Focus - The company plans to allocate 30% of the recent funds to support core business development, another 30% to generative AI research, and 20% for exploring AI technology integration in innovative verticals [7] - The company aims to enhance its organizational efficiency and focus on strategic growth areas, particularly in AI infrastructure and applications [15][17]
小红书开源多模态大模型dots.vlm1:解锁图文理解与数学解题新能力
Sou Hu Cai Jing· 2025-08-07 10:31
小红书的人文智能实验室(hi lab)近日宣布开源了其最新的多模态大模型dots.vlm1。这款模型建立在DeepSeek V3的基础上,并配备了小红书 自研的12亿参数视觉编码器NaViT,展现出强大的多模态理解与推理能力。 据hi lab介绍,dots.vlm1在多个视觉评测集上的表现已经接近当前领先的模型,如Gemini 2.5 Pro和Seed-VL1.5 thinking。特别是在MMMU、 MathVision、OCR Reasoning等基准测试中,dots.vlm1显示出卓越的图文理解与推理能力。它能理解复杂的图文交错图表,解析表情包背后的 含义,分析产品配料表差异,并能准确判断博物馆中文物和画作的名称及背景信息。 在文本推理任务上,dots.vlm1的表现大致与DeepSeek-R1-0528相当,显示出一定的数学和代码能力通用性。然而,在GPQA等更多样化的推理 任务上,dots.vlm1仍存在提升空间。尽管如此,dots.vlm1的整体性能已经相当可观,特别是在视觉多模态能力方面,已接近最佳性能 (SOTA)水平。 | 意在全 | | Qwen2.5VL-72B | Gemini2.5 ...
千里科技(601777.SH):与阶跃星辰在智能座舱领域形成战略协同
Ge Long Hui A P P· 2025-08-07 08:13
Core Viewpoint - Qianli Technology (601777.SH) has formed a strategic collaboration with Jieyue Xingchen in the smart cockpit sector, focusing on the development of next-generation smart cockpit products utilizing AI capabilities [1] Group 1: Strategic Collaboration - The partnership aims to leverage multi-modal large models and end-to-end voice large models to enhance product offerings [1] - The collaboration will include the development of a large model native operating system, referred to as Agent OS, and AI smart assistants [1] Group 2: Product Development Focus - The goal is to create industry-leading Natural UI products for natural interaction [1]
具身智能之心技术交流群成立了!
具身智能之心· 2025-08-07 02:38
Group 1 - The establishment of the Embodied Intelligence Heart Technology Exchange Group focuses on various advanced technologies including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the joining process, it is recommended to include a note with the institution/school, name, and research direction [3]
商汤CFO王征亲述:“Re-CoFound”200多天后,“1+X”交出怎样的答卷?
第一财经· 2025-08-06 12:53
Core Viewpoint - The article discusses the transformation of SenseTime through its "1+X" organizational restructuring, emphasizing the emergence of a new entrepreneurial spirit among its young leaders and the financial accountability they now embrace [6][11][15]. Group 1: Organizational Changes - SenseTime's "1+X" strategy was officially announced on December 3, 2024, marking a significant restructuring aimed at fostering a new entrepreneurial collective to seize opportunities in the AI 2.0 era [6][11]. - The restructuring has led to the establishment of a five-member executive committee, enhancing decision-making efficiency and fostering a collaborative environment [7][11]. - The new structure encourages a focus on core business areas while allowing for flexibility and rapid adaptation in emerging vertical markets [12][14]. Group 2: Financial Accountability and Performance - The restructuring has resulted in a noticeable increase in financial oversight among the CEOs of the "X" businesses, who are now more proactive in managing their financial situations [15][16]. - Each of the six "X" enterprises has successfully raised over 2 billion yuan in cumulative financing, indicating strong investor interest and market validation [17][18]. - The establishment of the "X" businesses has positively impacted the parent company's cash flow, allowing for more resources to be allocated to core operations [17]. Group 3: Strategic Focus and Market Position - SenseTime is focusing on a "three-in-one" strategy that integrates large devices, large models, and applications, while still maintaining its core competency in computer vision (CV) [21][22]. - The company has seen significant growth in its CV business, with increased willingness from clients to invest, particularly in Hong Kong and overseas markets [22][23]. - SenseTime's extensive experience in CV is viewed as a competitive advantage in developing multi-modal large models, which are essential for future AI advancements [24][25]. Group 4: Technological Advancements - The latest model, released on July 27, 2025, showcases significant improvements in multi-modal reasoning capabilities, reflecting the company's commitment to innovation [27]. - SenseTime's strategic focus on integrating visual data with AI applications positions it well for future growth in the rapidly evolving AI landscape [24][25].
“AI”之眼,一场视觉智能的进化 | 2025 ITValue Summit前瞻WAIC现场版:AI落地指南系列
Tai Mei Ti A P P· 2025-08-06 11:39
WAIC 世界人工智能大会展会上熙熙攘攘,格灵深瞳CEO吴一洲发现会场比往年更热闹,现场的人和产 品的画像更丰富,而且许多大公司展现出的AI单点应用深度也让人印象深刻。AI应用真正走进产业的 脉络更为清晰了。 在钛媒体2025 ITValue Summit前瞻WAIC现场版:AI落地指南系列的直播中,吴一洲与钛媒体联合创始 人刘湘明聚焦视觉智能的进化和AI技术升级下的技术厂商展开对话。 格灵深瞳一直深耕视觉算法和多模态大模型技术研发,经历过上一个技术时代的技术企业在这一波智能 浪潮中有明显不同的感受——产品有了"成长性"。吴一洲在对话中反复强调的一点是:要让产品能用起 来、用得好,而且有持续性的成长性。这不仅是格灵深瞳对产品的要求,也是作为技术厂商与客户共创 的愿景。 "以前,我们会给客户一个通用工具,现在有了智能体Agent之后,变成了个性化、有记忆的工具,相当 于一个搭档、一个执行合伙人,应用上更细化、更成熟了。"吴一洲介绍说,经过近几年的演进,格灵 深瞳构建了由模型、算法、软硬一体的产品和服务形成的端到端的体系。不过,她仍然非常理性,认为 当前AI距离真正的落地应用、在行业里跟专家超融合一样去深化应用, ...