量子位
Search documents
机器人“会用手”了!银河通用首破手掌任意朝向旋转难题,拧螺丝、砸钉子样样精通
量子位· 2025-11-10 00:30
Core Insights - The article discusses the breakthrough of the DexNDM model developed by Galaxy Universal, which enables dexterous hands to perform complex tasks such as in-hand rotation and tool usage, bridging the gap between simulation and real-world applications [2][4][55]. Group 1: DexNDM Model Capabilities - DexNDM allows for stable in-hand rotation of various objects, regardless of their size or shape, achieving cross-object and cross-pose manipulation [5][6]. - The model can operate under challenging wrist postures, enabling continuous rotation of long objects and stable manipulation of small items [6][17]. - It enhances the robot's ability to perform complex tasks like screw tightening and furniture assembly, marking a significant leap from simple grasping to dexterous manipulation [21][64]. Group 2: Technical Innovations - DexNDM employs a joint-wise neural dynamics model, allowing each joint to independently predict its next state, improving data efficiency and generalization across different tasks [8][10]. - The model utilizes an automated data collection strategy to generate rich contact data without manual intervention, enhancing learning efficiency [11][14]. - A residual policy network is trained to bridge the gap between simulation and reality, facilitating the transfer of learned strategies to real-world scenarios [15]. Group 3: Importance of Dexterous Manipulation - Dexterous manipulation is crucial for robots to transition from basic capabilities to productive tasks, as it encompasses both motion and operational abilities [24][28]. - The ability to perform in-hand rotation and tool usage is seen as a pinnacle of dexterous manipulation, representing a significant challenge in robotics research [37][38]. - The advancements in dexterous manipulation are expected to lead to robots that can perform a wide range of tasks, moving beyond simple demonstrations to actual productive capabilities [58][65].
量子位2025年度榜单申报倒计时!企业/产品/人物三大维度5类奖项即将截止
量子位· 2025-11-09 07:01
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 1、注册地在中国,或主营业务主要面向中国市场; 2、主营业务属于人工智能及相关产业,或已将人工智能广泛应用于主营业务,并在细分领域居于行业领先地位; 评选标准 : 1、 业务能力 |市场占有率与营收规模,商业模式与盈利能力,客户数量及行业覆盖面,增长潜力与持续性等; 2、 技术能力 |科研实力与技术成果,研发投入比例,技术核心竞争力,创新案例与技术落地情况等; 3、 资本能力 |融资 ...
银河通用全新模型统一机器人导航任务,7B参数模型支持实时部署
量子位· 2025-11-09 07:01
Core Viewpoint - The article discusses the development of NavFoM, a foundational model for embodied navigation that aims to unify navigation tasks across different robots and scenarios, moving from specialized to general-purpose navigation capabilities [1][20]. Group 1: Unified Navigation Paradigm - NavFoM is based on a fundamental idea of unifying navigation tasks for different robots into a common paradigm: streaming video input from robots combined with natural language navigation instructions to predict action trajectories [3][21]. - The model supports multiple tasks such as visual language navigation, target search, target following, and autonomous driving, across various environments including indoor and outdoor settings, and is applicable to different types of robots like quadrupeds, wheeled robots, humanoids, drones, and cars [3][21]. Group 2: Model Structure and Features - The model structure includes TVI Tokens, which provide a scalable method for the model to understand images under different tasks and camera settings [5]. - NavFoM employs a Budget-Aware Token Sampling Strategy (BATS) to adaptively sample key frames during navigation, ensuring efficient real-time deployment of the 7B parameter model while maintaining performance [6][11]. Group 3: Training Data and Performance - The team collected 8 million navigation data entries, including visual language navigation, target navigation, target tracking, and autonomous driving data, covering various robot types and scenarios [12][21]. - NavFoM achieved state-of-the-art (SOTA) and SOTA-comparable results across multiple public benchmarks without requiring task-specific fine-tuning, demonstrating its versatility and effectiveness [16][21]. Group 4: Future Implications - The development of NavFoM marks a significant step towards generalizing embodied intelligent navigation models, enabling scalable navigation technology across industries [20][21]. - The team aims to attract more attention to embodied navigation research and stimulate the emergence of new technologies, datasets, and benchmarks, facilitating innovation in intelligent services [21].
大厂AI新战场:AQ狂飙,蚂蚁押注大健康赛道
量子位· 2025-11-09 07:01
Core Viewpoint - Ant Group has strategically upgraded its "Digital Healthcare Division" to "Healthcare Business Group," aiming to accelerate the development of healthcare services as a strategic pillar of the company [2][3]. Group 1: Strategic Adjustments - The restructuring has led to a more comprehensive business matrix for Ant Group, which now includes five core business segments: Ant International, Ant Digital Technology, OceanBase, Alipay Business Group, and the newly formed Healthcare Business Group [3]. - The timing of this strategic shift is notable as it reflects a broader trend in the AI industry, moving from model competition to focusing on practical applications and commercialization [5][7]. Group 2: AI Application in Healthcare - Ant Group's AI strategy is taking shape with a focus on three key areas: lifestyle services, financial services, and healthcare services [5]. - The launch of the AI health management app AQ has been a significant success, achieving over 10 million monthly active users within four months and a compound growth rate of 83.4% in Q3 2023, far exceeding the industry average of 13.5% [8][10]. Group 3: Market Dynamics and Trends - The competition among major tech companies is shifting from parameter optimization to application-level differentiation, with a focus on creating value through AI models [13][14]. - The healthcare sector is becoming increasingly competitive, driven by the need for specialized AI applications that can address complex healthcare challenges [6][19]. Group 4: Long-term Vision and Market Potential - The healthcare market in China is projected to exceed 20 trillion RMB by 2025, driven by an aging population and increasing demand for chronic disease management and personalized health services [44][46]. - Ant Group's historical investments in digital healthcare infrastructure have positioned it well to capitalize on these emerging opportunities, transitioning from a connector in the healthcare system to an active participant with service capabilities [22][39]. Group 5: Challenges and Future Outlook - The transition to AI-driven healthcare services presents challenges, including the need for deep integration into existing healthcare systems and the establishment of unique competitive advantages in vertical markets [17][19]. - The success of Ant Group's healthcare initiatives will depend on its ability to navigate these challenges and leverage its existing capabilities to meet the evolving demands of the healthcare market [59][61].
量子位2025年度榜单申报倒计时!企业/产品/人物三大维度5类奖项即将截止
量子位· 2025-11-08 04:10
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 1、注册地在中国,或主营业务主要面向中国市场; 2、主营业务属于人工智能及相关产业,或已将人工智能广泛应用于主营业务,并在细分领域居于行业领先地位; 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 3、具备成熟的产品或服务,已获得实际客户应用及市场认可; 4、近一年在技术 ...
机器人训练,北京男大有了技能玩法
量子位· 2025-11-08 04:10
Core Viewpoint - The article discusses a new method of human-robot collaboration called COLA, which allows humanoid robots to interact and cooperate with humans using only proprioception, eliminating the need for external sensors [10][17][23]. Group 1: Introduction to COLA - The article introduces a scenario where a male student collaborates with a robot in various tasks, showcasing the robot's ability to assist without traditional controls [3][5]. - The interaction between the student and the robot is achieved through simple physical cues rather than remote controls or voice commands [8][10]. Group 2: Technical Aspects of COLA - COLA is a novel reinforcement learning method that enables humanoid robots to perform tasks by relying solely on proprioception, which includes internal sensory data like joint angles and force feedback [17][23]. - The method integrates two roles—leader and follower—into a single strategy, allowing the robot to switch roles seamlessly based on the human's actions [19][20]. Group 3: Training and Environment - The training environment for COLA is designed to be highly dynamic, simulating various real-world scenarios to prepare the robot for unexpected changes during tasks [21][22]. - The training process involves a feedback loop where the robot's actions influence the environment, and vice versa, creating a realistic interaction model [21][30]. Group 4: Performance and Validation - COLA has been tested in both simulated and real-world environments, demonstrating robust collaborative capabilities across various object types and movement patterns [35][36]. - Human participants rated COLA-controlled robots higher in terms of tracking and smoothness compared to other baseline methods, indicating superior performance [39][40]. Group 5: Research Team and Contributions - The research team behind COLA consists of members from the Beijing Academy of General Artificial Intelligence, with notable contributions from Yushi Du, Yixuan Li, and Baoxiong Jia [41][46]. - The team has published multiple papers in top conferences, showcasing their expertise in humanoid robotics and collaborative systems [45][47].
LLM强化学习新框架!UCSD多智能体训练框架让LLM工具调用能力暴增5.8倍
量子位· 2025-11-08 04:10
PettingLLMs团队 投稿 量子位 | 公众号 QbitAI 大语言模型智能体的强化学习框架, 首次实现了通用的多智能体的"群体强化"。 在大语言模型(LLM)智能体的各种任务中,已有大量研究表明在各领域下的多智能体工作流在未经训练的情况下就能相对单智能体有显著提 升。 但是现有的LLM智能体训练框架都是针对单智能体的,多智能体的"群体强化"仍是一个亟须解决的问题。 为了解决这一领域的研究痛点,来自UCSD和英特尔的研究人员,提出了新的提出通用化多智能体强化学习框架—— PettingLLMs 。支持任 意组合的 多个 LLM一起训练。 研究背景 大语言模型驱动的多智能体系统在医疗、编程、科研、具身智能等多个领域均能大幅度提升任务表现。 为训练大模型智能体,Group Relative Policy Optimization (GRPO) 已被验证为通用的有效强化学习算法。然而,当前所有针对LLM的强 化学习训练框架,包括GRPO算法本身,都局限于单智能体训练的范畴。 多智能体间的协作优化,即"群体强化"的学习机制,仍然是一个亟 待填补的空白。 GRPO算法的核心机制是,针对同一个输入(prompt), ...
ICCV涌现自动驾驶新范式:统一世界模型VLA,用训练闭环迈向L4
量子位· 2025-11-08 04:10
Core Viewpoint - The article discusses the shift in the autonomous driving industry from a data-driven approach to a training-driven approach, emphasizing the importance of world models and reinforcement learning in achieving Level 4 (L4) autonomy [2][4][6]. Group 1: Transition from Data Loop to Training Loop - The current data loop is insufficient for advancing autonomous driving technology, necessitating a shift to a training loop that allows for continuous model iteration through environmental feedback [4][11]. - Ideal's approach involves building a world model training environment in the cloud, which integrates prior knowledge and driving capabilities into the vehicle's VLA model [11][30]. - The world model encompasses environment construction, agent modeling, feedback mechanisms, and various scenario simulations, which are crucial for the training loop [13][31]. Group 2: Simulation and Evaluation Techniques - Ideal employs a combination of reconstruction and generation techniques for simulation, allowing for both stable and dynamic outputs [14][15][16]. - The Hierarchy UGP model, developed in collaboration with academic institutions, achieves state-of-the-art results in large-scale dynamic scene reconstruction [21][19]. - The focus on synthetic data generation enhances the diversity and complexity of training scenarios, improving model performance [25][24]. Group 3: Reinforcement Learning and Challenges - The reinforcement learning world engine enables models to explore training environments and receive feedback, with five key factors influencing its effectiveness [25][27]. - The simulation of interactions between multiple agents poses significant challenges, with Ideal exploring self-play and reward function adjustments to enhance sample diversity [27][29]. Group 4: Commercialization and Technological Advancements - Ideal has successfully established a profitable business model, which supports its ongoing research and development efforts, with over 10 billion yuan invested in the self-developed Star Ring OS [32][33]. - The Star Ring OS enhances vehicle performance by streamlining communication between different control systems, significantly reducing braking distances [35][36]. - The open-source initiative of the Star Ring OS is expected to benefit the entire industry, reducing development costs for other automakers [39][40]. Group 5: Industry Position and Future Outlook - Ideal is positioning itself as a leading player in the AI-driven automotive sector, with a focus on becoming a "space robotics company" [48][50]. - The company has established a research-production closed loop, allowing for rapid application of research findings to production, exemplified by the DriveVLM project [52]. - The article concludes that while many companies are investing in AI and robotics, few have achieved the comprehensive capabilities demonstrated by Ideal and Tesla [53].
AI100访谈:「Get笔记」方法论 |量子位智库
量子位· 2025-11-08 02:25
Core Insights - Get Notes has rapidly gained over 1.5 million users within a year, demonstrating strong user engagement and retention in a competitive AI knowledge management market [4][10][25] - The product's success is attributed to its ability to address user pain points effectively, leveraging user feedback and co-creation in its development process [6][13][14] Market Landscape - The AI knowledge management sector is highly competitive, with major players like Baidu, Alibaba, and Tencent offering similar products [5] - Despite the crowded market, Get Notes has attracted a significant number of users, with over half being new users who had not previously engaged with the parent app, "Get" [22][24] User Engagement and Product Development - Get Notes emphasizes user co-creation, collecting feedback through user groups and allowing users to vote on feature requests, which helps prioritize development [50][51][57] - The product focuses on three core functionalities: efficient recording, easy retrieval, and user-friendly design, ensuring that it meets the actual needs of users [63][66] Unique Features and Differentiation - Get Notes offers unique features such as AI-enhanced transcription and intelligent note organization, which differentiate it from competitors [11][35][41] - The product's ability to integrate various forms of content (audio, text, images) into a cohesive knowledge base enhances its utility for users [80][81] Future Outlook and Industry Impact - The company believes that the AI knowledge management sector is still in its early stages, with significant potential for growth and innovation as user needs become more specialized [21][95] - AI is expected to create new demands and job roles within organizations, emphasizing the need for tools that facilitate AI integration into daily workflows [96][97]
两周复刻DeepSeek-OCR!两人小团队还原低token高压缩核心,换完解码器更实用
量子位· 2025-11-07 05:32
Core Insights - The article discusses the development of DeepOCR, a replica of the previously acclaimed DeepSeek-OCR, achieved by a small team in just two weeks, maintaining the original's advantages of low token usage and high compression [1][5]. Group 1: Technology and Design - DeepSeek-OCR's design philosophy focuses on "visual compression," using a limited number of visual tokens to represent content that would typically require many text tokens, thus reducing computational costs associated with large models [4][6]. - The model achieves a compression ratio of 7-20 times, maintaining an accuracy of 97% even with a 10-fold compression [7]. - The architecture of DeepSeek-OCR includes a three-stage structure: local processing, compression, and global understanding, which helps manage memory usage effectively [10]. Group 2: Training and Performance - DeepOCR is designed to be low-computationally intensive, allowing it to be trained on just two H200 GPUs, making it accessible for small teams [21]. - The training process consists of two phases, with the first phase focusing on training a multi-modal projector while keeping the DeepEncoder frozen, significantly reducing memory requirements [20]. - In practical tests, DeepOCR uses approximately 250 visual tokens, which, while slightly less efficient than the original DeepSeek-OCR, is still significantly better than baseline models that require thousands of tokens for similar performance [22]. Group 3: Results and Future Plans - DeepOCR shows strong performance in basic tasks such as English text recognition and table parsing, with table parsing even outperforming the original model due to precise restoration of the original 2D spatial encoding [24]. - The team plans to enhance the model by incorporating additional data types, including formulas and multi-language support, and exploring advanced techniques to further improve performance [28]. - The article highlights the team's academic backgrounds, showcasing their expertise in multi-modal fields and previous experience in notable tech companies [29][31].