Workflow
机器之心
icon
Search documents
「从追赶者到引领者,路有多远?」 我们和CANN一线开发者聊了聊
机器之心· 2025-09-28 04:50
Core Viewpoint - The article discusses the transformation of the AI industry, emphasizing that the competition has shifted from hardware capabilities to a battle for software, developers, and ecosystem building, with Huawei's Ascend and its heterogeneous computing architecture CANN at the forefront of this change [1][4]. Summary by Sections CANN Open Source Announcement - Huawei's rotating chairman Xu Zhijun announced that the CANN hardware enabling will be fully open-sourced by December 30, 2025 [2]. Significance of CANN Open Source - The open-sourcing of CANN represents a profound self-revolution in the domestic AI infrastructure, aiming to break the closed model traditionally dominated by hardware manufacturers and embrace a more open and community-driven future [4][19]. - The success of the ecosystem relies on attracting academic innovation and creating a stable, universal, and efficient foundational tool for developers [5][18]. Developer Perspectives on CANN - Developers describe CANN's evolution as a challenging journey, with early versions requiring low-level programming skills, which hindered productivity [10][11]. - The introduction of the Ascend C programming language marked a significant improvement, aligning more closely with mainstream programming practices [15]. Challenges Faced by Developers - Early developers faced high technical barriers and a lack of stable architecture, leading to a difficult development environment [11][13]. - Systemic issues persisted, such as the inability to reproduce model accuracy across different frameworks due to a lack of transparency in the underlying systems [17]. The Role of Open Source - Open sourcing CANN is seen as a means to break down technical barriers and empower developers by providing transparency and control over the platform [21][23]. - The open-source model aims to foster a vibrant community where developers can contribute and innovate, moving away from reliance on a few official experts [29]. Ecosystem Empowerment - Open source provides unprecedented opportunities for deep integration between academia and industry, allowing researchers to address real-world problems and convert solutions into academic contributions [26]. - The shift from users to contributors is expected to cultivate a new generation of developers who can engage in high-quality projects [28]. Future Outlook for CANN - The current focus is on matching CUDA's capabilities while fostering original innovations within the CANN ecosystem [44]. - Huawei has committed to investing significant resources, including 1,500 petaflops of computing power and 30,000 development boards annually, to support the open-source community [45].
RLHF与RLVR全都要,陈丹琦团队最新力作将推理能力拓展到通用智能
机器之心· 2025-09-28 04:50
一个月前,我们曾报道过清华姚班校友、普林斯顿教授 陈丹琦似乎加入 Thinking Machines Lab 的消息。有些爆料认为她在休假一年后,会离开普林斯顿,全职加 入 Thinking Machines Lab。 最近,陈丹琦在普林斯顿大学的团队发布了最新学术成果,表明了 RLVR 范式在可验证领域之外依然有效,提出了 基于模型奖励思维的强化学习(RLMT) 方 法,它将显式的思维链推理融入通用聊天模型之中。 论文标题:Language Models that Think, Chat Better 论文链接:https://www.arxiv.org/overview/2509.20357v1 众所周知,大型语言模型传统上遵循一种多阶段训练范式:首先在大规模文本语料上进行 预训练,然后通过 监督微调 来学习指令跟随,最后借助 强化学习 来对 齐人类偏好。 机器之心报道 编辑:冷猫 思考自身行为的后果,并在必要时进行修正 —— 这是人类智慧的核心特征之一。 这种方法确实催生了功能强大的对话式 AI 系统,但仍存在一个关键局限: 在数学、编程等领域通过 可验证奖 励的强化学习(RLVR) 所获得的推理能力, ...
一文读懂鲸智百应:驱动组织进化的企业AI操作系统,让企业从「用AI」到「是AI」
机器之心· 2025-09-28 04:50
机器之心发布 机器之心编辑部 「统一认知、智能执行、决策中枢、记忆进化、智能体工厂、 AI 治理」六大维度,让企业 彻底跳出「用 AI 」的工具思维,成为 「 AI 原生组织」。 走进任何一家大中型企业,「系统横跳」已成为日常:员工每天要在 5 个以上业务系统间切换完成工 作, 80% 的生产数据沉睡在 ERP 、 CRM 、 OA 的孤岛中无法调用, AI 工具仍停留在「问答式辅 助」而非「全流程执行」 ...... 本该驱动业务迭代的核心资产,成了看得见、用不上的「数据孤岛」, 企业数字化落地早已陷入「工具堆砌而非价值重构」的困境。 曾经一家企业 CTO 的感慨颇具代表 性:「每个系统都很专业,可当处理复杂业务时,却连一份完整的分析报告都凑不出来。」 2025 云栖大会上,在多数玩家还在聚焦「智能体」时,浩鲸科技正式推出的「鲸智百应」,以「企业 AI 操作系统」的定位撕开了差异化缺口。 据 浩鲸科技董事、云智能总裁杨名 介绍,鲸智百应并非简单的功能叠加,而是从「统一认知、智能执 行、决策中枢、记忆进化、智能体工厂、 AI 治理」六大维度,让企业彻底跳出「用 AI 」的工具思 维,成为 具备感知、思考、行动 ...
新一代AI教师是什么样?学而思让它从L2「助手」跃迁至L3「老师」
机器之心· 2025-09-28 00:32
机器之心报道 编辑:+0 自动驾驶有 L1-L5 的分级路径,现在教育 AI 也有了自己的版本。 然而,长期以来,这种 高频互动和个性化引导 几乎只是少数学生才能享有的「奢侈品」。 人工智能的加入正在改变这一切。AI 学伴不仅能提供全天候的回应,还能创造一个无须担心被评判的空间,让学生大胆试错、主动追问。更重要的是,它能把启 发式的交互和个性化的反馈规模化,让「因材施教」真正成为可能。 可以看到,全球科技巨头已将目光聚焦于此。从 OpenAI 到 Google,其 AI 应用界面均已部署学习板块。 如今,「AI 下半场」已成共识,应用落地正成为决定未来的关键。教育,作为关乎人类发展的根本基石,已然成为 AI 技术融合与创新的前沿阵地。 很多人可能都有过这样的经历: 课堂上,一个问题在嘴边盘旋,却因为害怕问得「太蠢」而最终选择沉默;或者,前面的内容还没听懂,老师已经跳到下一个知 识点了。 ChatGPT 学习板块。 这正是教育领域长期存在的无奈:大班授课下,个体的思考路径常常被淹没在统一的教学节奏中。教师想兼顾每一位学生的困惑,但心有余而力不足。 瑞士心理学家 Jean Piaget 提出的建构主义早已指出:知 ...
让大模型合成检查器:UIUC团队挖出Linux内核90余个长期潜伏漏洞
机器之心· 2025-09-28 00:32
Core Insights - The paper introduces KNighter, a system that transforms static analysis by synthesizing checkers using large language models (LLMs), successfully identifying 92 long-standing vulnerabilities in the Linux kernel [3][11][16] - KNighter utilizes historical patch data to distill defect patterns and repair intentions, allowing the model to generate structured, maintainable, and compilable static analysis checkers [11][21] Background and Pain Points - Traditional static analysis tools require manual rule creation, which is time-consuming and difficult to maintain, often covering only limited predefined patterns [7] - Directly scanning large codebases with LLMs poses challenges due to context limitations and high computational costs [7] Methodology - KNighter's approach involves breaking down the task of creating a static analysis checker into manageable steps, allowing the model to analyze defect patterns and program states before generating the checker framework [11] - The synthesized checkers can be integrated into continuous integration (CI) pipelines for long-term use and iterative upgrades as new patches are introduced [12][20] Experimental Results - The research team validated KNighter's effectiveness on the Linux kernel, where the synthesized checkers identified 92 vulnerabilities, with 77 confirmed by maintainers and 57 fixed, including 30 that received CVE identifiers [16] - This method is more cost-effective and stable compared to direct LLM code scanning, as the generated checkers can be reused and provide precise alerts with clear state transitions [16] Practical Recommendations - The synthesized checkers can be integrated into version control systems and CI processes, facilitating code review and evolution [19] - Organizations can trigger KNighter's pattern mining and checker generation automatically with each patch merge, gradually building a comprehensive rule library [20] - Starting with high-risk scenarios, such as resource management and error propagation, can help in generating initial seed checkers before expanding to other subsystems [20]
规范对齐时代:GPT-5 断层领先,让安全与行为边界更明晰
机器之心· 2025-09-27 06:18
Core Viewpoint - The article discusses the concept of Specification Alignment in large models, emphasizing the need for these models to adhere to both safety and behavioral specifications in various contexts, thereby ensuring user safety while meeting diverse behavioral requirements [3][9][30]. Group 1: Specification Alignment - Specification Alignment is introduced as a new concept requiring large models to comply with both safety specifications (safety-spec) and behavioral specifications (behavioral-spec) in different scenarios [3][9]. - Safety specifications define the boundaries that models must not cross, such as avoiding violent content in children's stories or refusing to generate malicious code [9][10]. - Behavioral specifications guide how models should operate, reflecting user or organizational preferences, such as including educational morals in stories or providing multiple travel plans [9][10]. Group 2: SpecBench and Evaluation - The research team developed SpecBench, the first benchmark for evaluating specification alignment, covering five application scenarios, 103 specifications, and 1500 prompts [6][15]. - A new metric, Specification Alignment Rate (SAR), was introduced to assess models' adherence to specifications, emphasizing the principle of "safety first, then utility" [16][30]. - Testing revealed that most models exhibited significant gaps in specification alignment, with GPT-5 showing a clear lead across all scenarios, attributed to OpenAI's safe-completion training [23][24]. Group 3: Test-time Deliberation - The article presents Test-time Deliberation (TTD) as a flexible approach to achieve specification alignment, allowing models to reflect on specifications during inference without altering model parameters [18][21]. - The Align3 method, part of TTD, effectively integrates safety and behavioral specifications into the reasoning process, enhancing model reliability [21][27]. - Experimental results indicate that TTD methods, including Align3, significantly improve specification alignment while maintaining lower computational costs compared to other methods [27][28]. Group 4: Future Outlook - Specification alignment is identified as a critical academic challenge and a key threshold for large models to integrate into society and industry [30]. - Future models must balance safety and practicality while adapting to increasingly diverse and personalized specifications [30]. - The ongoing development of SpecBench and methods like Align3 represents the initial steps toward achieving more capable and responsible AI systems [30][31].
OpenAI研究大模型对GDP贡献,三大行业已能代替人类,并自曝不敌Claude
机器之心· 2025-09-27 06:13
Core Viewpoint - The article discusses the introduction of GDPval, a new evaluation method by OpenAI that assesses AI model performance on economically valuable real-world tasks, indicating that AI is nearing human-level performance in various industries [1][3][22]. Group 1: Evaluation Methodology - GDPval uses GDP as a key economic indicator and extracts tasks from critical occupations in the top nine industries contributing to the GDP [3][16]. - The evaluation includes 1,320 professional tasks, with a golden open-source subset of 220 tasks, designed and reviewed by experienced professionals [18][22]. - Tasks are based on real work outcomes, ensuring the evaluation's realism and diversity compared to other benchmarks [18][19]. Group 2: Model Performance - The evaluation results show that leading models like Claude Opus 4.1 and GPT-5 are approaching or matching the quality of human experts in various tasks [4][9]. - Claude Opus 4.1 excels in aesthetic tasks, while GPT-5 performs better in accuracy-related tasks [9][10]. - Performance improvements have been significant, with task completion speed being approximately 100 times faster and costs being 100 times lower than human experts [13]. Group 3: Industry Impact - AI has reached or surpassed human-level capabilities in sectors such as government, retail, and wholesale [7]. - The early results from GDPval suggest that AI can complete some repetitive tasks faster and at a lower cost than human experts, potentially transforming the job market [21]. - OpenAI aims to democratize access to these tools, enabling workers to adapt to changes and fostering economic growth through AI integration [21]. Group 4: Future Developments - OpenAI plans to expand GDPval to include more occupations, industries, and task types, enhancing interactivity and addressing more ambiguous tasks [22]. - The ongoing improvements in the evaluation method indicate a commitment to better measure the progress of diverse knowledge work [22].
AI能「拍」好电影?五部短片亮相釜山电影节,答案出乎意料
机器之心· 2025-09-27 06:13
Core Viewpoint - The article discusses the technological advancements in AI-generated films, highlighting the successful creation of the first fully AI-generated short film "Nine Heavens" by a young team from Hong Kong, which has been recognized at the Busan International Film Festival [2][5][40]. Group 1: AI in Film Production - The team at ManyMany Creations Limited aimed to create a 15-minute narrative short film entirely generated by AI, which they successfully accomplished with "Nine Heavens" [2][3]. - "Nine Heavens" is notable for its reliance on subtle micro-expressions to convey the protagonist's emotional journey, showcasing AI's capability in narrative storytelling [5][6]. - The film was part of a larger initiative called the "Future Image Plan," which aims to explore AI's role in filmmaking [5][18]. Group 2: AI Technology and Tools - The production utilized advanced AI models from platforms like Jiemeng AI and Volcano Engine, which have significantly improved the quality and realism of AI-generated images and videos [17][18]. - The article mentions the evolution of AI tools, such as Seedream 4.0, which allows for multi-image fusion, enabling creators to generate detailed storyboards and videos from simple descriptions [23][25]. - The integration of AI in film production has led to a reduction in production time and costs, with "Nine Heavens" being produced in a fraction of the time compared to traditional methods [25][26]. Group 3: Industry Trends and Future Outlook - Major film companies, like Bona Film Group, are embracing AI technologies, establishing dedicated AI production centers to explore new creative workflows [19][20]. - The shift towards AI in filmmaking is seen as a way to democratize the industry, allowing non-professionals to create high-quality content with minimal resources [30][31]. - Despite the advancements, challenges remain in achieving consistent quality in longer scenes, indicating that human intervention is still necessary in the production process [40][47]. Group 4: Creative Freedom and Expression - AI tools have provided unprecedented creative freedom, allowing filmmakers to experiment with character designs and settings without the constraints of traditional production processes [32][33]. - The article emphasizes that while AI can generate content, the essence of storytelling and artistic expression remains rooted in human creativity and perspective [48][49].
先验+后验加持,大模型能否 hold 住推理预测的现实「溢出」?
机器之心· 2025-09-27 01:30
本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 引言 :近日,字节跳动等推出的 FutureX 动态评测基准,让大模型在答案未知、数据动态更新和闭环检验的情况下直面预测型「考卷」。这项工作在模型预测力和记忆力之 间做了区分,也探究了模型在长程推理、执行稳健性和不确定性环境下的表现。此外,大模型在财务预测、疾病评估等场景的落地效果正在优化过程中,业内研究者也在寻 找能填平推理和执行鸿沟的新机制。 目录 当推理「用兵」碰上财务预测等现实场景,模型能否稳定「指挥」从而落地?... 03 . 模型推理预测哪家强,先验后验不同路径 「各显神通」? 过往的模型预测技术在往哪些方向发力?先验记忆与后验反思机制,未来能为模型预测带来新的突破吗?... 01 FutureX 「出世」,从长程推理到现实预测大模型「顶」住了吗? 1、目前,大多数用于评估大型语言模型的基准都依赖于预先存在的、固定不变的数据集。 2、这种评估方式在衡量模型的事实性知识或在已知数据集上的简单推理能力时表现较好,但在面对动态的真实世界进行预测时,则难以考察模型真实的推理实力。 ① 静态基准通常处理的是在已有解决方案的情况下 ...
Agentic Coding表现创新高,全新KAT系列模型上榜SWE-Bench
机器之心· 2025-09-26 10:35
Core Insights - The article discusses the launch of two groundbreaking models in the Code Intelligence field by the Kuaipilot team: the open-source 32B parameter model KAT-Dev-32B and the closed-source flagship model KAT-Coder, showcasing their strong performance and capabilities in coding tasks [2][26]. Model Performance - KAT-Dev-32B achieved a 62.4% solution rate on the SWE-Bench Verified, ranking 5th among all open-source models of various sizes [2]. - KAT-Coder demonstrated an impressive 73.4% solution rate on the same benchmark, comparable to top global closed-source models [2][11]. Model Accessibility - KAT-Dev-32B is available on the Hugging Face platform for further research and development [7]. - The API key for KAT-Coder has been made available for application on the "Kuaishou Wanqing" enterprise-level model service and development platform, allowing users to access coding tools directly [7]. Training Innovations - The KAT series models underwent several innovative training phases, including Mid-Training, Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT), and large-scale Agentic Reinforcement Learning (RL) [9][12]. - Mid-Training focused on enhancing the model's capabilities related to "LLM-as-Agent," improving tool usage, multi-turn interaction, and instruction adherence [10][12]. - SFT involved collecting real demand delivery trajectories marked by human engineers to enhance end-to-end delivery capabilities [13]. - RFT introduced ground truth for trajectory exploration, improving the efficiency and stability of the reinforcement learning phase [15]. Advanced Techniques - The team implemented entropy-based tree pruning to efficiently learn from non-linear trajectory histories and maximize throughput while minimizing costs [19]. - The SeamlessFlow framework was developed to manage trajectory trees and ensure high throughput training by decoupling RL training from the agent's internal logic [21][22]. Emergent Capabilities - Post-training analysis revealed two significant emergent phenomena: a reduction in dialogue rounds by 32% compared to SFT models and the ability to call multiple tools in parallel [33][35]. - The model's efficiency preference and parallel calling capabilities were attributed to the implicit optimization pressure from the trajectory tree structure [33]. Future Prospects - The Kuaipilot team aims to explore the frontiers of code intelligence, including enhancing tool integration, expanding language support, and developing collaborative coding systems [35].