机器之心
Search documents
一文读懂鲸智百应:驱动组织进化的企业AI操作系统,让企业从「用AI」到「是AI」
机器之心· 2025-09-28 04:50
Core Insights - The article emphasizes the transformation of enterprises into "AI native organizations" through six dimensions: unified cognition, intelligent execution, decision-making hub, memory evolution, intelligent agent factory, and AI governance [1][3][23] - It highlights the limitations of traditional AI tools, which often remain as passive assistants rather than active participants in business processes [8][10] Unified Cognition - The first step towards an AI native organization is addressing the issue of scattered knowledge, enabling real-time, complete, and callable information [5] - The "intelligent knowledge hub" of Whale Intelligence allows AI to understand the entire business context, facilitating automatic report generation without manual data gathering [5] Intelligent Execution - The goal is to transition AI from a passive role to an active participant in business processes, enabling it to autonomously complete tasks [8] - Whale Intelligence's multi-agent collaborative engine allows for dynamic task orchestration and integration without disrupting existing systems, significantly reducing HR's administrative workload [8] Decision-Making Hub - AI is positioned as a strategic partner in core decision-making processes, moving away from reliance on experience-based decision-making [10][11] - The system can autonomously understand complex directives and break them down into actionable tasks, improving decision-making efficiency and accuracy [11] Memory Engine - The memory engine enables organizations to accumulate knowledge and experience over time, transforming each task execution into a learning opportunity [14][15] - This creates a knowledge compounding effect, enhancing the organization's capabilities with each completed task [15] Intelligent Agent Factory - The intelligent agent factory allows for rapid development of digital employees through low-code solutions, enabling business personnel to create tailored AI solutions without programming knowledge [17] - This flexibility supports both general and specialized business needs, ensuring that AI capabilities evolve alongside organizational requirements [17] AI Governance - The governance framework ensures that AI operations are secure, compliant, and accountable, addressing potential risks associated with AI usage [19][20] - The system includes features for permission management and audit trails, ensuring that AI actions are traceable and compliant with regulations [20] Industry Observation - Whale Intelligence distinguishes itself by positioning its solution as an "enterprise AI operating system," addressing systemic issues in AI integration within organizations [23] - This approach signifies a shift from merely using AI tools to fundamentally transforming organizations into entities that inherently possess AI capabilities [23]
新一代AI教师是什么样?学而思让它从L2「助手」跃迁至L3「老师」
机器之心· 2025-09-28 00:32
Core Viewpoint - The article discusses the evolution of AI in education, emphasizing the transition from basic assistance (L1) to more interactive and personalized teaching roles (L3), ultimately aiming for a trustworthy learning partnership between AI and students [2][10][43]. Group 1: AI's Role in Education - The integration of AI in education is seen as a frontier for innovation, with the potential to enhance personalized learning experiences [2][5]. - Traditional classroom settings often fail to address individual student needs due to large class sizes and uniform teaching methods [3][5]. - AI companions can provide constant feedback and create a judgment-free environment, allowing students to explore and ask questions freely [5][42]. Group 2: AI Teacher Levels - The proposed "AI Teacher L1-L5" framework outlines the progression of AI capabilities in education, with L1 being basic assistance and L3 representing a more integrated teaching role [10][12]. - L2 AI tools serve as effective assistants, capable of tasks like grading and providing resources, but do not engage in true teaching [14][13]. - L3 AI aims to create a closed-loop interaction, where it can observe and respond to students' thought processes in real-time, resembling a human teacher [15][21]. Group 3: Hardware and Interaction - The transition to L3 requires specialized hardware to facilitate real-time interaction, as software alone cannot achieve the necessary complexity [16][17]. - The hardware enables AI to "see" and "hear" students, allowing for a more dynamic and responsive teaching experience [18][22]. - The design of the learning machine focuses on minimizing response times to maintain student engagement and reduce anxiety during learning [19][20]. Group 4: AI's Teaching Methodology - L3 AI teachers utilize a more interactive approach, guiding students through problem-solving rather than simply providing answers [21][24]. - The AI encourages critical thinking by prompting students with questions and suggestions, fostering a deeper understanding of the material [23][25]. - The learning machine incorporates various educational tools, such as interactive models and gamified learning experiences, to enhance student engagement [29][30][31]. Group 5: Content and Knowledge Base - The effectiveness of AI teachers is supported by a robust knowledge base, including optimized models for K12 education and extensive teaching resources [37][40]. - The combination of advanced AI capabilities and high-quality educational content ensures that AI can serve as a reliable learning partner [41][42]. - The ultimate goal is to create a seamless educational experience across different learning environments, allowing for personalized education regardless of location [42][43].
让大模型合成检查器:UIUC团队挖出Linux内核90余个长期潜伏漏洞
机器之心· 2025-09-28 00:32
Core Insights - The paper introduces KNighter, a system that transforms static analysis by synthesizing checkers using large language models (LLMs), successfully identifying 92 long-standing vulnerabilities in the Linux kernel [3][11][16] - KNighter utilizes historical patch data to distill defect patterns and repair intentions, allowing the model to generate structured, maintainable, and compilable static analysis checkers [11][21] Background and Pain Points - Traditional static analysis tools require manual rule creation, which is time-consuming and difficult to maintain, often covering only limited predefined patterns [7] - Directly scanning large codebases with LLMs poses challenges due to context limitations and high computational costs [7] Methodology - KNighter's approach involves breaking down the task of creating a static analysis checker into manageable steps, allowing the model to analyze defect patterns and program states before generating the checker framework [11] - The synthesized checkers can be integrated into continuous integration (CI) pipelines for long-term use and iterative upgrades as new patches are introduced [12][20] Experimental Results - The research team validated KNighter's effectiveness on the Linux kernel, where the synthesized checkers identified 92 vulnerabilities, with 77 confirmed by maintainers and 57 fixed, including 30 that received CVE identifiers [16] - This method is more cost-effective and stable compared to direct LLM code scanning, as the generated checkers can be reused and provide precise alerts with clear state transitions [16] Practical Recommendations - The synthesized checkers can be integrated into version control systems and CI processes, facilitating code review and evolution [19] - Organizations can trigger KNighter's pattern mining and checker generation automatically with each patch merge, gradually building a comprehensive rule library [20] - Starting with high-risk scenarios, such as resource management and error propagation, can help in generating initial seed checkers before expanding to other subsystems [20]
规范对齐时代:GPT-5 断层领先,让安全与行为边界更明晰
机器之心· 2025-09-27 06:18
Core Viewpoint - The article discusses the concept of Specification Alignment in large models, emphasizing the need for these models to adhere to both safety and behavioral specifications in various contexts, thereby ensuring user safety while meeting diverse behavioral requirements [3][9][30]. Group 1: Specification Alignment - Specification Alignment is introduced as a new concept requiring large models to comply with both safety specifications (safety-spec) and behavioral specifications (behavioral-spec) in different scenarios [3][9]. - Safety specifications define the boundaries that models must not cross, such as avoiding violent content in children's stories or refusing to generate malicious code [9][10]. - Behavioral specifications guide how models should operate, reflecting user or organizational preferences, such as including educational morals in stories or providing multiple travel plans [9][10]. Group 2: SpecBench and Evaluation - The research team developed SpecBench, the first benchmark for evaluating specification alignment, covering five application scenarios, 103 specifications, and 1500 prompts [6][15]. - A new metric, Specification Alignment Rate (SAR), was introduced to assess models' adherence to specifications, emphasizing the principle of "safety first, then utility" [16][30]. - Testing revealed that most models exhibited significant gaps in specification alignment, with GPT-5 showing a clear lead across all scenarios, attributed to OpenAI's safe-completion training [23][24]. Group 3: Test-time Deliberation - The article presents Test-time Deliberation (TTD) as a flexible approach to achieve specification alignment, allowing models to reflect on specifications during inference without altering model parameters [18][21]. - The Align3 method, part of TTD, effectively integrates safety and behavioral specifications into the reasoning process, enhancing model reliability [21][27]. - Experimental results indicate that TTD methods, including Align3, significantly improve specification alignment while maintaining lower computational costs compared to other methods [27][28]. Group 4: Future Outlook - Specification alignment is identified as a critical academic challenge and a key threshold for large models to integrate into society and industry [30]. - Future models must balance safety and practicality while adapting to increasingly diverse and personalized specifications [30]. - The ongoing development of SpecBench and methods like Align3 represents the initial steps toward achieving more capable and responsible AI systems [30][31].
OpenAI研究大模型对GDP贡献,三大行业已能代替人类,并自曝不敌Claude
机器之心· 2025-09-27 06:13
Core Viewpoint - The article discusses the introduction of GDPval, a new evaluation method by OpenAI that assesses AI model performance on economically valuable real-world tasks, indicating that AI is nearing human-level performance in various industries [1][3][22]. Group 1: Evaluation Methodology - GDPval uses GDP as a key economic indicator and extracts tasks from critical occupations in the top nine industries contributing to the GDP [3][16]. - The evaluation includes 1,320 professional tasks, with a golden open-source subset of 220 tasks, designed and reviewed by experienced professionals [18][22]. - Tasks are based on real work outcomes, ensuring the evaluation's realism and diversity compared to other benchmarks [18][19]. Group 2: Model Performance - The evaluation results show that leading models like Claude Opus 4.1 and GPT-5 are approaching or matching the quality of human experts in various tasks [4][9]. - Claude Opus 4.1 excels in aesthetic tasks, while GPT-5 performs better in accuracy-related tasks [9][10]. - Performance improvements have been significant, with task completion speed being approximately 100 times faster and costs being 100 times lower than human experts [13]. Group 3: Industry Impact - AI has reached or surpassed human-level capabilities in sectors such as government, retail, and wholesale [7]. - The early results from GDPval suggest that AI can complete some repetitive tasks faster and at a lower cost than human experts, potentially transforming the job market [21]. - OpenAI aims to democratize access to these tools, enabling workers to adapt to changes and fostering economic growth through AI integration [21]. Group 4: Future Developments - OpenAI plans to expand GDPval to include more occupations, industries, and task types, enhancing interactivity and addressing more ambiguous tasks [22]. - The ongoing improvements in the evaluation method indicate a commitment to better measure the progress of diverse knowledge work [22].
AI能「拍」好电影?五部短片亮相釜山电影节,答案出乎意料
机器之心· 2025-09-27 06:13
Core Viewpoint - The article discusses the technological advancements in AI-generated films, highlighting the successful creation of the first fully AI-generated short film "Nine Heavens" by a young team from Hong Kong, which has been recognized at the Busan International Film Festival [2][5][40]. Group 1: AI in Film Production - The team at ManyMany Creations Limited aimed to create a 15-minute narrative short film entirely generated by AI, which they successfully accomplished with "Nine Heavens" [2][3]. - "Nine Heavens" is notable for its reliance on subtle micro-expressions to convey the protagonist's emotional journey, showcasing AI's capability in narrative storytelling [5][6]. - The film was part of a larger initiative called the "Future Image Plan," which aims to explore AI's role in filmmaking [5][18]. Group 2: AI Technology and Tools - The production utilized advanced AI models from platforms like Jiemeng AI and Volcano Engine, which have significantly improved the quality and realism of AI-generated images and videos [17][18]. - The article mentions the evolution of AI tools, such as Seedream 4.0, which allows for multi-image fusion, enabling creators to generate detailed storyboards and videos from simple descriptions [23][25]. - The integration of AI in film production has led to a reduction in production time and costs, with "Nine Heavens" being produced in a fraction of the time compared to traditional methods [25][26]. Group 3: Industry Trends and Future Outlook - Major film companies, like Bona Film Group, are embracing AI technologies, establishing dedicated AI production centers to explore new creative workflows [19][20]. - The shift towards AI in filmmaking is seen as a way to democratize the industry, allowing non-professionals to create high-quality content with minimal resources [30][31]. - Despite the advancements, challenges remain in achieving consistent quality in longer scenes, indicating that human intervention is still necessary in the production process [40][47]. Group 4: Creative Freedom and Expression - AI tools have provided unprecedented creative freedom, allowing filmmakers to experiment with character designs and settings without the constraints of traditional production processes [32][33]. - The article emphasizes that while AI can generate content, the essence of storytelling and artistic expression remains rooted in human creativity and perspective [48][49].
先验+后验加持,大模型能否 hold 住推理预测的现实「溢出」?
机器之心· 2025-09-27 01:30
本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 引言 :近日,字节跳动等推出的 FutureX 动态评测基准,让大模型在答案未知、数据动态更新和闭环检验的情况下直面预测型「考卷」。这项工作在模型预测力和记忆力之 间做了区分,也探究了模型在长程推理、执行稳健性和不确定性环境下的表现。此外,大模型在财务预测、疾病评估等场景的落地效果正在优化过程中,业内研究者也在寻 找能填平推理和执行鸿沟的新机制。 目录 当推理「用兵」碰上财务预测等现实场景,模型能否稳定「指挥」从而落地?... 03 . 模型推理预测哪家强,先验后验不同路径 「各显神通」? 过往的模型预测技术在往哪些方向发力?先验记忆与后验反思机制,未来能为模型预测带来新的突破吗?... 01 FutureX 「出世」,从长程推理到现实预测大模型「顶」住了吗? 1、目前,大多数用于评估大型语言模型的基准都依赖于预先存在的、固定不变的数据集。 2、这种评估方式在衡量模型的事实性知识或在已知数据集上的简单推理能力时表现较好,但在面对动态的真实世界进行预测时,则难以考察模型真实的推理实力。 ① 静态基准通常处理的是在已有解决方案的情况下 ...
Agentic Coding表现创新高,全新KAT系列模型上榜SWE-Bench
机器之心· 2025-09-26 10:35
Core Insights - The article discusses the launch of two groundbreaking models in the Code Intelligence field by the Kuaipilot team: the open-source 32B parameter model KAT-Dev-32B and the closed-source flagship model KAT-Coder, showcasing their strong performance and capabilities in coding tasks [2][26]. Model Performance - KAT-Dev-32B achieved a 62.4% solution rate on the SWE-Bench Verified, ranking 5th among all open-source models of various sizes [2]. - KAT-Coder demonstrated an impressive 73.4% solution rate on the same benchmark, comparable to top global closed-source models [2][11]. Model Accessibility - KAT-Dev-32B is available on the Hugging Face platform for further research and development [7]. - The API key for KAT-Coder has been made available for application on the "Kuaishou Wanqing" enterprise-level model service and development platform, allowing users to access coding tools directly [7]. Training Innovations - The KAT series models underwent several innovative training phases, including Mid-Training, Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT), and large-scale Agentic Reinforcement Learning (RL) [9][12]. - Mid-Training focused on enhancing the model's capabilities related to "LLM-as-Agent," improving tool usage, multi-turn interaction, and instruction adherence [10][12]. - SFT involved collecting real demand delivery trajectories marked by human engineers to enhance end-to-end delivery capabilities [13]. - RFT introduced ground truth for trajectory exploration, improving the efficiency and stability of the reinforcement learning phase [15]. Advanced Techniques - The team implemented entropy-based tree pruning to efficiently learn from non-linear trajectory histories and maximize throughput while minimizing costs [19]. - The SeamlessFlow framework was developed to manage trajectory trees and ensure high throughput training by decoupling RL training from the agent's internal logic [21][22]. Emergent Capabilities - Post-training analysis revealed two significant emergent phenomena: a reduction in dialogue rounds by 32% compared to SFT models and the ability to call multiple tools in parallel [33][35]. - The model's efficiency preference and parallel calling capabilities were attributed to the implicit optimization pressure from the trajectory tree structure [33]. Future Prospects - The Kuaipilot team aims to explore the frontiers of code intelligence, including enhancing tool integration, expanding language support, and developing collaborative coding systems [35].
IEEE TPAMI 2025 | 北京大学提出分布驱动的终身学习范式,用结构建模解决灾难性遗忘
机器之心· 2025-09-26 10:35
Core Viewpoint - The article discusses a recent research achievement in the field of artificial intelligence, specifically focusing on a new framework called DKP++ for lifelong person re-identification (LReID), which addresses the catastrophic forgetting problem in lifelong learning by enhancing memory retention of historical knowledge and improving cross-domain learning capabilities [2][3]. Research Background - Person re-identification (ReID) aims to match and associate images of the same individual across different camera views, locations, and times, with applications in surveillance, intelligent transportation, and urban safety management [3]. - The traditional ReID paradigm struggles with domain shift issues due to variations in data collection conditions, leading to inadequate adaptability in long-term dynamic environments [3]. Research Challenges - The core challenge in lifelong person re-identification is the catastrophic forgetting problem, where the model's retrieval performance for old domain data significantly decreases after learning new domain knowledge [5]. - Existing methods to mitigate forgetting, such as retaining historical samples or using knowledge distillation, face limitations related to data privacy risks, storage overhead, and model flexibility [5]. Research Motivation - The motivation behind DKP++ includes distribution-aware prototype learning to effectively retain historical knowledge without storing historical samples, and cross-domain distribution alignment to enhance the model's ability to learn new knowledge while utilizing historical information [8][10]. Method Design - DKP++ employs a distribution-aware knowledge aligning and prototyping framework, which includes: 1. Instance-level fine-grained modeling to capture local details of person instances [14]. 2. Distribution-aware prototype generation to create robust category-level prototypes that retain intra-class variation knowledge [14]. 3. Distribution alignment to bridge the feature distribution gap between new and old data [14]. 4. Prototype-based knowledge transfer to guide model learning using generated prototypes and labeled new data [14]. Experimental Analysis - The experiments utilized two typical training domain sequences and five widely used ReID datasets, evaluating the model's knowledge retention and generalization capabilities [16]. - DKP++ demonstrated an improvement of 5.2%-7% in average performance on known domains and 4.5%-7.7% in overall generalization performance on unseen domains compared to existing methods [17]. - The model showed higher historical knowledge retention and faster performance growth in unseen domains as the number of learned domains increased [20]. Technical Innovations - DKP++ introduces innovative designs focusing on distribution prototype modeling and representation, as well as sample alignment-guided prototype knowledge transfer to overcome distribution gaps between new and old domain data [23]. Future Outlook - The research suggests potential improvements in areas such as distribution alignment using larger models, active forgetting mechanisms to eliminate redundant knowledge, and multi-modal lifelong learning capabilities to enhance perception in complex environments [23].
京东AI「结果」:深度应用已成当下,万亿生态瞄准未来
机器之心· 2025-09-26 10:35
Core Insights - JD's AI model "JoyAI" has transitioned into deep industry applications, showcasing its capabilities across various sectors and daily life [2][3][31] - The company emphasizes that understanding industry scenarios is crucial for leading in AI applications, moving beyond mere technical advancements [33][34] Product Launches - JD has upgraded its AI model brand to "JoyAI," covering a range from 3 billion to 750 billion parameters, and introduced three major AI products: JoyAI LiveHuman, JoyAI LiveTTS, and the 京犀 App [6][10][11] - The 京犀 App is positioned as a next-generation shopping and lifestyle service platform, capable of understanding user needs and facilitating transactions through voice commands [11][13][14] - "他她它" is JD's first digital assistant product, designed to provide a wide range of services and engage users in a more human-like interaction [15][16] Technological Advancements - JoyAI's architecture includes innovations such as sparse MOE training and self-competitive algorithms, enhancing reasoning speed by 1.8 times compared to traditional methods [7][9] - The model achieved a score of 76.3 on the Rbench0924 evaluation, ranking first in China and second globally for reasoning capabilities [9] Industry Applications - JD's AI is being integrated into various sectors, including retail, logistics, industrial, and healthcare, enhancing efficiency and trust in supply chain operations [21][22][27] - The new AI architecture "Oxygen" aims to revolutionize e-commerce by providing personalized shopping experiences through advanced recommendation systems [24][27] Strategic Vision - JD's approach combines self-developed technology with investments and ecosystem partnerships to penetrate the embodied intelligence field, focusing on practical applications rather than just technological prowess [20][31] - The company plans to invest significantly over the next three years to build a trillion-scale AI ecosystem, emphasizing sustainable development and real value creation for industries [38]