机器之心
Search documents
先验+后验加持,大模型能否 hold 住推理预测的现实「溢出」?
机器之心· 2025-09-27 01:30
本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 引言 :近日,字节跳动等推出的 FutureX 动态评测基准,让大模型在答案未知、数据动态更新和闭环检验的情况下直面预测型「考卷」。这项工作在模型预测力和记忆力之 间做了区分,也探究了模型在长程推理、执行稳健性和不确定性环境下的表现。此外,大模型在财务预测、疾病评估等场景的落地效果正在优化过程中,业内研究者也在寻 找能填平推理和执行鸿沟的新机制。 目录 当推理「用兵」碰上财务预测等现实场景,模型能否稳定「指挥」从而落地?... 03 . 模型推理预测哪家强,先验后验不同路径 「各显神通」? 过往的模型预测技术在往哪些方向发力?先验记忆与后验反思机制,未来能为模型预测带来新的突破吗?... 01 FutureX 「出世」,从长程推理到现实预测大模型「顶」住了吗? 1、目前,大多数用于评估大型语言模型的基准都依赖于预先存在的、固定不变的数据集。 2、这种评估方式在衡量模型的事实性知识或在已知数据集上的简单推理能力时表现较好,但在面对动态的真实世界进行预测时,则难以考察模型真实的推理实力。 ① 静态基准通常处理的是在已有解决方案的情况下 ...
Agentic Coding表现创新高,全新KAT系列模型上榜SWE-Bench
机器之心· 2025-09-26 10:35
Core Insights - The article discusses the launch of two groundbreaking models in the Code Intelligence field by the Kuaipilot team: the open-source 32B parameter model KAT-Dev-32B and the closed-source flagship model KAT-Coder, showcasing their strong performance and capabilities in coding tasks [2][26]. Model Performance - KAT-Dev-32B achieved a 62.4% solution rate on the SWE-Bench Verified, ranking 5th among all open-source models of various sizes [2]. - KAT-Coder demonstrated an impressive 73.4% solution rate on the same benchmark, comparable to top global closed-source models [2][11]. Model Accessibility - KAT-Dev-32B is available on the Hugging Face platform for further research and development [7]. - The API key for KAT-Coder has been made available for application on the "Kuaishou Wanqing" enterprise-level model service and development platform, allowing users to access coding tools directly [7]. Training Innovations - The KAT series models underwent several innovative training phases, including Mid-Training, Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT), and large-scale Agentic Reinforcement Learning (RL) [9][12]. - Mid-Training focused on enhancing the model's capabilities related to "LLM-as-Agent," improving tool usage, multi-turn interaction, and instruction adherence [10][12]. - SFT involved collecting real demand delivery trajectories marked by human engineers to enhance end-to-end delivery capabilities [13]. - RFT introduced ground truth for trajectory exploration, improving the efficiency and stability of the reinforcement learning phase [15]. Advanced Techniques - The team implemented entropy-based tree pruning to efficiently learn from non-linear trajectory histories and maximize throughput while minimizing costs [19]. - The SeamlessFlow framework was developed to manage trajectory trees and ensure high throughput training by decoupling RL training from the agent's internal logic [21][22]. Emergent Capabilities - Post-training analysis revealed two significant emergent phenomena: a reduction in dialogue rounds by 32% compared to SFT models and the ability to call multiple tools in parallel [33][35]. - The model's efficiency preference and parallel calling capabilities were attributed to the implicit optimization pressure from the trajectory tree structure [33]. Future Prospects - The Kuaipilot team aims to explore the frontiers of code intelligence, including enhancing tool integration, expanding language support, and developing collaborative coding systems [35].
IEEE TPAMI 2025 | 北京大学提出分布驱动的终身学习范式,用结构建模解决灾难性遗忘
机器之心· 2025-09-26 10:35
近日,北京大学王选计算机研究所周嘉欢助理教授与彭宇新教授合作在人工智能重要国际期刊 IEEE TPAMI 发布一项最新的研究成果: DKP++(Distribution- aware Knowledge Aligning and Prototyping for Non-exemplar Lifelong Person Re-Identification) 。该工作针对终身学习中的灾难性遗忘问题,提出分布建模引导 的知识对齐与原型建模框架,不仅有效增强了对历史知识的记忆能力,也提升了模型的跨域学习能力。 本文的第一作者为北京大学北京大学王选计算机研究所助理教授周嘉欢,通讯作者为北京大学王选计算机研究所教授彭宇新。目前该研究已被 IEEE TPAMI 接 收,相关代码已开源。 行人重识别(Person Re-Identification, ReID)旨在针对跨相机视角、跨地点、跨时间等场景中,基于视觉特征实现对同一行人图像的匹配与关联。该技术在多摄像 头监控、智能交通系统、城市安全管理以及大规模图像视频检索等实际场景中具有广泛应用价值。然而,在现实环境中,由于采集地点、拍摄设备和时间条件的 不断变化,行人图像的分 ...
京东AI「结果」:深度应用已成当下,万亿生态瞄准未来
机器之心· 2025-09-26 10:35
Core Insights - JD's AI model "JoyAI" has transitioned into deep industry applications, showcasing its capabilities across various sectors and daily life [2][3][31] - The company emphasizes that understanding industry scenarios is crucial for leading in AI applications, moving beyond mere technical advancements [33][34] Product Launches - JD has upgraded its AI model brand to "JoyAI," covering a range from 3 billion to 750 billion parameters, and introduced three major AI products: JoyAI LiveHuman, JoyAI LiveTTS, and the 京犀 App [6][10][11] - The 京犀 App is positioned as a next-generation shopping and lifestyle service platform, capable of understanding user needs and facilitating transactions through voice commands [11][13][14] - "他她它" is JD's first digital assistant product, designed to provide a wide range of services and engage users in a more human-like interaction [15][16] Technological Advancements - JoyAI's architecture includes innovations such as sparse MOE training and self-competitive algorithms, enhancing reasoning speed by 1.8 times compared to traditional methods [7][9] - The model achieved a score of 76.3 on the Rbench0924 evaluation, ranking first in China and second globally for reasoning capabilities [9] Industry Applications - JD's AI is being integrated into various sectors, including retail, logistics, industrial, and healthcare, enhancing efficiency and trust in supply chain operations [21][22][27] - The new AI architecture "Oxygen" aims to revolutionize e-commerce by providing personalized shopping experiences through advanced recommendation systems [24][27] Strategic Vision - JD's approach combines self-developed technology with investments and ecosystem partnerships to penetrate the embodied intelligence field, focusing on practical applications rather than just technological prowess [20][31] - The company plans to invest significantly over the next three years to build a trillion-scale AI ecosystem, emphasizing sustainable development and real value creation for industries [38]
学三年动画被AI秒杀,OpenAI要拍电影,好莱坞不敢买账
机器之心· 2025-09-26 08:26
Core Viewpoint - OpenAI is positioning itself to disrupt Hollywood by demonstrating that generative AI can produce animated films more quickly and cost-effectively than traditional methods [21][26]. Group 1: OpenAI's Animation Project - OpenAI is backing an animated film titled "Critterz," which aims to showcase the capabilities of generative AI in film production [21]. - The film's production timeline is targeted to be reduced from the traditional three years to approximately nine months, with a budget of under $30 million, significantly lower than typical animation costs [23]. - The film is set to premiere globally in 2026, with hopes of debuting at the Cannes Film Festival [25]. Group 2: Technology and Collaboration - The production involves collaboration with human artists for character sketches, which will be integrated with OpenAI's tools, including the latest GPT-5 and image generation models [23][28]. - OpenAI's approach combines human creativity with AI assistance, aiming to mitigate copyright concerns that have arisen in the industry [28]. Group 3: Industry Implications - If successful, "Critterz" could accelerate the adoption of AI technologies in Hollywood, lowering creative barriers for more creators [26]. - Despite the potential benefits, the entertainment industry remains cautious about fully embracing AI due to fears of job displacement for actors and writers, as well as intellectual property issues [27][28].
创智&交大发现AI能动性新规律, 78样本胜GPT5实现软件+科研自动化
机器之心· 2025-09-26 08:26
Core Insights - The article emphasizes the emergence of "Agency" as a core competency in AI systems, highlighting the shift from passive tools to proactive collaborators in various industries [3][11][46] - The research introduces the "Agency Efficiency Principle," suggesting that the development of agency capabilities relies more on strategic data construction rather than merely increasing data volume [5][44][52] Group 1: Definition and Importance of Agency - Agency is defined as the ability of AI systems to autonomously identify problems, formulate hypotheses, and execute solutions through interaction with their environment [3][11] - The significance of agency lies in its potential to transform AI from a passive assistant into an active participant capable of handling complex tasks in knowledge work [3][11] Group 2: Research Findings and Methodology - The LIMI research demonstrates that a model can achieve superior agency performance using only 78 samples, outperforming models trained on 10,000 samples by 53.7% [4][14][38] - The study focuses on two core areas: collaborative programming and scientific research workflows, which require comprehensive agency capabilities [16][17] Group 3: Data Construction and Efficiency - LIMI's approach to data construction emphasizes the importance of high-quality, strategically curated samples over sheer quantity, challenging traditional beliefs about data scaling [5][44][40] - The training data for LIMI exhibited an average length of 42.4k tokens, significantly exceeding typical training sample lengths, which enhances the complexity and richness of learning signals [28][31] Group 4: Experimental Results and Performance - In the AgencyBench evaluation, LIMI achieved an average score of 73.5%, significantly surpassing all baseline models, including GLM-4.5, which scored 45.1% [37][41] - The findings indicate that strategic data construction can lead to more effective capability transfer than simply increasing the size of training datasets [38][40] Group 5: Implications for the AI Industry - LIMI's discoveries could revolutionize the AI industry by lowering the barriers to entry for smaller teams and shifting the focus from data collection to high-quality sample design [47][48] - The approach has broad commercial potential, reducing development costs and time while improving performance in specific applications [50][51]
视远·正心明智——机器之心2025年度AI榜单正式启动
机器之心· 2025-09-26 03:31
Core Viewpoint - The article emphasizes the ongoing advancements in artificial intelligence (AI) as of 2025, highlighting the rapid iteration of large models and the emergence of new applications, particularly in China, where domestic models are approaching or surpassing international standards [2][3][4]. Summary by Sections AI Development Trends - In 2025, AI continues to evolve with significant breakthroughs in large models, including GPT-4.5, GPT-5, and Genie 3, enhancing capabilities in understanding, generation, and reasoning [3][4]. - The advancements in model capabilities are leading to new application forms, such as automated code generation and multi-step task completion in intelligent agents [4]. Domestic AI Landscape - China's AI development in 2025 is marked by domestic large models not only matching but also leading in performance compared to international counterparts, with a strong open-source ecosystem [4]. - Recent rankings show that all top 15 open-source AI models on the Design Arena leaderboard are from China [4]. Recognition of AI Leaders - The article outlines a curated list of top companies and products in AI for 2025, recognizing those with significant technological strength and innovation [6][7][8][9][10][11][12][13]. - Categories include: - **Top 10 Companies with Strong Technical Strength**: Companies that have made long-term investments in AI technology and maintain a leading position in the field [7]. - **Top 20 AI Leading Companies**: Firms that have established comprehensive operational capabilities and competitive advantages in AI technology and applications [8]. - **Top 20 Best Large Models**: Recognizing representative and powerful foundational models in the domestic market [9]. - **Top 20 Best Large Model Products**: Highlighting valuable new products and applications based on large models [10]. - **Top 10 Leading Companies in Embodied Intelligence**: Companies with systematic technology layouts and continuous innovation in the field of embodied intelligence [12]. - **Top 10 Leading Companies in ScienceAI**: Firms focusing on the intersection of AI and other scientific disciplines, driving industry development through innovative solutions [13].
NeurIPS Spotlight|运动遮挡都不怕,0先验、一段视频精准预测相机参数
机器之心· 2025-09-26 00:32
这让作者重新思考: 有没有一种方法可以从动态场景视频准确、高效、稳定地预测相机参数,不受前景运动物体的影响,且仅用一段 RGB 视频作为监督呢? 方法概览 为了实现这一目的,他们提出了 ROS-Cam (RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes), 已被 NeurIPS 2025 接收为 Spotlight 论文 。 代码即将开源。 论文一作李放,美国伊利诺伊大学香槟分校 (UIUC) 博二学生,研究方向为 4D 视觉定位、重建/新视角合成以及理解。第二作者为美国伊利诺伊大学香槟分校博四 学生张昊。通讯作者是 Narendra Ahuja, 美国伊利诺伊大学香槟分校 Donald Biggar Willet Professor(Ming-hsuan Yang, Jia-bin Huang 博士导师)。这篇工作为作 者在博一期间完成。 研究背景 在三维重建、NeRF 训练、视频生成等任务中,相机参数是不可或缺的先验信息。传统的 SfM/SLAM 方法(如 COLMAP)在静态场景下表现优异,但在存在人车 运动、物体遮挡 ...
ChatGPT新功能Pulse,GPT-5主动给你推消息,大家玩得停不下来
机器之心· 2025-09-26 00:32
Core Viewpoint - OpenAI has introduced a new feature called "Pulse" for ChatGPT, which aims to provide personalized updates and proactive assistance to users, marking a significant step towards practical application of AI technology [4][5][20]. Group 1: Feature Overview - The "Pulse" feature allows ChatGPT to conduct daily research based on user interactions, providing customized content updates each morning [4][7]. - Users can link their Gmail and Google Calendar to enhance the relevance of suggestions, such as drafting meeting agendas or reminding about important dates [8]. - The updates are presented in a visual card format, making it easy for users to browse and access detailed information [4][14]. Group 2: User Interaction and Feedback - Users can provide feedback on the content through a simple like or dislike mechanism, which will help refine the personalization of the Pulse feature over time [12]. - The feature is designed to be user-friendly, allowing users to manage the types of content they receive and to request specific information [11][15]. Group 3: Future Implications - OpenAI envisions that the proactive nature of Pulse could change how users consume news and interact with social media, potentially paving the way for future advertising opportunities [17]. - The company aims to expand the functionality of ChatGPT to perform more meaningful tasks, with Pulse being just the beginning of this evolution [20].
AI视频进入蒸汽机时代
机器之心· 2025-09-25 23:54
Core Viewpoint - The AI video generation industry has seen a significant advancement with Baidu's Steam Engine 2.0, which introduces the capability to generate long videos without time limitations, enhancing creative flexibility and efficiency [2][3][37]. Group 1: Technological Advancements - Baidu's Steam Engine 2.0 has upgraded its capabilities to generate long videos, breaking the previous 5-second and 10-second limitations, allowing for the creation of videos of any length [3][4]. - The introduction of interactive demand expression allows creators to update prompts in real-time during video generation, enhancing the creative process [3][4]. - Unlike traditional methods that require complex operations and often result in a lack of coherence, Baidu's approach utilizes streaming generation technology, enabling users to generate videos with just one image and a prompt [4][6]. Group 2: Commercial Applications - The advancements in long video generation technology provide new tools and commercial value for content creators, allowing for high-quality video production in a shorter time frame and at a lower cost [6][19]. - The Steam Engine 2.0 can produce videos that maintain high visual quality and detail, making it suitable for various industries, including advertising and film [6][19][33]. Group 3: Challenges and Solutions - The AI video generation industry faces challenges such as long context memory retention and high computational costs associated with generating longer videos [22][25]. - Baidu's solution involves introducing long-term consistency modeling and dynamic buffer management to address these challenges, allowing for real-time adjustments during video generation [26][27][32]. - The use of historical reference frames and noise management techniques enhances the continuity and quality of generated videos, mitigating issues related to memory and visual consistency [28][30][32]. Group 4: Market Impact - The release of Baidu's Steam Engine 2.0 is expected to reshape the interaction between humans and media, moving from passive consumption to collaborative creation, potentially leading to new artistic forms and business models [22][37]. - The technology's ability to produce high-quality, coherent long videos positions it as a significant player in the AI video generation market, catering to both professional and amateur creators [33][37].