智能体系统
Search documents
都是TOP人才!跑遍全球,和机器之心共聚AI学术顶会
机器之心· 2025-12-23 09:36
2025 年,AI 依然在加速奔跑。从多模态大模型到智能体系统的演进,从基础理论的突破到产业应用的深化,技术的每一次跃迁,都在重塑未来的轮廓。在海量 学术成果爆发的背景下,单纯的阅读已难以追赶技术的迭代速度,我们笃信——再强大的算法,也需要人与人的连接;再前沿的突破,也需要面对面的对话。 今年,带着这份相信,我们出发了。从北京的四季轮转到江南的桂香满庭,从新加坡的星洲夜语到维也纳的夏风微拂,从温哥华的学术静谧到圣地亚哥的海边星 光……我们围 绕 ICLR、CVPR、ACL、ICML、IROS、EMNLP、NeurIPS 等 AI 学术会议,跨越 8 座城市,落地 11 场活动。 在时差交替的版图上,我们找到了共同的频率,写下了这些属于 2025 的记忆与数字: 2025,精彩回顾 从论文的深度解读,到人才晚宴上的热烈交谈,"论文分享会"与"人才 Meetup"两大系列活动,贯穿全年,覆盖海内外,旨在打造一个 有温度、有深度、也有价值 的 AI 交流生态圈: 2026,继续出发 旧章已谱,新篇待书。2025 年的圆满收官,是 2026 年更精彩旅程的起点。我们已经初步规划了覆盖 ICLR、CVPR、ACL、IC ...
Xiaomi MiMo-V2-Flash开源:能力比肩标杆闭源模型Claude 4.5 Sonnet
Feng Huang Wang· 2025-12-17 10:26
Group 1 - Xiaomi officially announced the open-source release of Xiaomi MiMo-V2-Flash, a MoE model with a total parameter count of 309 billion (15 billion activated), achieving top 2 in global open-source model benchmarks [1] - The model features innovations such as Hybrid attention architecture and multi-layer MTP inference acceleration, resulting in a code capability comparable to the closed-source model Claude 4.5 Sonnet, but at only 2.5% of its inference cost and with a 2x increase in generation speed [1] - Xiaomi MiMo-V2-Flash outperformed DeepSeek V3.2 and K2-Thinking in most evaluation benchmarks, reducing parameter count by 50% to 67%, and achieving low cost and high speed, with preliminary capabilities to simulate the world [1] Group 2 - The next generation of intelligent agent systems is envisioned not merely as "language simulators" but as true "intelligent agents" that understand and coexist with the human world [2] - There is a shift in agent execution capabilities from merely "answering questions" to "completing tasks," incorporating memory, reasoning, autonomous planning, decision-making, and execution abilities [2] - Unified multimodal perception is essential for understanding the physical world, which will enhance integration with smart devices like glasses [2]
加入小米一个多月后,“AI才女”罗福莉完成首秀
新华网财经· 2025-12-17 05:43
来源:财联社、新京报 视频来源:中新经纬 12月17日,在小米2025小米人车家全生态合作伙伴大会上,Xiaomi MiMO大模型负责人罗福莉完成入职后的小米首秀,并正式发布和开 源最新MoE大模型MiMo-V2-Flash。 罗福莉 表示,该模型具备超强基座模型潜能,在世界级评估榜单中排到了全球开源模型的TOP2,具备低成本高速度的特点,其成本在低 于DeepseekV3.2的情况下,推理速度是其3倍。 罗福莉表示,下一代智能体系统不是一个"语言模拟器",而是一个真正理解我们世界,并与之共存的"智能体"。罗福莉进一步解释,"智能体"要有两个能 力,一是Agent执行,从"回答问题"到"完成任务",包括记忆、推理、自主规划、决策、执行;二是Omni 感知,统一多模态感知(为理解物理世界打基 础)嵌入眼镜等智能终端,融入日常工作流。 罗福莉被誉为"95后AI才女",曾入职阿里达摩院,后任职幻方量化、DeepSeek并成为DeepSeek-V2关键开发者。2025年11月起罗福莉 担任小米MiMo大模型团队负责人。 茅台、五粮液同日官宣! 苹果闪送官宣iPhone免运费 N # 啊 _ | ■ i 关注" 新华网 ...
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
3 6 Ke· 2025-10-24 12:45
Core Insights - AgentFlow introduces a new paradigm for online reinforcement learning, enhancing the reasoning capabilities of agent systems through real-time optimization and collaboration among specialized agents [1][11][14]. Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning tasks, 14.5% in mathematical reasoning, and 4.1% in scientific reasoning [4][19][21]. - The performance of AgentFlow surpasses that of larger models, including GPT-4o and Llama3.1-405B, demonstrating that effective system design can outperform sheer model size [21][25]. System Architecture - The architecture of AgentFlow consists of four specialized agents: a planner for task analysis and tool selection, an executor for tool invocation, a verifier for evaluating intermediate results, and a generator for synthesizing final outputs [11][13][14]. - The system employs a shared memory design that facilitates collaboration and reduces error propagation in multi-step reasoning processes [7][14]. Learning Mechanism - The on-policy optimization of the planner within the agent interaction flow is crucial for adapting to environmental changes and feedback, leading to a robust and self-evolving reasoning process [13][14][22]. - The Flow-GRPO algorithm addresses the challenges of multi-turn credit assignment in reinforcement learning, enhancing training efficiency and stability in complex reasoning tasks [15][19]. Research Findings - The study reveals that online learning in real interaction environments is essential for achieving efficient reasoning, as opposed to offline supervised learning, which can lead to performance declines [22][25]. - AgentFlow's training allows the system to autonomously discover new tool combinations and usage patterns, enhancing its problem-solving capabilities [25][29]. Future Implications - AgentFlow represents a shift from seeking a single comprehensive model to enabling agents to adapt and learn continuously within a system, highlighting the potential of collaborative intelligence in addressing complex tasks [29].
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
量子位· 2025-10-24 03:53
Core Insights - The article discusses the introduction of AgentFlow, a new paradigm in online reinforcement learning that enhances the reasoning capabilities of intelligent systems, outperforming models like GPT-4o and Llama3.1-405B [1][4][23]. Group 1: AgentFlow Overview - AgentFlow consists of a team of specialized agents including a planner, executor, verifier, and generator, which collaborate through shared memory to optimize decision-making in real-time [1][14][18]. - The Flow-GRPO method allows for on-policy optimization of the planner agent, enabling adaptive decision-making based on environmental changes and feedback from other agents [19][16]. Group 2: Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning, 14.5% in math reasoning, and 4.1% in scientific reasoning [3][25][27]. - The model's performance surpasses that of larger models, demonstrating that effective system design and training methods can be more impactful than simply increasing model size [27]. Group 3: Learning Mechanisms - The article emphasizes the importance of "learning in the flow," indicating that online learning in real interactive environments is crucial for achieving efficient reasoning [28][29]. - AgentFlow's architecture allows for rapid error correction and improved task planning through real-time training, enhancing overall system performance [30][29]. Group 4: Innovations and Findings - The system autonomously discovers new solution paths, such as combining different search tools to enhance information retrieval, showcasing its ability to adapt and innovate [33]. - AgentFlow maintains performance improvements without significantly increasing the average reasoning steps, indicating efficient handling of complex tasks [35]. Group 5: Future Implications - The article concludes that AgentFlow presents a novel approach to intelligent agent training, advocating for systems that adapt and learn continuously rather than relying on a single comprehensive model [37][38]. - Despite the distance from research to practical application, the potential for Agentic AI remains significant, suggesting a promising future for intelligent systems [39].
微软研究院杨玉庆:Agent 的注意力系统|Attention
3 6 Ke· 2025-09-05 03:42
Core Insights - The article discusses TriangleMix, a structural optimization method for attention mechanisms in large models, which addresses the computational bottleneck during the prefill stage while maintaining performance and accuracy [2][5][10] - TriangleMix allows for a hierarchical sparse attention architecture that significantly reduces latency and memory consumption, making it suitable for long-context tasks [8][10][36] Technical Overview - TriangleMix employs a layered attention strategy, using standard dense attention in the first 16 layers and switching to a triangle-shaped mask in the subsequent layers, which reduces computational complexity from O(N²) to O(N) [5][6] - The method has been tested on models like Llama-3.1-8B-Instruct, showing a kernel latency reduction from 750ms to 49ms, achieving a speedup of 15.3x and a decrease in time to first token (TTFT) by 12%-32% [10][9] Performance Metrics - Experimental results indicate that TriangleMix retains 99.7% of the original performance while applying the triangle attention in the majority of the deep layers [8][10] - The method demonstrates significant reductions in latency and memory usage with almost no loss in accuracy across various benchmark tasks [10][9] Broader Implications - The research emphasizes the importance of viewing attention mechanisms within the larger context of agent systems, training mechanisms, and task structures, rather than as isolated components [12][26] - The ongoing work at Microsoft Research focuses on optimizing agent-native systems, which aim to enhance the efficiency and effectiveness of AI applications, particularly for users with specific needs [15][67]
OpenAI女CEO太狠了,智商148,GPT-5才是真印钞机
3 6 Ke· 2025-08-14 03:11
Core Insights - GPT-5 is positioned as a significant advancement in AI technology, achieving an IQ of 148 and surpassing human genius levels, particularly excelling in mathematics and programming tests [3][5][13] - OpenAI's focus with GPT-5 is not just on intelligence but on monetization strategies, particularly targeting the vast number of free users to convert them into revenue-generating customers [15][16][17] Group 1: Performance and Recognition - GPT-5 has demonstrated exceptional performance in various benchmark tests, including setting new records in mathematics and showing notable improvements in programming tests [5][13] - The model's capabilities have received recognition from Nvidia, indicating its potential in reasoning and programming applications [13] Group 2: Monetization Strategy - OpenAI aims to monetize GPT-5 by leveraging its "router" technology, which can dynamically allocate resources based on user intent and query complexity, thus optimizing operational costs and enhancing performance [20][24][26] - The router system allows for a significant increase in user engagement, with daily active users of the reasoning model surging sevenfold among free users and nearly 3.5 times among paid users [26] Group 3: User Engagement and Growth - ChatGPT's user base has rapidly expanded, now surpassing major platforms like Twitter, Reddit, and WhatsApp, and is approaching the likes of Instagram and Facebook [19] - The growth in user engagement is attributed to the router's ability to provide tailored responses, enhancing the overall user experience and increasing the likelihood of monetization through indirect payments [17][19] Group 4: Future Commercialization Potential - OpenAI's strategic direction includes integrating advertising and affiliate models into the ChatGPT experience, allowing the platform to generate revenue without compromising user experience [34][36] - The router's capability to assess the commercial value of queries positions ChatGPT to evolve into a "super app," facilitating transactions and generating revenue through commissions on sales [35][51][58]
周鸿祎:不会再拍短剧,气质实在不符
Zheng Quan Shi Bao· 2025-08-06 10:05
Group 1 - The core viewpoint of the article is that Zhou Hongyi, the founder of 360, has decided not to produce short dramas anymore, stating that they do not align with his temperament [2][7] - Zhou Hongyi's first short drama, "Reigniting the Life of a Hidden Hacker," aired at the end of 2024 and sparked significant discussion due to its unique blend of a love story and an AI entrepreneurship narrative [4] - The short drama features a storyline where a wealthy father's tech company is intertwined with his son's romantic interest in a cleaning lady, who ultimately aids in the development of an AI product [4] Group 2 - Zhou Hongyi previously clarified that his interest in short dramas was business-related, not personal, after being misinterpreted by the media [5] - Following the announcement of his short drama, the National Radio and Television Administration required stricter management of "wealthy boss" micro-dramas, leading to public reactions directed at Zhou Hongyi [6] - At the ISC AI2025 conference, Zhou Hongyi expressed a shift in focus towards collaboration on animated-style short dramas, highlighting advancements in their AI tool, Nano AI, which has recently upgraded to a Level 4 intelligent system [7]
360宣布纳米AI升级为“多智能体蜂群”,可一句话生成大片
Xin Lang Ke Ji· 2025-08-02 14:17
Core Insights - 360 Group has officially announced the rebranding of Nano AI to "Multi-Agent Swarm," marking its advancement to L4 level intelligent systems, which enables a shift from "individual operation" to "group collaboration" [1] - The evolution of intelligent agents has gone through three stages: L1 chat assistants, L2 low-code workflow agents, and L3 autonomous planning agents, with the new L4 level allowing for collaborative task execution among multiple agents [1] - The new swarm collaboration framework allows over 50,000 L3 reasoning agents to work together to complete complex tasks, such as producing a 10-minute movie, with the system capable of executing over 1,000 steps continuously for 2 hours [1] Application and Efficiency - Nano AI has launched over 10 types of multi-agent swarms, covering various scenarios including video production, content creation, industry research, e-commerce, and travel planning [2] - The platform has developed the first "one-sentence blockbuster" multi-agent swarm, which can complete tasks that previously took at least 2 hours in just 20 minutes, utilizing L1 to L3 agents for scriptwriting, storyboarding, visuals, audio, music, and editing [2]
OpenAI发布ChatGPT Agent:部分能力超越人类,但做电子表格仍不如人类
Di Yi Cai Jing· 2025-07-18 05:13
Core Insights - OpenAI has launched ChatGPT Agent, which integrates Operator and Deep Research capabilities, allowing it to perform complex multi-step tasks and interact with various tools [1][2][9] - Despite improvements, ChatGPT Agent scored 45.5% in spreadsheet editing tasks, significantly lower than the human score of 71.3% [6] Group 1: ChatGPT Agent Features - ChatGPT Agent can perform tasks such as checking calendars, analyzing competitors, and converting screenshots to editable formats [1] - The system combines capabilities of visual browsing, text processing, code execution, and API access [2] Group 2: Performance Metrics - In various benchmark tests, ChatGPT Agent achieved an accuracy of 41.6% in interdisciplinary expert tests, outperforming other models [3] - In data science tasks, ChatGPT demonstrated high accuracy with 89.9% in analysis and 85.5% in modeling [3] Group 3: Future Developments - OpenAI plans to continue iterating on the Agent, with a focus on releasing GPT-5, which is anticipated to enhance the foundational model's capabilities [9] - Developers expect the Agent to reach 90% accuracy in complex tool usage by the end of the year, indicating a move towards commercial viability [9]