Workflow
智能体系统
icon
Search documents
Xiaomi MiMo-V2-Flash开源:能力比肩标杆闭源模型Claude 4.5 Sonnet
Feng Huang Wang· 2025-12-17 10:26
凤凰网科技讯12月17日,小米官方宣布Xiaomi MiMo-V2-Flash开源。据悉,该模型是小米专为极致推理 效率自研的总参数309B(激活15B)的MoE模型,通过引入Hybrid注意力架构创新及多层MTP推理加速, 在多个Agent测评基准上进入全球开源模型Top2。代码能力比肩标杆闭源模型Claude4.5Sonnet,但推理 价格仅为其2.5%且生成速度提升至2倍。 今日上午的2025小米"人车家全生态"合作伙伴大会上,Xiaomi MiMo大模型负责人罗福莉也介绍了该模 型构建的细节。她称,Xiaomi MiMo-V2-Flash在大部分评测基准上超过了DeepSeek V3.2和K2- Thinking,同时对比参数量减少了二分之一至三分之二,在全球大致处于相同水位的顶尖模型速度和成 本象限里,MiMo-V2-Flash实现了低成本、高速度,已初步具备模拟世界的能力。 罗福莉称,在她看来,下一代智能体系统不是一个"语言模拟器",而是一个真正理解人类世界,并与之 共存的"智能体"。Agent执行能力方面,应实现从"回答问题"到"完成任务"的转变,具备记忆、推理、 自主规划、决策、执行等能力。从 ...
加入小米一个多月后,“AI才女”罗福莉完成首秀
新华网财经· 2025-12-17 05:43
来源:财联社、新京报 视频来源:中新经纬 12月17日,在小米2025小米人车家全生态合作伙伴大会上,Xiaomi MiMO大模型负责人罗福莉完成入职后的小米首秀,并正式发布和开 源最新MoE大模型MiMo-V2-Flash。 罗福莉 表示,该模型具备超强基座模型潜能,在世界级评估榜单中排到了全球开源模型的TOP2,具备低成本高速度的特点,其成本在低 于DeepseekV3.2的情况下,推理速度是其3倍。 罗福莉表示,下一代智能体系统不是一个"语言模拟器",而是一个真正理解我们世界,并与之共存的"智能体"。罗福莉进一步解释,"智能体"要有两个能 力,一是Agent执行,从"回答问题"到"完成任务",包括记忆、推理、自主规划、决策、执行;二是Omni 感知,统一多模态感知(为理解物理世界打基 础)嵌入眼镜等智能终端,融入日常工作流。 罗福莉被誉为"95后AI才女",曾入职阿里达摩院,后任职幻方量化、DeepSeek并成为DeepSeek-V2关键开发者。2025年11月起罗福莉 担任小米MiMo大模型团队负责人。 茅台、五粮液同日官宣! 苹果闪送官宣iPhone免运费 N # 啊 _ | ■ i 关注" 新华网 ...
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
3 6 Ke· 2025-10-24 12:45
Core Insights - AgentFlow introduces a new paradigm for online reinforcement learning, enhancing the reasoning capabilities of agent systems through real-time optimization and collaboration among specialized agents [1][11][14]. Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning tasks, 14.5% in mathematical reasoning, and 4.1% in scientific reasoning [4][19][21]. - The performance of AgentFlow surpasses that of larger models, including GPT-4o and Llama3.1-405B, demonstrating that effective system design can outperform sheer model size [21][25]. System Architecture - The architecture of AgentFlow consists of four specialized agents: a planner for task analysis and tool selection, an executor for tool invocation, a verifier for evaluating intermediate results, and a generator for synthesizing final outputs [11][13][14]. - The system employs a shared memory design that facilitates collaboration and reduces error propagation in multi-step reasoning processes [7][14]. Learning Mechanism - The on-policy optimization of the planner within the agent interaction flow is crucial for adapting to environmental changes and feedback, leading to a robust and self-evolving reasoning process [13][14][22]. - The Flow-GRPO algorithm addresses the challenges of multi-turn credit assignment in reinforcement learning, enhancing training efficiency and stability in complex reasoning tasks [15][19]. Research Findings - The study reveals that online learning in real interaction environments is essential for achieving efficient reasoning, as opposed to offline supervised learning, which can lead to performance declines [22][25]. - AgentFlow's training allows the system to autonomously discover new tool combinations and usage patterns, enhancing its problem-solving capabilities [25][29]. Future Implications - AgentFlow represents a shift from seeking a single comprehensive model to enabling agents to adapt and learn continuously within a system, highlighting the potential of collaborative intelligence in addressing complex tasks [29].
AI在线强化学习“边做边学”,斯坦福团队让7B小模型性能飙升,甚至超越GPT-4o
量子位· 2025-10-24 03:53
Core Insights - The article discusses the introduction of AgentFlow, a new paradigm in online reinforcement learning that enhances the reasoning capabilities of intelligent systems, outperforming models like GPT-4o and Llama3.1-405B [1][4][23]. Group 1: AgentFlow Overview - AgentFlow consists of a team of specialized agents including a planner, executor, verifier, and generator, which collaborate through shared memory to optimize decision-making in real-time [1][14][18]. - The Flow-GRPO method allows for on-policy optimization of the planner agent, enabling adaptive decision-making based on environmental changes and feedback from other agents [19][16]. Group 2: Performance Metrics - AgentFlow, based on the Qwen-2.5-7B-Instruct model, shows significant improvements across various benchmark tests: 14.9% in search tasks, 14.0% in agentic reasoning, 14.5% in math reasoning, and 4.1% in scientific reasoning [3][25][27]. - The model's performance surpasses that of larger models, demonstrating that effective system design and training methods can be more impactful than simply increasing model size [27]. Group 3: Learning Mechanisms - The article emphasizes the importance of "learning in the flow," indicating that online learning in real interactive environments is crucial for achieving efficient reasoning [28][29]. - AgentFlow's architecture allows for rapid error correction and improved task planning through real-time training, enhancing overall system performance [30][29]. Group 4: Innovations and Findings - The system autonomously discovers new solution paths, such as combining different search tools to enhance information retrieval, showcasing its ability to adapt and innovate [33]. - AgentFlow maintains performance improvements without significantly increasing the average reasoning steps, indicating efficient handling of complex tasks [35]. Group 5: Future Implications - The article concludes that AgentFlow presents a novel approach to intelligent agent training, advocating for systems that adapt and learn continuously rather than relying on a single comprehensive model [37][38]. - Despite the distance from research to practical application, the potential for Agentic AI remains significant, suggesting a promising future for intelligent systems [39].
微软研究院杨玉庆:Agent 的注意力系统|Attention
3 6 Ke· 2025-09-05 03:42
Core Insights - The article discusses TriangleMix, a structural optimization method for attention mechanisms in large models, which addresses the computational bottleneck during the prefill stage while maintaining performance and accuracy [2][5][10] - TriangleMix allows for a hierarchical sparse attention architecture that significantly reduces latency and memory consumption, making it suitable for long-context tasks [8][10][36] Technical Overview - TriangleMix employs a layered attention strategy, using standard dense attention in the first 16 layers and switching to a triangle-shaped mask in the subsequent layers, which reduces computational complexity from O(N²) to O(N) [5][6] - The method has been tested on models like Llama-3.1-8B-Instruct, showing a kernel latency reduction from 750ms to 49ms, achieving a speedup of 15.3x and a decrease in time to first token (TTFT) by 12%-32% [10][9] Performance Metrics - Experimental results indicate that TriangleMix retains 99.7% of the original performance while applying the triangle attention in the majority of the deep layers [8][10] - The method demonstrates significant reductions in latency and memory usage with almost no loss in accuracy across various benchmark tasks [10][9] Broader Implications - The research emphasizes the importance of viewing attention mechanisms within the larger context of agent systems, training mechanisms, and task structures, rather than as isolated components [12][26] - The ongoing work at Microsoft Research focuses on optimizing agent-native systems, which aim to enhance the efficiency and effectiveness of AI applications, particularly for users with specific needs [15][67]
OpenAI女CEO太狠了,智商148,GPT-5才是真印钞机
3 6 Ke· 2025-08-14 03:11
Core Insights - GPT-5 is positioned as a significant advancement in AI technology, achieving an IQ of 148 and surpassing human genius levels, particularly excelling in mathematics and programming tests [3][5][13] - OpenAI's focus with GPT-5 is not just on intelligence but on monetization strategies, particularly targeting the vast number of free users to convert them into revenue-generating customers [15][16][17] Group 1: Performance and Recognition - GPT-5 has demonstrated exceptional performance in various benchmark tests, including setting new records in mathematics and showing notable improvements in programming tests [5][13] - The model's capabilities have received recognition from Nvidia, indicating its potential in reasoning and programming applications [13] Group 2: Monetization Strategy - OpenAI aims to monetize GPT-5 by leveraging its "router" technology, which can dynamically allocate resources based on user intent and query complexity, thus optimizing operational costs and enhancing performance [20][24][26] - The router system allows for a significant increase in user engagement, with daily active users of the reasoning model surging sevenfold among free users and nearly 3.5 times among paid users [26] Group 3: User Engagement and Growth - ChatGPT's user base has rapidly expanded, now surpassing major platforms like Twitter, Reddit, and WhatsApp, and is approaching the likes of Instagram and Facebook [19] - The growth in user engagement is attributed to the router's ability to provide tailored responses, enhancing the overall user experience and increasing the likelihood of monetization through indirect payments [17][19] Group 4: Future Commercialization Potential - OpenAI's strategic direction includes integrating advertising and affiliate models into the ChatGPT experience, allowing the platform to generate revenue without compromising user experience [34][36] - The router's capability to assess the commercial value of queries positions ChatGPT to evolve into a "super app," facilitating transactions and generating revenue through commissions on sales [35][51][58]
周鸿祎:不会再拍短剧,气质实在不符
Zheng Quan Shi Bao· 2025-08-06 10:05
Group 1 - The core viewpoint of the article is that Zhou Hongyi, the founder of 360, has decided not to produce short dramas anymore, stating that they do not align with his temperament [2][7] - Zhou Hongyi's first short drama, "Reigniting the Life of a Hidden Hacker," aired at the end of 2024 and sparked significant discussion due to its unique blend of a love story and an AI entrepreneurship narrative [4] - The short drama features a storyline where a wealthy father's tech company is intertwined with his son's romantic interest in a cleaning lady, who ultimately aids in the development of an AI product [4] Group 2 - Zhou Hongyi previously clarified that his interest in short dramas was business-related, not personal, after being misinterpreted by the media [5] - Following the announcement of his short drama, the National Radio and Television Administration required stricter management of "wealthy boss" micro-dramas, leading to public reactions directed at Zhou Hongyi [6] - At the ISC AI2025 conference, Zhou Hongyi expressed a shift in focus towards collaboration on animated-style short dramas, highlighting advancements in their AI tool, Nano AI, which has recently upgraded to a Level 4 intelligent system [7]
360宣布纳米AI升级为“多智能体蜂群”,可一句话生成大片
Xin Lang Ke Ji· 2025-08-02 14:17
Core Insights - 360 Group has officially announced the rebranding of Nano AI to "Multi-Agent Swarm," marking its advancement to L4 level intelligent systems, which enables a shift from "individual operation" to "group collaboration" [1] - The evolution of intelligent agents has gone through three stages: L1 chat assistants, L2 low-code workflow agents, and L3 autonomous planning agents, with the new L4 level allowing for collaborative task execution among multiple agents [1] - The new swarm collaboration framework allows over 50,000 L3 reasoning agents to work together to complete complex tasks, such as producing a 10-minute movie, with the system capable of executing over 1,000 steps continuously for 2 hours [1] Application and Efficiency - Nano AI has launched over 10 types of multi-agent swarms, covering various scenarios including video production, content creation, industry research, e-commerce, and travel planning [2] - The platform has developed the first "one-sentence blockbuster" multi-agent swarm, which can complete tasks that previously took at least 2 hours in just 20 minutes, utilizing L1 to L3 agents for scriptwriting, storyboarding, visuals, audio, music, and editing [2]
OpenAI发布ChatGPT Agent:部分能力超越人类,但做电子表格仍不如人类
Di Yi Cai Jing· 2025-07-18 05:13
Core Insights - OpenAI has launched ChatGPT Agent, which integrates Operator and Deep Research capabilities, allowing it to perform complex multi-step tasks and interact with various tools [1][2][9] - Despite improvements, ChatGPT Agent scored 45.5% in spreadsheet editing tasks, significantly lower than the human score of 71.3% [6] Group 1: ChatGPT Agent Features - ChatGPT Agent can perform tasks such as checking calendars, analyzing competitors, and converting screenshots to editable formats [1] - The system combines capabilities of visual browsing, text processing, code execution, and API access [2] Group 2: Performance Metrics - In various benchmark tests, ChatGPT Agent achieved an accuracy of 41.6% in interdisciplinary expert tests, outperforming other models [3] - In data science tasks, ChatGPT demonstrated high accuracy with 89.9% in analysis and 85.5% in modeling [3] Group 3: Future Developments - OpenAI plans to continue iterating on the Agent, with a focus on releasing GPT-5, which is anticipated to enhance the foundational model's capabilities [9] - Developers expect the Agent to reach 90% accuracy in complex tool usage by the end of the year, indicating a move towards commercial viability [9]
OpenAI发布ChatGPT Agent
第一财经· 2025-07-18 00:10
Core Viewpoint - OpenAI has launched ChatGPT Agent, which integrates multiple capabilities into a unified intelligent system, combining website interaction, information integration, and deep conversational abilities [1] Group 1 - ChatGPT Agent features a multi-tool integration capability [1] - The system merges Operator's website interaction ability, Deep Research's information integration, and ChatGPT's deep dialogue capabilities [1]