Workflow
多智能体框架
icon
Search documents
32B逆袭GPT-5.2:首个端到端GPU编程智能体框架StitchCUDA问世
机器之心· 2026-03-05 03:54
本文作者包括明尼苏达大学的李世阳(共同第一作者),张子健(共同第一作者),Winson Chen,罗越波,洪明毅,丁才文。 现有的 LLM 自动化 CUDA 方法大多只能优化单个 Kernel,面对完整的端到端 GPU 程序(如整个 VisionTransformer 推理)往往束手无策。 本文中, StitchCUDA 提出了一个根本性的问题转向:从优化单个 Kernel,到生成完整的端到端 GPU 程序 。通过多智能体协作框架与基于 Rubric Reward 的 Agentic RL,StitchCUDA 在 KernelBench Level 3 端到端任务上实现了 90% 的成功率和 1.50× 的平均加速比 ,分别比多智能体基线高出 1.72× 和 RL 模型基线高出 2.73×。 论文标题:StitchCUDA: An Automated Multi-Agents End-to-End GPU Programming Framework with Rubric-based Agentic Reinforcement Learning CUDA 代码的性能对当今模型训练与推理至关重要。近年来,基于 ...
东吴证券:端云协同驱动AI入口重塑 端侧模型牵引硬件重构
智通财经网· 2026-02-27 07:07
Core Insights - The evaluation system for cloud-based large models is shifting from purely capability metrics to the actual completion of tasks, with a focus on code capabilities and multi-agent systems by leading overseas companies since 2026 [1] - The dual capability stack of "fast interaction + long reasoning" is expected to become a significant evolution direction for general-purpose agents in the near future [2] - The collaboration between edge models and cloud models is emphasized, with edge models handling high-frequency, lightweight tasks locally, while heavier reasoning tasks are processed in the cloud [3] Cloud Models - The expansion of capability boundaries and cost restructuring are occurring simultaneously in cloud models, with a focus on task completion [1] - Leading companies are intensively laying out code capabilities and multi-agent systems to enhance performance [2] Code Models - The reasoning demands in the era of intelligent agents are evolving along two optimization directions: long-chain complex reasoning and real-time interaction [2] - Low-latency agents like OpenAI's Codex-Spark prioritize interactive AI experiences, while agents like Claude4.6 focus on improving success rates in complex tasks through increased context length [2] Edge Models - The evolution of edge models is characterized by efficiency optimization and capability compression under a collaborative framework with cloud models [3] - Multi-modal capabilities are becoming a key competitive point for edge models, with a focus on achieving zero-latency interactions [3] Hardware Reconstruction - The industry is expected to focus on high-frequency demand scenarios in 2024, with a shift towards multi-modal creative capabilities by 2025 [4] - Key components for edge models are undergoing upgrades in memory and power consumption to enhance user experience [4] Future Outlook - Next-generation flagship SoC platforms like Qualcomm's Snapdragon 8 Elite Gen 6 are anticipated to provide enhanced hardware support for the complexity and multi-modality of edge AI functions [5]
电子行业深度报告:端云协同驱动AI入口重塑与硬件范式重构
Soochow Securities· 2026-02-27 05:50
证券研究报告·行业深度报告·电子 [Table_Tag] [Table_Summary] 投资要点 ◼ 端侧模型牵引硬件重构:算力、存力与散热协同升级。从整机 AI 功能 看,2024 年行业整体仍以高频刚需场景为切入点,重点围绕图像消除、 文本摘要等低门槛功能;进入 2025 年,厂商明显加速向多模态创作能 力延展,覆盖语音、生成式图像等更复杂交互形态,并进一步向操作系 统底层渗透。整机 AI 竞争正从功能数量比拼,转向多模态体验与系统 级整合深度的综合较量。在整机级 AI 能力向多模态等方向升级的背景 下,端侧核心部件也正围绕内存与功耗等制约端侧体验的关键变量上进 行新一轮升级。在存储侧,三星 LPDDR6 产品在支持更高数据传输速率 和内存带宽的情况下,还从电路架构到电源管理进行了系统性重构,使 LPDDR6 在保持高速性能的同时,实现较上一代约 21%的能效提升。在 散热侧,三星于 2025 年 12 月 19 日发布 Exynos 2600 芯片,首次在移 动 SoC 中引入 High-k EMC 材料优化热传输路径,使热阻较 Exynos 2500 降低约 16%。在重载场景(如游戏与端侧 AI ...
像开发软件一样造世界,Agent2World来了,把世界模型做成可运行的符号环境
机器之心· 2026-02-02 06:14
让模型真正 " 能行动 ",往往需要一个可执行、可验证的符号世界模型(Symbolic World Model):它不是抽象的文字描述,而是能被规划器或执行器直接调用的 形式化定义 —— 例如 PDDL 领域 / 问题,或可运行的环境代码 / 模拟器。一旦世界被 "写成可运行的规则",我们就能在同一套约束下进行推演、测试与复现:模 型不再停留在 "会说",而是能回答 "如果我这样做,会发生什么",并用执行结果检验自己是否真的理解了这个世界。 问题在于,现有自动生成路线普遍陷入三重困局:脚本式工作流、知识边界封闭、表示覆盖单一。许多方法仍沿用固定的 "生成 — 修复" 脚本,并以解析 / 规则匹 配 / 固定检查集等静态校验为主:它们或许能修语法与格式,却常常抓不住只有在交互执行中才暴露的行为级错误(例如状态更新不一致、目标不可达、奖励机制 失效)。与此同时,当任务规格含糊、缺失关键规则或背景常识时,系统缺少主动检索与补全机制,只能依赖模型记忆 "猜"。更关键的是,既有研究往往只覆盖 一种世界模型表示(只做 PDDL,或只做可执行代码),导致同一任务难以在不同符号表达之间共享验证闭环与改进经验,限制了方法的通用 ...
全球第二、国内第一!钉钉发布DeepResearch多智能体框架,已在真实企业部署
机器之心· 2025-11-12 03:17
Core Insights - The article emphasizes the increasing demand for efficient and precise information retrieval and decision support in the digital economy, highlighting the necessity of a "Deep Research System" that can extract key knowledge from vast heterogeneous data sources and perform multi-step reasoning [2][3]. Challenges in Existing Research Systems - Existing research systems face challenges in adapting to real-world enterprise environments, including static architectures, insufficient integration of private datasets, lack of automated evaluation and continuous optimization, and inadequate long-term memory and dynamic evolution mechanisms [5]. - Many systems rely on static prompts or fixed scripts, making them unable to learn and optimize from real-world feedback [5]. - Current research-oriented intelligent agents struggle to securely and efficiently integrate enterprise private data and lack dynamic optimization capabilities [5]. - There is a notable absence of automated evaluation and continuous optimization mechanisms in systems like Anthropic's Claude Research Workbench, hindering sustained improvement in deployment environments [5]. Dingtalk-DeepResearch Framework - Dingtalk-DeepResearch is introduced as a unified multi-agent intelligent framework designed for complex and evolving enterprise tasks, integrating deep research generation, heterogeneous table reasoning, and multi-modal report synthesis [3][10]. - The framework has achieved high scores in international deep research evaluations, ranking second globally and first domestically in the DeepResearch Bench [7]. - It has been successfully deployed in real enterprise scenarios such as manufacturing and supply chain, demonstrating industry-leading accuracy and robustness [10]. Framework Architecture - The Dingtalk-DeepResearch framework features a layered design, providing a comprehensive and flexible intelligent hub for enterprises [12]. - The framework includes specialized agents for deep research, table data processing, and data analysis, along with a core that integrates key functions such as context compression, reasoning, long-term memory, and human-machine collaboration [14]. - A unified data layer consolidates knowledge graphs, databases, and multi-modal datasets, facilitating diverse enterprise and industry data retrieval [14]. Adaptive Intelligence Mechanisms - The framework employs a multi-stage document reinforcement learning approach to enhance document generation capabilities, utilizing a reward model trained on approximately 800,000 labeled samples [17][18]. - An entropy-guided, memory-aware online learning mechanism allows the intelligent agent to adapt continuously to evolving tasks without frequent fine-tuning of the underlying LLM parameters [21]. - The system's table question-answering module effectively handles complex and heterogeneous table data, ensuring precise and interpretable reasoning [22][23]. Continuous Optimization and Evaluation - DingAutoEvaluator serves as a core driver for continuous evolution, transforming the development paradigm into a fully evaluation-driven approach [25]. - The platform continuously monitors cognitive uncertainty peaks in model outputs, prioritizing uncertain cases for expert annotation [25]. - A unified measurement framework evaluates various aspects of the framework's outputs, providing real-time signals for ongoing optimization [31]. Practical Applications and Case Studies - The article presents multiple real-world case studies demonstrating Dingtalk-DeepResearch's end-to-end capabilities in complex table data parsing, retrieval, reasoning, and multi-modal document generation [27]. - In one case, the system accurately processed a complex table containing inventory and logistics information, showcasing its robustness and practical utility [28]. - Another case involved the system answering production-related queries by effectively breaking down complex questions into manageable steps [30][32]. Future Outlook - Dingtalk-DeepResearch is set to be deployed in enterprise workflows and will soon be available as a service through Dingtalk, providing a robust solution for complex task management [44]. - The framework's adaptive capabilities, large-scale document reinforcement learning, and structured table reasoning position it as a significant advancement in enterprise-level adaptive intelligence [45].
论文秒变海报!开源框架PosterAgent一键生成顶会级学术Poster
量子位· 2025-06-03 07:59
Core Viewpoint - The article introduces PosterAgent, a tool designed to convert academic papers into visually appealing posters, highlighting its efficiency and effectiveness compared to existing methods like GPT-4o [2][18]. Group 1: PosterAgent Overview - PosterAgent can transform a 22-page paper into an editable ".pptx" poster for only $0.0045, significantly reducing token usage by 87% compared to GPT-4o [2][36]. - The tool is built upon the Paper2Poster framework, which establishes the first academic poster evaluation standard, addressing gaps in long-context and multi-modal compression assessments [4][18]. Group 2: Evaluation Metrics - Paper2Poster includes 100 pairs of AI-related papers and their corresponding posters, covering various subfields like computer vision (19%), natural language processing (17%), and reinforcement learning (10%) [20]. - The evaluation metrics focus on four dimensions: visual quality, text coherence, overall assessment, and PaperQuiz, which simulates communication between authors and readers [22][23]. Group 3: PosterAgent Components - The PosterAgent framework consists of three key components: a parser for extracting key content, a planner for organizing text and visuals, and a painter-commenter for generating and refining the poster layout [28][29]. - The system employs a top-down design approach to ensure coherence and alignment of content [25]. Group 4: Performance Comparison - In comparative tests, PosterAgent achieved the highest graphic relevance and visual similarity to human-designed posters, scoring an average of 3.72 when evaluated by a visual language model (VLM) [31][32]. - While GPT-4o-image had the highest visual similarity, it recorded the lowest coherence, indicating that its outputs may appear attractive but lack textual clarity [30][31]. Group 5: Cost Efficiency - PosterAgent demonstrated significant cost efficiency, requiring only 101.1K and 47.6K tokens for different variants, translating to a cost of $0.55 (based on GPT-4o) or $0.0045 (based on Qwen) per poster [36].