Workflow
Agent能力
icon
Search documents
DeepSeek-V3.1 发布,官方划重点:Agent、Agent、Agent!
Founder Park· 2025-08-21 08:16
编者荐语: DeepSeek V3.1 上架 2 天后,官方终于发了详细的介绍文档。这里简单划重点: 混合推理模型 上下文拓展至 128K Agent 能力增强,这次的重点 思考模型和非思考模型价格统一,目前看来还有下降区间 以下文章来源于赛博禅心 ,作者金色传说大聪明 赛博禅心 . 拜AI古佛,修赛博禅心 DEEPSEEK V3.1 正式发布 混合推理架构 思考/非思考模式合一 更高思考效率 更少 Token,更快响应 更强 Agent 能力 工具使用与智能体任务提升 MODEL UPDATES 核心架构与使用 混合推理架构: 单一模型支持思考与非思考双模式。 2025年08月21日 北京 • deepseek-chat 对应非思考模式。 • deepseek-reasoner 对应思考模式。 | | | R1-0528 | | --- | --- | --- | | Browsecomp | 30.0 | 8.9 | | Browsecomp_zh | 49.2 | 35.7 | | HLE | 29.8 | 24.8 | | xbench-DeepSearch | 71.2 | 55.0 | | Fra ...
DeepSeek-V3.1发布:更高效思考、更强Agent能力、更长上下文
生物世界· 2025-08-21 08:00
1. 混合推理架构:思考模式 & 非思考模式自由切换 编辑丨王多鱼 排版丨水成文 刚刚 , DeepSeek 正式发布了 DeepSeek - V3.1 。这一升级版包含以下主要变化: DeepSeek-V3.1 首次引入 混合推理架构 ,用户可以在"深度思考"模式和 "非思考"模式之间自由切换: 思考模式 (DeepSeek-Reasoner) :适用于需要深度推理的任务,如数学计算、复杂逻辑分析等,推理效率更高。 非思考模式 (DeepSeek-Chat) :适用于日常对话、信息查询等轻量级任务,响应更迅速。 在官方 App 和网页端,用户可通过"深度思考"按钮,一键切换模式,体验更智能的交互方式! 2. 更强的 Agent 能力:编程、搜索大幅提升 DeepSeek-V3.1 通过 Post-Training 优化,大幅提升了工具使用和智能体任务的表现: 编程智能体 (SWE & Terminal Bench) :在代码修复 (SWE) 和命令行终端任务 (Terminal Bench) 中,表现优于前代模型,所需轮数更少,效率更高! 混合推理架构 :一个模型同时支持思考模式与非思考模式; 更高的 思考 ...
DeepSeek-V3.1正式发布
第一财经· 2025-08-21 07:53
作者 | 一财资讯 8月21日,据DeepSeek官方公众号消息,DeepSeek-V3.1正式发布,本次升级包含以下主要变 化: 2025.08. 21 在9月6日前,所有API服务仍按原价格政策计费。 微信编辑 | 小羊 第 一 财 经 持 续 追 踪 财 经 热 点 。 若 您 掌 握 公 司 动 态 、 行 业 趋 势 、 金 融 事 件 等 有 价 值 的 线 索 , 欢 迎 提 供 。 专 用 邮 箱 : bianjibu@yicai.com 混合推理架构 :一个模型同时支持思考模式与非思考模式; 更高的 思考效率 :相比DeepSeek-R1-0528,DeepSeek-V3.1-Think能在更短时间 内给出答案; 更强的Agent能力 :通过Post-Training优化,新模型在工具使用与智能体任务中的表 现有较大提升。 官方App与网页端模型已同步升级为DeepSeek-V3.1。用户可以通过"深度思考"按钮,实现思考模 式与非思考模式的自由切换。 DeepSeek API也已同步升级,deepseek-chat对应非思考模式,deepseek-reasoner对应思考模 式,且上下文均 ...
官宣!DeepSeek-V3.1 发布,API调用价格低至0.5元/百万Tokens
Xin Lang Ke Ji· 2025-08-21 07:05
据悉,本次V3.1升级包含以下主要变化: 新浪科技讯 8月21日下午消息,DeepSeek今日发布 DeepSeek-V3.1,宣布将于北京时间 2025 年 9 月 6 日 凌晨起,对 DeepSeek 开放平台 API 接口调用价格进行调整。 其中,输入价格为,0.5元/百万 tokens (缓存命中) ,4元 /百万 tokens (缓存未命中) 。输出价格为 12元 /百万 tokens ,该价格于2025 年 9月6日 00:00 起生效。 3,更强的 Agent 能力:通过 Post-Training 优化,新模型在工具使用与智能体任务中的表现有较大提 升。 目前,官方 App 与网页端模型已同步升级为 DeepSeek-V3.1。用户可以通过"深度思考"按钮,实现思考 模式与非思考模式的自由切换。(文猛) 责任编辑:杨赐 1,混合推理架构:一个模型同时支持思考模式与非思考模式。 2,更高的思考效率:相比DeepSeek-R1-0528,DeepSeek-V3.1-Think 能在更短时间内给出答案。 ...
大模型专题:2025年大模型智能体开发平台技术能力测试研究报告
Sou Hu Cai Jing· 2025-08-14 15:48
Core Insights - The report evaluates the technical capabilities of four major AI model development platforms: Alibaba Cloud's Bailian, Tencent Cloud's Intelligent Agent Development Platform, Kouzi, and Baidu Intelligent Cloud Qianfan, focusing on RAG capabilities, workflow capabilities, and agent capabilities [1][7][8]. RAG Capability Testing - RAG capability testing assesses knowledge enhancement mechanisms, including multi-modal knowledge processing, task complexity adaptation, and interaction mechanism completeness [7][8]. - In text question answering, all platforms demonstrated high accuracy, with over 80% accuracy in multi-document responses, although some platforms showed stability issues during API calls [20][21]. - Baidu Intelligent Cloud Qianfan exhibited stable performance in complex query scenarios for structured data, while Tencent Cloud achieved 100% refusal for out-of-knowledge-base questions [21][23]. - The platforms showed differences in handling refusal and clarification, with Tencent Cloud providing 100% refusals for non-knowledge-base questions [21][22]. Workflow Capability Testing - Workflow capability testing focuses on dynamic parameter extraction, exception rollback, intent recognition, and fault tolerance [35][36]. - The end-to-end accuracy for workflow processes ranged from 61.5% to 93.3%, with Tencent Cloud leading in intent recognition accuracy at 100% [36][37]. - The platforms demonstrated basic usability in workflow systems, but there is room for improvement in complex information processing [38][39]. Agent Capability Testing - Agent capability testing evaluates the ability to call tools, focusing on intent understanding, operational coordination, feedback effectiveness, and mechanism completeness [44][45]. - All platforms achieved high single-tool call completion rates (83%-92%), but multi-tool collaboration and prompt calling showed potential for improvement [48][50]. - Tencent Cloud's Intelligent Agent Development Platform excelled in tool call success rates due to its robust ecosystem and process optimization [49][50].
全球知名Agent应用Perplexity CEO点赞阿里千问
news flash· 2025-07-24 02:56
Core Insights - Perplexity CEO Aravind Srinivas praised Alibaba's open-source Qwen3-Coder, highlighting its impressive achievements and stating that "open source has exploded" [1] - Qwen3-Coder demonstrates top-tier agent capabilities, outperforming the US Claude4 model in multiple agent capability metrics, including SWE-Bench Multilingual, Aider-Polyglot, Spider2, and Mind2Web [1] - The API pricing for Qwen3-Coder is significantly lower than that of Claude, averaging only one-third of Claude's cost [1]
阿里开源最强AI编程模型Qwen3-Coder,性能比肩Claude4 | 钛快讯
Tai Mei Ti A P P· 2025-07-23 00:01
Core Insights - Alibaba has launched the Qwen3-Coder AI programming model, which is now the leading open-source model globally, surpassing proprietary models like GPT-4.1 and competing with Claude4 [1][3]. Model Specifications - Qwen3-Coder utilizes a mixture of experts (MoE) architecture with a total of 480 billion parameters, activating 35 billion parameters, and supports a context length of 256K tokens, expandable to 1 million [2][3]. Performance Metrics - The model was pre-trained on 7.5 trillion data points, with 70% of the data focused on coding tasks. It has shown superior performance in agent capabilities, surpassing GPT-4.1 in benchmarks like WebArena and BFCL [3]. - In the SWE-Bench evaluation, Qwen3-Coder achieved the best results among open-source models, comparable to Claude4 [3]. Practical Applications - Qwen3-Coder significantly enhances programming efficiency, reducing the time for tasks like code writing, completion, and bug fixing from hours to minutes. It also lowers the entry barrier for non-programmers, enabling "Vibe Coding" where complex simulations can be generated with simple commands [4]. - The model is available on platforms like MagicDock and HuggingFace, with over 20 million downloads, making it the most popular open-source programming model globally [4]. Industry Adoption - Major companies such as FAW Group, China Petroleum, China Construction Bank, Ping An Group, China Southern Airlines, and Xiaopeng Motors have integrated the Qwen AI programming model into their operations [4].
AI动态跟踪系列(六):OpenAIo3、豆包新品首发,关注原生Agent与多模态推理
Ping An Securities· 2025-04-17 13:10
Investment Rating - The industry investment rating is "Outperform the Market" [1][38]. Core Insights - OpenAI's latest models, o3 and o4-mini, introduce significant advancements in image reasoning and agent capabilities, enhancing the AI programming ecosystem [3][4]. - The competition in the global large model field remains intense, with a strong emphasis on native agent capabilities and multimodal reasoning [34]. - The domestic AI computing power market is expected to see increased acceptance and market share for Chinese AI computing solutions due to ongoing global trade tensions [34]. Summary by Sections OpenAI's New Models - OpenAI released o3 and o4-mini, which are touted as the most intelligent models to date, featuring breakthroughs in image reasoning and agent capabilities [3][4]. - The o3 model has set new state-of-the-art benchmarks in coding, mathematics, and visual perception tasks, outperforming its predecessor o1 by 20% in error rates on complex tasks [5][7]. - The o4-mini model is optimized for fast and cost-effective reasoning, excelling in non-STEM tasks and data science [5]. Doubao 1.5 Model - Doubao 1.5 has reached or is close to the top tier globally in reasoning tasks across mathematics, coding, and science, with enhanced visual understanding capabilities [17][21]. - The Doubao APP, based on the Doubao 1.5 model, can perform "thinking while searching," providing detailed recommendations based on user needs [24][27]. - Doubao's daily token usage has surged to over 12.7 trillion, indicating significant growth and market penetration [18]. Investment Recommendations - The report suggests focusing on AI applications in enterprise services, programming, and office automation, as well as on domestic AI computing power companies [34]. - Recommended stocks in AI applications include companies like Fanwei Network and Kingdee International, while AI computing power recommendations include companies like Haiguang Information and Inspur Information [34].