推理 - filings, earnings calls, financial reports, news

DeepSeek-Prover-V2-671B

AI数学推理

不到15万元！清华90后团队发布“褐蚁”一体机，已支持阿里最新Qwen3模型｜钛媒体AGI

DeepSeek-Prover-V2-671B

Tai Mei Ti A P P· 2025-04-30 15:09

行云集成电路创始人、CEO季宇 4月30日消息，钛媒体AGI获悉，清华90后创立的北京行云集成电路有限公司（简称"行云集成电路"）宣布，推出全新的一体机产品"褐蚁"，仅需最高15万元就可以跑满血版DeepSeek R1/V3大模型，并且对话速度达到了20token/s。今天下午，行云集成电路创始人、CEO季宇对钛媒体AGI表示，目前"褐蚁"一体机已经支持阿里最新发布的Qwen3系列开源大模型，包括顶配版Qwen3-235B-A22B。具体来说，"褐蚁"一体机有三款不同的配置：最高性价比的"超大杯"褐蚁HY90，搭载双路AMD EPYC 9355服务器、24条 48G 6400M频率内存和NV 5090D计算卡，支持FP8、INT4两种数据精度，在FP8精度下跑满血版DS能达到21token/s的对话速度，在INT4精度下则能达到28token/s，最高支持128K的上下文，售价14.9万元；此外，行云集成电路还将推出"大杯"褐蚁HY70、"中杯"褐蚁HY50两个配置版本。 | 型号 | 福盛 HY90 | 褐蚁 HY70 | 褐蚁 HY50 | | --- | --- | --- | --- | ...

从论文中积累复现 R1 的 insight

理想TOP2· 2025-04-30 13:04

以下文章来源于刘聪NLP ，作者周星星，恢复了 PPO 的原始目标，采用蒙特卡罗回报估计优势，并设置无偏基线，从而有效避免了优化偏差，在提升令牌效率的同时，还能维持模型的推理性能。 4. 推理能力的提升是渐进的，没有明显的"顿悟时刻" 6. 避免"长度作弊"需自然扩展响应。刘聪NLP . NLP刘聪，如货币般流通！这里的刘聪，不会rapper，只发paper！长期关注AIGC前沿内容！还写过两本书：ChatGPT原理与实战、大型语言模型实战指南！欢迎来讨论AI！上篇 R1复现小记：在业务场景的两类NLP任务上有显著效果提到在业务场景中复现 DeepSeek-R1，也简单记录下最近阅读一些论文过程中积累的 insight。 [1]Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning [2]An Empirical Study on Eliciting and Improving R1-like Reasoning Models [3]Understanding R1-Zero-Like Training: ...

全球最强开源AI大模型诞生：中国研发，成本只有Deepseek的30%

Xin Lang Cai Jing· 2025-04-30 11:28

众所周知，自从OpenAI的ChatGPT发布之后，全球就进入了千模大战。而自从Deeseek推出之后，这些大模型们，又掀起了开源高潮，因为大家发现，开源的大模型，更能够得到大家的使用。但与此同时，在AI大模型方面，也有两个方向，一个就是OpenAI们，那就是大力出奇迹，狂堆GPU 卡，用算力来堆出高性能AI。毕竟像OpenAI、马斯克的AI们，它们又有钱，又能买到最强的GPU卡，没必要没苦硬吃，堆显卡就是了。而另外一个方向，则是像Deepseek一样，钱不多，且显卡也受限，只有"四两拨千斤"，用最少的显卡，办最大的事，做出最强的性能。所以Deepseek打的华尔街是溃不成军，因为用的显卡少，性能却最强。自从Deepseek推出，国内就进行了一大波的国产GPU替代，因为大家发现不需要英伟达最强大的显卡，也可以部署强大的模型，一度打破了OpenAI的神话，也打破了英伟达的算力泡沫。但近日，又产一国产大模型，甩出了王炸，因为它的成本更低，但性能却超过了OpenAI-o1模型，也超过了Deepseek-R1等，登顶全球第一。这个模型，就是阿里通义千问大模型 Qwen3（简称千问 3），并 ...

AI大模型

创业邦· 2025-04-30 10:09

以下文章来源于光子星球，作者郝鑫来源丨光子星球（ID：TMTweb）作者丨郝鑫光子星球 . 细微之处，看见未来编辑丨王潘图源丨Midjourney "DeepSeek-R1如同当年苏联抢发的第一颗卫星，成为AI开启新时代的斯普特尼克时刻。" 2025年春节前，DeepSeek比除夕那天的烟花先一步在世界上空绽放。离年夜饭仅剩几个小时，国内某家云服务器的工程师突然被拉入工作群，接到紧急任务，要求其快速调优芯片，以适配最新的DeepSeek-R1模型。该工程师告诉我们，"从接入到完成，整个过程不到一周"。大年初二，一家从事Agent To B业务的厂商负责人电话被打爆，客户的要求简单粗暴：第一时间验证模型真实性能，尽快把部署提上日程。节前大模型，节后只有DeepSeek。DeepSeek-R1就像一道分水岭，重新书写了中国大模型的叙事逻辑。以2022年11月，OpenAI发布基于GPT-3.5的ChatGPT应用为起点，国内自此走上了追赶OpenAI的道路。 2023年，大模型如雨后春笋般冒出头，无大模型不AI，各厂商你追我赶，百模大战初见端倪。你方唱罢我登场，2024年的主人公变成了 ...

推理模型

全栈国产化大模型

数字中国峰会 |度小满CTO张文斌：Agent正在重塑客户体验与金融风险决策模式

ChatGPT

GPT系列

Zhong Guo Jing Ji Wang· 2025-04-29 12:04

Core Insights - The 8th Digital China Construction Summit was held in Fuzhou, Fujian Province, focusing on how digital technology is reshaping the financial ecosystem and the training of digital finance talents [1] - The forum featured prominent figures from academia and industry, discussing the transformative impact of AI and large models in finance [1] Group 1: AI and Financial Innovation - Zhang Wenbin, CTO of Du Xiaoman, highlighted the shift from generative models to reasoning models, emphasizing their enhanced capabilities in complex logical reasoning [3] - The application of reasoning models in finance has evolved from peripheral areas like customer service to core scenarios such as user experience and risk decision-making [3][4] - AI Agents are revolutionizing customer interaction by providing seamless online guidance and real-time responses, thus improving user experience and reducing reliance on manual processes [4] Group 2: Risk Management Enhancements - Traditional risk management processes often lead to information loss due to the transformation of raw data into structured variables; reasoning models can utilize full-dimensional raw data to enhance data efficiency [4] - Reasoning models can identify high-risk behaviors, such as suspicious transfers to high-risk accounts, by analyzing user transaction data [4] Group 3: Implementation Strategies for AI - Zhang Wenbin proposed starting with "small cuts" to build Agents, focusing on specific scenarios and customer segments to develop differentiated models [4] - The recommendation includes applying AI in real-world scenarios to generate data that can optimize models, creating a feedback loop of application, data accumulation, model iteration, and effect optimization [4] - Companies should concentrate computational power and talent to establish teams that accelerate AI application, prioritizing the cultivation of "AI-aware talents" to drive organizational transformation [4]

news flash· 2025-04-29 10:31

Core Insights - The article highlights the launch of Alibaba's Qwen3 model, which is the first "hybrid reasoning model" in China, integrating "fast thinking" and "slow thinking" into a single framework [1] - Huawei's Ascend supports the deployment of the Qwen3 model across its entire series, allowing developers to utilize it seamlessly in MindSpeed and MindIE [1] - The Qwen3 model is designed to provide quick responses for simple queries with low computing power while enabling multi-step deep reasoning for complex questions, significantly reducing computational resource consumption [1]

Qwen3深夜炸场，阿里一口气放出8款大模型，性能超越DeepSeek R1，登顶开源王座

3 6 Ke· 2025-04-29 09:53

Core Insights - The release of Qwen3 marks a significant advancement in open-source AI models, featuring eight hybrid reasoning models that rival proprietary models from OpenAI and Google, and surpass the open-source DeepSeek R1 model [4][24]. - Qwen3-235B-A22B is the flagship model with 235 billion parameters, demonstrating superior performance in various benchmarks, particularly in software engineering and mathematics [2][4]. - The Qwen3 series introduces a unique dual reasoning mode, allowing the model to switch between deep reasoning for complex problems and quick responses for simpler queries [8][21]. Model Performance - Qwen3-235B-A22B achieved a score of 95.6 in the ArenaHard test, outperforming OpenAI's o1 (92.1) and DeepSeek's R1 (93.2) [3]. - Qwen3-30B-A3B, with 30 billion parameters, also shows strong performance, scoring 91.0 in ArenaHard, indicating that smaller models can still achieve competitive results [6][20]. - The models have been trained on approximately 36 trillion tokens, nearly double the data used for the previous Qwen2.5 model, enhancing their capabilities across various domains [17][18]. Model Architecture and Features - Qwen3 employs a mixture of experts (MoE) architecture, activating only about 10% of its parameters during inference, which significantly reduces computational costs while maintaining high performance [20][24]. - The series includes six dense models ranging from 0.6 billion to 32 billion parameters, catering to different user needs and computational resources [5][6]. - The models support 119 languages and dialects, broadening their applicability in global contexts [12][25]. User Experience and Accessibility - Qwen3 is open-sourced under the Apache 2.0 license, making it accessible for developers and researchers [7][24]. - Users can easily switch between reasoning modes via a dedicated button on the Qwen Chat website or through commands in local deployments [10][14]. - The model has received positive feedback from users for its quick response times and deep reasoning capabilities, with notable comparisons to other models like Llama [25][28]. Future Developments - The Qwen team plans to focus on training models capable of long-term reasoning and executing real-world tasks, indicating a commitment to advancing AI capabilities [32].

开源模型

终端云端三连发！无问芯穹开源大模型推理加速神器，加码构建新一代端、云推理系统

Qwen3系列大模型

机器之心· 2025-04-29 09:14

机器之心发布机器之心编辑部当前 AI 领域呈现「端云并发」的发展态势，端侧与云侧大模型各展所长，共同推动着智能发展与应用落地的边界。端侧模型实现本地毫秒级实时响应，云侧模型依托强大算力支持复杂大规模推理，而两者都离不开高效的推理系统支撑。在 GTC 2025 上，NVIDIA CEO 黄仁勋强调，大模型计算正从预训练转向推理优化阶段。随着产业落地加速，推理计算需求正呈现爆发式增长，如何在性能、成本和响应速度间取得平衡成为关键工程挑战，推理系统正是解决这一问题的核心。近日，无问芯穹发起了一次推理系统开源节，连续开源了三个推理工作，包括加速端侧推理速度的 SpecEE、计算分离存储融合的 PD 半分离调度新机制 Semi-PD、低计算侵入同时通信正交的计算通信重叠新方法 FlashOverlap，为高效的推理系统设计提供多层次助力。下面让我们一起来对这三个工作展开一一解读： Day 1｜SpecEE：基于推测的 Early Exiting 机制，让 AI PC 推理速度起飞随着 DeepSeek 等开源模型表现出越来越强悍的性能，在 PC 端本地部署大模型的需求持续增长。尽管许多情况下使用云端 ...

大模型推理系统

端云并发

性能超越DeepSeek R1，Qwen3正式登场！阿里一口气放出8款大模型，登顶开源王座！

SpecEE

Semi - PD

FlashOverlap

AI科技大本营· 2025-04-29 09:05

整理 | 屠敏出品 | CSDN（ID：CSDNnews）今天凌晨，大模型领域最受关注的重磅消息来自阿里 Qwen 团队——他们正式发布了备受期待的全新 Qwen3 系列大模型。 8 大模型齐发！这 8 款混合推理模型中，包括了 2 个 MOE 模型： Qwen3-235B-A22B 和 Qwen3-30B-A3B 。其中，Qwen3-235B-A22B 是本次发布中规模最大的旗舰模型，拥有 2350 亿个参数，激活参数超过 220 亿。在代码、数学和通用能力等多个基准测试中，它的表现不仅超过了 DeepSeek 的 R1 开源模型，还优于 OpenAI 的闭源模型 o1。尤其在软件工程和数学领域的 ArenaHard 测试（共 500 道题）中，成绩甚至接近了 Google 最新发布的 Gemini 2.5-Pro，可见其实力不容小觑。不同于以往，这次其一次性开源了多达 8 款混合推理模型，在性能上全面逼近 OpenAI、Google 等闭源大模型，以及超越了开源大模型 DeepSeek R1，堪称当前最强的开源模型之一，也难怪昨晚 Qwen 团队一直在加班。 | | Qwen3- ...

BABA(US:BABA)

Qwen3系列大模型