Workflow
CoT
icon
Search documents
谷歌DeepMind:大模型也很任性,知道最优路径偏要撞南墙
机器之心· 2025-05-05 03:40
Core Insights - The article investigates the common failure modes of Large Language Models (LLMs) in decision-making scenarios, specifically focusing on greediness, frequency bias, and the knowing-doing gap [2][15]. - It proposes a reinforcement learning fine-tuning method (RLFT) to enhance the decision-making capabilities of LLMs by addressing these shortcomings [2][8]. Group 1: Failure Modes - LLMs exhibit suboptimal exploration and a knowing-doing gap, which prevents effective translation of knowledge into action [2][15]. - The three identified failure modes are: 1. Greediness, where LLMs overly favor actions that have previously shown the best performance [15]. 2. Frequency bias, where LLMs tend to repeat high-frequency actions regardless of their reward differences [5][18]. 3. Knowing-doing gap, where LLMs understand task requirements but fail to execute optimal actions due to a preference for greedy choices [7][20]. Group 2: Model Performance - Small-scale LLMs (2B) are significantly affected by frequency bias, leading to a lack of exploration, with up to 55% of actions remaining unexplored [4][18]. - Large-scale LLMs (27B) show reduced frequency bias but still exhibit greedy behavior, limiting their overall performance [6][18]. - The average action coverage for the largest models was only 45%, indicating a substantial gap compared to optimal strategies [17]. Group 3: Reinforcement Learning Fine-Tuning - The RLFT method adjusts the reasoning process of LLMs based on rewards obtained from environmental interactions, promoting the selection of actions that yield higher rewards [8][22]. - Results indicate that RLFT significantly reduces regret values in various environments, improving LLM performance compared to random baselines [22]. - RLFT effectively mitigates greediness by encouraging exploration, thus enhancing decision-making capabilities [22].
大模型推理上限再突破:「自适应难易度蒸馏」超越R1蒸馏,长CoT语料质量飞升
机器之心· 2025-05-04 04:57
Core Viewpoint - The article discusses the development of a novel method for generating high-quality Chain of Thought (CoT) data, focusing on the adaptive difficulty grading of questions for large language models (LLMs) to enhance the reasoning capabilities of smaller models [2][6][41]. Group 1: Research Motivation and Challenges - The emergence of large models like DeepSeek-R1 (671 billion parameters) has highlighted the challenges of deploying such models in real-time systems and edge devices [6]. - There is a pressing need for research on smaller models with fewer than 7 billion parameters, particularly in complex reasoning tasks such as mathematical problem-solving and code generation [7]. - Current CoT data generation methods face challenges, including high computational and annotation costs associated with large-scale data-driven approaches and limited performance gains from high-quality sample-driven methods [8][9]. Group 2: Proposed Methodology - The article introduces a new method called "LLM Adaptive Question Difficulty Grading," which aims to improve the quality of CoT data by dynamically matching model capabilities with data difficulty [12][13]. - The method includes four key innovations: establishing a question difficulty grading system based on inherent model reasoning capabilities, creating an adaptive question bank, designing a difficulty distribution sampling strategy, and generating high-quality CoT data using DeepSeek-R1 [15][18]. Group 3: Experimental Results - The proposed method has shown significant improvements in reasoning performance across various model sizes, with accuracy increases ranging from 6.66% to 26.7% on the AIME24 mathematics competition dataset compared to traditional non-adaptive strategies [18][20]. - Detailed experimental results indicate that models trained with the adaptive CoT data outperform baseline models in multiple mathematical reasoning benchmarks, achieving up to 94.6% accuracy on MATH500 [37]. - The ZCode-32B model demonstrated superior performance across different difficulty levels, indicating that smaller models can achieve competitive results through adaptive data training [38]. Group 4: Conclusion and Future Work - The article concludes that the proposed framework for generating high-quality CoT data is efficient and effective, requiring only about 2,000 high-quality samples to significantly enhance model performance while reducing data and computational costs [41]. - Future work will focus on further integrating reinforcement learning to explore deeper reasoning capabilities and extending applications to more complex cross-domain tasks such as communication fault diagnosis [42].
o3解读:OpenAI发力tool use,Manus们会被模型取代吗?
Founder Park· 2025-04-30 12:31
以下文章来源于海外独角兽 ,作者拾象 海外独角兽 . 研究科技大航海时代的伟大公司。 前段时间, OpenAI 陆续发布了 o 系列最新的两个模型 o3 和 o4-mini。其中, o3 模型 在融合了 tool use 能力后,模型表现已经覆盖了 Agent 产品常用的 use case。 Agent 产品开始分化出两类路线:一类是像 o3 那样把 tool use 通过 CoT 内化到模型中,模型可以用写代码调用的方式执行任务;另一类是类似 Manus, 把工作流程外化成人类 OS 中的 computer use。 同时,OpenAI 也已经把 Agent 产品作为了未来产品商业化收入占比的大头。 o3 这类基础大模型的 tool use 内化能力的提升,是否意味着专用 Agent 产品的技术护城河正在消失? 本篇文章针对于 OpenAI 发布的 o3、o4-mini 模型,开源的 Codex CLI,以及在 API 中使用的 GPT 4.1 进行了解读,尤其是 o3 agentic 和多模态 CoT 新能 力。 Founder Park 正在搭建「 AI 产品市集」社群,邀请从业者、开发人员和创业者,扫 ...
cBrain aims to create and lead two new global solution niches
Globenewswire· 2025-04-29 13:19
Company Announcement no. 05/2025 cBrain aims to create and lead two new global solution niches Copenhagen, April 29, 2025 The faster-than-anticipated shift in the government IT market toward COTS government software presents new strategic opportunities for cBrain. As a result, cBrain (NASDAQ: CBRAIN) has announced to adjust its growth strategy during the first half of 2025 to capitalize on these market changes. Consequently, the growth strategy is extended by adding a focus on two market niches with glob ...
22nd Century Revenue Growth from Continued Expansion of CMO Volume with New Filtered Cigar Agreements
Globenewswire· 2025-04-09 12:25
Core Insights - 22nd Century Group, Inc. has announced the execution of two new agreements to supply filtered cigar products, expanding its customer partnerships and production capabilities [1][3][4] Group 1: Business Expansion - The company is increasing production of filtered cigars, with initial shipments expected in Q2 2025, targeting an annual volume of 500,000 cartons or more [2] - The new agreements are designed to provide gross margin and consistent volume, reinforcing the company's core CMO business [3] - These agreements build on previous momentum from Q3 2024, indicating a strategic focus on expanding the footprint of its VLN brand through established retail channels [4] Group 2: Product Offering - The flagship product, VLN cigarettes, contains 95% less nicotine than traditional cigarettes, aiming to help smokers control their nicotine consumption [5][7] - The proprietary reduced nicotine tobacco blends are developed using patented technologies, ensuring a unique position in the market with the only low nicotine combustible cigarette in the U.S. and critical international markets [7] Group 3: Manufacturing Capabilities - The company operates a 60,000 square foot facility in Mocksville, North Carolina, capable of producing over 45 million cartons of combustible tobacco products annually, with room for expansion [6]
22nd Century CEO & Chairman Larry Firestone Provides Corporate Update Letter to Stockholders
Newsfilter· 2025-04-08 12:00
Corporate Update Letter Highlights Plans to Begin Profitable Growth Phase in 2025 on Expansion of Rebranded VLN® Cigarette Products MOCKSVILLE, N.C., April 08, 2025 (GLOBE NEWSWIRE) -- 22nd Century Group, Inc. (NASDAQ:XXII), a tobacco products company that is leading the fight against nicotine by offering smokers a choice about their nicotine consumption, today issued the following letter to stockholders from Larry Firestone, the Chief Executive Officer of 22nd Century Group, Inc.: A Letter to Our Sharehold ...
22nd Century CEO & Chairman Larry Firestone Provides Corporate Update Letter to Stockholders
Globenewswire· 2025-04-08 12:00
Corporate Update Letter Highlights Plans to Begin Profitable Growth Phase in 2025 on Expansion of Rebranded VLN® Cigarette Products MOCKSVILLE, N.C., April 08, 2025 (GLOBE NEWSWIRE) -- 22nd Century Group, Inc. (Nasdaq: XXII), a tobacco products company that is leading the fight against nicotine by offering smokers a choice about their nicotine consumption, today issued the following letter to stockholders from Larry Firestone, the Chief Executive Officer of 22nd Century Group, Inc.: A Letter to Our Sharehol ...
昆仑万维发布全球首个音乐推理模型Mureka O1,董事长兼CEO方汉详解AI音乐商业化路径
21世纪经济报道· 2025-03-27 01:04
继去年4月发布了第一代音乐生成模型MurekaV1后,3月26日,昆仑万维发布全球首款音乐推理 大模型MurekaO1模型与全新基座模型MurekaV6。 《Mu r e k a》AI音乐人MV全网首发, 该作品由AI生成,其中音乐由Mu r e k a生成, 视频由S k yRe e ls技术支持生成。 据悉,Mu r e k aV6是当前Mu r e k a的基座模型,支持纯音乐生成,还支持1 0种语言的AI音乐创 作。在Mu r e k aV6中,昆仑万维团队引入自研ICL(i n - c o n t e x tl e a r n i n g)技术,使得声场更 加开阔,人声质感和混音设计进一步强化。 Mu r e k a V6进入界面(来源:Mu r e k a官网) 方汉: 用户群体包括C端喜欢音乐的普通人,降低了他们的创作门槛,让他们能自由作 曲作词;B端则主要是影视、游戏、音频等领域的从业者,可帮助他们降低成本、提高 效率。商业模式上,C端免费用户有一定使用权限,付费可获得更高速度和优先的AI生 成机会;B端提供专业功能,通过Sa aS或Pa sS服务收费。 Mu r e k aO1模型则是基于M ...
超越 Suno,全球首个 CoT 音乐模型Mureka O1 来了!
AI科技大本营· 2025-03-26 10:20
人人都是音乐创作人的时代来临了! 出品丨AI 科技大本营(ID:rgznai100) AI 正渗透各行各业,前不久,一首由 AI 创作的歌曲火爆出圈,在短短几天内登上热歌榜单。AI 正在为音乐爱好者打开音乐创作之门。据 Fortune Business Insights 数据显示,2023年全球数字音频工作站(DAW, Digital Audio Workstation)市场规模高达约30亿美元,预计2026年约70%的 DAW企业将使用AI技术辅助音乐创作。 《Mureka》AI 音乐人 MV 全网首发,歌手:Mureka;该作品由 AI 生成,其中音乐由 Mureka 生成,视频由 SkyReels 技术支持生成。 点开这首《童年的夜晚》,旋律柔和动听,人声温柔真挚,咬字清晰,歌词很贴近提示词的风格,完全没有 AI 感,很不错。 将生成的歌曲下载下来后,小编发现它支持音轨分离下载。普通歌曲下载只有一条音轨,而Mureka 提供音乐生成的独立的人声、伴奏等多轨输出,比 如鼓声、贝斯等,这样对编曲者来说无疑是二次创作的神器,方便用户后续混音。 什么?你说提示词生成是小case啦?来,上难度,点击高级模式,Mu ...
ZPedia|中国AI再出王炸!全球首个音乐推理大模型MurekaO1上线,硅谷彻底碎了?
Z Finance· 2025-03-26 09:14
在人工智能技术持续突破与市场需求双重驱动下, AI 生成音乐行业正迎来爆发式增长,年复合增长率超过 16.3% 。 Mureka 的横空出世,直指全球 540 亿 美元音乐产业的核心腹地。其颠覆性在于: 当 Suno 还在优化单曲生成时, Mureka 已构建起覆盖创作、生产、商业化的完整生态链。 硅谷尚未意识到的危机在于:当音乐生产的核心工具链被一家中国公司定义,全球文化产业的权力结构正在发生静默转移 。 正如 20 世纪好莱坞通过电影 工业标准统治全球娱乐业,今天 Mureka 的 API 接口、模型协议、音色库,正在成为 AI 时代音乐产业的新基建。 二、核心能力:为什么说 Mureka 是 " 音乐界的原子弹 " ? 昆仑万维今日重磅发布2款革命性音乐大模型。 一、中国 AI 突破再 临:音乐工业革命迎来东方引擎 2025 年第一季度,中国 AI 产业以 " 三连跳 " 的姿态完成技术版图扩张: 2 月 DeepSeek 以开源策略重构大模型竞争格局, 3 月 Manus 重新定义智能体协 作范式,而 3 月末,当硅谷还在讨论 "AI 是否具备真正的艺术创造力 " 时,一组来自中国的代码已悄然改写了音 ...