Workflow
机器之心
icon
Search documents
文心新出的推理大模型,给了我们信心
机器之心· 2025-09-09 11:46
机器之心报道 机器之心编辑部 当下的大语言模型,不怕它搞不定,就怕它胡说八道:有「幻觉」存在,我们经常会下意识地不信任 AI 输出的结果。就在上周,OpenAI 的论文《Why Language Models Hallucinate》广为流传。研究人员指出,要想消除幻觉,需要修正模型训练时的评分机制并开发全新的技术。 不过 AI 领域里,技术的发展速度一直比想象得快,就像是对 OpenAI 研究的呼应,今天上午 WAVE SUMMIT 深度学习开发者大会 2025 上,百度发布的 新模型就把「可信度」提升了一大截,除了更准确的事实性,更有指令遵循、智能体等能力的显著提升。 今天发布的是 文心大模型 X1.1 深度思考模型,它是百度在 4 月份发布的旗舰模型 X1 的升级版,发布即上线,所有人都可以免费体验 。同时该模型通过 百度智能云千帆平台向企业客户与开发者开放使用。 升级后的模型主攻事实性、指令遵循以及智能体、工具调用能力,带来了综合能力的显著提升。用一组数据说话,相较于文心 X1,X1.1 的事实性提升 34.8%,指令遵循提升 12.5%,智能体提升 9.6%。 这意味着它提供信息时更加可靠、执行任务 ...
SFT远不如RL?永不过时的剃刀原则打开「终身学习」大模型训练的大门
机器之心· 2025-09-09 11:46
Core Viewpoint - The article discusses the challenges and advancements in large models, particularly focusing on the phenomenon of catastrophic forgetting and the advantages of reinforcement learning (RL) over supervised fine-tuning (SFT) in mitigating this issue [1][3][29]. Group 1: Large Models and Their Challenges - The era of large models has arrived, becoming a core component of intelligent infrastructure supporting various applications such as language processing, visual analysis, and robotics [1]. - Most deployed large models are "static" and lack the ability for dynamic learning and self-improvement, which is essential for achieving more general artificial intelligence (AGI) [2][3]. - Catastrophic forgetting occurs when models lose previously learned skills while learning new tasks, posing a significant challenge for long-term learning agents [3]. Group 2: Research Insights on Catastrophic Forgetting - Researchers have proposed various methods to address catastrophic forgetting, including regularization, experience replay, and parameter tuning [5]. - A recent study from MIT's Improbable AI Lab revealed fundamental patterns and training strategies related to forgetting in large models, gaining significant attention [6][7]. Group 3: Findings from the Study - The study compared two common post-training methods: supervised fine-tuning (SFT) and reinforcement learning (RL), finding that RL is less prone to forgetting [8][29]. - A new principle called the "forgetting law" was introduced, indicating that the KL divergence between the fine-tuned strategy and the baseline strategy is a key predictor of forgetting [10][30]. - The research demonstrated that RL maintains better retention of prior knowledge while learning new tasks compared to SFT, which often sacrifices old knowledge for new performance [15][29]. Group 4: Mechanisms and Theoretical Contributions - The study identified that the online nature of RL contributes to its KL divergence minimization, which helps retain prior knowledge [21][30]. - The authors provided a theoretical basis for RL's KL-minimizing behavior, explaining that RL naturally prefers solutions closer to the original model [24][30]. - The findings suggest that future training methods should aim to minimize KL divergence to achieve continuous learning without forgetting [31][32].
DPad: 扩散大语言模型的中庸之道,杜克大学陈怡然团队免训推理加速61倍
机器之心· 2025-09-09 08:56
论文作者团队 :来自杜克大学 CEI Center,由实习生陈欣骅、黄思韬及郭聪博士共同完成,指导教师为李 海教授、陈怡然教授。 扩散大语言模型(dLLMs)凭借并行解码与独特的全局规划能力,有望解决自回归(AR)大模型的效率瓶 瓶颈和规划能力缺陷。但其「全局规划」能力依赖于其双向注意力对所有后文的关注,这带来了严重的计 算冗余,从而导致现有开源模型的潜力远远未被释放。 当前的 dLLM 存在「路线之争」:一是保留全局规划能力但推理效率极低的「全局双向注意」(如 LLaDA),二是追求速度却牺牲规划能力的「块内双向注意」(如 Block Diffusion)。如何在这两条路线之 间调和折中,让模型既能「着眼全局」,又能加速推理,已成为学界日益关注的问题。 针对以上问题,杜克大学陈怡然团队另辟蹊径,揭示了 dLLM 中实现全局规划的「草稿纸机制」,并发现 其存在高度冗余。据此,他们提出免训练方法 DPad(Diffusion Scratchpad),通过先验地丢弃大量无效后 缀 token,既极大地降低了计算量,又保留了核心规划能力,尝试在两条路线中走出一条「中间路线」。该 方法与现有优化技术结合后,在几乎无损 ...
硅谷也996实锤了?AI的火,烧掉了硅谷的周末
机器之心· 2025-09-09 08:56
Core Viewpoint - The "996" work culture, initially seen as a phenomenon unique to Chinese tech companies, is increasingly becoming a reality in Silicon Valley, with evidence of longer working hours and changes in employee consumption patterns [2][3][9]. Group 1: Evidence of 996 in Silicon Valley - A blog post by Ara Kharazian, an economist at fintech company Ramp, highlights the increase in Saturday work hours among employees in San Francisco, reflected in their consumption trends [3][7]. - Data from Ramp shows a significant increase in dining and takeout spending on Saturdays in 2025 compared to 2024, indicating that employees are working longer hours on weekends [7][8]. - This trend is unique to San Francisco, as other major tech hubs do not show a similar increase in Saturday spending, with New York's increase being only a quarter of that in San Francisco [8][9]. Group 2: Broader Implications and Reactions - The increase in Saturday spending is not limited to tech companies but is observed across various industries in San Francisco, suggesting a widespread adoption of longer working hours [9]. - Some industry leaders express concerns that forcing employees to work long hours can lead to talent attrition, ultimately harming company progress [18][20]. - The phenomenon of "996" is contrasted with a more relaxed work culture in Europe, where the concept of "996" humorously refers to taking significant time off rather than long working hours [25][26].
Altman亲自发博客点赞,这两大杰出人才是谁?
机器之心· 2025-09-09 06:45
Core Viewpoint - OpenAI's recent advancements in AI technology, particularly with ChatGPT, are attributed to the contributions of two key researchers, Jakub Pachocki and Szymon Sidor, who have effectively combined cutting-edge research with engineering practices to solve numerous challenges [1][3][4]. Group 1: Contributions of Jakub Pachocki - Jakub Pachocki is recognized as a pivotal figure at OpenAI, serving as the Chief Scientist and leading significant projects such as the development and pre-training of GPT-4 [4][8]. - He played a crucial role in the OpenAI Five project, where AI defeated human champions in the game Dota 2, which bolstered confidence in the potential of large-scale reinforcement learning (RL) [4][8]. - Pachocki's academic background includes a focus on high-dimensional convex optimization, which is closely related to the training of modern neural networks [6][8]. Group 2: Contributions of Szymon Sidor - Szymon Sidor, who graduated from MIT, has made significant contributions to various core projects at OpenAI, including the development of large-scale RL systems and advancements in robotics [12][13]. - His early research explored the intersection of reinforcement learning and natural language processing (NLP), laying the groundwork for techniques used in aligning ChatGPT and training reasoning models [12][14]. - Sidor's involvement in the OpenAI Five project and his contributions to the GPT-4 technical report highlight his integral role in the company's advancements [13][14]. Group 3: Internal Dynamics and Leadership Changes - Following the unexpected dismissal of CEO Sam Altman, both Jakub Pachocki and Szymon Sidor, along with other key personnel, resigned in protest, which triggered a significant employee backlash [16][17]. - The internal crisis led to a restructuring of OpenAI's leadership, with Pachocki being appointed as the new Chief Scientist after Altman's return [17].
从「会说」迈向「会做」,LLM下半场:Agentic强化学习范式综述
机器之心· 2025-09-08 10:30
Core Insights - The article discusses the evolution of training paradigms for large language models (LLMs) from Preference-based Reinforcement Fine-tuning (PBRFT) to Agentic Reinforcement Learning (Agentic RL), highlighting the limitations of PBRFT and the advantages of Agentic RL in enabling LLMs to engage in proactive decision-making and long-term planning [2][4][37]. Paradigm Shift - The transition from PBRFT to Agentic RL is defined formally, where PBRFT is seen as a degenerate single-step Markov Decision Process (MDP), while Agentic RL operates under a partially observable Markov decision process (POMDP) framework, allowing for multi-step interactions [6][8]. - Key changes include the expansion of action space from pure text sequences to include both text and actions, and the reward structure evolving from single-step scoring to temporal feedback, optimizing the entire decision trajectory [8][10]. Core Capabilities of Agentic RL - Six core capabilities are identified as essential for LLMs to function as agents: 1. **Planning**: Setting sub-goals and multi-step action sequences for complex tasks [14]. 2. **Tool Use**: Learning to autonomously select and combine external tools [15]. 3. **Memory**: Maintaining context and accumulating knowledge through various memory management techniques [17]. 4. **Self-Improvement**: Enhancing capabilities through self-correction and iterative self-training [18]. 5. **Reasoning**: Developing both intuitive and systematic reasoning abilities [19]. 6. **Perception**: Understanding and processing multi-modal inputs actively [19]. Applications and Evolution - Agentic RL is expanding into various application domains, including search and research optimization, code generation, mathematical reasoning, graphical user interface (GUI) interactions, and multi-agent systems [25][26][27][28]. - The framework for Agentic RL is supported by a variety of experimental environments and tools, facilitating research and development [32][33]. Challenges and Future Directions - Despite its potential, Agentic RL faces challenges such as ensuring reliability and safety, scaling up training processes, and creating environments that accurately reflect real-world complexities [35][39]. - The article emphasizes the need for overcoming these challenges to enable LLMs to transition from merely "speaking" to "doing," thereby evolving into more autonomous and versatile agents [38][39].
Hinton自曝:前女友提分手,用ChatGPT列出自己「七宗罪」
机器之心· 2025-09-08 10:30
Core Viewpoint - The article discusses the increasing integration of AI, particularly ChatGPT, into personal and professional lives, highlighting both its utility and the potential emotional implications of reliance on AI for interpersonal communication [4][6]. Group 1: AI in Personal Life - Geoffrey Hinton, known as the "Godfather of AI," shared a personal anecdote about using ChatGPT in his breakup, illustrating how AI is becoming a participant in human relationships [1][4]. - The use of AI for personal matters, such as breakups, indicates a shift in communication paradigms, where individuals may increasingly depend on AI for emotional support and decision-making [4][6]. Group 2: AI's Impact on Employment - Hinton warns that the advancement of AI could lead to massive unemployment and wealth concentration, where the rich benefit disproportionately from AI technologies [8][9]. - He emphasizes that while current layoffs are not significant, evidence suggests that AI is reducing entry-level job opportunities, which could have long-term economic implications [8][9]. Group 3: AI Risks and Future Concerns - Hinton expresses concerns about the potential dangers of AI, including the risk of it being used for harmful purposes, such as creating autonomous weapons [9][11]. - He estimates a 10% to 20% chance that superintelligent AI could pose an existential threat to humanity if left unchecked [9][10]. - Hinton reflects on his delayed recognition of AI's risks, noting that earlier generations of neural networks seemed far from surpassing human intelligence [11][12]. Group 4: Recommendations for AI Management - Hinton suggests that the only hope for humanity in managing AI's risks is to design AI systems that prioritize human welfare, akin to a nurturing relationship between a mother and child [13][14].
字节Seedream 4.0将全量开放!抢先评测来了,我们摸索出AI生图20种「邪修」玩法
机器之心· 2025-09-08 09:17
机器之心报道 编辑:杨文 打开多模态自由创作的大门。 谷歌 Nano Banana 掀起的全球创作狂欢尚未消退之际,字节又玩了把大的。 近日,字节跳动开始内测最新的豆包・图像创作模型 Seedream 4.0。与此前版本相比,Seedream 4.0 首次支 持多模态生图,同一模型可以实现文生图、图像编辑、组图生成,并在核心能力上迎来了显著提升: 主体一致性增强 :无论是文本驱动还是图像驱动,都能稳健保持主体特征,避免「失真」与「错 位」。 提示词:将平视视角改为俯视视角,将近景改为中景,并把画面比例改为 16 : 9 。 多图灵活创作 :支持文本、图像的多维组合输入,轻松实现参考生成、融合生成与编辑。 提示词:根据参考图中两个男生的形象,生成一组动作片分镜,原比例。 从更长远的技术发展视角来看,多模态自由创作正成为大势所趋。无论是文本驱动、图像驱动,还是多图 融合,用户都期待能以更自然、更随心的方式与 AI 协作。 Seedream 4.0 内测一出,网友们就把它玩出了花。 比如基于多图融合能力,上传两张角色照片,再加上火柴人自拍动作,即可实现同框合影。 提示词:将图 1 男子和图 2 女子合进一张画面,参 ...
扎克伯格的豪赌初见成效?Meta新方法让LLM长上下文处理提速30倍
机器之心· 2025-09-08 06:22
| Jackson Atkins > @JacksonAtkinsX · 9月7日 | | --- | | Meta Superintelligence Labs just made LLMs handle 16x more context and | | unlocked up to a 31x speedup. | | Their new REFRAG framework rethinks RAG from the ground up to achieve | | this, all with zero drop in accuracy. | | Here's how it works: | | The core problem with long context is | | Meta 超级智能实验室刚刚让 LLMs 的上下文容量提升 16 倍,速度最高提升 31 | | 倍。 | | 他们的全新 REFRAG 框架彻底重构了 RAG,实现了这一目标,且准确率零下 | | 它的运作方式如下: | | 长上下文的核心问题在于 | | 动示更多 | 近期,Meta Superintelligence Lab ...
全球图生视频榜单第一,爱诗科技PixVerse V5如何改变一亿用户的视频创作
机器之心· 2025-09-08 06:22
机器之心原创 作者:冷猫 好玩好用的明星视频生成产品再更新,用户操作基础,模型技术就不基础。 熟悉生成领域的读者们最近都被谷歌的一只纳米香蕉 nano-banana 刷了屏。 在图像生成领域,纳米香蕉在短期内获得了巨量的影响力,凭着「照片变手办」的超高真实感的创意玩法横扫整个社交媒体,尤其触动了毛孩子家长们的心。 在优秀的模型实力基本盘外,真正做到出圈的 核心 要素还得是「 创意」 。 把自家宠物变成可爱手办的创意玩法的彻底出圈,让更多普通用户意识到 AI 生成让想象落地的能力,「这个好酷,我也想要」的心理触发了全民 AI 创作的裂 变。 不过,说到在 AI 视频中玩 创意,老玩家 PixVerse(拍我 AI) 上周五开始在国内开启免费开放周,两天内有创作者在小红书、短视频平台上玩 Nano banana 3D 手 办,也有创作者用 Nano banana 生图和拍我 AI 模板结合,玩衣柜变装,获得 视频号超 5000 点赞量。 在两年前,Sora 甚至还没有概念发布的时候,PixVerse 就已经上线了网页端产品,上线 30 天内就实现了百万访问量。 如此元老级的视频生成玩家,在「创意」上是认真的。过去 ...