Workflow
Chain-of-Thought (CoT)
icon
Search documents
训练成本29.4万美元,DeepSeek-R1登Nature封面,首个通过权威期刊同行评审的主流大模型获好评
3 6 Ke· 2025-09-18 07:55
Core Insights - DeepSeek-R1's research results have been published in Nature, marking it as the first mainstream large model to undergo peer review by a reputable journal, which has sparked significant discussion in the academic community [1][14][17] - The training cost of DeepSeek-R1 is reported to be only $294,000, significantly lower than the industry standard of tens of millions for leading models, despite an investment of approximately $6 million in the foundational LLM [1][2][17] Training Costs - The training costs for DeepSeek-R1 are broken down as follows: - DeepSeek-R1-Zero: $202,000 - SFT data creation: $10,000 - DeepSeek-R1: $82,000 - Total: $294,000 - The training utilized 648 H800 GPUs over approximately 198 hours for DeepSeek-R1-Zero and around 80 hours for DeepSeek-R1 [2] Reinforcement Learning and Reasoning Capabilities - The model employs Group Relative Policy Optimization (GRPO) to enhance reasoning capabilities without traditional supervised fine-tuning, allowing for more exploratory learning [3][4] - DeepSeek-R1-Zero demonstrates complex reasoning behaviors, generating longer responses that incorporate verification and exploration of different solutions [4][6] Performance Metrics - DeepSeek-R1-Zero achieved a pass@1 score of 77.9% in the AIME 2024 math competition, with further improvements to 86.7% using self-consistent decoding strategies, surpassing human average performance [6][8] - The model also excelled in programming competitions and graduate-level questions in biology, physics, and chemistry, validating the effectiveness of reinforcement learning in enhancing reasoning capabilities [6] Development Pipeline - The development of DeepSeek-R1 involved multiple stages, starting from data collection based on human-like dialogue to reinforcement learning and sampling, ultimately enhancing the model's utility and safety [9][11] - Experimental results indicate significant improvements in instruction execution across various development stages, with DeepSeek-R1 outperforming its predecessors in benchmark tests [11][13] Industry Impact - The peer review of DeepSeek-R1 is seen as a positive trend for AI research, promoting transparency and standardization in the field, which has been lacking for many mainstream AI models [14][16][17]
DeepSeek们越来越聪明,却也越来越不听话了。
数字生命卡兹克· 2025-05-19 20:14
Core Viewpoint - The article discusses the paradox of advanced AI models, where increased reasoning capabilities lead to a decline in their ability to follow instructions accurately, as evidenced by recent research findings [1][3][10]. Group 1: Research Findings - A study titled "When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs" reveals that when models engage in reasoning, they often fail to adhere to given instructions [2][3]. - The research team from Harvard, Amazon, and NYU conducted tests on 15 models, finding that 13 out of 14 models showed decreased accuracy when using Chain-of-Thought (CoT) reasoning in simple tasks [4][6]. - In complex tasks, all models tested exhibited a decline in performance when employing CoT reasoning [4][6]. Group 2: Performance Metrics - In the IFEval test, models like GPT-4o-mini and Claude-3.5 experienced significant drops in accuracy when using CoT, with GPT-4o-mini's accuracy falling from 82.6% to 76.9% [5]. - The results from ComplexBench also indicated a consistent decline across all models when CoT was applied, highlighting the detrimental impact of reasoning on task execution [4][6]. Group 3: Observed Behavior Changes - The models, while appearing smarter, became more prone to disregarding explicit instructions, often modifying or adding information that was not requested [9][10]. - This behavior is attributed to a decrease in "Constraint Attention," where models fail to focus on critical task constraints when reasoning is involved [10]. Group 4: Proposed Solutions - The article outlines four potential methods to mitigate the decline in instruction-following accuracy: 1. **Few-Shot Learning**: Providing examples to the model, though this has limited effectiveness due to input length and bias [11][12]. 2. **Self-Reflection**: Allowing models to review their outputs, which works well for larger models but poorly for smaller ones [13]. 3. **Self-Selective Reasoning**: Enabling models to determine when reasoning is necessary, resulting in high recall but low precision [14]. 4. **Classifier-Selective Reasoning**: Training a smaller model to decide when to use CoT, which has shown significant improvements in accuracy [15][17]. Group 5: Insights on Intelligence - The article emphasizes that true intelligence lies in the ability to focus attention on critical aspects of a task rather than processing every detail [20][22]. - It suggests that AI should be designed to prioritize key elements of tasks, akin to how humans effectively manage their focus during critical moments [26][27].
AI生成视频总不符合物理规律?匹兹堡大学团队新作PhyT2V:不重训练模型也能让物理真实度狂飙2.3倍!
机器之心· 2025-05-19 04:03
Core Viewpoint - The article discusses the advancement of Text-to-Video (T2V) generation technology, emphasizing the transition from focusing on visual quality to ensuring physical consistency and realism through the introduction of the PhyT2V framework, which enhances existing T2V models without requiring retraining or extensive external data [2][3][26]. Summary by Sections Introduction to PhyT2V - PhyT2V is a framework developed by a research team at the University of Pittsburgh, aimed at improving the physical consistency of T2V generation by integrating large language models (LLMs) for iterative self-refinement [2][3][8]. Current State of T2V Technology - Recent T2V models, such as Sora, Pika, and CogVideoX, have shown significant progress in generating complex and realistic scenes, but they struggle with adhering to real-world physical rules and common sense [5][7]. Limitations of Existing Methods - Current methods for enhancing T2V models often rely on data-driven approaches or fixed physical categories, which limits their generalizability, especially in out-of-distribution scenarios [10][12][18]. PhyT2V Methodology - PhyT2V employs a three-step iterative process involving: 1. Identifying physical rules and main objects from user prompts [12]. 2. Detecting semantic mismatches between generated videos and prompts using video captioning models [13]. 3. Generating corrected prompts based on identified physical rules and mismatches [14] [18]. Advantages of PhyT2V - PhyT2V offers several advantages over existing methods: - It does not require any model structure modifications or additional training data, making it easy to implement [18]. - It provides a feedback loop for prompt correction based on real generated results, enhancing the optimization process [18]. - It demonstrates strong cross-domain applicability, particularly in various physical scenarios [18]. Experimental Results - The framework has been tested on multiple T2V models, showing significant improvements in physical consistency (PC) and semantic adherence (SA) scores, with the CogVideoX-5B model achieving up to 2.2 times improvement in PC and 2.3 times in SA [23][26]. Conclusion - PhyT2V represents a novel, data-independent approach to T2V generation, ensuring that generated videos comply with real-world physical principles without the need for additional model retraining, marking a significant step towards creating more realistic T2V models [26].
超越 Suno,全球首个 CoT 音乐模型Mureka O1 来了!
AI科技大本营· 2025-03-26 10:20
人人都是音乐创作人的时代来临了! 出品丨AI 科技大本营(ID:rgznai100) AI 正渗透各行各业,前不久,一首由 AI 创作的歌曲火爆出圈,在短短几天内登上热歌榜单。AI 正在为音乐爱好者打开音乐创作之门。据 Fortune Business Insights 数据显示,2023年全球数字音频工作站(DAW, Digital Audio Workstation)市场规模高达约30亿美元,预计2026年约70%的 DAW企业将使用AI技术辅助音乐创作。 《Mureka》AI 音乐人 MV 全网首发,歌手:Mureka;该作品由 AI 生成,其中音乐由 Mureka 生成,视频由 SkyReels 技术支持生成。 点开这首《童年的夜晚》,旋律柔和动听,人声温柔真挚,咬字清晰,歌词很贴近提示词的风格,完全没有 AI 感,很不错。 将生成的歌曲下载下来后,小编发现它支持音轨分离下载。普通歌曲下载只有一条音轨,而Mureka 提供音乐生成的独立的人声、伴奏等多轨输出,比 如鼓声、贝斯等,这样对编曲者来说无疑是二次创作的神器,方便用户后续混音。 什么?你说提示词生成是小case啦?来,上难度,点击高级模式,Mu ...