Workflow
DeepSeek
icon
Search documents
刚刚!北大校友Lilian Weng最新博客来了:Why We Think
机器之心· 2025-05-18 04:25
Core Insights - The article discusses advancements in utilizing "thinking time" during model inference, aiming to enhance the reasoning capabilities of AI models like GPT, Claude, and Gemini [2][3][16]. Group 1: Thinking Mechanisms - The concept of "thinking time" is analogous to human cognitive processes, where complex problems require reflection and analysis before arriving at a solution [6]. - Daniel Kahneman's dual process theory categorizes human thinking into fast (System 1) and slow (System 2) modes, emphasizing the importance of slower, more deliberate thought for accurate decision-making [12]. Group 2: Computational Resources - In deep learning, neural networks can be characterized by the computational and storage resources they utilize during each forward pass, impacting their performance [8]. - The efficiency of models can be improved by allowing them to perform more computations during inference, particularly through strategies like Chain of Thought (CoT) prompting [8][18]. Group 3: Chain of Thought (CoT) and Learning Strategies - CoT prompting significantly enhances the success rate of solving mathematical problems, with larger models benefiting more from extended "thinking time" [16]. - Early research focused on supervised learning from human-written reasoning paths, evolving into reinforcement learning strategies that improve CoT reasoning capabilities [14][41]. Group 4: Test-Time Computation Strategies - Two main strategies for improving generation quality are parallel sampling and sequential revision, each with distinct advantages and challenges [19][20]. - Parallel sampling is straightforward but relies on the model's ability to generate correct answers in one go, while sequential revision allows for targeted corrections but is slower [20][21]. Group 5: Reinforcement Learning Applications - Recent studies have successfully employed reinforcement learning to enhance reasoning capabilities in language models, particularly in STEM-related tasks [41][46]. - The training process often involves a cold-start phase followed by reasoning-oriented reinforcement learning, optimizing performance through structured feedback [42][43]. Group 6: External Tools and Integration - Utilizing external tools, such as code interpreters or APIs, can enhance the reasoning process by offloading certain computational tasks [52][56]. - The ReAct method combines external operations with reasoning trajectories, allowing models to incorporate external knowledge into their inference paths [56][57]. Group 7: Model Interpretability and Trustworthiness - The article highlights the importance of model interpretability, particularly through CoT, which allows for monitoring and understanding model behavior [59]. - However, there are concerns regarding the fidelity of CoT outputs, as biases and errors can affect the reliability of the reasoning process [62][64]. Group 8: Adaptive Computation and Token Utilization - Adaptive computation time allows models to dynamically adjust the number of computation steps during inference, enhancing their reasoning capabilities [81]. - Introducing special tokens, such as thinking tokens, can provide additional processing time and improve model performance on complex tasks [85][89].
英国《金融时报》刊文:中国是如何赶上硅谷的
Huan Qiu Wang Zi Xun· 2025-05-16 22:58
Group 1 - The article discusses how China is catching up to Silicon Valley, with predictions that by 2030, global usage of Chinese AI applications and electric vehicles will be prevalent [1] - American tech giants have acknowledged that China has taken the lead in various technology sectors, with notable advancements in AI and electric vehicle charging technology [1] - Prominent figures in the tech industry, including former Google CEO Eric Schmidt and Nvidia CEO Jensen Huang, have stated that China is on par with or even ahead of the U.S. in technology [1] Group 2 - A report by former Italian Prime Minister Mario Draghi concludes that U.S. production efficiency is primarily due to its technology, which was established over 20 years ago [2] - The perception of China has shifted from being merely a production hub to a significant player in the technology future, with some investors now buying into Chinese tech [2]
突袭Cursor,Windsurf抢发自研大模型!性能比肩Claude 3.5、但成本更低,网友好评:响应快、不废话
AI前线· 2025-05-16 15:39
Core Viewpoint - Windsurf has launched its first AI software engineering model family, SWE-1, aimed at optimizing the entire software engineering process beyond just coding tasks [1][2][9]. Group 1: Model Details - The SWE-1 series includes three specific models: SWE-1, SWE-1-lite, and SWE-1-mini, each designed for different functionalities and user needs [2][6][27]. - SWE-1 is comparable to Claude 3.5 Sonnet in reasoning ability but at a lower service cost, while SWE-1-lite replaces the previous Cascade Base model with improved quality [6][27]. - SWE-1-mini focuses on speed and is designed for passive prediction tasks, operating within latency constraints [6][27]. Group 2: Performance and Evaluation - Windsurf claims that SWE-1's performance is close to leading models and superior to non-leading and open-weight models, based on offline evaluations and production experiments [14][20][21]. - The offline evaluation involved benchmark tests comparing SWE-1 with models like Cascade and DeepSeek, focusing on usability, efficiency, and accuracy [15][18][20]. - Production experiments measured user engagement and model utility, with Claude as a benchmark for comparison [21][22][24]. Group 3: Development Philosophy - Windsurf aims to enhance software development speed by 99%, recognizing that coding is only a small part of the software engineering process [9][10][12]. - The company emphasizes the need for models to handle various tasks beyond coding, including accessing knowledge, testing software, and understanding user feedback [9][10]. - The development of SWE-1 is part of Windsurf's broader strategy to create a "software engineering" model that can automate more workflows and improve overall efficiency [12][30][33]. Group 4: Future Directions - Windsurf is committed to continuous improvement and investment in the SWE model family, aiming to surpass the performance of leading research lab models [27][33]. - The concept of "flow awareness" is central to the development of SWE-1, allowing seamless interaction between users and AI [29][30]. - The company believes that leveraging insights from user interactions will guide future enhancements and ensure the model meets user expectations [30][33].
杭州市创业投资协会周恺秉:杭州科创崛起离不开两个“微小但重要”的变量
作为杭州科技创新体系建设的重要参与者和亲历者,周恺秉曾长期负责杭州市创业投资引导基金管理工 作。自20世纪90年代起,他持续呼吁地方财政和企事业单位加大科技投入;2011年提出应关注创业投资 项目的退出管理机制;2015年,他撰文建议杭州构建"硅谷式"的创业生态系统。2025年4月,《21世纪 经济报道》在杭州独家对话周恺秉,听他分享杭州创业投资体系演进的经验与思考。 口述 / 中国投资发展促进会副会长、杭州市创业投资协会轮值会长 周恺秉 采访整理 / 21世纪经济报道记者 赵娜 过去几十年,说起硅谷,人们总会提到它鼓励冒险、宽容失败、以人为本的创业文化。那么,光有这些 就够了吗? 事实是,世界至今未能复制出第二个硅谷。也许我们的理解还有偏差,或者说,忽略了一些微小但重要 的因素。 我在2020年曾提出一个创新公式:Innovations = F(Culture,System,VC,...) 创新是多个变量叠加形成的函数。第一是敢于冒险、宽容失败的文化;第二是市场经济的体制机制;第 三是活跃推动创新创业的资本。当然,还有创业生态、营商环境、教育医疗等其他条件。 当杭州选定了这个公式,后面的发展就变成了"时间的 ...
安联投资:当下或许是把握收益基金稳健潜力的好时机
Zhi Tong Cai Jing· 2025-05-16 08:17
Core Insights - The current market environment, characterized by significant volatility in the U.S. stock market and uncertain interest rate outlook, presents a favorable opportunity for income funds to provide stable returns [1][2][4] Group 1: Benefits of Income Funds - Income funds focus on generating regular returns through investments in dividend-paying stocks, specific types of bonds, and alternative assets, which can help investors manage their daily financial needs amidst market fluctuations [2][3] - The rising bond yields, particularly in low-interest-rate risk bonds like short-duration bonds and floating-rate notes, enhance the potential returns for income funds [3][4] - Income funds typically invest in large, stable companies with consistent performance, contrasting with growth stocks that exhibit higher volatility and lower dividend payouts [3][4] Group 2: Current Market Conditions - The U.S. stock market has experienced significant fluctuations, with technology stocks particularly affected, raising concerns about high valuations and potential inflation due to government policies [2][4] - The anticipated long-term high-interest rate environment poses challenges for core bond holders, but floating-rate notes and other fixed-income instruments may be less impacted [4][6] - Diversification is crucial, as the balance between stocks and bonds will be essential for wealth protection and accumulation in the coming years [5][6] Group 3: Suitability of Income Funds - Income funds may not be suitable for all investors; those seeking aggressive returns or longer investment horizons might prefer growth-oriented assets [6] - For investors prioritizing stable returns and less exposure to price volatility, income funds are increasingly attractive in the current unpredictable market landscape [6]
疆亘资本总裁胡仲江:GP从“财务出资人”升级为“生态建筑师”
Sou Hu Cai Jing· 2025-05-16 06:41
Group 1 - The emergence of DeepSeek signifies a shift in local governments' understanding of "core competitiveness," moving from tax incentives to a new battleground focused on "data sovereignty" [3][6] - The role of General Partners (GPs) is evolving from "financial investors" to "ecosystem architects," requiring enhanced data analysis capabilities to help governments quantify data value and design compliant data usage frameworks [3][6] - The rise of DeepSeek is prompting deeper exploration of cooperation models among governments, enterprises, and investment institutions, moving away from traditional subsidy models to new mechanisms based on value co-creation and risk-sharing [7] Group 2 - DeepSeek's success represents a restructuring of productivity tools, utilizing a model with 7 billion parameters to achieve the effectiveness of 100 billion parameter models, reducing deployment costs by 90% [4] - The transformation in AI applications reveals that while less data can yield practical results, core technology still relies on foreign infrastructure, pushing investors to seek opportunities that allow AI to take root in industries [5] - The investment focus is shifting towards AI platforms that enable enterprises to build applications independently and ensure sustainable data resource revenue [5] Group 3 - The return of cultural confidence in China is reshaping the economic value system, with traditional cultural symbols entering mainstream life through various mediums, marking a response to Western consumerism [8] - Three evolving investment logics are emerging: a reconstruction of cultural valuation systems, a shift in the paradigm of technological empowerment, and an elevation of cultural consumption scenarios [8][9] - The challenge lies in balancing cultural dignity with commercial efficiency, with sustainable cultural assets emerging from projects that maintain cultural purity while establishing modern value exchange systems [9] Group 4 - The Chinese primary market in 2025 is expected to present a complex landscape of "ice and fire," with both new opportunities and transitional challenges [10] - Investment direction is shifting from broad trends to a focus on industry details, with specialized funds gaining an advantage over those following trends [10] - The exit strategies for investments are being reshaped, with a move towards industrial mergers and acquisitions as traditional public listings become less reliable [10] Group 5 - The international environment, particularly the Sino-U.S. technology competition, is becoming a dominant variable, clearly dividing investment tracks into "safe zones" and "risk zones" [10] - The biggest opportunities may lie in "curve innovation" areas, such as establishing Chinese-led IoT standards in smart home appliances, which could receive policy and funding support [10][11] - The winners in 2025 are likely to be investors who understand technical details, are familiar with industry ecosystems, and can capture policy trends [11]
R2来之前,DeepSeek又放了个烟雾弹
虎嗅APP· 2025-05-15 13:03
Core Viewpoint - The article discusses DeepSeek's advancements in AI technology, particularly focusing on their V3 model and its cost-effective strategies for optimizing performance in the competitive AI landscape [2][4][6]. Group 1: DeepSeek V3 Model Innovations - DeepSeek V3 utilizes a "multi-head attention mechanism" (MLA) to enhance memory efficiency, significantly reducing memory consumption while processing long texts and multi-turn dialogues [2][3]. - The model adopts a "Mixture of Experts" (MoE) architecture, allowing for efficient collaboration among specialized components, which improves computational efficiency and reduces resource wastage [3][4]. - DeepSeek V3 incorporates FP8 mixed precision training, which allows for lower precision calculations in less sensitive areas, resulting in faster training speeds and reduced memory usage without sacrificing final model performance [3][4]. Group 2: Technical Optimizations - The model features a "multi-plane network topology" that optimizes data transfer paths within GPU clusters, enhancing overall training speed by minimizing congestion and bottlenecks [4]. - DeepSeek's approach emphasizes the importance of cost-effectiveness and hardware-software synergy, suggesting that even without top-tier hardware, significant advancements can be achieved through engineering optimization and algorithm innovation [4][6]. Group 3: Market Context and Implications - The article highlights the competitive landscape of AI, where leading firms are engaged in intense competition over model parameters and application ecosystems, while also facing rising computational costs and unclear commercialization paths [6][7]. - DeepSeek's recent developments signal a shift towards efficiency and targeted value creation, indicating that the ability to leverage existing resources and address real-world needs will be crucial for success in the evolving AI market [6][7].
梁文锋参与发表回顾性论文:DeepSeek首次揭秘V3模型背后扩展方案
news flash· 2025-05-15 10:57
DeepSeek刚刚发表了一篇名为《深入解读 DeepSeek-V3:AI 架构的扩展挑战与硬件思考》(Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures)的回顾性论文,梁 文锋也是作者之一。这篇论文深入剖析了最新的大模型DeepSeek-V3及其AI基础设施扩展方案, DeepSeek-V3的实践充分证明了硬件-软件协同设计在提升AI系统可扩展性、效率和鲁棒性方面的巨大潜 力。(AI寒武纪) ...
R2来之前,DeepSeek又放了个烟雾弹
Hu Xiu· 2025-05-15 10:52
Core Insights - DeepSeek has been actively preparing for the release of its anticipated R2 model, with recent developments serving as a precursor to its launch [1][7] - The company’s recent V3 paper highlights its innovative cost-reduction strategies, showcasing its technical capabilities and addressing industry pain points related to high computational costs [2][6] Cost-Reduction Strategies - DeepSeek V3 employs a "memory system" optimization through a Multi-Head Attention mechanism, significantly reducing memory consumption while processing long texts and dialogues [2][3] - The company utilizes a "Mixture of Experts" (MoE) architecture, allowing for efficient task delegation among specialized models, enhancing computational efficiency and resource management [3][4] - By adopting FP8 mixed precision, DeepSeek reduces computational load and memory usage without compromising model performance, demonstrating that lower precision can be sufficient in many training scenarios [3][4] Technical Innovations - The implementation of a "multi-plane network topology" enhances data exchange efficiency among GPU clusters, improving overall training speed [4] - DeepSeek's recent advancements signal a shift towards maximizing existing hardware capabilities through engineering optimizations and algorithmic innovations, making high-performance models accessible without top-tier hardware [4][6] Market Context - The backdrop of rising computational costs and unclear commercialization paths in the AI industry emphasizes the importance of efficiency and targeted value creation, as highlighted by DeepSeek's recent initiatives [6][7] - The competitive landscape is characterized by rapid technological iterations among leading firms, with DeepSeek positioning itself as a player focused on practical applications and resource optimization [6][7] Anticipation for Future Developments - The market is eagerly awaiting not just the performance of the upcoming R2 model, but also the innovative approaches and insights that DeepSeek may bring to the industry [7]
ICML 2025 | 大模型深度思考新范式:交替「推理-擦除」解决所有可计算问题
机器之心· 2025-05-15 06:04
作者介绍:本文第一作者是丰田工业大学芝加哥 PhD 学生杨晨晓,研究兴趣是机器学习理论和大模型推理,在 ICML,NeurIPS,ICLR 等顶级会议上发表过论 文。 本文提出一个 交替 「推理 - 擦除 」的深度思考新范式 PENCIL ,比传统 CoT 更高效地解决更复杂的推理任务。理论上,我们证明 PENCIL 可用 最优空间 与 最 优时间 下解决所有可计算问题,而这对于传统的 CoT 是不可能的!该工作已被机器学习顶会 ICML 2025 收录。 最近的大模型(如 OpenAI 的 o1/o3、DeepSeek 的 R1)发现能通过在测试阶段 深度思考(Test-Time Scaling) 来大幅提高模型的推理能力。目前实现深度思考的 关键在于使用 长链思维链(Long Chain-of-Thought,CoT) ,即让模型生成更长中间结果得到最终答案。然而,传统 「只写不擦 」的方法在处理高难度、大规 模任务时面临以下瓶颈: 不过实际上,并非所有中间思路都后续推理有用:例如定理证明里,引理一旦验证通过,其具体推导可被丢弃;解数学题时,已知某条思路走不通就无需保留那 段 「尝试 」的细节。纵观计算机 ...