DeepSeek

Search documents
2025搜狐科技年度论坛在京举办
Zhong Zheng Wang· 2025-05-18 09:25
清华大学计算机系教授、中国工程院院士郑纬民在关于人工智能大模型基础设施建设与应用探索的演讲 中表示,2025年人工智能发展呈现两大特点:一是多模态,二是应用于GDP密切相关的行业,其中中国 在推动AI落地方面具有显著优势。他进一步介绍,人工智能大模型的生命周期主要包括数据获取、预 处理、模型训练、微调和推理五个环节,前三个环节需要大量算力和存储资源,通常由阿里、华为、 DeepSeek等大型科技公司完成,一般单位只需基于已有基础模型进行领域适配的微调和后续的推理应 用。 "问道智能"圆桌论坛上,多位专家对于机器智能的认知能力和人形机器人未来发展进行了深入讨论。 多位嘉宾一致认为,AI并非人类的替代者,而是人类认知与能力的延伸。清华大学智能产业研究院院 长、中国工程院外籍院士张亚勤表示,在认知中人类仍是主宰,机器或者机器人还在从属地位。 中证报中证网讯(记者王婧涵)5月17日,2025搜狐科技年度论坛在北京举办,多位专家与产业界人士围 绕基础科学突破、技术革命产业化应用、人工智能与人类文明演进等议题进行了探讨。 搜狐创始人张朝阳表示,2024年以来,AI发展进入快车道,具身智能百花齐放。科技进步带来惊喜的 同时 ...
北大校友、OpenAI前安全副总裁Lilian Weng关于模型的新思考:Why We Think
Founder Park· 2025-05-18 07:06
Core Insights - The article discusses recent advancements in utilizing "thinking time" during testing and its mechanisms, aiming to enhance model performance in complex cognitive tasks such as logical reasoning, long text comprehension, mathematical problem-solving, and code generation and debugging [4][5]. Group 1: Motivating Models to Think - The core idea is closely related to human thinking processes, where complex problems require time for reflection and analysis [9]. - Daniel Kahneman's dual process theory categorizes human thinking into two systems: fast thinking, which is quick and intuitive, and slow thinking, which is deliberate and logical [9][13]. - In deep learning, neural networks can be characterized by the computational and storage resources they utilize during each forward pass, suggesting that optimizing these resources can improve model performance [10]. Group 2: Thinking in Tokens - The strategy of generating intermediate reasoning steps before producing final answers has evolved into a standard method, particularly in mathematical problem-solving [12]. - The introduction of the "scratchpad" concept allows models to treat generated intermediate tokens as temporary content for reasoning processes, leading to the term "chain of thought" (CoT) [12]. Group 3: Enhancing Reasoning Capabilities - CoT prompting significantly improves success rates in solving mathematical problems, with larger models benefiting more from increased "thinking time" [16]. - Two main strategies to enhance generation quality are parallel sampling and sequential revision, each with its own advantages and challenges [18][19]. Group 4: Self-Correction and Reinforcement Learning - Recent research has successfully utilized reinforcement learning (RL) to enhance language models' reasoning capabilities, particularly in STEM-related tasks [31]. - The DeepSeek-R1 model, designed for high-complexity tasks, employs a two-stage training process combining supervised fine-tuning and reinforcement learning [32]. Group 5: External Tools and Enhanced Reasoning - The use of external tools, such as code interpreters, can efficiently solve intermediate steps in reasoning processes, expanding the capabilities of language models [45]. - The ReAct method integrates external operations with reasoning trajectories, allowing models to incorporate external knowledge into their reasoning paths [48][50]. Group 6: Monitoring and Trustworthiness of Reasoning - Monitoring CoT can effectively detect inappropriate behaviors in reasoning models, such as reward hacking, and enhance robustness against adversarial inputs [51][53]. - The article highlights the importance of ensuring that models faithfully express their reasoning processes, as biases can arise from training data or human-written examples [55][64].
AI周报|智能体平台Manus开放注册;梁文锋署名DeepSeek新论文
Di Yi Cai Jing· 2025-05-18 06:47
Group 1 - DeepSeek-V3 addresses "hardware bottlenecks" through four innovative technologies: memory optimization, computation optimization, communication optimization, and inference acceleration [1] - Manus AI platform has opened registration, offering users free points and various subscription plans, indicating growing interest and potential for investment [1] - Nvidia has secured a significant chip supply agreement with Saudi Arabia's AI company Humain, providing 18,000 GB300 chips for a data center with a capacity of up to 500 megawatts [2] Group 2 - DeepSeek released a new paper detailing cost-reduction methods for the V3 model, emphasizing its ability to achieve large-scale training effects with only 2048 H800 chips [3] - Zhang Yaqin predicts that general artificial intelligence will take 15 to 20 years to achieve, highlighting the challenges in information, physical, and biological intelligence [4] - OpenAI is considering building a new data center in the UAE, which could significantly expand its operations in the Middle East [5][6] Group 3 - The US and UAE are collaborating to build the largest AI park in the Middle East, featuring a 5-gigawatt data center, showcasing the region's commitment to becoming an AI hub [7] - OpenAI launched a new AI programming assistant called Codex, aimed at simplifying software development processes, indicating a growing interest in generative AI tools [8] - Baidu has launched DeepSearch, a deep search engine based on a vast content library, marking a significant advancement in search technology [9] Group 4 - Google announced the establishment of the "AI Future Fund" to support AI startups, aiming to discover the next OpenAI and accelerate innovation in the field [10] - INAIR unveiled an AI spatial computer, set to launch in June, which combines AR glasses, a computing center, and a 3D keyboard, indicating advancements in AR technology [12] - Perplexity AI is in late-stage negotiations for a $500 million funding round at a $14 billion valuation, reflecting the company's growth amid the AI boom [13] Group 5 - Tencent reported a 91% year-on-year increase in capital expenditure in Q1 2025, primarily to support AI-related business development [14] - Tencent's president stated that the company has sufficient high-end chips to train future models, addressing the high demand for GPU resources in AI applications [15]
刚刚!北大校友Lilian Weng最新博客来了:Why We Think
机器之心· 2025-05-18 04:25
Core Insights - The article discusses advancements in utilizing "thinking time" during model inference, aiming to enhance the reasoning capabilities of AI models like GPT, Claude, and Gemini [2][3][16]. Group 1: Thinking Mechanisms - The concept of "thinking time" is analogous to human cognitive processes, where complex problems require reflection and analysis before arriving at a solution [6]. - Daniel Kahneman's dual process theory categorizes human thinking into fast (System 1) and slow (System 2) modes, emphasizing the importance of slower, more deliberate thought for accurate decision-making [12]. Group 2: Computational Resources - In deep learning, neural networks can be characterized by the computational and storage resources they utilize during each forward pass, impacting their performance [8]. - The efficiency of models can be improved by allowing them to perform more computations during inference, particularly through strategies like Chain of Thought (CoT) prompting [8][18]. Group 3: Chain of Thought (CoT) and Learning Strategies - CoT prompting significantly enhances the success rate of solving mathematical problems, with larger models benefiting more from extended "thinking time" [16]. - Early research focused on supervised learning from human-written reasoning paths, evolving into reinforcement learning strategies that improve CoT reasoning capabilities [14][41]. Group 4: Test-Time Computation Strategies - Two main strategies for improving generation quality are parallel sampling and sequential revision, each with distinct advantages and challenges [19][20]. - Parallel sampling is straightforward but relies on the model's ability to generate correct answers in one go, while sequential revision allows for targeted corrections but is slower [20][21]. Group 5: Reinforcement Learning Applications - Recent studies have successfully employed reinforcement learning to enhance reasoning capabilities in language models, particularly in STEM-related tasks [41][46]. - The training process often involves a cold-start phase followed by reasoning-oriented reinforcement learning, optimizing performance through structured feedback [42][43]. Group 6: External Tools and Integration - Utilizing external tools, such as code interpreters or APIs, can enhance the reasoning process by offloading certain computational tasks [52][56]. - The ReAct method combines external operations with reasoning trajectories, allowing models to incorporate external knowledge into their inference paths [56][57]. Group 7: Model Interpretability and Trustworthiness - The article highlights the importance of model interpretability, particularly through CoT, which allows for monitoring and understanding model behavior [59]. - However, there are concerns regarding the fidelity of CoT outputs, as biases and errors can affect the reliability of the reasoning process [62][64]. Group 8: Adaptive Computation and Token Utilization - Adaptive computation time allows models to dynamically adjust the number of computation steps during inference, enhancing their reasoning capabilities [81]. - Introducing special tokens, such as thinking tokens, can provide additional processing time and improve model performance on complex tasks [85][89].
英国《金融时报》刊文:中国是如何赶上硅谷的
Huan Qiu Wang Zi Xun· 2025-05-16 22:58
去年,硅谷的未来学家们与一位推崇化石燃料的怀旧政治人物结成不稳定联盟——美国当选总统向美国 科技巨头们承诺放松监管。但就在他们出席其就职典礼的当天,中国初创公司 DeepSeek发布人工智能 模型,其性能似乎与美国竞争对手不相上下,只是更便宜、更节能。不久后,中国一家企业发布世界上 最快的电动汽车充电技术,华为开始向外国人销售可与苹果最新款手机媲美的手机…… 来源:环球时报 谷歌前首席执行官埃里克·施密特写道:"(如今)中国在各种技术领域与美国不相上下,甚至领 先。"英伟达首席执行官黄仁勋也认为,中国在人工智能领域"并不落后"。国防科技公司Anduril的创始 人帕尔默·拉奇表示,中国的造船能力是美国的350倍。优步联合创始人特拉维斯·卡兰尼克说,要想看到 线上送餐的未来,"你不能去纽约,而应该去上海"。一名美国人对把饺子送到中国酒店房间的机器人赞 叹不已……从科技企业家转行成为投资人的尼克·丹顿说:"不管愿不愿意,他们都是'中国赢'论的最有 力拥护者。"哈佛大学肯尼迪政府学院研究美亚关系的拉纳·米特尔教授说,虽然中国的城市并不是唯一 可能的未来,中国的农村也肯定不是唯一可能的未来,但"现在以某种方式给人留下 ...
突袭Cursor,Windsurf抢发自研大模型!性能比肩Claude 3.5、但成本更低,网友好评:响应快、不废话
AI前线· 2025-05-16 15:39
Core Viewpoint - Windsurf has launched its first AI software engineering model family, SWE-1, aimed at optimizing the entire software engineering process beyond just coding tasks [1][2][9]. Group 1: Model Details - The SWE-1 series includes three specific models: SWE-1, SWE-1-lite, and SWE-1-mini, each designed for different functionalities and user needs [2][6][27]. - SWE-1 is comparable to Claude 3.5 Sonnet in reasoning ability but at a lower service cost, while SWE-1-lite replaces the previous Cascade Base model with improved quality [6][27]. - SWE-1-mini focuses on speed and is designed for passive prediction tasks, operating within latency constraints [6][27]. Group 2: Performance and Evaluation - Windsurf claims that SWE-1's performance is close to leading models and superior to non-leading and open-weight models, based on offline evaluations and production experiments [14][20][21]. - The offline evaluation involved benchmark tests comparing SWE-1 with models like Cascade and DeepSeek, focusing on usability, efficiency, and accuracy [15][18][20]. - Production experiments measured user engagement and model utility, with Claude as a benchmark for comparison [21][22][24]. Group 3: Development Philosophy - Windsurf aims to enhance software development speed by 99%, recognizing that coding is only a small part of the software engineering process [9][10][12]. - The company emphasizes the need for models to handle various tasks beyond coding, including accessing knowledge, testing software, and understanding user feedback [9][10]. - The development of SWE-1 is part of Windsurf's broader strategy to create a "software engineering" model that can automate more workflows and improve overall efficiency [12][30][33]. Group 4: Future Directions - Windsurf is committed to continuous improvement and investment in the SWE model family, aiming to surpass the performance of leading research lab models [27][33]. - The concept of "flow awareness" is central to the development of SWE-1, allowing seamless interaction between users and AI [29][30]. - The company believes that leveraging insights from user interactions will guide future enhancements and ensure the model meets user expectations [30][33].
杭州市创业投资协会周恺秉:杭州科创崛起离不开两个“微小但重要”的变量
2 1 Shi Ji Jing Ji Bao Dao· 2025-05-16 13:02
作为杭州科技创新体系建设的重要参与者和亲历者,周恺秉曾长期负责杭州市创业投资引导基金管理工 作。自20世纪90年代起,他持续呼吁地方财政和企事业单位加大科技投入;2011年提出应关注创业投资 项目的退出管理机制;2015年,他撰文建议杭州构建"硅谷式"的创业生态系统。2025年4月,《21世纪 经济报道》在杭州独家对话周恺秉,听他分享杭州创业投资体系演进的经验与思考。 口述 / 中国投资发展促进会副会长、杭州市创业投资协会轮值会长 周恺秉 采访整理 / 21世纪经济报道记者 赵娜 过去几十年,说起硅谷,人们总会提到它鼓励冒险、宽容失败、以人为本的创业文化。那么,光有这些 就够了吗? 事实是,世界至今未能复制出第二个硅谷。也许我们的理解还有偏差,或者说,忽略了一些微小但重要 的因素。 我在2020年曾提出一个创新公式:Innovations = F(Culture,System,VC,...) 创新是多个变量叠加形成的函数。第一是敢于冒险、宽容失败的文化;第二是市场经济的体制机制;第 三是活跃推动创新创业的资本。当然,还有创业生态、营商环境、教育医疗等其他条件。 当杭州选定了这个公式,后面的发展就变成了"时间的 ...
安联投资:当下或许是把握收益基金稳健潜力的好时机
Zhi Tong Cai Jing· 2025-05-16 08:17
Core Insights - The current market environment, characterized by significant volatility in the U.S. stock market and uncertain interest rate outlook, presents a favorable opportunity for income funds to provide stable returns [1][2][4] Group 1: Benefits of Income Funds - Income funds focus on generating regular returns through investments in dividend-paying stocks, specific types of bonds, and alternative assets, which can help investors manage their daily financial needs amidst market fluctuations [2][3] - The rising bond yields, particularly in low-interest-rate risk bonds like short-duration bonds and floating-rate notes, enhance the potential returns for income funds [3][4] - Income funds typically invest in large, stable companies with consistent performance, contrasting with growth stocks that exhibit higher volatility and lower dividend payouts [3][4] Group 2: Current Market Conditions - The U.S. stock market has experienced significant fluctuations, with technology stocks particularly affected, raising concerns about high valuations and potential inflation due to government policies [2][4] - The anticipated long-term high-interest rate environment poses challenges for core bond holders, but floating-rate notes and other fixed-income instruments may be less impacted [4][6] - Diversification is crucial, as the balance between stocks and bonds will be essential for wealth protection and accumulation in the coming years [5][6] Group 3: Suitability of Income Funds - Income funds may not be suitable for all investors; those seeking aggressive returns or longer investment horizons might prefer growth-oriented assets [6] - For investors prioritizing stable returns and less exposure to price volatility, income funds are increasingly attractive in the current unpredictable market landscape [6]
疆亘资本总裁胡仲江:GP从“财务出资人”升级为“生态建筑师”
Sou Hu Cai Jing· 2025-05-16 06:41
Group 1 - The emergence of DeepSeek signifies a shift in local governments' understanding of "core competitiveness," moving from tax incentives to a new battleground focused on "data sovereignty" [3][6] - The role of General Partners (GPs) is evolving from "financial investors" to "ecosystem architects," requiring enhanced data analysis capabilities to help governments quantify data value and design compliant data usage frameworks [3][6] - The rise of DeepSeek is prompting deeper exploration of cooperation models among governments, enterprises, and investment institutions, moving away from traditional subsidy models to new mechanisms based on value co-creation and risk-sharing [7] Group 2 - DeepSeek's success represents a restructuring of productivity tools, utilizing a model with 7 billion parameters to achieve the effectiveness of 100 billion parameter models, reducing deployment costs by 90% [4] - The transformation in AI applications reveals that while less data can yield practical results, core technology still relies on foreign infrastructure, pushing investors to seek opportunities that allow AI to take root in industries [5] - The investment focus is shifting towards AI platforms that enable enterprises to build applications independently and ensure sustainable data resource revenue [5] Group 3 - The return of cultural confidence in China is reshaping the economic value system, with traditional cultural symbols entering mainstream life through various mediums, marking a response to Western consumerism [8] - Three evolving investment logics are emerging: a reconstruction of cultural valuation systems, a shift in the paradigm of technological empowerment, and an elevation of cultural consumption scenarios [8][9] - The challenge lies in balancing cultural dignity with commercial efficiency, with sustainable cultural assets emerging from projects that maintain cultural purity while establishing modern value exchange systems [9] Group 4 - The Chinese primary market in 2025 is expected to present a complex landscape of "ice and fire," with both new opportunities and transitional challenges [10] - Investment direction is shifting from broad trends to a focus on industry details, with specialized funds gaining an advantage over those following trends [10] - The exit strategies for investments are being reshaped, with a move towards industrial mergers and acquisitions as traditional public listings become less reliable [10] Group 5 - The international environment, particularly the Sino-U.S. technology competition, is becoming a dominant variable, clearly dividing investment tracks into "safe zones" and "risk zones" [10] - The biggest opportunities may lie in "curve innovation" areas, such as establishing Chinese-led IoT standards in smart home appliances, which could receive policy and funding support [10][11] - The winners in 2025 are likely to be investors who understand technical details, are familiar with industry ecosystems, and can capture policy trends [11]
R2来之前,DeepSeek又放了个烟雾弹
虎嗅APP· 2025-05-15 13:03
Core Viewpoint - The article discusses DeepSeek's advancements in AI technology, particularly focusing on their V3 model and its cost-effective strategies for optimizing performance in the competitive AI landscape [2][4][6]. Group 1: DeepSeek V3 Model Innovations - DeepSeek V3 utilizes a "multi-head attention mechanism" (MLA) to enhance memory efficiency, significantly reducing memory consumption while processing long texts and multi-turn dialogues [2][3]. - The model adopts a "Mixture of Experts" (MoE) architecture, allowing for efficient collaboration among specialized components, which improves computational efficiency and reduces resource wastage [3][4]. - DeepSeek V3 incorporates FP8 mixed precision training, which allows for lower precision calculations in less sensitive areas, resulting in faster training speeds and reduced memory usage without sacrificing final model performance [3][4]. Group 2: Technical Optimizations - The model features a "multi-plane network topology" that optimizes data transfer paths within GPU clusters, enhancing overall training speed by minimizing congestion and bottlenecks [4]. - DeepSeek's approach emphasizes the importance of cost-effectiveness and hardware-software synergy, suggesting that even without top-tier hardware, significant advancements can be achieved through engineering optimization and algorithm innovation [4][6]. Group 3: Market Context and Implications - The article highlights the competitive landscape of AI, where leading firms are engaged in intense competition over model parameters and application ecosystems, while also facing rising computational costs and unclear commercialization paths [6][7]. - DeepSeek's recent developments signal a shift towards efficiency and targeted value creation, indicating that the ability to leverage existing resources and address real-world needs will be crucial for success in the evolving AI market [6][7].