Workflow
Large Language Model
icon
Search documents
X @The Economist
The Economist· 2025-09-18 15:30
The Emiratis’ carefully calibrated large language model https://t.co/KYKJ4kgPct ...
DeepSeek-R1登上Nature封面:朝着AI透明化迈出的可喜一步
3 6 Ke· 2025-09-18 02:02
Core Insights - The value of open-source artificial intelligence (AI) is gaining broader recognition, highlighted by the publication of the DeepSeek-R1 paper in the prestigious journal Nature, with founder Liang Wenfeng as the corresponding author [1][5]. Research Findings - The research team hypothesized that human-defined reasoning patterns might limit model exploration, and unrestricted reinforcement learning (RL) training could better stimulate the emergence of new reasoning capabilities in large language models (LLMs) [3][8]. - Experiments demonstrated that the reasoning ability of LLMs can be enhanced through pure RL, reducing the need for human input, and outperforming traditionally trained LLMs in tasks such as mathematics, programming competitions, and graduate-level STEM problems [3][9]. Model Evaluation - Following the launch of DeepSeek-R1, it received widespread acclaim from global developers, achieving 91.1k stars on GitHub [4]. - Nature's editorial recognized DeepSeek-R1 as the first mainstream LLM published after peer review, marking a significant step towards transparency in AI [5][17]. - The editorial emphasized the importance of peer-reviewed publications in clarifying LLM operations and assessing their authenticity [6][17]. Methodology - The research introduced a new paradigm within the RL framework, minimizing reliance on human-annotated reasoning processes and exploring the potential for LLMs to develop reasoning capabilities through self-evolution [9][10]. - The team proposed a RL algorithm called "Group Relative Policy Optimization" (GRPO) and trained various models, including DeepSeek-R1-Zero and DeepSeek-R1, based on the foundational model DeepSeek-V3 Base [10][12]. Training Phases - The training process involved multiple stages, with each subsequent model improving upon the previous one in terms of reasoning and instruction-following capabilities [14]. - DeepSeek-R1 demonstrated strong reasoning abilities aligned with human preferences, achieving superior performance across 21 mainstream benchmarks, validating the effectiveness of the RL framework [15][16]. Industry Implications - The editorial raised concerns about the lack of independent peer review for many widely used LLMs, highlighting the need for transparency and accountability in the AI industry [17][18]. - Nature called for more AI companies to submit their models for publication review, emphasizing that peer review can enhance trust and credibility in AI research [18][19].
DeepSeek-R1开创历史,梁文锋论文登上《自然》封面
Di Yi Cai Jing· 2025-09-17 23:09
与今年1月发布的DeepSeek-R1的初版论文相比,本次论文披露了更多模型训练的细节,并正面回应了 模型发布之初的蒸馏质疑。 DeepSeek-R1也是全球首个经过同行评审的主流大语言模型。Nature评价道:目前几乎所有主流的大模 型都还没有经过独立同行评审,这一空白"终于被DeepSeek打破"。 本次论文正面回应了模型发布之初的蒸馏质疑。 由DeepSeek团队共同完成、梁文锋担任通讯作者的DeepSeek-R1推理模型研究论文,登上了国际权威期 刊《自然(Nature)》的封面。 ...
X @The Economist
The Economist· 2025-09-17 18:01
We analysed each speech using OpenAI’s large language model, requesting that it assess how controversial King Charles’s remarks had been in the past three decades. This is what the results showed https://t.co/vCu2vKkDdu ...
100轮工具调用,8B小模型也能做复杂长搜索!MiniMax&港科大最新开源
量子位· 2025-09-12 08:46
不圆 发自 凹非寺 量子位 | 公众号 QbitAI 网络搜索Agent效果不好,猛猛投喂一波数据,表现还那样,咋回事? 港科大&MiniMax团队指出问题核心:不是模型参数不够多,而是缺乏足够有挑战性的训练数据。 换句话说,别死记硬背了,来做点"真题"吧。 他们提出了一种构建高质量QA对的方法 WebExplorer 。 用该方法构建的数据集去训练,即使是较小的模型,也可以在复杂、长程的搜索任务上超越更大的模型。 训练后的8B模型支持高达 128K的上下文长度 和 100次工具调用轮次 的长期推理,能在参数量低于10B的模型中取得顶尖结果。 网友评价:用模型驱动的方式做探索,确实比传统图谱方法更能让智能体的浏览行为变灵活。 模型及数据集均已开源,链接可见文末。 优质训练数据稀缺 随着大语言模型(LLM)的快速发展,智能体的能力边界不断扩展。 网络搜索智能体作为这一发展的重要组成部分,能够自主地从广泛的在线资源中检索信息;长视野(Long-Horizon)网络智能体更是需要在 多个网站间进行复杂的推理和搜索。 可是呢, 现有的开源网络智能体在处理复杂搜索任务时往往表现有限,更强大的商业模型又缺乏透明的训练细节 ...
阿里通义千问发布迄今最大模型——Qwen3-Max-Preview
Xin Lang Cai Jing· 2025-09-05 16:40
Core Insights - Alibaba's Tongyi Qianwen has launched its largest model to date, Qwen3-Max-Preview, with a parameter count of 1 trillion [1] - The new model shows significant enhancements in understanding both Chinese and English, following complex instructions, and tool invocation [1] - Qwen3-Max-Preview also significantly reduces instances of knowledge hallucination [1]
神州泰岳(300002.SZ)目前尚未私有化部署Grok 2.5
Ge Long Hui· 2025-09-03 09:00
Core Insights - The company has integrated multiple product lines through online API interfaces and private deployment of open-source models to connect with general large models like DeepSeek to serve various customer application scenarios [1] Group 1 - The company has multiple business lines and products that have successfully connected to DeepSeek [1] - The current status indicates that the company has not yet privatized the deployment of Grok 2.5 [1]
X @Avi Chawla
Avi Chawla· 2025-09-03 06:31
Core Technologies - Tool Calling enables Large Language Models (LLMs) to determine appropriate actions [1] - MCP (Model Control Plane) infrastructure ensures tool reliability, discoverability, and executability [1] - Tool Calling requests can be routed through the MCP [1]
Claude Code 的设计哲学:Keep Things Simple
Founder Park· 2025-08-31 02:06
Core Insights - The article emphasizes the effectiveness of Claude Code due to its simplicity in design and functionality, contrasting it with other AI assistants that focus on adding features [2][6][33]. Group 1: Design Philosophy - Claude Code adopts an extremely minimalist approach, utilizing a single main loop and a clear set of tools, which allows it to perform 80% of tasks with a low-cost small model [2][4][14]. - The system is designed to manage its own task list, marking progress autonomously, which enhances user experience by reducing the need for manual input [2][11][27]. - The use of a context file (claude.md) is crucial for remembering user preferences and coding habits, significantly improving the interaction quality [19][20]. Group 2: Model Utilization - Over 50% of the important LLM calls in Claude Code utilize the smaller Haiku model, which is cost-effective and sufficient for most tasks, leading to a reduction in operational costs by 70-80% [17][18]. - The article suggests that using smaller models for the majority of tasks can simplify the system and improve performance [17][18]. Group 3: Prompt Engineering - Claude Code's prompts are highly detailed, containing around 2800 tokens for system prompts and 9400 tokens for tool descriptions, which serve as comprehensive guidelines for the model [18][22]. - The article highlights the importance of using XML tags and Markdown to organize prompts effectively, which enhances clarity and usability [21][22]. Group 4: Task Management - The system's ability to maintain a to-do list autonomously helps prevent context decay over time, allowing the model to stay focused on tasks [27]. - The article critiques the multi-agent approach, advocating for a single-agent system that can manage tasks efficiently without the added complexity [15][27]. Group 5: Tool Design - Claude Code employs a mix of low-level and high-level tools, allowing for flexibility in task execution while maintaining clarity in tool usage [24][25]. - The article stresses the importance of providing detailed tool descriptions and examples to guide the model in its operations [25][26]. Group 6: Overall Takeaway - The primary lesson from Claude Code's design is to keep things simple, as complexity can hinder performance and make debugging more challenging [33].
每周观察 | 英伟达机器人“新大脑”推升芯片市场规模有望达4,800万美元以上;2Q25 NAND Flash营收季增逾20%
TrendForce集邦· 2025-08-29 03:44
Group 1 - NVIDIA's newly launched Jetson Thor is considered the physical intelligence core for robots, featuring Blackwell GPU and 128 GB memory, achieving 2070 FP4 TFLOPS AI computing power, which is 7.5 times that of the previous Jetson Orin [2] - The introduction of Jetson Thor enables advanced humanoid robots to process large sensory data and large language models (LLM) in real-time, enhancing their ability to see, think, and act [2] - The humanoid robot chip market is expected to exceed $4.8 billion by 2028, driven by the adoption of this technology by companies like Agility Robotics, Boston Dynamics, and Amazon [2] Group 2 - In Q2 2025, the NAND Flash industry is projected to see a quarter-over-quarter revenue increase of over 20%, despite a slight decline in average selling prices (ASP) [4] - Major manufacturers have implemented production reduction strategies to alleviate supply-demand imbalances, resulting in significant growth in overall output [4] - The combined revenue of the top five NAND Flash manufacturers reached $14.67 billion in Q2 2025, reflecting a 22% quarter-over-quarter increase [5]