Large Language Model

Search documents
DeepSeek-R1开创历史,梁文锋论文登上《自然》封面
Di Yi Cai Jing· 2025-09-17 23:09
与今年1月发布的DeepSeek-R1的初版论文相比,本次论文披露了更多模型训练的细节,并正面回应了 模型发布之初的蒸馏质疑。 DeepSeek-R1也是全球首个经过同行评审的主流大语言模型。Nature评价道:目前几乎所有主流的大模 型都还没有经过独立同行评审,这一空白"终于被DeepSeek打破"。 本次论文正面回应了模型发布之初的蒸馏质疑。 由DeepSeek团队共同完成、梁文锋担任通讯作者的DeepSeek-R1推理模型研究论文,登上了国际权威期 刊《自然(Nature)》的封面。 ...
X @The Economist
The Economist· 2025-09-17 18:01
We analysed each speech using OpenAI’s large language model, requesting that it assess how controversial King Charles’s remarks had been in the past three decades. This is what the results showed https://t.co/vCu2vKkDdu ...
100轮工具调用,8B小模型也能做复杂长搜索!MiniMax&港科大最新开源
量子位· 2025-09-12 08:46
不圆 发自 凹非寺 量子位 | 公众号 QbitAI 网络搜索Agent效果不好,猛猛投喂一波数据,表现还那样,咋回事? 港科大&MiniMax团队指出问题核心:不是模型参数不够多,而是缺乏足够有挑战性的训练数据。 换句话说,别死记硬背了,来做点"真题"吧。 他们提出了一种构建高质量QA对的方法 WebExplorer 。 用该方法构建的数据集去训练,即使是较小的模型,也可以在复杂、长程的搜索任务上超越更大的模型。 训练后的8B模型支持高达 128K的上下文长度 和 100次工具调用轮次 的长期推理,能在参数量低于10B的模型中取得顶尖结果。 网友评价:用模型驱动的方式做探索,确实比传统图谱方法更能让智能体的浏览行为变灵活。 模型及数据集均已开源,链接可见文末。 优质训练数据稀缺 随着大语言模型(LLM)的快速发展,智能体的能力边界不断扩展。 网络搜索智能体作为这一发展的重要组成部分,能够自主地从广泛的在线资源中检索信息;长视野(Long-Horizon)网络智能体更是需要在 多个网站间进行复杂的推理和搜索。 可是呢, 现有的开源网络智能体在处理复杂搜索任务时往往表现有限,更强大的商业模型又缺乏透明的训练细节 ...
阿里通义千问发布迄今最大模型——Qwen3-Max-Preview
Xin Lang Cai Jing· 2025-09-05 16:40
Core Insights - Alibaba's Tongyi Qianwen has launched its largest model to date, Qwen3-Max-Preview, with a parameter count of 1 trillion [1] - The new model shows significant enhancements in understanding both Chinese and English, following complex instructions, and tool invocation [1] - Qwen3-Max-Preview also significantly reduces instances of knowledge hallucination [1]
神州泰岳(300002.SZ)目前尚未私有化部署Grok 2.5
Ge Long Hui· 2025-09-03 09:00
格隆汇9月3日丨神州泰岳(300002.SZ)在互动平台表示,我司多条业务线多款产品已通过在线API接口以 及对开源模型进行私有化部署的方式接入DeepSeek等通用大模型,用以服务客户各类应用场景。公司 目前尚未私有化部署Grok 2.5。 ...
X @Avi Chawla
Avi Chawla· 2025-09-03 06:31
To sum up:- Tool Calling helps an LLM decide what to do.- MCP is an infrastructure that ensures tools are reliably available, discoverable, and executable.So, a Tool Calling request can be routed through MCP.Here's the visual again for your reference 👇 https://t.co/geB5I6KbqL ...
Claude Code 的设计哲学:Keep Things Simple
Founder Park· 2025-08-31 02:06
Core Insights - The article emphasizes the effectiveness of Claude Code due to its simplicity in design and functionality, contrasting it with other AI assistants that focus on adding features [2][6][33]. Group 1: Design Philosophy - Claude Code adopts an extremely minimalist approach, utilizing a single main loop and a clear set of tools, which allows it to perform 80% of tasks with a low-cost small model [2][4][14]. - The system is designed to manage its own task list, marking progress autonomously, which enhances user experience by reducing the need for manual input [2][11][27]. - The use of a context file (claude.md) is crucial for remembering user preferences and coding habits, significantly improving the interaction quality [19][20]. Group 2: Model Utilization - Over 50% of the important LLM calls in Claude Code utilize the smaller Haiku model, which is cost-effective and sufficient for most tasks, leading to a reduction in operational costs by 70-80% [17][18]. - The article suggests that using smaller models for the majority of tasks can simplify the system and improve performance [17][18]. Group 3: Prompt Engineering - Claude Code's prompts are highly detailed, containing around 2800 tokens for system prompts and 9400 tokens for tool descriptions, which serve as comprehensive guidelines for the model [18][22]. - The article highlights the importance of using XML tags and Markdown to organize prompts effectively, which enhances clarity and usability [21][22]. Group 4: Task Management - The system's ability to maintain a to-do list autonomously helps prevent context decay over time, allowing the model to stay focused on tasks [27]. - The article critiques the multi-agent approach, advocating for a single-agent system that can manage tasks efficiently without the added complexity [15][27]. Group 5: Tool Design - Claude Code employs a mix of low-level and high-level tools, allowing for flexibility in task execution while maintaining clarity in tool usage [24][25]. - The article stresses the importance of providing detailed tool descriptions and examples to guide the model in its operations [25][26]. Group 6: Overall Takeaway - The primary lesson from Claude Code's design is to keep things simple, as complexity can hinder performance and make debugging more challenging [33].
每周观察 | 英伟达机器人“新大脑”推升芯片市场规模有望达4,800万美元以上;2Q25 NAND Flash营收季增逾20%
TrendForce集邦· 2025-08-29 03:44
Group 1 - NVIDIA's newly launched Jetson Thor is considered the physical intelligence core for robots, featuring Blackwell GPU and 128 GB memory, achieving 2070 FP4 TFLOPS AI computing power, which is 7.5 times that of the previous Jetson Orin [2] - The introduction of Jetson Thor enables advanced humanoid robots to process large sensory data and large language models (LLM) in real-time, enhancing their ability to see, think, and act [2] - The humanoid robot chip market is expected to exceed $4.8 billion by 2028, driven by the adoption of this technology by companies like Agility Robotics, Boston Dynamics, and Amazon [2] Group 2 - In Q2 2025, the NAND Flash industry is projected to see a quarter-over-quarter revenue increase of over 20%, despite a slight decline in average selling prices (ASP) [4] - Major manufacturers have implemented production reduction strategies to alleviate supply-demand imbalances, resulting in significant growth in overall output [4] - The combined revenue of the top five NAND Flash manufacturers reached $14.67 billion in Q2 2025, reflecting a 22% quarter-over-quarter increase [5]
Quick Tour of NVIDIA DGX H100
NVIDIA· 2025-08-27 17:44
NVIDIA accelerated computing starts with DGX, the world's AI supercomputer, the engine behind the large language model breakthrough. IHand delivered the world's first DGX to open AI. Since then, half of the Fortune 100 companies have installed DGX AI supercomputers. DGX has become the essential instrument of AI. The GPU of DGX is eight H100 modules.H100 has a transformer engine designed to process models like the amazing chat GPT which stands for generative pre-trained transformers. The eight H100 modules a ...
硅基流动:上线DeepSeek-V3.1,上下文升至160K
Xin Lang Cai Jing· 2025-08-25 12:32
据硅基流动消息,8月25日,硅基流动大模型服务平台上线深度求索团队最新开源的DeepSeek-V3.1。 DeepSeek-V3.1总参数共671B,激活参数37B,采用混合推理架构(同时支持思考模式与非思考模 式)。此外,DeepSeek-V3.1率先支持160K超长上下文,让开发者高效处理长文档、多轮对话、编码及 智能体等复杂场景。 ...