大语言模型
Search documents
11.98万元起售,小鹏MONA M03加推四款新版型升级上市
Bei Jing Shang Bao· 2025-06-04 04:11
北京商报讯(记者 刘晓梦)5月28日,小鹏MONA M03升级上市,并加推四款全新版型,分别为小鹏MONA M03 502长续航Max、小鹏MONA M03 600超长 续航Max,以及小鹏MONA M03 515长续航 Plus、小鹏MONA M03 620超长续航Plus,官方指导价为11.98万—13.98万元。作为MONA系列的进阶产品,本次 更新在智能驾驶、座舱系统、外观配置等多方面集中升级,强化小鹏纯电市场的智能化竞争力。 与此同时,新车首次搭载全球首发的人机共驾功能,实现智能辅助驾驶过程中不强制接管控制,提升人机协同的平滑性与驾驶者掌控感。泊车能力也同步增 强,支持极窄车位、自主出库与全场景车位到车位路径规划,强调智能体验的日常可用性。 智能座舱方面,MONA M03 Max首发天玑系统5.7.0版本,新增超300项功能,语音控制覆盖率超过90%。依托小鹏自研的大语言模型XGPT,座舱实现推 理、百科查询、连续对话等复杂交互,语音响应时间控制在0.9秒内。系统兼容多家主流手机品牌,进一步拓展车机生态的使用边界。 在造型与舒适性方面,新车提供星暮紫、微月灰、星雨青三款原厂新车色,新增两种轮毂设计, ...
助力水稻研究与智能育种 种业大语言模型“丰登·水稻”向全球开放网站
Hai Nan Ri Bao· 2025-06-04 01:19
Core Insights - The "Fengdeng Rice" model, the world's first large language model specifically designed for rice breeding, has been officially launched, integrating a comprehensive rice biological knowledge graph and establishing the largest rice research corpus globally [1][2] - The model aims to enhance the efficiency and quality of agricultural breeding research by providing a deep understanding of crop biology and specialized reasoning capabilities [1][2] Group 1: Model Development - The research team has constructed the largest rice research corpus, integrating over 1.4 million Chinese and English publications, covering more than 98% of published results in the field [2] - The "Fengdeng" model was developed based on Alibaba's Tongyi Qianwen model, with continuous training and fine-tuning processes [2] - An automated evaluation dataset, SeedBench, was created, containing 1,975 question-answer pairs across 10 task categories, demonstrating the model's superior accuracy compared to mainstream models [2] Group 2: Evaluation and Performance - A high-quality human evaluation dataset, HumanDesignRiceQA, was designed with 253 specialized questions focusing on key topics such as gene function and molecular design breeding, evaluated by 326 reviewers, including 83 senior experts in rice research [2] - The results indicate that the "Fengdeng" model outperforms OpenAI's GPT-4 and the average performance of undergraduate students in terms of answer quality [2] Group 3: Knowledge Graph and Practical Applications - The research team has also developed the world's first rice multi-omics knowledge graph, integrating data from 1,879 publications related to rice transcriptomics and proteomics, encompassing over 400,000 nodes and 1.57 million edges [3] - The model's capabilities in language understanding and knowledge reasoning position it as a critical tool for supporting rice research and intelligent breeding [3] - The "Fengdeng" service enables collaborative reasoning across structured graphs, allowing for precise queries and integration of multidimensional evidence [3]
超越GPT-4o!华人团队新框架让Qwen跨领域推理提升10%,刷新12项基准测试
量子位· 2025-06-04 00:17
Core Insights - A new reinforcement learning method called General-Reasoner has significantly improved the performance of the Qwen series models, surpassing GPT-4o in various benchmarks [1][2]. Group 1: Methodology and Innovations - The General-Reasoner framework enhances cross-domain reasoning accuracy by nearly 10%, addressing limitations of existing Zero-RL methods that focus on single-domain data and rigid validation methods [2][4]. - The research team created a comprehensive reasoning dataset, WebInstruct-verified, consisting of approximately 230,000 high-quality, verifiable reasoning questions across multiple fields such as physics, chemistry, and finance [5][9]. - The dataset was derived from WebInstruct, which initially included around 5 million natural instructions, with a rigorous filtering process to ensure quality and relevance [6][7]. Group 2: Validation Mechanism - A new generative answer verifier, General-Verifier, was developed to replace traditional rule-based validation, significantly improving the accuracy of answer verification across diverse domains [13]. - General-Verifier, with only 1.5 billion parameters, generates reasoning processes and outputs binary correctness judgments, providing accurate and interpretable feedback for reinforcement learning [13]. Group 3: Performance Metrics - The General-Reasoner framework was tested on 12 benchmark tests, showing a 10% improvement in cross-domain tasks compared to the base models, with specific accuracy rates such as 58.9% for Qwen2.5-7B-Base in MMLU-Pro [15]. - The optimal model, General-Reasoner-Qwen3-14B, achieved competitive results against GPT-4o, with accuracy rates of 56.1% in GPQA and 54.4% in TheoremQA [15][16]. Group 4: Future Directions - The research team aims to further optimize model performance, expand high-quality reasoning data across more domains, and enhance the robustness of the verifier to facilitate broader applications of large language models in complex real-world tasks [17].
工银瑞信马丽娜:两大方向布局AI核心主线
券商中国· 2025-06-03 23:15
Core Viewpoint - The article emphasizes the ongoing investment trend in artificial intelligence (AI) led by DeepSeek since 2025, with a focus on public funds, particularly the upcoming launch of the 工银科技先锋混合发起式基金 managed by Marina, which targets high-quality companies in the AI industry chain [1][2]. Investment Focus - The new fund will concentrate on two main areas: AI infrastructure and AI semiconductors, as well as AI applications, reflecting the current technological trends driven by large language models [2][8]. - Marina's investment strategy involves identifying companies that benefit from industry trends, focusing on those with high performance growth, valuation flexibility, and competitive barriers [5][6]. Fund Management Background - Marina has a strong academic background in microelectronics and computer science from Peking University and has been with 工银瑞信基金 for 10 years, specializing in technology industry research and investment [3]. - The 工银科技先锋 fund represents Marina's latest move in the AI industry chain, differing from her previous fund, 工银新兴制造, by having a broader investment scope that includes more AI applications [3][4]. Market Trends and Predictions - The article outlines that the current AI investment wave is characterized by the development of large language models, with significant advancements expected in AI applications over the next 3-5 years as model capabilities improve and costs decrease [4][8]. - The article also highlights that the hardware infrastructure in China is catching up, and the gap in model development between China and the US is narrowing, suggesting a potential advantage for domestic applications due to a large internet market and a well-established robotics industry [8][9].
“互联网女皇”AI报告图解版:AI采用速度前所未有,推理成本暴跌99.7%
3 6 Ke· 2025-06-03 12:14
在隐退五年后,被誉为"互联网女皇"的传奇风险投资家玛丽·米克尔于近日发布长达340页的《AI趋势报告》。这份被业界称为"AI圣 经"的文档,用51次"前所未有"的表述宣告:人工智能革命已进入不可逆的爆发期,人类正站在技术奇点的临界点。 在报告中,米克尔利用大量图表详尽呈现了人工智能技术在开发速度、应用广度、资金投入和使用规模方面的爆炸性增长,并质疑 OpenAI等AI巨头的"烧钱模式"是否能持续下去。 下面,就让我们以图表的形式解读下这份报告的核心内容: 用户的AI采用速度是前所未有的 报告显示, 人工智能时代的来临标志,是AI用户群的激增。 与互联网1.0革命的技术起步于美国,然后稳步向全球扩散不同的是,ChatGPT一下子登上了世界舞台,并在全球大部分地区同时增长。 作为衡量算力的基本计量单位,浮点运算次数在2010年以后开始增速显著增加,年增长率达到360%。 如果以美国计算相关专利授权数量为例,可以发现,第一次加速是在1995年,标志着互联网时代的开始。2004年起,其增速放缓,标志 着互联网时代的发展也开始变慢。在2022年ChatGPT发布之后,专利数量又一次开始爆发式增长,而且比1995年那次更 ...
“不用 Cursor和 ChatGPT、手写代码的开发者,怕不是疯了?”
3 6 Ke· 2025-06-03 08:53
Core Viewpoint - The article discusses the contrasting perspectives on AI, particularly large language models (LLMs), in software development, highlighting the divide between supporters and skeptics [3][10][26]. Group 1: Supporters' Perspective - Supporters argue that AI tools have significantly improved efficiency in software development, with examples such as Kenton Varda from Cloudflare completing a project in days that would have taken weeks or months without AI assistance [7]. - The use of AI in programming is seen as a major technological breakthrough, with the potential to transform the development process and reduce the barriers to entry for new developers [2][12]. - AI tools can handle repetitive coding tasks, allowing developers to focus on more complex problems and enhancing overall productivity [13][15]. Group 2: Skeptics' Perspective - Skeptics believe that AI is overhyped and that many developers still prefer traditional coding methods, viewing reliance on AI as a sign of incompetence [4][8]. - Concerns are raised about the quality of AI-generated code, with some experienced developers dismissing it as "garbage" and expressing reluctance to use AI tools [8][21]. - The debate on AI's role in programming has sparked extensive discussions online, indicating a significant divide in the developer community [6][10]. Group 3: The Role of AI in Programming - The article emphasizes that while AI can assist in coding, it is crucial for developers to understand the code being generated to ensure quality and reliability [16][17]. - AI's ability to automate mundane tasks is highlighted as a way to free developers from repetitive work, allowing them to engage in more meaningful and creative aspects of software development [23][25]. - The emergence of asynchronous AI agents represents a new frontier in programming, enabling developers to explore multiple solutions simultaneously and improve workflow efficiency [31][32].
重磅报告下载 | 2025生成式AI: 当DeepSeek颠覆行业, 近2万亿美元的市场有哪些机遇?
彭博Bloomberg· 2025-06-03 06:30
本文节选自彭博终端"彭博行业研究《2025年生成式AI展望》",彭博终端用户可运行{NSN SWJ7Y1DWX2PS0 }阅读。如您还不是终 端用户,您可在文末"阅读原文"联系我们预约产品演示。 彭博行业研究 2025年生成式AI展望 生成式人工智能(AI)和大语言模型(LLM)的应用已经渗透到科技领域的各个环节并迅速发 展。预计到2032年, 这个市场将创造约1.8 万亿美元的收入。 彭博行业研究认为,随着由思维链和强化学习加持的推理模型更受青睐,LLM的应用可能从基 于文本的搜索扩大至各种图片、音频和视频的分析;除了LLM赋能的合同审查和客服聊天机器 人等现有用例外,集成写作和编程助手以及利用文本和语音提示词生成图像和视频的工具,也 将推动生成式 AI智能体在消费端和企业端的部署;DeepSeek问世后,大多数LLM公司都致力 于提高模型效率,从而实现大规模推理。 核心议题: 长按或扫描二维码 阅读完整报告 推理超过训练的时间有望提前: 推理支出超过训练支出的时间可能比我们之前的预测至 少提前三年。 大语言模型之间的差距缩小: OpenAI的GPT、谷歌的Gemini、Meta的Llama、 Anthro ...
思维链也会「跳帧」?浙大团队提出CoT-Bridge,显著提升数学推理性能
机器之心· 2025-06-03 06:26
在大语言模型(LLM)飞速发展的今天,Chain-of-Thought(CoT)技术逐渐成为提升复杂推理能力的关键范式,尤 其是在数学、逻辑等结构化任务中表现亮眼。 本文的共同第一作者是徐皓雷和颜聿辰。徐皓雷是浙江大学的一年级硕士生,主要研究兴趣集中在大模型推理和可解释 性研究;颜聿辰是浙江大学博士三年级研究生,主要研究兴趣集中在大模型推理和智能体。本文通讯作者是浙江大学鲁 伟明教授和沈永亮研究员。 但你是否注意到:即使是精心构建的 CoT 数据,也可能存在 "跳跃式" 推理,缺失关键中间步骤。对人类专家来说这 些步骤或许 "理所当然",但对模型而言,却可能是无法逾越的鸿沟。 为了解决这一问题,浙江大学联合微软亚洲研究院、香港中文大学提出了 Thought Leap Bridge 任务,并开发了思维 链修复方法:CoT-Bridge。实验显示,该方法显著提升了多个数学与逻辑任务中的推理准确率,并能作为 "即插即用" 的模块嵌入到知识蒸馏、强化学习等流程中。 CoT 不等于 Coherent-of-Thought 思维跳跃是如何破坏推理链的? CoT 的设计初衷是让大模型像人一样 "按步骤思考",然而研究团队发 ...
四月游戏收入同比增长超20%,游戏ETF(516010)涨超3%
Mei Ri Jing Ji Xin Wen· 2025-06-03 03:01
Group 1 - The Chinese gaming market is projected to reach 27.351 billion yuan by April 2025, representing a year-on-year growth of 21.93%, with mobile gaming growing by 28.41% and overseas revenue increasing by 9.62% [1] - Deepseek R1 has demonstrated global leadership in deep thinking capabilities, surpassing o3 and Gemini 2.5 Pro in digital testing AIME2024 and code testing LiveCodeBench, with a 15% improvement over the previous version [1] - The advancement of artificial intelligence is expected to boost the gaming sector, as the industry is a mature application area for AI, with potential new gameplay emerging from the integration of large language models [1] Group 2 - The ability of R1 in text understanding and creative writing has improved, with a reduction in hallucination rates for rewriting, summarizing, and reading comprehension by 45%-50%, and significant growth in long-form writing and role-playing capabilities [1] - Future developments may allow large language models to endow game characters with independent personalities, enabling them to perform actions and behaviors within the virtual world, potentially creating new gameplay experiences [1]
揭开大模型“伪遗忘”,港理工等团队:结构不变就是没忘
量子位· 2025-06-01 03:40
训练中暴露的敏感信息往往被模型"记住",引发广泛关注。 Machine Unlearning团队 投稿 量子位 | 公众号 QbitAI 近年来,大语言模型(LLMs)的能力突飞猛进,但随之而来的隐私风险也逐渐浮出水面。 在此背景下, 机器遗忘(Machine Unlearning) 技术应运而生,目标是在不影响整体能 力的前提下,有选择性地抹除特定知识。 来自香港理工大学、卡内基梅隆大学和加州大学圣克鲁兹分校的研究团队通过构建一套表示 空间的诊断工具,系统性地区分了 "可逆性遗忘"与"灾难性不可逆遗忘" ,并首次揭示了遗 忘现象背后的表示结构变化规律—— 真正的遗忘只有在多个网络层发生协同且大幅度扰动时才会出现;而相比之下,在高敏感区 域(如输出logits)中进行轻微更新虽然会显著降低准确率或提高困惑度,但模型内部表示 结构仍可保持完整。 研究人员整理成了一个统一的表示层分析工具箱,支持诊断LLM在 Unlearning/Relearning/Finetuning等过程中的内在变化。 真正的遗忘,是结构性的抹除,而非行为的抑制 研究者提出:"一个模型若仅仅在token输出上'忘记',而其内部结构几乎未变, ...