语言

Search documents
长思维链里的推理步骤,哪些最关键?三招锁定LLM的「命门句子」
机器之心· 2025-07-09 00:50
机器之心报道 编辑:张倩 思维链里的步骤很重要,但有些步骤比其他步骤更重要,尤其是在一些比较长的思维链中。 找出这些步骤,我们就可以更深入地理解 LLM 的内部推理机制,从而提高模型的可解释性、可调试性和安全性。 但是,这些步骤没有那么好找,因为每个生成的 token 都依赖于之前的所有 token,其计算难以分解。 在最近的一项研究中,来自杜克大学和 Aiphabet 的研究者提出, 在句子层面分析推理痕迹或许是一种有前途的方法 。 作者指出,与 token 相比,句子的连贯性更强,并且往往与 LLM 提取的推理步骤相一致;与段落相比,句子不太可能混淆推理步骤,并且可以作为连接不同步骤 的有效对象。 作者提出了三种互补的方法来分析 LLM 的推理过程,这些方法旨在识别推理过程中的关键步骤,即所谓的「思维锚(thought anchor)」,这些步骤对后续推理过 程具有重大影响。 论文标题:Thought Anchors: Which LLM Reasoning Steps Matter? 论文链接:https://arxiv.org/pdf/2506.19143 第一种是 黑盒方法 。它通过反事实分析衡量句 ...
腾讯研究院AI速递 20250709
腾讯研究院· 2025-07-08 15:50
Group 1 - Ruoming Pang, head of Apple's foundational model team, is reported to join Meta's new AI team with an annual compensation in the tens of millions [1] - Pang's departure may be influenced by internal discussions at Apple regarding the introduction of third-party models like OpenAI, leading to team morale issues [1] - Apple's AI team structure will be reorganized under Zhifeng Chen, transitioning to a multi-layer management structure [1] Group 2 - Microsoft has launched Deep Research, a public preview version that utilizes the o3 model and Bing search to create an advanced AI research tool [2] - This AI can automatically deconstruct complex problems, gather the latest authoritative information from the web, and generate auditable research reports [2] - An API interface has been opened for integration into applications, supporting enterprise-level AI platforms across various fields such as research, finance, and healthcare [2] Group 3 - Alibaba has open-sourced the multi-modal reasoning model HumanOmniV2, capable of accurately capturing hidden information in videos and understanding "subtext" [3] - The model incorporates a forced context summarization mechanism, a multi-dimensional reward system driven by large models, and optimization training methods based on GRPO [3] - Alibaba has introduced the IntentBench evaluation benchmark, with HumanOmniV2 achieving an accuracy rate of 69.33%, excelling in understanding complex human intentions [3] Group 4 - PaddleOCR 3.1 has been released, with Wenxin 4.5 enhancing the accuracy of text recognition in 37 languages by over 30%, supporting high-quality automatic data labeling [4] - A new production line, PP-DocTranslation, has been added, combining PP-StructureV3 and Wenxin 4.5 to support translation of Markdown, PDF, and image documents, along with customization of professional terminology [4] Group 5 - A controversy has emerged involving hidden instructions in academic papers aimed at inducing AI to give high scores, with several top universities implicated [6] - Xie Saining, a co-author of one such paper, acknowledged responsibility and apologized, clarifying that he does not endorse such practices [6] - This incident has sparked discussions on academic ethics in the AI era, highlighting the lack of unified standards in AI review processes and the need for reform [6] Group 6 - The Visual Language Action model (VLA) is becoming a core technology for embodied intelligence by 2025, with rapid iterations from Google's RT-2 breakthrough [7] - China's Zhihui Square has partnered with top universities to launch FiS-VLA, innovatively embedding "fast systems" into "slow systems" to address the trade-off between robotic control efficiency and reasoning capability [7] - FiS-VLA has achieved an 8% success rate improvement in simulation tasks and an 11% improvement in real environments, with a control frequency of 21.9Hz, 1.6 times that of the open-source model π0 [7] Group 7 - YouTube co-founder Chen Shijun discussed AI entrepreneurship and long-termism with the Manus team, emphasizing the value of rapid experimentation and risk-taking [8] - Recommendations for AI startups include leveraging first-mover advantages to retain users, creating compound network effects, and exploring areas that larger companies avoid, all within legal boundaries [8] - Key decisions at YouTube included prioritizing user growth over immediate monetization, establishing transparent core metrics, and developing a creator-friendly advertising model while focusing on the "passive experience" of recommendation systems [8] Group 8 - The key shift in acquiring users for AI products is that if a product does not generate social engagement within the first 48 hours, it may fail, making virality a survival threshold rather than a bonus [9] - The success story of selling Base44 for $80 million involved user participation in the development process, encouraging sharing of creations, and strategically choosing LinkedIn as a platform for dissemination, creating a closed loop of development, showcasing, and sharing [9] - The distribution paradigm for AI startups is evolving, with product development becoming a public showcase, niche native creators proving more effective than influencers, and growth metrics becoming assets for dissemination, shifting from "closed-door development" to "public collaboration" [9] Group 9 - U.S. universities are reshaping computer science education, with the CS major potentially becoming more humanities-oriented, emphasizing computational thinking and AI literacy over traditional programming skills [10] - The "Level Up AI" initiative has launched an 18-month curriculum overhaul, where future programming languages may involve "Human," allowing students to complete programming tasks through interaction with AI [10] - Traditional humanities classrooms are facing assessment crises, with educators struggling to identify AI-generated content, leading to a return to handwritten assignments and the development of anti-cheating systems, raising concerns about students' over-reliance on AI affecting their cognitive abilities [10]
企业数字化转型的创新利器:DigitLangPro 语言处理平台
Jiang Nan Shi Bao· 2025-07-08 14:12
Core Insights - The article discusses the transformative pressures and opportunities faced by companies in the digital age, emphasizing the importance of understanding employee needs and optimizing management strategies for successful digital transformation [1] Company Overview - DigitLangPro is an innovative language processing platform developed by Yang Xiaoying, focusing on aiding companies in their digital transformation efforts through advanced natural language processing technology [2] - The platform collects and analyzes internal data and employee feedback to assess engagement levels across different generational employees during crisis responses, identifying needs related to transformation and generating a comprehensive digital transformation index [2] Practical Case Study - Huaji Manufacturing Co., Ltd. faced challenges in employee acceptance of new technologies and intergenerational communication during its digital transformation journey [3] - By implementing DigitLangPro, the company was able to accurately identify participation levels among different age groups, quantify employee needs and expectations, and analyze sentiments regarding the transformation [3] - Post-implementation, employee participation increased by 35%, the accuracy of identifying transformation-related needs reached 88%, and overall transformation efficiency improved by 23% [3] Economic Efficiency Innovation - The application of DigitLangPro at Huaji Manufacturing demonstrated significant economic value, converting employee feedback into quantifiable data for informed decision-making [4] - The platform reduced project implementation cycles by 23% and directly saved 14% in operational costs [4] - The comprehensive transformation index generated allows management to monitor progress in real-time and adjust strategies accordingly, with the index rising from 64 to 81 post-implementation, indicating substantial success in digital transformation [4] Industry Impact - The introduction of DigitLangPro has had a profound impact on the industry, enhancing transformation efficiency and employee satisfaction, allowing companies to stand out in competitive markets [5] - Many companies recognize that digital transformation is not just a technological upgrade but a comprehensive change in management philosophy and employee engagement [5] - The successful application of DigitLangPro serves as a valuable reference for other companies, promoting increased attention and investment in digital transformation across the industry [5] Future Outlook - With the ongoing development of artificial intelligence and big data technologies, DigitLangPro is expected to play a significant role in various sectors [6] - In the financial industry, the platform can assist banks in evaluating customer acceptance of digital services and optimizing product design [6] - In healthcare, it can help hospitals enhance patient satisfaction and streamline service processes, driving deeper digital transformation and contributing to the overall societal digitalization process [6]
具身智能论文速递 | 强化学习、VLA、VLN、世界模型等~
具身智能之心· 2025-07-08 12:54
算法框架: 点击下方 卡片 ,关注" 具身智能 之心 "公众号 强化学习如何提升VLA泛化能力 清华大学、上海期智研究院、北京中关村科学院通过强化学习微调(PPO算法)显著提升视觉-语言-动作模 型(VLA)的泛化能力: 1)执行任务成功率提升42.6%(OOD场景) 2)语义理解任务成功率从61.5%提升至75.0%(未见物体) 3)动态干扰场景成功率从28.6%跃升至74.5%(Tab 3) 主要贡献: 论文标题:What Can RL Bring to VLA Generalization? An Empirical Study 论文链接:https://arxiv.org/pdf/2505.19789 1. 构建了一个严谨且具有挑战性的基准,用于评估 VLA 微调方法在视觉、语义和执行等不同维度上对泛 化能力的影响。 2. 确定 PPO 是优于 GRPO 和 DPO 的 VLA 微调 RL 算法,并讨论了将这些 RL 算法从 LLM/VLM 范式适 配到 VLA 独特需求时的关键挑战。 3. 开发了一种高效的基于 PPO 的 VLA 微调方案,该方案借助共享的 actor-critic 骨干网络、VL ...
最近才明白,智能驾驶量产的核心不止是模型算法。。。
自动驾驶之心· 2025-07-08 12:45
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 千万级4D标注方案应该怎么做? 最近有幸和很多业内的小伙伴交流,大家普遍形成了一个共识: 模型算法只是智驾能力从0到10的关键,却不是从10到100的核心。未来是海量自动标注数据的时代! 智能驾驶的量产开发已经到了深水区,各家都投入了大量的精力去做量产落地。其中泛化的核心关键便是如何高效&高质量的获取4D数据自动标注。一方面人工精标 周期长、成本贵,对于量产泛化的关键周期是非常大的阻力,因此高质量的4D自动标注是业内非常重要的一环,无论是3D动态目标、OCC、静态标注还是端到端标 注。 相比于车端的感知算法,自动标注系统更像是一个不同模块组成的系统, 充分利用离线的算力和时序信息,才能得到更好的感知结果, 实际落地的时候,对于工程师 的能力要求上了一个档次,想要把这些大模型大系统玩转的好和高效,也是非常不容易的。 而自从端到端和大语言LLM横空出世以来,大规模无监督的预训练 + 高质量数据集做具体任务的微调, 可能也会成为量产感知算法下一阶段需要发力的方向。同时数 据的联合标注也是当下各家训练模型的实际刚需, ...
还在为AI数据发愁?张文涛和鄂维南院士团队推出Data-centric AI系统
机器之心· 2025-07-08 09:41
近年来,大模型发展主要由大型科技公司主导,其领先的核心在于规模庞大且高质量的数据资源。然而,这些公司通常并不公开其原始数据及数据处理工具,使 得学术界在大模型训练数据的构建与优化方面难以追赶,受制甚深。 尽管近年来开源了大量数据集,学术界在大模型数据准备方面仍面临诸多挑战。目前,大模型训练数据的清洗与构建仍主要依赖各个研究团队 "闭门造车",缺乏 系统化、高效的工具支持 。现有的数据处理工具如 Hadoop 和 Spark 等, 支持的操作算子大多偏向传统方法,尚未有效集成基于最新大语言模型(LLMs)的智能 算子,对于构建先进大模型的训练数据支持有限。 为此,张文涛和鄂维南院士团队提出了以数据为中心的 AI 系统 DataFlow 。它系统实现了 100 余个基于规则、本地大模型或大模型 API 的数据治理算子 (Operators),并在此基础上构建 8 条预设数据处理流水线(Pipeline),包括:大规模嘈杂数据(如 PDF 文档、纯文本、低质量问答数据、爬虫数据等)的清 洗、扩增与评估;带有思维链的强推理数据合成;RAG 数据提取与合成等等主流数据治理需求。该系统可供用户灵活组织现有算子,开发新算子 ...
Nature子刊:谈攀/洪亮团队开发蛋白质语言模型VenusMine,成功挖掘高效的PET水解酶
生物世界· 2025-07-08 08:18
近日, 上海人工智能实验室青年研究员 谈攀 联 合 上海交通大学自然科学研究院/物理天文学院/张江高研院/药学院 洪亮 教授团队,在 Nature Communications 期刊发表了题为: Harnessing Protein Language Model for Structure-Based Discovery of Highly Efficient and Robust PET Hydrolases 的研究论文。 该研究发布了用于酶挖掘的蛋白质大模型—— VenusMine ,该 模型融合了蛋白质语言大模型与三维结构分析,通过蛋白质序列、结构和功能之间的隐含映射规则,能在海量的蛋白 数据库中高效挖掘同源性低但功能优异的酶分子。 应用该模型,研究团队成功发现了一系列 PET 水解酶,其中来自 Kibdelosporangium banguiense 的 KbPETase 表现出极高的催化效 率和热稳定性,其最适酶活是模板 IsPETase 的 97 倍。 编辑丨王多鱼 排版丨水成文 塑料废弃物,带来了重大环境挑战,尤其是 聚对苯二甲酸乙二醇酯 (PET) ,是当今使用量最大的饮料包装,用于 碳酸饮料 ...
突破全模态AI理解边界:引入上下文强化学习,赋能全模态模型“意图”推理新高度
量子位· 2025-07-08 07:30
HumanOmniV2团队 投稿 量子位 | 公众号 QbitAI 在多模态大语言模型(MLLMs)应用日益多元化的今天,对模型深度理解和分析人类意图的需求愈发迫切。尽管强化学习(RL) 在增强大语言模型(LLMs)的推理能力方面已展现出巨大潜力,但将其有效应用于复杂的多模态数据和格式仍面临诸多挑战。 在深入研究现有技术后,发现在当前多模态推理模型中发现现有的推理路径存在两大核心问题:全局上下文理解不足和捷径问题。 全局上下文理解不足: 当模型无法准确识别或错误解读多模态证据和上下文信息时,便会出现此问题,导致给出不正确的答案。 捷径问题: 指模型在处理多模态输入时,忽视了关键线索,未充分考量多模态信息就直接给出答案,从而导致次优或片面的结果 为彻底解决这些痛点,阿里巴巴通义实验室团队推出 HumanOmniV2 ,强调模型必须在对多模态输入 全局上下文有清晰理解 的 基础上进行推理。这种全局性理解能够有效避免模型遗漏关键多模态线索,确保推理过程的全面性和深入性。 相关代码、模型、数据都开源,地址可在文末获取。 效果展示 问题:这两个人是什么关系? A. 他们想引起人们对该产品的关注。 B. 这两个人是商业伙 ...
美科技巨头角逐五角大楼大单,向AI要营收 | 企服国际观察
Tai Mei Ti A P P· 2025-07-08 03:43
Core Insights - OpenAI signed a $200 million contract with the U.S. Department of Defense to provide AI tools for addressing critical national security challenges [2] - The competition for government contracts in the AI and cloud computing sectors has intensified, with major tech companies vying for lucrative deals [2][3] - The U.S. government is increasingly integrating AI into military operations, with significant investments planned for the coming years [10][12] Government Contracts and Collaborations - OpenAI's contract with the Department of Defense is part of a broader trend where tech companies like Palantir and Snowflake are securing government contracts to enhance their AI capabilities [2][3] - Palantir has seen substantial revenue growth, with 60% of its income derived from government contracts, including a significant contract for Project Maven [2] - Snowflake obtained a $1 billion temporary authorization from the Department of Defense, allowing all military branches to utilize its enhanced data capabilities [3] Major Cloud Providers and AI Integration - The Department of Defense awarded a $9 billion Joint Warfighting Cloud Capability (JWCC) contract to major cloud providers including Amazon, Google, Microsoft, and Oracle [4] - Microsoft has been a key partner for the government, integrating OpenAI's GPT-4 model into various government agencies [4] - Oracle is also involved in providing cloud services to the military, aiming to simplify cloud management and reduce costs [10] Economic Implications of AI - The economic benefits of AI are under scrutiny, with predictions suggesting that generative AI could contribute $7 trillion to global GDP over the next decade [7] - However, some experts argue that the immediate economic impact of AI may be overstated, with many tasks requiring human intervention and expertise [8][9] Shifts in Corporate Policies - Major tech companies are shifting their policies regarding military applications of AI, with OpenAI and Google removing restrictions on the use of their technologies for military purposes [11][12] - This shift indicates a deeper involvement of tech companies in military operations, reflecting the growing importance of AI in national security [12]
美联储:全面召回?大型语言模型的宏观经济知识评价(英文版)
Sou Hu Cai Jing· 2025-07-08 02:02
Core Insights - The report evaluates the performance of large language models (LLMs) in recalling macroeconomic knowledge, particularly focusing on the Claude Sonnet 3.5 model's ability to estimate historical macroeconomic variables and data release dates [1][8][10] - Findings indicate that while LLMs demonstrate impressive recall for certain economic indicators, they also exhibit significant shortcomings, particularly in handling volatile data series and in avoiding look-ahead bias [2][11][18] Group 1: Performance Evaluation - LLMs show strong recall for historical unemployment rates and Consumer Price Index (CPI) values, accurately recalling quarterly values back to World War II [11][44] - However, the model struggles with more volatile data series such as real GDP growth and industrial production growth, often missing high-frequency fluctuations while capturing broader business cycle trends [11][45] - The model's estimates for GDP are found to mix first print values with subsequent revisions, leading to inaccuracies in historical understanding and real-time forecasting simulations [12][14] Group 2: Data Release Dates - LLMs can recall historical data release dates with reasonable accuracy, but they occasionally misestimate these dates by a few days [16] - The accuracy of recalling release dates is sensitive to prompt details, with adjustments to prompts reducing one type of error while increasing another [16] - On average, about 20.2% of days show at least one series with recall issues, indicating limitations in the reliability of LLMs for historical analysis and real-time forecasting [2][16] Group 3: Look-Ahead Bias - Evidence suggests that LLMs may inadvertently incorporate future data values when estimating historical data, even when instructed to ignore future information [15][18] - This look-ahead bias presents challenges for using LLMs in historical analysis and as real-time forecasters, as it reflects a tendency to blend past and future information [18][22] - The report highlights that these errors are reminiscent of human forecasting mistakes, indicating a fundamental challenge in the LLMs' recall capabilities [18][22]