语言模型

Search documents
ICML 2025 | 千倍长度泛化!蚂蚁新注意力机制GCA实现16M长上下文精准理解
机器之心· 2025-06-13 15:45
该工作第一作者为蚂蚁技术研究院副研究员胡翔,蚂蚁技术研究院高级研究员武威为通讯作者。 在大语言模型如火如荼的当下,长文本建模仍然是一个极具挑战的问题。纠其根源,一方面在于主流 LLMs 的架构 Transformers 中平方复杂度及随序列长度线性增 长的推理阶段显存开销;另一方面在于 full-attention 有限的外推能力,难以泛化到远超预训练阶段长度的输入。 而高效处理长上下文能力,除了简单的工业界降本增效的需求外,还涉及通用人工智能 (AGI) 的核心问题:具有永久记忆的智能体。如果将人类从出生开始接收 到的信息视作长上下文,人类拥有记忆无非是访问这些上下文。因此记忆可以看作是超长上下文访问能力,而拥有与用户所有对话记忆的智能体,很可能为大语 言模型公司构建数据护城河 (事实上,OpenAI 已经开放了类似能力)。 近日,蚂蚁的研究团队为这个问题带来了一个新思路。就像人类开卷考试只会挑和当前问题相关的关键页作为参考,语言模型也可以只关注与当前上下文相关的 过去片段。以此为出发点,他们提出一种 基于因果检索的注意力机制 GCA (Grouped Cross Attention),完全端到端地学习如何 ...
烧钱一年,李飞飞的「空间智能」愿景有变化吗?
机器之心· 2025-06-13 12:02
01. 创业一年后,李飞飞如何阐述 World Labs 的愿景? 成立一年的World Labs 发布过什么进展?World Labs 的愿景有变化吗?空间智能终于有望解锁了?... 02 . 为什么没有空间智能的 AI 是不完整的? 本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 在近期由 a16z 普通合伙人 Erik Torenberg 主持的一场访谈中,李飞飞和 World Labs 早期投资者 Martin Casado 围绕「世界模型」和「空间智能」的话题探讨了她对 AI 技术的理解,并在创业 项目 启动一年后重新 介绍了 World Labs 的任务和愿景。 目录 2、李飞飞指出当前语言模型在描述和理解三维物理世界方面存在明显的局限性,空间智能则超越语言模型成 为智能的关键组件,是世界模型理解、重建和生成物理世界的核心能力。 ① 语言虽然是思想和信息的强大编码,但对 3D 物理世界而言是「有损的编码方式」,无法有效描述和操作三 维空间。而空间智能代表着更为古老和根本的智能形式,是 AI 的关键组成部分。 3、在这一认知框架下,World Labs 试图构建能理解 ...
每日机构分析:6月13日
Xin Hua Cai Jing· 2025-06-13 08:29
Group 1 - HSBC's foreign exchange strategy head indicates that geopolitical risks are putting pressure on the British pound, which is seen as a risk-sensitive currency, dropping to around 1.3530 against the US dollar [1] - Danske Bank analysts report that the recent 30-year US Treasury auction showed strong demand, alleviating concerns about long-term US Treasury demand and pushing yields below the critical 5% level [1] - The Swedish Nordea Bank anticipates that the Swedish central bank will lower interest rates in June, reflecting expectations among fixed-income investors [2] Group 2 - Analysts from Mizuho Securities highlight that the current geopolitical tensions have not been fully reflected in market volatility, with risks of full-scale conflict increasing [2] - HSBC Global Research predicts that the Philippine central bank will lower its policy rate to 5.25%, differing from previous expectations of maintaining rates, due to low inflation and slow economic growth [2] - Economists from Wilmington Trust suggest that long-term impacts of US tariffs are more likely to lead to economic weakness rather than inflation, with consumers beginning to cut back on non-essential spending [2] Group 3 - RSM's chief economist notes that rising prices in the US appliance market reflect cost increases from previous import tariffs, emphasizing the importance of consumer behavior in determining inflation persistence [3] - Goldman Sachs analysts report that the US data center securitization market has surged from $5 billion to $30 billion, driven by increased capital expenditure in cloud computing and policy support [3] - The data center market is expected to peak in occupancy rates by mid-2026, with growth primarily fueled by large investments in facilities equipped with thousands of GPUs for large language models [3]
全球最大上市对冲基金集团出手!
Zhong Guo Ji Jin Bao· 2025-06-13 07:00
日前,全球最大的上市对冲基金集团——英仕曼集团宣布,其全资子公司英仕曼(上海)投资管理有限公司于中国市场推出首只自主管理的股票指数增强 策略产品——英仕曼美量中证500指数增强策略。该产品已于中国证券投资基金业协会(简称协会)备案,面向合格投资者发行。 自2017年在境内登记为证券私募管理人以来,英仕曼集团发展节奏历经波动。英仕曼集团于6月12日发布的新闻稿中表示,该产品的发行标志着集团在中 国投资市场的重要战略布局进入新阶段。 于中国市场推出自主管理指增产品 英仕曼进一步表示,该产品将集团旗下Numeric团队的全球长期实盘经验的系统化量化投资方法用于中国A股市场投资。据了解,Numeric团队拥有超过30 年的量化投资经验。截至2025年3月31日,其管理的全球股票策略资产规模超过400亿美元。 英仕曼Numeric高级投资经理方子昂表示,随着中国经济的稳健增长,作为全球第二大股票市场,A股市场不仅拥有显著的配置潜力,而且为量化策略提 供了丰富的Alpha来源。 英仕曼Numeric投资经理、英仕曼美量中证500指数增强策略首席基金经理杨海翔表示,投资策略在量化模型基础上,整合了包括公司基本面、行业另类数 ...
OpenAI掀桌子,新模型力压谷歌,o3降到地板价
3 6 Ke· 2025-06-13 06:07
Core Insights - OpenAI has launched o3-pro, an enhanced version of its reasoning model, following a 9-hour outage of ChatGPT, aiming to provide more reliable responses and extended thinking time [1][2][4]. Model Performance - o3-pro has been made available to all ChatGPT and API Pro users, with usage limits for Plus users increased from 100 to 200 times per week [2]. - In expert evaluations, o3-pro outperformed its predecessor o3 in all tested categories, particularly in science, education, programming, business, and writing assistance [2][6]. - The model supports both text and image inputs, with a context window size of 200k and a maximum output token count of 100k [11]. Competitive Landscape - OpenAI's performance is under scrutiny, especially with Google’s Gemini 2.5 Pro entering the market, which has been noted for its competitive pricing and capabilities [4][24]. - In internal tests, o3-pro surpassed Gemini 2.5 Pro in mathematical benchmarks and outperformed Anthropic's Claude 4 Opus in doctoral-level science tests [27]. Pricing Strategy - o3-pro is priced at $20 per million tokens for input and $80 for output, significantly lower than its predecessor o1-pro, which is expected to be phased out [24][27]. - Following the launch of o3-pro, OpenAI announced an 80% price reduction for o3, making it more competitive against Gemini 2.5 Pro [27]. User Experience - Users have reported that o3-pro is slower in response times compared to other models, taking several minutes for simple queries, which has raised concerns about its efficiency [15][17]. - Despite the slower response, o3-pro has demonstrated strong analytical capabilities and proficiency in using tools for complex problem-solving [19][22].
迈向人工智能的认识论:真的没有人真正了解大型语言模型 (LLM) 的黑箱运作方式吗
3 6 Ke· 2025-06-13 06:01
Group 1 - The core issue revolves around the opacity of large language models (LLMs) like GPT-4, which function as "black boxes," making their internal decision-making processes largely inaccessible even to their creators [1][4][7] - Recent research highlights the disconnect between the reasoning processes of LLMs and the explanations they provide, raising concerns about the reliability of their outputs [2][3][4] - The discussion includes the emergence of human-like reasoning strategies within LLMs, despite the lack of transparency in their operations [1][3][12] Group 2 - The article explores the debate on whether LLMs exhibit genuine emergent capabilities or if these are merely artifacts of measurement [2][4] - It emphasizes the importance of understanding the fidelity of chain-of-thought (CoT) reasoning, noting that the explanations provided by models may not accurately reflect their actual reasoning paths [2][5][12] - The role of the Transformer architecture in supporting reasoning and the unintended consequences of alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF), are discussed [2][5][12] Group 3 - Methodological innovations are being proposed to bridge the gap between how models arrive at answers and how they explain themselves, including circuit-level attribution and quantitative fidelity metrics [5][6][12] - The implications for safety and deployment in high-risk areas, such as healthcare and law, are examined, stressing the need for transparency in AI systems before their implementation [6][12][13] - The article concludes with a call for robust verification and monitoring standards to ensure the safe deployment of AI technologies [2][6][12]
今年“港股AGI第一股”确认了!云知声冲刺IPO五年终通过港交所聆讯
Sou Hu Cai Jing· 2025-06-13 00:36
Core Viewpoint - Yunzhisheng Intelligent Technology Co., Ltd. is set to become the first "AGI stock" in Hong Kong this year after passing the Hong Kong Stock Exchange hearing and disclosing relevant information [2][3] Company Overview - Founded in 2012, Yunzhisheng specializes in providing intelligent voice technology and comprehensive AI solutions, focusing on the smart voice sector [6] - The company has developed several key products, including the UniCore language model and the UniOne AI chip series, and recently launched a self-developed 600 billion parameter model [6] - Yunzhisheng's AI computing cluster has over 184 PFLOPS and more than 10 PB of storage capacity, supporting its technology development [6] Business Model and Market Position - The company primarily serves the life and medical sectors, with clients including China's top three insurance groups [7] - Yunzhisheng offers AI capabilities through a MaaS model, providing API services and customized AI technology platforms [7] - According to Frost & Sullivan, Yunzhisheng is the fourth largest AI solution provider in China by revenue in 2024, with a market share of 0.6% [9] Financial Performance - The company has completed 11 rounds of financing totaling over $340 million, with a valuation around 10 billion [9] - Revenue for 2022, 2023, and 2024 is projected at 601 million, 727 million, and 939 million respectively, with a compound annual growth rate of 25% [9] - Despite revenue growth, the company reported net losses of 375 million, 376 million, and 454 million for the same years [9] Funding and Future Outlook - The IPO proceeds will be used to enhance R&D capabilities, invest in emerging business opportunities, and support international expansion [13] - The company has raised over 700 million RMB in its D3 financing round in 2023, ensuring sufficient operational funds for at least the next 12 months [10] - Yunzhisheng anticipates continued net losses due to ongoing R&D investments and financing costs related to redeemable securities [10]
万马科技20250612
2025-06-12 15:07
摘要 万马科技通过收购有方科技切入车联网领域,车联网业务收入从 2021 年的 5,000 万元增长到 2024 年的 2.6 亿元,利润也显著提升,并已建 立完整的数据闭环工具链和智驾算力中心。 国内车联网行业渗透率约为 80%,海外市场渗透率不足 30%,随着智 能驾驶对数据需求的增加,国内外市场均有较大的发展空间,尤其 Robotaxi 对实时数据监控和技术要求更高,单车价值提升显著。 优卡科技提供蓝海全球车联和云自动驾驶数据闭环两大解决方案,支持 1,400 万辆车辆,客户包括吉利、上汽、东风和理想等,并在全球范围 内支持 Robotaxi 企业的业务布局。 Robotaxi 被视为车联网行业发展的"皇冠上的明珠",高盛预测中国 Robotaxi 市场年化增长率将达到 96%。目前已在北京、武汉、广州以 及香港、迪拜等地进行常态化运营,特斯拉也即将推出相关业务。 Robotaxi 运营对网络质量有极高要求,包括运行安全、用户交互、合 规性、自动驾驶数据采集和运维等方面,需要高清地图、车路协同、远 程脱困以及海量数据支持。 万马科技 20250612 据监控需求高,对技术和数据量要求也更高,从单车价值上 ...
云知声通过港交所聆讯:将成「港股AGI第一股」,今年一季度营收同比增长25%
IPO早知道· 2025-06-12 15:07
云知声是亚洲最早将AI大语言模型商业化的公司之一。 本文为IPO早知道原创 作者| Stone Jin 微信公众号|ipozaozhidao 据 IPO早知道消息, 云知声 智能科技股份有限公司 (以下简称 "云知声")日前已通过港交所聆讯 并于6月12日披露通过聆讯后的资料集,中金公司和海通国际担任联席保荐人。 这意味着, 云知声 或即将成为 "港股 AGI 第一股 "—— 根据弗若斯特沙利文的资料,中国 AI解决 方案市场 预计 将 从 2024年 的 1,804亿元 以 36.7%的复合年增长率 增 加至 2030年的11,749 亿元 ,且 AGI解决方案的出现改变了各垂直行业的供需两端,而前沿科技正驱动市场增长 。 作为中国 AGI技术的先行者,成立于2012年的云知声在以深度学习模型发布为标志的人工智能自然 语言处理取得突破后不久,即利用自己在交互式AI方面的研发专业知识和自成立以来获得的市场洞察 力,推出首个基于BERT的大语言模型UniCore,作为自己的中心技术平台云知大脑的初始核心算法 模型,并为广泛的垂直行业的客户赋能一系列AI解决方案。 2016年,云知声战略性地开始建立Atlas A ...
蔡崇信:DeepSeek取得突破后,阿里巴巴工程师春节无休全力追赶AI浪潮
华尔街见闻· 2025-06-12 10:42
媒体报道,DeepSeek今年1月推出的低成本、功能强大的人工智能模型震惊了全球科技行业后,阿里 巴巴集团的工程师们取消了假期,在春节假期期间继续工作,奋起直追,连夜加班赶进度。 阿里巴巴董事会主席蔡崇信(Joe Tsai)周三在巴黎举行的VivaTech科技大会上表示,中国充满活力的 消费互联网环境,加上本地工程师之间激烈的竞争文化,正不断推动中国在AI领域的创新。 他透露,阿里巴巴是在DeepSeek发布R1模型之后,才意识到自己在AI领域已经落后了。他举例说明了 中国科技行业竞争的激烈程度: "我们看了那篇论文,心想:'天啊,怎么我们落后了?我们之前也在做这些事情啊,'结果就是,我们 的工程主管决定说:'取消春节假期,所有人都留在公司,加班睡办公室,我们要加速开发进度。'几周 之内,我们就推出了自己的版本,也就是Qwen系列模型。它非常具有竞争力"。 作为与马云共同创办阿里巴巴的创始成员之一,尽管蔡崇信近来开始更加坦率地谈论公司所经历的困 难,但同时也表达了对未来发展的乐观态度。上个月在澳门举办的一场科技大会上,他曾提到阿里巴巴 经历了一系列挫折,但也强调公司"正走在一条非常好的路上"。 ⭐星标华尔街见 ...