Artificial Intelligence

Search documents
ICML 2025杰出论文出炉:8篇获奖,南大研究者榜上有名
机器之心· 2025-07-15 05:37
机器之心编辑部 包括 6 篇杰出论文奖和 2 篇杰出立场论文奖。 机器之心报道 本周一,ICML 2025 公布了最佳论文奖项。 今年获奖论文共计 8 篇,其中包括 6 篇杰出论文奖和 2 篇杰出立场论文奖。值得关注的是,南京大学研究者也位列获奖名单之中。 国际机器学习会议 ICML(International Conference on Machine Learning),是全球范围内人工智能领域的顶级学术会议之一,由国际 机器学习学会(IMLS)举办,与 NeurIPS、ICLR 并列为 AI 三大顶会。本届 ICML 为第四十二届,于 7 月 13-19 日在加拿大温哥华 举行。 今年的 ICML 大会共获得 12107 篇有效论文投稿,其中 3260 篇被接收,接收比例为 26.9%。相比 2024 年的 9653 篇投稿数量持续大 幅增长,展示了 AI 领域的火热。 以下是今年的获奖论文与简要介绍。 杰出论文奖 论文 1: Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions ...
什么都不做就能得分?智能体基准测试出现大问题
机器之心· 2025-07-15 05:37
Core Viewpoint - The existing benchmarks for evaluating AI agents are fundamentally flawed, leading to significant misjudgments of their capabilities, necessitating the development of more rigorous testing standards [5][7][23]. Group 1: Importance of Benchmark Testing - Benchmark testing plays a foundational role in assessing the strengths and limitations of AI systems, guiding both research and industry development [2]. - As AI agents transition from research prototypes to real-world applications, the need for effective evaluation benchmarks becomes critical [3]. Group 2: Current Issues with AI Benchmarks - Current AI agent benchmarks have not reached a reliable state, with many tests allowing for misleadingly high scores without actual capability [5][6]. - A study involving researchers from several prestigious universities identified common failure modes in existing benchmarks, highlighting the need for a checklist to minimize the potential for "gaming" the tests [7][23]. Group 3: Challenges in Benchmark Design - AI agent tasks often require real-world scenarios and lack standard answers, making the design and evaluation of benchmarks more complex than traditional AI tests [4][11]. - Two key validity criteria for AI benchmarks are proposed: task validity (whether the task can only be solved with specific capabilities) and result validity (whether the evaluation accurately reflects task completion) [12][15]. Group 4: Findings from the ABC Checklist - The ABC checklist, derived from 17 widely used AI benchmarks, contains 43 items focusing on outcome validity and task validity [17][18]. - Application of the ABC checklist revealed that 7 out of 10 benchmarks contained tasks that could be exploited by AI agents, and 7 out of 10 did not meet outcome validity standards [23]. Group 5: Specific Benchmark Failures - Examples of benchmark failures include SWE-bench, which failed to detect errors in AI-generated code due to insufficient unit test coverage [24][27]. - KernelBench's reliance on random tensor values may overlook critical errors in generated code, while τ-bench allowed a "no-operation" agent to achieve a 38% success rate [28][31]. - OSWorld's outdated evaluation methods led to a 28% underestimation of agent performance due to reliance on obsolete website elements [32][33]. Group 6: Future Directions - The ABC aims to provide a practical evaluation framework to help benchmark developers identify potential issues and enhance the rigor of their assessments [36].
“芯智AI加速营”启动 20家科创企业与央企对接合作
Xin Hua Cai Jing· 2025-07-15 04:48
浦软创投与中电智方舟联合打造的"芯智AI加速营"近日在上海正式开营。作为上海浦东软件园孵化器产 业链接品牌活动"浦软加速营"的重要组成部分,本次加速营汇聚了20家来自集成电路、人工智能、数据 应用等领域的科创企业,与中国电子旗下企业展开精准对接洽谈,实现"大手牵小手"和产业链上下游协 同合作。 "芯智AI加速营"采用上海、深圳两地联动形式,20家科创企业深度参访中国电子在沪、在深成员企业。 在上海活动期间,中国电子旗下华大半导体、华大九天、达梦数据、成都华微、中国电子云、上海浦软 汇智软件等产业链企业,与参与活动的科创企业围绕技术研发、市场拓展、项目合作及投资机会等议题 进行合作探讨。 正是为助力科创企业加速产业化进程,"芯智AI加速营"为创新企业搭建与行业龙头、资本机构深度对接 的平台,帮助企业在技术、市场、资金等维度实现突破与加速成长。同时,加速营还推动产业链上下游 企业的交流合作,助力产业生态协同发展。 上海浦东软件园创业投资管理有限公司总经理郭斌在接受记者采访时表示,"芯智AI加速营"核心目标是 推动入营企业与中国电子成员企业间的业务协同与技术创新。他期待通过此平台精准发掘并赋能一批拥 有核心技术优势 ...
20cm速递|创业板人工智能ETF(159388)涨超4.8%,行业存结构性机会
Mei Ri Jing Ji Xin Wen· 2025-07-15 04:32
Group 1 - The core viewpoint indicates that AI computing power will maintain high prosperity in 2025, with AI applications accelerating upward trends. Quantum computing, data elements, and EDA sectors are showing steady growth, with the integration of quantum computing and AI potentially leading to revolutionary advancements [1] - The laser radar industry, referred to as the "intelligent driving eye" and "robotic eye," is experiencing high prosperity due to increased penetration rates in L3 vehicles and the normalization of embodied robots in operations, reflecting a theme investment characteristic driven by risk appetite [1] - The Guotai AI ETF tracks the entrepreneurial board AI index, which can experience daily fluctuations of up to 20%. This index is compiled by Shenzhen Securities Information Co., selecting listed companies involved in AI technology development and application from the entrepreneurial board market, covering AI software, hardware, and related services [1]
“美国已经基本退出,都是中国的”
Guan Cha Zhe Wang· 2025-07-15 04:08
Core Viewpoint - Meta is considering a significant shift in its AI strategy by potentially moving from open-source AI models to closed-source models, which could mark a departure from its long-standing commitment to open-source development [1][5][6] Group 1: Strategic Shift - Meta's newly established "Super Intelligence Lab" (MSL) is contemplating abandoning its powerful open-source AI model, Behemoth, in favor of developing a closed-source model [1][5] - This potential shift is seen as a major strategic change for Meta, which has historically believed that open-source technology fosters faster AI development and broader access for developers [5][6] - The decision is reportedly influenced by the underperformance of the Behemoth model during internal testing, leading to delays in its release [5][6] Group 2: Leadership and Talent Acquisition - Meta has appointed Alexandr Wang, the new AI head, who previously led Scale AI, to oversee the Super Intelligence Lab, which consists of a specialized team of about 12 members [6][7] - The company has adopted a "high-paying talent acquisition" strategy, offering salaries exceeding $100 million to attract top researchers from competitors like OpenAI, Google, and Apple [5][6] Group 3: Market Implications - The shift towards closed-source models could signify a retreat from the competitive landscape of open-source large language models (LLMs), with concerns raised about the U.S. losing its edge in this area [1][3] - The ongoing developments in Meta's AI strategy are closely watched, especially as the company faces challenges in the AI technology sector [5][6]
我们找到3位大学教授,聊了聊越来越严重的AI幻觉
3 6 Ke· 2025-07-15 03:23
最近,网上出现了一个 AI 幻觉引发的闹剧。 7 月 2 日,网上突然出现大量 " DeepSeek 就 AI 模型违规关联向王一博道歉 " 相关内容,最终被发现其 实是 DeepSeek 在对话中虚构了事件甚至引用了一份在中国裁判文书网上完全查不到的判决书。 而这场闹剧,源于 DeepSeek 在与用户对话过程中产生的幻觉。借此,知危编辑部认为有必要探讨一下 AI 大模型们激增的幻觉率了。 前段时间,OpenAI o3 模型刚发布不久,也因为幻觉率 " 不降反升 " 的现象引发了广泛关注。 OpenAI o3 模型会犯很多匪夷所思的错误。比如,捏造从未运行过的代码,在编码设置中使用无效的非 ASCII 破折号,甚至还会假装自己在调用工具。 在 PersonQA 基准测试中,o3 会在 33% 的问答中出现幻觉,几乎是o1( 16% )的 2 倍,o4-mini 的幻 觉率更是高达 48%,远高于此前发布的推理模型。 近期发布的其他深度思考模型也出现了类似的规律,即随着推理能力增强,其幻觉率也反而更高。 艾伦人工智能研究所科学家 Nathan Lambert 曾发文评论 o3 的推理幻觉,表示这一问题的出现是 ...
内部爆料:Alexandr Wang上任第一把火,Meta大模型闭源
机器之心· 2025-07-15 03:20
Core Viewpoint - Meta is considering a significant shift in its AI development strategy, potentially moving from an open-source model to a closed-source approach, which would represent a major philosophical and technical change for the company [1][7]. Group 1: AI Development Strategy - Meta's newly established Superintelligence Lab is discussing a major decision that could alter its AI development direction [2]. - There are differing opinions within Meta regarding the future of its AI models, with some executives advocating for closed-source models while others believe that an open-source strategy remains advantageous in the competitive landscape [3]. - The focus of the discussion is on Meta's most powerful open-source AI model, Behemoth, which has faced delays due to performance issues [4][5]. Group 2: Organizational Changes - Meta has made significant organizational changes, including a $14.3 billion investment in Scale AI, acquiring a 49% stake and appointing Scale AI's CEO, Alexandr Wang, as Meta's Chief AI Officer [8]. - The entire AI department has been rebranded as the Meta Superintelligence Lab, led by Alexandr Wang and a core team of newly hired researchers [9]. Group 3: Future Directions and Concerns - Meta's spokesperson stated that the company's stance on open-source AI remains unchanged, planning to continue releasing leading open-source models while also training a combination of open-source and closed-source models [13]. - The discussions within the Superintelligence Lab are still in preliminary stages, and any major changes will require CEO Mark Zuckerberg's approval [13]. - The uncertainty surrounding Meta's potential shift to closed-source models raises concerns for startups relying on open-source models and the academic community, which heavily depends on open-source resources [16][20].
速递|Meta开源信仰动摇:传高层密议闭源Behemoth模型,Alexandr Wang 力主封闭路线
Z Potentials· 2025-07-15 03:14
Meta 发言人在声明中表示: " 我们对开源 AI 的立场没有改变,计划继续发布领先的开源模型。历史 上我们并非公开所有研发成果,未来也预计将同时训练开源和闭源模型。 " 据知情人士透露, Meta 近几周还暂停了其旗舰大语言模型 Llama 4 最大版本的开发工作,这标志着 该模型遭遇了最新挫折。 图片来源: U nsplash 据知情人士透露, Meta 近几周来一直在讨论开发封闭式人工智能模型,这将标志着其当前专注于开 源或免费模式的战略转变。 知情人表示,包括 Meta 首席人工智能官 Alexandr Wang 在内的一些高管建议公司不应开源其最先 进的模型。 另一位人士称,其他高管则认为开源仍具优势,因 Meta 正试图追赶竞争对手。 这些讨论正值 Meta 对人工智能业务进行全面改革之际,此前该公司在今年早些时候遭遇挫折。这家 社交媒体巨头上月敲定了向 Scale AI 投资 143 亿美元的协议,并聘请了这家数据标注公司的 CEO Wang 。 Meta 还聘用了前 GitHub 首席执行官 Nat Friedman 和前 Safe Superintelligence 首席执行官 Danie ...
速递| 谷歌以24亿美元挖走Windsurf核心技术后,剩余资产由AI新贵Cognition接手
Z Potentials· 2025-07-15 03:14
该公司将其生成式 AI 编程工具 Devin 宣传为 " 全球首个 AI 软件工程师 " 。 Cognition 的收购内容包括 Windsurf 的知识产权、剩余员工、现金、资产和品牌。 Cognition 联合创 始人兼CEO Scott Wu 在员工备忘录中写道, Windsurf 年经常性收入达 8200 万美元,其企业业务一 直保持快速增长。 Windsurf 所有员工都将参与此次交易的财务分配,且他们的股权兑现时间表将提前。 吴写道。 " 杰 夫和我共同努力,确保每一位员工在这笔交易中都得到尊重和妥善安置, " 吴在备忘录中表示。 Cognition 最近一次融资是在春季,以 40 亿美元估值筹集了数亿美元资金,这笔由 8VC 领投的交易 中,这家乔·朗斯代尔支持的风险投资公司参与其中。根据 PitchBook 数据,该公司迄今已从包括 Founders Fund 、 Khosla Ventures 和 Conviction Partners 在内的投资者处筹集了超过 3 亿美元。 图片来源: Cognition AI 人工智能编程初创公司 Cognition AI 已同意收购 Windsurf ...
AI“众神之战”:对抗“星际之门”,扎克伯格要建“普罗米修斯”
Hua Er Jie Jian Wen· 2025-07-15 02:53
Meta正在发起一场史无前例的战略转型,以扭转其在基础模型竞赛中的落后局面。 7月15日据华尔街见闻,Meta首席执行官扎克伯格周一表示,将投资数千亿美元建设几座大型数据中 心,其中首个数据中心普罗米修斯(Prometheus)预计将于明年投入使用。 据报道,Meta正在效仿xAI,采用更灵活、建设速度更快的"帐篷式"数据中心设计,并同时在俄亥俄州 和路易斯安那州秘密建设两个"吉瓦级"(GW)的超级计算集群,项目内部代号分别为普罗米修斯 (Prometheus)和亥伯龙(Hyperion)。 在创始人扎克伯格的亲自推动下,这家年现金流高达千亿美元的广告巨头正不计成本地重金投入算力基 础设施和顶尖人才,旨在追赶并超越OpenAI等竞争对手,其核心目标直指"超级智能"。 算力为王:从"帐篷"到"吉瓦级"集群 为了快速获得海量算力,Meta已将过去十年的数据中心建设蓝图束之高阁。 据报道,扎克伯格决定再次革新战略,拥抱一种将建设速度置于首位的全新设计。这种受xAI启发的"帐 篷式"结构,采用预制电力和冷却模块以及超轻型结构,牺牲了部分冗余(例如备用柴油发电机),以 求尽快让GPU集群上线运行。 为实现这一目标,Me ...