Workflow
机器之心
icon
Search documents
ICLR重磅回应:评审回滚、AC重置、封禁泄密者、严查贿赂串通
机器之心· 2025-12-04 03:18
机器之心报道 编辑:Panda ICLR 官方最新的回应来了。 对于全球 AI 研究社区而言,过去的一周无疑是动荡与至暗的。自 11 月 27 日 OpenReview 平台曝出重大 API 漏洞以来,这场波及 ICLR 2026 超过 10,000 篇投稿(占总数 45%)的数据泄露事故,迅速发酵为一场关于学术诚信的严峻危机。参阅《 学术圈炸了!ICLR 评审大开盒,原来低分是好友打的 》。 从漏洞被恶意利用导致作者与审稿人身份互通,到随之而来的大规模串通、针对审稿人的定点骚扰甚至贿赂尝试,整个评审流程被迫紧急熔断。 社区在震惊之余,更在焦急等待官方的定调:这场一度失控的同行评审将如何收场? 就在几个小时前,ICLR 公布了详尽的调查时间线与最终处理方案。为了彻底斩断恶意干扰的链条,官方做出了 「回滚评 审 数据」 并 「全员重新分配领 域主席(AC)」 的重磅决定,试图将评审状态强制恢复至讨论期开始前的「纯净版」,以此确保后续决策不再受已知泄露信息的污染。 除了流程上的「重启」,针对破坏者的清算也已开始:ICLR 明确表示, 泄露数据的始作俑者已被平台封禁 ,而任何被发现试图利用泄露信息进行串通的 论文, ...
估值7.5亿美元初创意欲「撬动」8000亿半导体市场?前谷歌AlphaChip主导者创业研发「AI芯片设计自动化」
机器之心· 2025-12-04 03:18
Core Viewpoint - Ricursive Intelligence aims to revolutionize chip design by using AI to autonomously create advanced chips, which could lead to a self-reinforcing cycle of AI and chip development, significantly impacting the AI and semiconductor industries [1][3]. Company Overview - Ricursive Intelligence was founded by former Google researchers Anna Goldie and Azalia Mirhoseini, both of whom have extensive backgrounds in AI and chip design [5][6]. - The founders previously led the AlphaChip project at Google, which introduced a novel reinforcement learning method for chip layout design, enabling faster and more efficient chip creation [8][10]. Technological Innovation - The core innovation of Ricursive Intelligence lies in applying recursive intelligence principles to complex chip design, aiming to automate the entire design process, which traditionally takes 2-3 years and costs hundreds of millions of dollars [11]. - The company plans to streamline chip design into three phases, allowing any tech company to design custom chips from scratch in a matter of weeks or even days [12]. Market Potential and Investment - Ricursive Intelligence has attracted attention from over 50 venture capital firms and secured $35 million in funding from Sequoia Capital and Striker Venture Partners, achieving a valuation of $750 million before launching any products [12]. - The startup is positioned to disrupt the $800 billion chip industry by optimizing the most time-consuming aspects of chip design and enabling companies without dedicated design teams to create custom chips for various applications [13].
突破具身智能任务规划边界,刷新具身大脑多榜单SOTA,中兴EmbodiedBrain模型让具身大脑学会「复杂规划」
机器之心· 2025-12-03 08:30
Core Insights - The article discusses the development of the EmbodiedBrain model by ZTE NebulaBrain Team, which aims to address the limitations of current large language models (LLMs) in embodied tasks, focusing on robust spatial perception, efficient task planning, and adaptive execution in real-world environments [2][4]. Group 1: Model Architecture - EmbodiedBrain utilizes a modular encoder-decoder architecture based on Qwen2.5-VL, achieving an integrated loop of perception, reasoning, and action [5]. - The model processes various multimodal inputs, including images, video sequences, and complex language instructions, generating structured outputs for direct control and interaction with embodied environments [8][10]. - Key components include a visual transformer for image processing, a lightweight MLP for visual-language integration, and a decoder that enhances temporal understanding of dynamic scenes [9][10]. Group 2: Data and Training - The model features a structured data architecture designed for embodied intelligence, ensuring alignment between high-level task goals and low-level execution steps [12]. - Training data encompasses four core categories: general multimodal instruction data, spatial reasoning data, task planning data, and video understanding data, with a focus on quality through multi-stage filtering [14][15]. - The training process includes a two-stage rejection sampling method to enhance model perception and reasoning capabilities, followed by a multi-task reinforcement learning approach called Step-GRPO to improve long-sequence task handling [20][21]. Group 3: Evaluation System - EmbodiedBrain establishes a comprehensive evaluation system covering general multimodal capabilities, spatial perception, and end-to-end simulation planning, addressing the limitations of traditional offline assessments [26][27]. - The model demonstrates superior performance in various benchmarks, including MM-IFEval and MMStar, indicating its enhanced multimodal capabilities compared to competitors [28][29]. - In spatial reasoning and task planning evaluations, EmbodiedBrain achieves significant improvements, showcasing its ability to perform complex tasks effectively [30][31]. Group 4: Case Studies and Future Outlook - The model successfully executes tasks involving spatial reasoning and end-to-end execution, demonstrating its capability to generate coherent action sequences based on complex instructions [37][41]. - ZTE plans to open-source the EmbodiedBrain model and its training data, aiming to foster collaboration in the field of embodied intelligence and address existing challenges in data accessibility and evaluation standards [42][43]. - Future developments will focus on multi-agent collaboration and enhancing adaptability across various real-world robotic platforms, pushing the boundaries of embodied intelligence applications [43].
老外傻眼!明用英文提问,DeepSeek依然坚持中文思考
机器之心· 2025-12-03 08:30
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which show significant improvements in reasoning capabilities, with the former being comparable to GPT-5 and the latter performing similarly to Gemini-3.0-Pro [1][4] - There is a notable phenomenon where DeepSeek switches to Chinese during reasoning, even when queries are made in English, leading to discussions about the efficiency of Chinese in processing information [4][6] Group 1: Model Performance - The new models exhibit enhanced reasoning speed, attracting interest from overseas researchers [1] - The comment section reflects a consensus that Chinese characters have a higher information density, requiring fewer characters to express the same meaning compared to English [4][6] Group 2: Cross-Lingual Reasoning - Research indicates that using non-English languages for reasoning can lead to better performance and reduced token consumption, as shown in the paper "EfficientXLang" [7][8] - The study found that reasoning in non-English languages can achieve a token reduction of 20-40% without sacrificing accuracy, with DeepSeek R1 showing reductions from 14.1% (Russian) to 29.9% (Spanish) [11] Group 3: Language Efficiency - Although Chinese can save reasoning token costs compared to English, it is not the most efficient language; Polish ranks highest in long-context tasks [12][14] - The performance of models varies significantly based on the language used for instructions, with English not being the top performer in long-context tasks [14][18] Group 4: Training Data Influence - The prevalence of Chinese training data in domestic models explains the tendency for these models to think in Chinese [20][21] - The phenomenon of models like OpenAI's o1-pro occasionally using Chinese during reasoning raises questions about the influence of training data composition [24][25]
原来这届中国AI年轻人,已经卷到业界都惊了
机器之心· 2025-12-03 04:01
Core Viewpoint - The article discusses a recent advertising algorithm competition organized by Tencent, highlighting the innovative approaches taken by participants to tackle the challenges of recommendation systems, particularly in addressing the "cold start" problem and utilizing generative methods for better user engagement [10][11][15]. Group 1: Competition Overview - The competition lasted over five months, attracting more than 8,000 participants and 2,800 teams, making it a highly competitive technical marathon [22]. - The prize pool for the competition was set at 3.6 million yuan, with the champion team eligible for a 2 million yuan reward [11]. - Participants were provided with desensitized multimodal historical behavior data, which included text, visual, and collaborative behaviors, to make predictions [17][21]. Group 2: Technical Challenges and Innovations - The competition focused on generative advertising recommendation, a new direction in the last few years, which requires participants to explore and innovate due to the lack of existing reference materials [21]. - Many teams attempted to integrate various modalities and address issues such as data noise and missing values, reflecting real-world complexities [21][28]. - Participants showcased innovative solutions, including different generative frameworks and methods for aligning multimodal embeddings, demonstrating a strong understanding of both academic and practical applications [31]. Group 3: Talent Development and Future Prospects - Tencent's Vice President, Jiang Jie, noted a significant improvement in students' understanding of large models and their ability to produce solutions closely aligned with industry needs [29]. - Outstanding participants will be included in Tencent's "Qingyun Plan," which aims to nurture top talent by providing access to resources and mentorship [35]. - The competition highlighted the importance of collaborative learning and the potential for young talents to contribute significantly to the AI field, indicating a promising future for China's AI development [35].
为什么给机器人装上昂贵的触觉传感器,反而让它变笨了?
机器之心· 2025-12-03 04:01
这项工作由伊利诺伊大学香槟分校 (UIUC)、哈佛大学、哥伦比亚大学和麻省理工学院 (MIT) 的合作完成 。 我们的解决方案:组合策略 (Compositional Policies) 为什么特征拼接 (Feature Concatenation)会在机器人感知和决策中失效? 想象一下,你在黑漆漆的背包里找钥匙。你的眼睛此时毫无用处,全靠指尖的触觉,这对你来说轻而易举 ,但在机器人领域,这却是一个非常困难的问题。 残酷的真相: 目前的机器人学习主流的多传感器融合的算法(Feature Concatenation)在处理这种任务时彻底失败了。我们的实验数据显示,当你给机器人加上触 觉数据试图让它更聪明时,它的抓取成功率竟然从 35% 暴跌至 5%!为什么? 因为传统的方法把偶尔出现的关键触觉信号当作了 "噪音" 直接过滤掉了。 当前方法的局限性 目前的多模态机器人学习方法通常使用 特征拼接 (Feature Concatenation) :提取所有传感器的嵌入 (embeddings),将其拼接成一个大向量,然后输入到一个单一 的神经网络策 略中 。 论文标题: Multi-Modal Manipulatio ...
借鉴人脑「海马体-皮层」机制,红熊AI重做了一个「记忆系统」
机器之心· 2025-12-03 04:01
Core Insights - The article emphasizes that memory is becoming a critical breakthrough in the evolution of AI, transitioning from "instant answer tools" to "personalized super assistants" [1][4] - A new machine learning paradigm called "Nested Learning" has been proposed, allowing large language models to learn new skills without forgetting old ones, marking significant progress towards AI that mimics human memory [3][4] Group 1: Shifts in AI Landscape - The focus of large models is shifting from size and speed to memory capabilities and understanding user needs, indicating a new competitive landscape in AI [4][5] - Current large models struggle with long-term memory due to inherent limitations in their architecture, leading to issues like forgetting critical user information during interactions [6][7] Group 2: Memory Mechanisms - Existing models typically have context windows of 8k-32k tokens, which can lead to early information being "pushed out" during long conversations, causing loss of context [6] - The lack of a shared memory mechanism among multiple agents results in "memory islands," where users must repeatedly provide information, diminishing the user experience [7] Group 3: Innovations in Memory - Companies like Google, OpenAI, and Anthropic are focusing on enhancing memory capabilities in AI models, responding to industry demands for long-term, stable, and evolving memory systems [7][10] - Red Bear AI has developed "Memory Bear," a product that addresses the memory limitations of traditional models by implementing a human-like memory architecture [10][11] Group 4: Memory Bear's Architecture - "Memory Bear" utilizes a hierarchical, dynamic memory structure inspired by the human brain's hippocampus and cortex, allowing for efficient memory management [11][13] - The system distinguishes between explicit memory (easily codified information) and implicit memory (subjective understanding), enhancing its ability to recall and utilize user-specific data [15][16] Group 5: Practical Applications and Impact - "Memory Bear" has shown significant improvements in various applications, such as AI customer service, where it creates dynamic memory maps for users, enhancing interaction quality and reducing the need for repetitive information sharing [20][21] - In marketing, "Memory Bear" tracks user behavior to create personalized marketing strategies, moving beyond traditional recommendation systems [22] - The technology has also improved knowledge acquisition efficiency in organizations and personalized education experiences, demonstrating its versatility across sectors [23][24] Group 6: Industry Consensus and Future Directions - The consensus in the industry is that memory capabilities are essential for advancing AI technology and applications, with increasing investments and explorations into human-like memory systems [24]
刚刚,「欧洲的DeepSeek」发布Mistral 3系列模型,全线回归Apache 2.0
机器之心· 2025-12-03 00:06
Core Viewpoint - Mistral AI has launched the Mistral 3 series of open models, which are positioned as high-performance, cost-effective alternatives in the AI model landscape, particularly in response to competition from DeepSeek [2][4][28]. Model Details - The Mistral 3 series includes multiple models: Mistral 3 (14B, 8B, 3B) with base, instruction-tuned, and reasoning versions [5][19]. - Mistral Large 3, a state-of-the-art open model, features a total parameter count of 675 billion and 41 billion active parameters, trained on 3000 NVIDIA H200 GPUs [7][5]. Performance and Benchmarking - Mistral Large 3 ranks second in the OSS non-inference model category on the LMArena leaderboard, indicating it is one of the best-performing open models available [14]. - The model demonstrates strong performance in general prompt tasks and excels in image understanding and multilingual dialogue [7][14]. Collaboration and Optimization - Mistral has partnered with vLLM and Red Hat to enhance accessibility and efficiency for developers using Mistral Large 3, utilizing optimized checkpoints for better performance [17][18]. - The collaboration with NVIDIA focuses on advanced optimization techniques, ensuring that Mistral models leverage high-bandwidth memory for demanding workloads [17][18]. Cost-Effectiveness - Mistral claims that its models offer the best cost-performance ratio among open-source models, with instruction models performing comparably or better than competitors while generating tokens at a significantly lower rate [22][28]. Availability and Customization - Mistral 3 models are available on various platforms including Mistral AI Studio, Amazon Bedrock, and Azure Foundry, among others [25]. - The company also offers custom model training services to organizations seeking tailored AI solutions for specific tasks or environments [27].
句子级溯源+生成式归因,C²-Cite重塑大模型可信度
机器之心· 2025-12-03 00:06
在人工智能快速发展的今天 ,大语言模型已经深入到我们工作和生活的方方面面。然而,如何让AI生成的内容更加可信、可追 溯, 一直是学术界和工业界关注的焦点问题。想象一下 ,当你向ChatGPT提问时,它不仅给出答案,还能像学术论文一样标注每 句话的信息来源——这就是"溯源大语言模型"要解决的核心问题。 北邮百家 AI团队 联合小米大模型团队 提出的 溯源大模型 C²-Cit e,首创上下文感知的归因 生成技术,不仅能让大模型在 生成内容时自动标注精准的信息来源,更能确保生成内容与引用的外部知识高度语义对齐,实现每一 处表述 都有溯源依据 、 与参考来源深度协同,从根本上解决大模型生成内容的可信度问题。 该工作 已被 国际顶级会议 WSDM 2026 收录 。 C²-Cit e 针对现 有 归因模型 存在的关 键缺陷 , 通过引入 "上下文感知"机制, 让引用标记从被动的占位符转变为带有上下 文语义的特殊令牌 , 显著提升了引用质量和 模型 回答准确性 。 论文标题: C²-Cite:Contextual-Aware Citation Generation for Attributed Large Languag ...
这下Altman急了,OpenAI紧急启动「红色警报」
机器之心· 2025-12-02 09:18
Core Viewpoint - OpenAI has declared a "Code Red" status to address competitive pressures, particularly from Google, as it seeks to enhance ChatGPT and maintain its market position [1][6][9]. Group 1: Competitive Landscape - Google has rapidly regained its footing with the Gemini chatbot, increasing its monthly active users from 450 million in July to 650 million in October, posing a significant threat to OpenAI [9]. - Other competitors like Anthropic and xAI are also advancing in various technological directions, intensifying the competitive environment [4][10]. Group 2: OpenAI's Current Challenges - OpenAI's growth rate for ChatGPT has shown signs of slowing, as indicated by CFO Sarah Friar, which raises concerns about sustaining high valuations amid significant cash burn [8]. - The company is seeking approximately $100 billion in new financing to support its extensive cash consumption and ongoing technological development [8]. Group 3: Strategic Priorities - OpenAI is shifting its resource allocation towards core projects, delaying the development of non-essential products, including advertising initiatives [5][6]. - A new reasoning model is set to be released, which is claimed to outperform Google's Gemini 3, aimed at enhancing ChatGPT's capabilities [12]. Group 4: User Engagement and Product Development - OpenAI plans to allow highly customizable interactions for its 800 million weekly active users, striving to position ChatGPT as a true "personal assistant" [13]. - The company aims to optimize model behavior to reduce instances of the AI refusing to answer benign questions and improve its performance in public rankings [13].