Workflow
机器之心
icon
Search documents
离谱:256G内存比RTX5090还贵,你要为AI买单吗?
机器之心· 2025-12-26 03:06
Core Viewpoint - The article highlights the significant price increases in computer components, particularly memory, driven by the demand from AI applications, leading to a structural shortage in the market [5][6]. Group 1: Memory Price Surge - The price of high-end GPU RTX 5090 has reached an official starting price of $1999, potentially exceeding $3000 in the market, while a single 256GB DDR5 memory stick is now priced between $3500 and $5000 [3]. - The current memory price surge is attributed to AI's demand for computing power, which has led to a structural shortage in the memory market [5]. - OpenAI has secured a deal with Samsung and SK Hynix for up to 900,000 DRAM wafers per month, representing 40% of global DRAM monthly production, which has significantly reduced the capacity available for consumer markets [5]. Group 2: Impact on Technology Companies - Major tech companies like Microsoft and Google are struggling to secure memory supplies, with reports of procurement executives being dismissed due to failures in securing long-term supply agreements [8]. - Microsoft executives faced difficulties in negotiations with SK Hynix regarding supply terms, leading to heightened tensions during discussions [8]. - Google has been unable to secure additional capacity for its TPU needs, resulting in significant supply chain risks and personnel changes within its procurement team [8]. Group 3: Broader Market Implications - The demand for larger memory capacities is increasing as the concept of "AI PCs" emerges, with 32GB or 64GB becoming the new standard for running large models [6]. - The price increases are not limited to memory; hard drive prices have also surged, and the GPU market is experiencing extreme price inflation, with second-hand RTX 4090 cards priced around 20,000 [6]. - The memory price hikes are affecting not only consumers but also tech companies, with reports of layoffs due to supply chain issues [6][9]. Group 4: Innovations in Memory Technology - Groq, an AI chip startup, has developed a chip design that integrates SRAM directly, achieving a memory bandwidth of 80TB/s, which is over 20 times that of traditional HBM solutions [11]. - The acquisition of Groq by NVIDIA may be a strategic move to mitigate the impact of rising DRAM prices and explore new memory technology paths [12]. - There are differing opinions on the feasibility of using SRAM as the main memory, given its high cost and integration challenges with existing chip designs [14].
全异构、全异步的RLinf v0.2尝鲜版发布,支持真机强化学习,像使用GPU一样使用你的机器人!
机器之心· 2025-12-26 03:06
Core Insights - The article discusses the ongoing data debate in the field of embodied intelligence, particularly between simulation data and real machine data, emphasizing the need for infrastructure that supports various technological explorations [2] Summary by Sections RLinf v0.2 Features - RLinf v0.2 is designed for users adopting real machine routes, supporting real machine reinforcement learning [2][4] - Users can utilize robots as flexible resources similar to GPUs, allowing for easy integration and configuration through a YAML file, significantly lowering usage costs [5][6] - The system aims to enable large-scale distributed real machine reinforcement learning, addressing challenges in stability, usability, and flexibility [9] Heterogeneous Hardware Support - RLinf supports flexible configurations of heterogeneous software and hardware clusters, enhancing system throughput and training efficiency [11][12] - It allows integration of various hardware setups, such as high-fidelity simulators on RTX 4090 GPUs, training on large memory GPUs like A800, and running robot controllers on CPU machines [13][14] Asynchronous Off-Policy Algorithms - RLinf v0.2 introduces a fully asynchronous design, decoupling inference nodes from training nodes, which significantly improves training efficiency [16] - It incorporates typical off-policy reinforcement learning algorithms, enhancing data utilization by leveraging both online and offline data [16] Experimental Results - The initial version of RLinf focuses on small model real machine reinforcement learning, using the Franka robotic arm for two quick validation tasks: Charger and Peg Insertion [19][21] - The training process includes human-in-the-loop interventions to improve efficiency, with successful results documented in training videos [21][22] Community Engagement - The RLinf team expresses gratitude to its community of 2,000 users, whose feedback has driven continuous improvements and feature updates since its release [22]
无需再训练微调,一个辅助系统让GPT-5.2准确率飙到创纪录的75%
机器之心· 2025-12-25 05:26
Core Insights - The article emphasizes that the performance of AI is now determined more by the orchestration of inference rather than the foundational models themselves, suggesting that a well-designed agentic system can significantly enhance AI capabilities without altering the underlying models [1] Group 1: Poetiq's Testing Results - Poetiq reported that their meta-system achieved a score of 75% on the PUBLIC-EVAL dataset using the GPT-5.2 X-High model, which is approximately 15% higher than the previous state-of-the-art (SOTA) models, with each question costing less than $8 [3][7] - The PUBLIC-EVAL dataset includes basic reasoning tasks and standard NLP and mathematical reasoning tests, making it suitable for broad model evaluation [3] - Poetiq did not retrain or specifically optimize GPT-5.2, yet it showed significant improvements in accuracy and cost compared to previous models tested on the same dataset [7] Group 2: Future Implications and Model Exchange - If the performance trends observed in the PUBLIC-EVAL tests continue in the ARC Prize's SEMI-PRIVATE tests, the combination of "GPT-5.2 X-High + Poetiq" could outperform any previous system configurations [7] - Greg Kamradt, president of ARC Prize, expressed optimism about Poetiq's results, noting that the system appears capable of handling model exchanges effectively, although full validation awaits resolution of infrastructure issues with OpenAI API [7] Group 3: System Efficiency and Mechanisms - Poetiq's meta-system is designed to work with any leading model without requiring extensive retraining, allowing for rapid adaptation and performance enhancement as new models are released [15] - The meta-system employs an iterative reasoning process, which differs from traditional single-answer generation methods, incorporating two main mechanisms: iterative problem-solving cycles and self-auditing [16] - The iterative problem-solving cycle allows the system to generate potential solutions, receive feedback, and refine those solutions, while self-auditing enables the system to monitor its progress and determine when to terminate the process, thus reducing unnecessary computational costs [16]
越狱成功率飙升至87.6%,南京大学联合美团、上交破解主流视频生成模型安全漏洞
机器之心· 2025-12-25 05:26
RunawayEvil 创新性采用「策略 - 战术 - 行动」核心范式,精准破解传统单一模态、静态攻击在 I2V 场景下效果受限的行业痛点,为 I2V 模型的安全漏洞分析提供 了高效可靠的工具,为构建更稳健、安全的视频生成系统提供助力。 来自南京大学 PRLab 的王淞平、钱儒凡,在单彩峰教授与吕月明助理教授的联合指导下,提出首个面向图生视频(I2V)模型的多模态自进化越狱攻击框架 RunawayEvil。本研究联合了美团、上海交通大学等多家顶尖机构,共同完成了首个支持多模态协同与自主进化的 I2V 越狱攻击框架的研发。 行业痛点: 图生视频模型安全研究的三大核心缺口 图生视频(I2V)是融合图像视觉约束与文本语义引导,生成时空连贯、高保真动态内容的核心多模态技术,为内容创作、商业广告等领域提供高效创意支撑。然 而,其安全防护体系是脆弱的,尚未跟上技术落地步伐,成为制约行业稳健发展的关键瓶颈。 论文标题:RunawayEvil: Jailbreaking the Image-to-Video Generative Models 项目地址:https://xzxg001.github.io/RunawayEvi ...
腾讯按下AI加速键,人才、组织、开源动作密集
机器之心· 2025-12-25 05:26
编辑|冷猫 在外界感知中,腾讯在 AI 领域的动作更多被贴上稳健甚至克制的标签。 但在 2025 年的尾声,从人才引进到产品迭代再到组织变革,一系列密集信号的发出,也侧面表明这个巨头正在按下加速键。 12 月 17 日,机器之心报道证实, 前 OpenAI 研究员、清华校友姚顺雨(Vinces Yao)正式加入腾讯 ,出任「CEO / 总裁办公室」首席 AI 科学家,并直接向腾讯 总裁刘炽平汇报。 姚顺雨曾在其博客中探讨过 AI 发展的「下半场」逻辑,强调智能体与认知架构的重要性。如今,随着这位在 ToT(思维树)、ReAct 等领域做出突破性工作的青 年科学家入局,并统管 AI Infra 与大语言模型两大核心部门,腾讯 AI 的战略路径已经比较清晰: 通过顶尖研究与扎实工程的深度咬合,为用户打造真正好用的 AI 。 持续补强模型能力 上层应用的繁荣,很难离开底层基础模型的支撑。 此次架构调整中,新成立的 AI Infra 部被置于关键位置,负责构建大模型分布式训练、高性能推理服务等核心能力。 在模型层,腾讯混元此前一段时间已经展现出强劲的迭代能力。最新发布的混元 2.0 在复杂指令遵循和文本创作上表现国内 ...
TPU惹急黄仁勋,200亿美元拿下「TPU之父」核心团队、技术授权
机器之心· 2025-12-25 03:12
编辑|张倩、+0 在被谷歌 TPU 挑战霸主地位后,英伟达终于急了? 今天,人工智能芯片初创公司 Groq 发布了一则重磅消息,他们已经与英伟达就 Groq 的推理技术达成了非排他性许可协议。 这个协议并不是说英伟达要把 Groq 买下来,而是挖走 Groq 的几员大将 —— 创始人兼 CEO Jonathan Ross、总裁 Sunny Madra 及多名核心工程师。要知道, Jonathan Ross 曾在谷歌主导 TPU 的开发。2016 年底,他从谷歌离职,并带走了当时 TPU 核心 10 人团队中的 7 位 。这批人带走了 TPU 最核心的技术理念和设计 经验,在加州山景城共同创办了 AI 芯片公司 Groq。 这批人和他们的知识产权有多值钱?看看交易额就知道了。据 Groq 投资者、Disruptive Technology Advisers 的首席执行官 Alex Davis 透露,这笔交易价值约 200 亿 美元,这比该初创公司 9 月份的估值还高出 131 亿美元。 交易过后,Groq 将继续作为独立公司运营,由首席财务官 Simon Edwards 接任 CEO,其 GroqCloud 云 ...
刷新NAVSIM SOTA,复旦引望提出Masked Diffusion端到端自动驾驶新框架
机器之心· 2025-12-25 03:12
Core Insights - The article discusses the transition in end-to-end autonomous driving from a "modular" approach to a "unified" paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive generation paradigms [2] - It introduces the WAM-Diff framework, which innovatively incorporates discrete masked diffusion models into VLA autonomous driving planning, addressing the challenges of single-direction temporal generation [2][6] Group 1: WAM-Diff Framework - WAM-Diff utilizes Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, achieving an error control within 0.005 [6] - The framework employs Masked Diffusion as its backbone, allowing for parallel prediction of all token positions, significantly enhancing inference efficiency and enabling global optimization [6] - WAM-Diff explores decoding strategies, revealing that the reverse-causal strategy outperforms others in closed-loop metrics, validating the "end-to-begin" planning logic [9][20] Group 2: Performance Metrics - In the authoritative NAVSIM benchmark, WAM-Diff achieved state-of-the-art (SOTA) scores of 91.0 PDMS in NAVSIM-v1 and 89.7 EPDMS in NAVSIM-v2, demonstrating its potential in complex autonomous driving scenarios [3][18] - The model surpassed competitors like DiffusionDrive and ReCogDrive, indicating its robustness in balancing safety and compliance in real-world driving conditions [18] Group 3: Technical Innovations - WAM-Diff integrates a Low-Rank Adaptation Mixture-of-Experts (LoRA-MoE) architecture, which includes 64 lightweight experts for dynamic routing and sparse activation, enhancing model capacity and adaptability [11] - The Group Sequence Policy Optimization (GSPO) algorithm is introduced to bridge the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [14] Group 4: Conclusion - The emergence of WAM-Diff marks a significant step towards discrete, structured, and closed-loop autonomous driving planning, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25]
V-Thinker: 让模型像人一样「边画边想」
机器之心· 2025-12-25 01:20
Core Insights - The article introduces V-Thinker, a multi-modal reasoning framework aimed at enhancing visual interactive reasoning by enabling models to generate code and interact with images during the reasoning process [3][19][40]. Group 1: Framework and Methodology - V-Thinker combines cold-start supervised fine-tuning with reinforcement learning to allow models to autonomously generate code and interact with images, achieving a "think while drawing" visual reasoning paradigm [3][21]. - The framework includes a data evolution mechanism called Data Evolution Flywheel, which synthesizes and validates visual interactive reasoning data across diversity, quality, and difficulty dimensions [3][12]. - A progressive training paradigm is designed, starting with enhancing visual perception capabilities through a dataset called V-Perception-40K, followed by a two-stage training approach that integrates supervised fine-tuning and reinforcement learning [15][18]. Group 2: Data and Evaluation - The V-Interaction-400K dataset is constructed to support visual interactive reasoning and image-to-code conversion tasks, providing a foundational resource for the framework [3][13]. - VTBench is developed as an evaluation benchmark specifically for visual interactive reasoning, focusing on tasks that require interaction with images, such as adding auxiliary lines or marking key areas [19][20]. - The evaluation design includes three types of tasks that cover the complete process from basic perception to interactive reasoning, ensuring that the assessment reflects the model's visual interactive reasoning capabilities [23]. Group 3: Experimental Results - V-Thinker shows significant improvements in interactive reasoning tasks, outperforming baseline models with an average accuracy increase of over 12%, particularly excelling in instruction-guided interaction scenarios with a performance boost exceeding 22% [24]. - The model demonstrates enhanced visual interaction capabilities and generalization in common reasoning scenarios, achieving a 6% performance increase in complex multi-step reasoning tasks [25][26]. - The model's ability to generate diverse interactive paths during the reinforcement learning phase indicates a stronger strategy diversity and improved interpretability in the interactive reasoning process [29][31]. Group 4: Future Directions - The article emphasizes the potential for V-Thinker to advance the "Thinking with Images" direction, showcasing the model's ability to autonomously generate and execute code while interacting with images [40]. - It suggests that as model capabilities continue to improve, new possibilities for reasoning paradigms and application scenarios may emerge, including the potential for models to create knowledge [40]. - The authors acknowledge that there is still room for improvement in perception and interaction capabilities, indicating future work may involve incorporating different resolution perturbations [40].
微软定目标:2030年,彻底删除C、C++代码,换成Rust
机器之心· 2025-12-25 01:20
Core Viewpoint - Microsoft aims to eliminate C and C++ code from its systems by 2030, focusing on rewriting its entire codebase using AI and algorithms, as stated by Galen Hunt, a senior engineer at Microsoft [2][4]. Group 1: Microsoft's Strategy - Galen Hunt's team has set an ambitious goal for each engineer to write 1 million lines of code per month, supported by infrastructure for AI-driven code processing [4][5]. - The new programming language intended to replace C is Rust, which is seen as a more advanced language that addresses memory and concurrency safety issues while maintaining performance [10][11]. Group 2: Challenges and Concerns - There is skepticism regarding the feasibility of rewriting a vast amount of well-tested code quickly and the assumption that Rust will be superior in all aspects without bugs [10]. - Microsoft's previous attempts to replace C++ and other native languages have not gained acceptance, leading to concerns about memory usage in applications like Discord and Teams [11]. Group 3: AI's Role in Code Generation - Microsoft CEO Satya Nadella mentioned that 20-30% of the code is already generated by AI, with varying success across different programming languages [13]. - Microsoft CTO Kevin Scott predicts that by 2030, 95% of code will be AI-generated, although the reliability of this approach remains unproven [15].
北航提出代码大模型的 Scaling Laws:编程语言差异与多语言最优配比策略
机器之心· 2025-12-24 09:30
北航、人大和九坤投资共同撰写的论文 《Scaling Laws for Code: Every Programming Language Matters》 整理而成。 在代码大模型(Code LLMs)的预训练中,行业内长期存在一种惯性思维,即把所有编程语言的代码都视为同质化的文本数据,主要关注数据总量的堆 叠。然而,现代软件开发本质上是多语言混合的,不同语言的语法特性、语料规模和应用场景差异巨大。如果忽略这些差异,笼统地应用通用的 Scaling Laws,往往会导致性能预测偏差和算力浪费。 为了打破这一黑盒,研究团队耗费了相当于 33.6 万个 H800 GPU 时,进行了超过 1000 次实验。研究覆盖了从 0.2B 到 14B 的模型参数规模,以及高 达 1T 的训练数据量,系统性地对 Python、Java、JavaScript、TypeScript、C#、Go、Rust 这七种主流语言进行了解构。这项工作的核心贡献在于建立 了 区分语言特性的 Scaling Laws ,并据此提出了一套数学可解的最优数据配比方案。 图 1:论文提出的多语言 Scaling Law 与传统均匀分布基线的 Loss ...