机器之心
Search documents
刚刚,英伟达CUDA迎来史上最大更新!
机器之心· 2025-12-06 04:08
Core Insights - NVIDIA has officially released CUDA Toolkit 13.1, marking the largest update in 20 years since the inception of the CUDA platform in 2006 [2] - The update introduces CUDA Tile, a new programming model that allows developers to write algorithms at a higher abstraction level, simplifying the use of specialized hardware like Tensor Cores [4][5] Summary by Sections CUDA Tile - CUDA Tile is the central update in NVIDIA CUDA Toolkit 13.1, enabling developers to abstract specialized hardware details and write GPU kernel functions at a higher level than the traditional SIMT (Single Instruction Multiple Threads) model [4][6] - The Tile model allows developers to specify data blocks called "Tiles" and the mathematical operations to be performed on them, with the compiler automatically managing workload distribution across threads [7][8] - CUDA 13.1 includes two components for Tile programming: CUDA Tile IR, a new virtual instruction set architecture, and cuTile Python, a domain-specific language for writing array and Tile-based kernel functions in Python [9] Software Updates - The update introduces support for Green Contexts, which are lightweight contexts that allow for finer-grained GPU resource allocation and management [19][20] - CUDA 13.1 also features a customizable split() API for building SM partitions and reducing false dependencies between different Green Contexts [21] - The Multi-Process Service (MPS) has been enhanced with memory locality optimization partitions (MLOPart) and static SM partitioning for improved resource allocation and isolation [23][28] Developer Tools - New developer tools include performance analysis tools for CUDA Tile kernel functions and enhancements to Nsight Compute for better analysis of Tile statistics [32] - The NVIDIA Compute Sanitizer has been updated to support compile-time patching for improved memory error detection [33] Mathematical Libraries - The core CUDA toolkit's mathematical libraries have received performance updates for the new Blackwell architecture, including enhancements to cuBLAS and cuSOLVER for better matrix operations [37][41] - New APIs have been introduced for cuBLAS and cuSPARSE, providing improved performance for specific operations [40][46]
Yann LeCun离开Meta后首篇论文?使用了宇树机器人做研究
机器之心· 2025-12-06 04:08
Core Insights - The article discusses a groundbreaking research paper that introduces a method called GenMimic, enabling humanoid robots to perform actions generated from AI video models without prior examples [1][3][4]. Research Contributions - The research presents a universal framework for humanoid robots to execute actions generated by video models [4]. - GenMimic employs a new reinforcement learning strategy that utilizes symmetric regularization and selectively weighted 3D keypoint rewards for training, allowing generalization to noisy synthetic videos [4]. - The team created a synthetic human action dataset named GenMimicBench, which serves as a scalable benchmark for evaluating zero-shot generalization and policy robustness [4][8]. GenMimicBench Dataset - GenMimicBench consists of 428 generated videos created using advanced video generation models Wan2.1 and Cosmos-Predict2 [9][11]. - The dataset includes a wide range of subjects, environments, and action types, from simple gestures to complex interactions with objects [11][13]. - It is designed to stress-test the robustness of humanoid robot control strategies under varying visual and action distributions [13]. Methodology Overview - The proposed method involves a two-stage process for executing humanoid robot actions from generated videos [15][17]. - The first stage focuses on reconstructing the humanoid robot's 4D model from the input RGB video, while the second stage translates this model into executable actions [17][18]. - The strategy emphasizes robustness to variations and noise in the input data by using 3D keypoints instead of joint angles [19][20]. Experimental Results - The team conducted extensive experiments on both the GenMimicBench dataset and a real-world 23-DoF humanoid robot, demonstrating significant improvements over strong baseline models [29][30]. - In simulations, GenMimic achieved a success rate (SR) of 29.78% and outperformed existing models in various metrics [31]. - Real-world experiments showed that the strategy successfully replicated a wide range of upper-body actions, although challenges remained with lower-body movements [34][35].
AAAI 2026|新突破:北大彭宇新团队提出可见光-红外终身行人重识别方法CKDA
机器之心· 2025-12-06 04:08
Core Insights - The article discusses the development of a novel method for lifelong pedestrian re-identification called CKDA, which aims to continuously learn new discriminative information from incoming data while retaining the ability to recognize known data across different modalities, specifically visible light and infrared images [2][6]. Group 1: Background and Motivation - Lifelong pedestrian re-identification focuses on recognizing the same individual across different scenarios by continuously learning from pedestrian data collected in various environments [6]. - Existing methods struggle to balance the acquisition of modality-specific knowledge and the retention of cross-modal common knowledge, leading to conflicts that hinder effective learning [9][11]. Group 2: Technical Solution - The CKDA method introduces a framework that includes three main modules: 1. Cross-modal common prompts, which extract shared discriminative knowledge by removing style information unique to each modality [12]. 2. Single-modal specific prompts, which enhance the retention of knowledge specific to each modality while avoiding interference between modalities [20]. 3. Cross-modal knowledge alignment, which aligns new and old knowledge in independent feature spaces to improve the model's ability to balance discriminative knowledge across modalities [24][25]. Group 3: Experimental Results - The CKDA method achieved the best performance on four commonly used visible-infrared pedestrian re-identification datasets, with an average mAP of 36.3% and R1 accuracy of 39.4% [28]. - The results indicate that CKDA effectively enhances the model's perception and retention capabilities for both visible and infrared modality information through complementary prompts [30][29].
Skills vs MCP,谁才是「大模型的 HTTP 时刻」?
机器之心· 2025-12-06 02:30
本文来自PRO会员通讯内容,文末关注「机器之心PRO会员」,查看更多专题解读。 目录 01. builder 比 user 还多,MCP 仅是「旧瓶装新酒」? 一年过去,社区对于 MCP 的定位仍有争议?平均 25 个用户对应 1 个开发者,MCP 目前更多是开发者自娱自乐的产物?... 02 . Not Skills vs MCP, but Skills with MCP? 「人如其名」,Skills 真是来 kill MCP 的?MCP 能做但 Skills 不能做的,现在也没什么用?... 03 . 过去一年,围绕 MCP 的 infra 层格局逐渐清晰? MCP 大规模落地还得看下一个「微信小程序」入口的出现?... builder 比 user 还多,MCP 仅是「旧瓶装新酒」? 引言: 近期,Anthropic 新推出的 Claude Skills 在社区内收获了相对一致的好评,被不少开发者视为「终于能直接拿来用」的能力;几乎同一时间,MCP 协议的「一周年纪 念日」却在一片「寂静」中度过。实际上从发布以来,MCP 的「builder 多于 user」、只是「旧瓶装新酒」的质疑始终存在,而在 Sk ...
AAAI 2026 | 北航、东京大学填补AI「语义鸿沟」,过程感知视频理解如何找到「状态」锚点?
机器之心· 2025-12-06 01:15
摘要 / 导语: 在具身智能与视频理解飞速发展的今天,如何让 AI 真正 "看懂" 复杂的操作步骤?北京航空航天大学陆峰教授团队联合东京大学,提出视频理解新 框架。该工作引入了 "状态(State)" 作为视觉锚点,解决了抽象文本指令与具象视频之间的对齐难题,已被人工智能顶级会议 AAAI 2026 接收。 在当今的视频理解和具身智能领域,教 AI 理解 "做菜" 或 "修理电器" 等程序性活动具有重要意义。然而,当这一需求遭遇现有的图文对齐范式时,一个难以忽视 的「语义鸿沟」(Semantic Gap)横亘在研究者面前。 现有的程序性视频学习方法面临数据困境:要么依赖极其昂贵的时间密集型标注,难以扩展;要么利用 WikiHow 等外部知识库进行弱监督学习,将视频帧与 "任 务(Task)" 或 "步骤(Step)" 的文本描述强行对齐。 但弱监督的方式仍然存在优化空间:抽象的语言描述与具体的视觉像素之间存在断层。当文本指令是 "切橙子"(Cut oranges)时,视频中呈现的是橙子从完整状 态到果肉外露的连续视觉形态变化,而非明确的动作过程。二者之间的不匹配导致模型难以准确识别和理解视频所表达的实际过程。 ...
IJCAI 2026每篇投稿收100美元,学术圈却评价颇高
机器之心· 2025-12-06 01:15
Core Viewpoint - The article discusses the impact of AI on academic conferences, particularly focusing on the surge in paper submissions and the challenges faced in the peer review process due to AI-generated content [3][5][6]. Group 1: Impact of AI on Academic Conferences - The number of submissions to top conferences has significantly increased, leading to an overwhelmed review system and a decline in review quality [3][4]. - At ICLR 2026, a study revealed that 21% of review comments were entirely generated by AI, with varying degrees of AI involvement in the editing process for a total of 56% of reviews [3]. - The credibility of academic conferences is at risk due to the influx of low-quality submissions, prompting conferences to implement stricter policies [5][6]. Group 2: New Submission Policies - IJCAI 2026 has introduced a new submission model that requires a $100 fee for each paper submitted to combat the overwhelming number of submissions [7][8]. - The submission fee is designed to exempt main papers from charges if the author has not submitted any other papers to IJCAI-ECAI 2026, encouraging high-quality submissions [10][11]. - The revenue generated from the submission fees will be allocated to support the reviewer community, aiming to enhance the quality of peer reviews [12][13]. Group 3: Statistics on Submissions - ICLR 2025 received 11,565 submissions, a 60% increase from the previous year, with an acceptance rate of 32.08% [14]. - AAAI 2026 saw nearly 29,000 submissions, with approximately 20,000 coming from China, representing two-thirds of the total [14]. - NeurIPS 2025 had 21,575 valid submissions, with 5,290 accepted, resulting in an overall acceptance rate of 24.52% [14]. Group 4: Community Response - The academic community has generally responded positively to the introduction of submission fees, viewing it as a necessary step to improve the quality of submissions and reviews [17].
全球引才:Faster R-CNN、ResNet作者,中国科大任少卿,招募教授、学者和学生
机器之心· 2025-12-05 10:17
Core Viewpoint - The article highlights the achievements and contributions of Professor Ren Shaoqing in the field of artificial intelligence, particularly in deep learning and computer vision, emphasizing his role in advancing key technologies that impact various sectors such as autonomous driving and medical imaging [4][5][6]. Group 1: Academic Achievements - Professor Ren has made foundational and pioneering contributions in deep learning, computer vision, and intelligent driving, with his research serving as a core engine for critical areas of national economy and livelihood [5]. - His academic papers have been cited over 460,000 times, ranking him first among domestic scholars across all disciplines [5]. - He has received multiple prestigious awards, including the 2023 Future Science Prize in Mathematics and Computer Science and the 2025 NeurIPS Time Test Award [5]. Group 2: Key Research Contributions - The paper "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," awarded the NeurIPS 2025 Time Test Award, is considered a milestone in computer vision, having been cited over 98,000 times since its publication in 2015 [6]. - Faster R-CNN introduced a fully learnable two-stage pipeline that replaced traditional methods, achieving high precision and near real-time detection, significantly influencing the development of visual models over the past decade [6]. Group 3: Research Institute and Talent Recruitment - The General Artificial Intelligence Research Institute at the University of Science and Technology of China focuses on cutting-edge areas such as AI, world models, embodied intelligence, and autonomous driving, aiming for integrated innovation in research, talent cultivation, and industrial application [7]. - The institute is actively recruiting for various positions, including professors, researchers, postdoctoral fellows, engineers, and students at different academic levels, with a commitment to supporting high-level talent projects [9][10].
登顶SuperCLUE DeepSearch,openPangu-R-72B深度搜索能力跃升
机器之心· 2025-12-05 10:17
Core Insights - The article highlights the rapid development of large model inference and agent tool capabilities, with a focus on the recent SuperCLUE DeepSearch evaluation report, where the domestic model openPangu-R-72B ranked first in complex information retrieval tasks, showcasing the strength of domestic Ascend computing power in large model development [1][15]. Model Performance - In the SuperCLUE DeepSearch evaluation, openPangu-R-72B achieved a score of 73.33, outperforming other models such as Gemini-3-Pro-Preview and GPT-5.1(high), which scored 70.48 [2]. - The model excelled in various task categories, particularly in humanities and social sciences (75.47) and natural sciences (83.33) [2]. Technical Architecture - openPangu-R-72B is based on a redesigned architecture that balances efficiency and performance, utilizing a mixture of experts (MoE) model with an 80 out of 8 expert selection mechanism, maintaining 15 billion active parameters from a total of 74 billion [4]. - The model was trained on 24 trillion tokens and can handle long sequences of up to 128k, which is crucial for deep search tasks [4]. Optimization Techniques - The model incorporates several optimizations, including the introduction of parameterized Sink Token technology to stabilize training and enhance quantization compatibility [7]. - It employs a combination of K-Norm and Depth-Scaled Sandwich-Norm architectures to reduce computational overhead while maintaining stability and flexibility in expression [7]. - The attention architecture has been optimized for precision and efficiency, achieving a 37.5% reduction in KV cache while enhancing the model's ability to capture fine-grained semantic relationships [7][8]. DeepSearch Capabilities - The model's success in deep search tasks is attributed to three key strategies: long-chain question answering synthesis, non-indexed information processing, and a fast-slow thinking integration approach [10]. - The long-chain QA synthesis improved the average difficulty of questions by 10% and introduced a verification agent to enhance training accuracy [12]. - The model's workflow includes a cycle of focusing on key URLs, crawling, and document QA to gather deep information beyond traditional search engine capabilities [12]. Domestic Computing Power - The achievement of openPangu-R-72B in the SuperCLUE DeepSearch evaluation underscores the effective integration of domestic computing power with large model research and development [15]. - The model's sibling, openPangu-718B, also performed well, securing the second position in the general ranking, indicating the comprehensive capabilities of the openPangu series across different task scenarios [15].
基于文本AI的终结?Agent协作可直接「复制思维」,Token效率暴涨
机器之心· 2025-12-05 04:08
Core Insights - The article discusses the emergence of multi-agent systems (MAS) in the Agentic AI era, emphasizing the shift from individual models to collaborative problem-solving among AI agents [2][5] - A new framework called LatentMAS is introduced, which allows agents to collaborate in latent space rather than through traditional text communication, enhancing efficiency and performance [5][14] Group 1: LatentMAS Framework - LatentMAS enables agents to exchange internal hidden layer representations and KV-cache working memory, resulting in higher performance and reduced token usage [5][10] - The framework is designed to support richer latent reasoning and lossless communication between agents, significantly lowering computational complexity compared to text-based MAS [15][16] Group 2: Experimental Results - Comprehensive experiments on nine benchmark tasks show that LatentMAS outperforms both single models and text-based MAS, with accuracy improvements of up to 14.6% and token usage reductions of 70.8% to 83.7% [6][20][22] - LatentMAS achieves end-to-end reasoning speed increases of 4× to 4.3× compared to traditional methods, demonstrating its efficiency [21][25] Group 3: Efficiency and Performance - The framework allows for complex reasoning processes while significantly reducing the number of tokens used, achieving higher accuracy with fewer output tokens [28][29] - LatentMAS can provide additional speed improvements of 2.6× to 7× over text-based MAS, even when the latter is optimized with vLLM services [25][28] Group 4: Semantic Richness - The latent representations generated by LatentMAS are shown to be semantically rich and diverse, surpassing the expressiveness of discrete tokens used in text-based systems [30][31] - The study indicates that the potential reasoning captured in LatentMAS is not only effective but also contains more nuanced internal representations compared to traditional methods [31][32]
字节前技术负责人创业,联手清华姚班校友,编程智能体世界登顶
机器之心· 2025-12-05 04:08
Core Insights - InfCode is defining the "Engineering Era" of AI programming, moving beyond the "Vibe Coding" concept introduced by Andrej Karpathy, which focuses on generating code from simple prompts [3][7]. Group 1: InfCode's Performance - InfCode achieved a Pass@1 score of 79.4% on the SWE-Bench Verified benchmark, surpassing leading models like GPT-5 and Claude, which scored around 70% [6][13]. - On the Multi-SWE-bench C++ subset, InfCode reached a 25.58% resolution rate, significantly outperforming competitors such as Claude 3.7 Sonnet (8.59%) and DeepSeek V3 (7.75%) [6][13]. Group 2: Technical Innovations - InfCode employs a multi-agent system designed for enterprise scenarios, marking a shift from individual efficiency to organizational evolution in AI coding [6][9]. - The system integrates "Code Intent Analysis," allowing it to understand the functional intent behind natural language descriptions, enhancing its ability to locate issues in large codebases [18][19]. - InfCode features a structured search engine based on Abstract Syntax Trees (AST), improving code retrieval accuracy compared to traditional text search tools [21][23]. Group 3: Repair Process and Methodology - The repair process of InfCode consists of two phases: generation and selection, allowing for multiple iterations to produce diverse patch candidates [30][33]. - InfCode utilizes a dual-agent architecture for code patch generation and testing, enabling continuous improvement and robustness of the generated patches [25][29]. Group 4: Team and Vision - The core team of InfCode, referred to as a "startup dream team," combines technical expertise with commercialization capabilities, positioning them uniquely in the competitive AI coding agent landscape [35][38]. - The team aims to transform the AI coding landscape from mere tool efficiency to a comprehensive reconstruction of the software engineering lifecycle, focusing on end-to-end value delivery [38].