量子位
Search documents
字节清华智能体自动写CUDA内核,比torch.compile加速2.11倍
量子位· 2026-03-03 07:02
Core Insights - The article discusses the successful collaboration between ByteSeed and Tsinghua AIR team to develop an AI system capable of generating high-performance GPU code [1] - The newly open-sourced CUDA Agent achieved optimal performance on the GPU kernel optimization benchmark KernelBench, with a pass rate of 98.8% and a speed-up ratio of 2.11 times compared to torch.compile [2][28] Group 1: GPU Kernel Optimization - GPU kernel optimization has traditionally been challenging, requiring deep understanding of GPU architecture, memory hierarchy, and thread scheduling [6] - The performance of model training and inference is significantly influenced by the quality of the underlying CUDA kernels [7] - Existing AI-assisted solutions have not fundamentally improved kernel optimization capabilities, being either non-training iterative optimizations or fixed execution-feedback loops [8] Group 2: CUDA Agent Development - The CUDA Agent is a comprehensive large-scale reinforcement learning system designed to learn how to generate and optimize high-performance CUDA kernels [9] - The training data for CUDA Agent was constructed through a three-phase process, resulting in 6000 synthetic training tasks [10][14] - The training process includes a robust anti-cheating mechanism to ensure the integrity of the generated tasks [12] Group 3: Training Methodology - The training environment utilizes a ReAct-style interaction loop, with a performance analysis and validation process to ensure the generated kernels exceed torch.compile by at least 5% [17] - A milestone-based discrete reward mechanism is implemented to reflect the true quality of the kernels generated [22] - The training pipeline is divided into multiple phases to maintain stability in long-context reinforcement learning scenarios, achieving a context window of 128K tokens [23][27] Group 4: Performance Evaluation - CUDA Agent outperformed commercial models significantly, with a faster rate of 96.8% compared to torch.compile and a geometric mean speed-up of 2.11 times [28][30] - In Level-1 and Level-2 tasks, CUDA Agent achieved a 100% pass rate, while Level-3 tasks had a pass rate of 94% and a faster rate of 90% compared to compile [29][30] - The performance gap between CUDA Agent and leading commercial models like Claude Opus 4.5 and Gemini 3 Pro is substantial, particularly in challenging tasks [30] Group 5: Open Source Contribution - The team has synchronized the open-source release of the training dataset CUDA-Agent-Ops-6K, which includes the complete filtering process and pollution control scheme for future research in reinforcement learning-based CUDA kernel optimization [32]
@所有人,2026真的需要自己上手用AI了丨年度AI盛会
量子位· 2026-03-03 07:02
Core Viewpoint - The article emphasizes the transition of AI from a niche technology to a mainstream tool that is now widely adopted in everyday life, marking a significant shift in its accessibility and application [2][5][18]. Group 1: AI's Mainstream Adoption - AI has evolved from being a topic of interest in the tech community to becoming a household name, especially after the Spring Festival, indicating its widespread acceptance [2]. - The presence of AI in various daily tasks, such as cooking, cleaning, and healthcare, showcases its integration into everyday life [3][5]. - The upcoming 2026 China AIGC Industry Summit aims to facilitate this transition by encouraging participation from AI entrepreneurs, developers, and users to explore practical applications of AI [5][12]. Group 2: Summit Details - The 2026 China AIGC Industry Summit will focus on the entire industry chain of generative AI, featuring both technology pioneers and application explorers [9]. - The summit will have over 60 industry leaders sharing insights, with significant online engagement, including over 350,000 live viewers and millions of total exposures [12]. - The agenda includes discussions on the necessity of adopting AI and practical integration across various sectors, such as healthcare and gaming, highlighting real-world experiences [13][14]. Group 3: Recognition of AIGC Enterprises - The article mentions the evaluation of noteworthy AIGC enterprises and products based on their performance and feedback over the past year, with results to be announced at the summit [19]. - The evaluation process will be grounded in objective data and expert opinions, ensuring credibility and professionalism in the selection [19]. - The summit will invite millions of industry professionals to witness the recognition of outstanding companies in the AIGC space [20].
这届MWC真成了中国AI主场,小米直接把AI从对话框里拽出来接管物理世界了
量子位· 2026-03-03 04:25
Core Viewpoint - The global AI competition is shifting from a focus on technical exploration to practical application, emphasizing the importance of real-world implementation over mere model size and parameters [1][2]. Group 1: AI Landscape and Competition - The AI competition landscape is evolving, with a clear shift towards who can effectively deploy AI in real-world scenarios [2][11]. - China is emerging as a leader in this new phase due to its vast application scenarios, rich data density, and comprehensive hardware ecosystem [3][11]. - The recent MWC event showcased China's AI capabilities, highlighting the country's advancements in practical AI applications [4][25]. Group 2: Xiaomi's AI Innovations - Xiaomi demonstrated a high level of AI integration across its "people, vehicles, and home" ecosystem at MWC, showcasing AI's role in everyday life [6][15]. - The company introduced the Miloco system, which utilizes AI to create a unified decision-making hub for smart homes, enhancing user experience through automation [17][23]. - Miloco's capabilities include automatic recognition of user behavior and seamless coordination among devices, marking a significant step towards practical AI applications [18][24]. Group 3: Technological Foundations - Xiaomi's MiMo model is positioned among the top tier of global open-source models, providing a robust foundation for AI applications in real-world environments [8][29]. - The company has made significant investments in AI research, with plans to allocate approximately 75 billion yuan for AI development in 2025, focusing on integrating model capabilities with hardware ecosystems [43][44]. - The synergy between Xiaomi's advanced AI capabilities and its extensive hardware ecosystem is crucial for enabling AI to operate effectively in physical spaces [46][49]. Group 4: Market Trends and Future Outlook - The AI industry is witnessing a transition from performance-based competition to a focus on systemic capabilities and real-world applications [54][56]. - China's AI advancements are increasingly evident in various sectors, including transportation and energy, with practical applications becoming more commonplace [58][60]. - Xiaomi's comprehensive ecosystem positions it favorably in the global market, allowing for a seamless integration of AI into daily life, thus paving the way for scalable real-world applications [66][68].
数据邪修大法好:仅用文本数据就能预训练多模态大模型
量子位· 2026-03-03 04:25
Core Viewpoint - The article discusses a groundbreaking approach in the development of multimodal large models (MLLM), suggesting that high-quality image-text pairs are not necessary for pre-training, challenging the long-held belief in the industry [1][3]. Group 1: Theoretical Foundation - The ReVision method is based on the concept of "representation alignment" rather than requiring paired data, relying on a shared representation space established through multimodal contrastive learning [4]. - Pre-training has created "semantic topology consistency," where images and texts are mapped into a high-dimensional embedding space, maintaining relative distances among semantic concepts despite absolute positional differences [8]. - The systematic geometric offset between image and text distributions can be corrected using statistical information from unpaired data, allowing for cross-modal interchangeability without expensive paired data [8]. Group 2: Understanding Modality Gap - The article highlights a significant misunderstanding in previous research regarding the modality gap, which was thought to be isotropic, but is actually anisotropic with specific geometric characteristics [11][14]. - The ReVision team identifies two components of the modality gap: stable bias and anisotropic residuals, indicating that the gap is not random but has a defined shape that can be mathematically replicated [13][15]. Group 3: Breakthrough in Data Utilization - The research introduces a method to bypass the need for expensive paired data by focusing on geometric alignment of representations, allowing the model to learn from the distribution shapes of image data [16]. - The approach requires only two low-cost conditions: a large volume of unpaired text and the statistical distribution of unpaired images, enabling the transformation of any text data into visual signals mathematically [17]. Group 4: Implementation Strategy - The ReAlign strategy consists of three steps: anchor alignment to address basic positional issues, trace alignment to handle anisotropic characteristics, and centroid alignment for final adjustments, ensuring that text features closely resemble visual features without using real images [19][20][22]. Group 5: Advantages of Non-Paired Data - The research emphasizes that non-paired text can provide a wealth of semantic knowledge, overcoming the limitations of high-quality paired data, which is scarce and costly to clean [25]. - Using long non-paired texts allows the model to learn complex world knowledge and reasoning logic, enhancing its understanding beyond mere image features [26]. - Experimental results show that models pre-trained with 2 million pure texts outperform those trained with 1 million real image-text pairs, with the former's training cost being only 74% of the latter's [27][28]. Conclusion - The emergence of ReVision opens new avenues for training multimodal large models, demonstrating that paired data is not a constraint, and vast amounts of unpaired text can serve as effective visual training material [30].
GPT-5.4意外泄露!OpenAI最新模型瞄准这2大能力突围
量子位· 2026-03-03 04:25
西风 发自 凹非寺 量子位 | 公众号 QbitAI GPT-5.4泄露了? 一觉醒来,这张图在上疯传: 在OpenAI编码助手Codex的代码拉取请求中,直接出现了GPT-5.4字样,包括用于快速模式的/Fast命令。 而且,这并不是大伙儿第一次发现GPT-5.4的踪迹了。 几天前,OpenAI某开发人员在GitHub提交代码拉取请求,在版本判断条件的变更说明中意外泄露: 在尚在开发的view_image_original_resolution功能开关背后,为view_image接口添加了原始分辨率支持。 当该功能开关启用,且目标模型为 gpt-5.4或更新版本 时…… 200万Tokens上下文窗口? 之后gpt-5.4火速被改为gpt-5.3-codex。 另外, Codex模型下拉选项也曾出现过GPT-5.4模型: 种种迹象似乎都预示着,GPT-5.4不远了。 除此之外,有传闻称GPT-5.4将搭载200万Tokens上下文窗口,能实现对超长篇内容的持久记忆。 网友指出,若要支撑"记住超长内容+长期不忘"的能力,模型推理时需要缓存的数据量会急剧膨胀,这本身就是一个极具挑战的技术难题。 而在泄露的代码拉 ...
量子位编辑作者招聘
量子位· 2026-03-03 04:25
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are full-time and based in Beijing, with various levels of roles open for application [2][4]. Group 2: Job Responsibilities - **AI Industry Direction**: Focuses on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [6]. - **AI Finance Direction**: Involves tracking venture capital and financial reports in the AI sector, monitoring capital movements within the industry [6]. - **AI Product Direction**: Concentrates on the application and hardware advancements in AI [6]. Group 3: Benefits and Growth Opportunities - Employees will have the chance to engage with the latest AI technologies, enhance their work efficiency through new tools, and build personal influence by creating original content [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses [6]. Group 4: Company Growth Metrics - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across all platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
全球首份大模型业绩报!MiniMax预判2026三大超级PMF,AI平台公司启程了
量子位· 2026-03-03 01:59
Core Insights - MiniMax has released its first annual report post-IPO, showcasing significant financial growth and providing insights into the commercialization of large models in the AI industry [2][4][23] - The report indicates that MiniMax's revenue for 2025 reached $79.04 million, a year-on-year increase of 158.9%, with over 70% of revenue coming from international markets [4][8] - The company has demonstrated a dual-driven business model, focusing on both consumer (C-end) AI native products and enterprise (B-end) open platforms, leading to a stable and predictable revenue stream [6][13] Financial Performance - MiniMax's adjusted net loss for the past year was $250 million, but the loss rate has significantly narrowed, indicating improved profitability [5][18] - The gross profit for 2025 was $20.08 million, a staggering increase of 437% year-on-year, with a gross margin rising from -24.7% in 2023 to 25.4% in 2025 [14][15] - Research and development (R&D) expenses were $250 million, a 33.8% increase from the previous year, but the efficiency of R&D spending improved, with the ratio of R&D expenses to total revenue decreasing from 619% in 2024 to 320% in 2025 [19] Product Development and Market Position - MiniMax has established comprehensive R&D capabilities across various modalities, including language, video, voice, and music, and has shown rapid iteration of its models [23][24] - The company has launched multiple iterations of its language models, with the M2.5 model being particularly noted for its efficiency and integration into mainstream productivity tools [32][42] - MiniMax is preparing for future challenges and opportunities in the AI landscape, aiming to transition from a large model company to an AI platform company, focusing on intelligent density and model throughput as key metrics [47][49] Future Outlook - The company anticipates significant growth in demand for multi-modal models, with expectations of a substantial increase in token volume [46] - MiniMax is actively developing the M3 and Hailuo 3 series models to optimize reasoning architecture and computational efficiency, positioning itself for the upcoming shift from the "tool era" to the "ecosystem era" in AI [51][52]
今年最值得关注的AI榜单来啦!申报即日启动
量子位· 2026-03-03 01:59
组委会 发自 凹非寺 量子位|公众号 QbitAI 中国生成式AI正在进入产业深水区。 这两年,AI从"新技术"变成了"新工具",又从"新工具"慢慢变成企业必须面对的现实。它不只在改变内容生产,也在影响研发效率、营销方 式、团队协作,甚至决策流程。 时值第四届中国AIGC产业峰会, 量子位将根据过去一年里生成式AI企业、产品的表现与反馈,结合对2026年技术与场景的观察与预判,评 选出: 量子位将结合对公司的深入调研及数十位行业知名专家的意见,评选结果将于2026年5月中国AIGC产业峰会上公布。 届时,量子位也将邀请数百万行业从业者,共同见证这些优秀企业的荣誉。 2026年度值得关注的AIGC企业 将评选出拥有最创新、最前瞻或最有规模落地潜力的AI企业。 【参选条件】 2026年度值得关注的AIGC企业 2026年度值得关注的AIGC产品 1. 公司主体在中国或主营业务在中国; 2. 主营业务是生成式AI及相关,或已将AI广泛应用于其主营业务; 3. 近一年在技术/产品、商业化有出色表现的企业。 △ 扫码申报产品奖项 报名方式 本次评选即日起开始报名, 4月27日截止 , 最终结果将于5月中国AIGC产业峰 ...
想入局VLA却不知从何下手?NTU&中大开源「终极菜谱」:从基座到频域建模,每一步都有实验支撑
量子位· 2026-03-02 16:00
Core Insights - The article discusses the development of VLANeXt, a new model that systematically analyzes the design space of Visual-Language-Action (VLA) models across 12 key dimensions, providing a comprehensive "recipe" for effective model design [1][5][20] - VLANeXt significantly outperforms various state-of-the-art (SOTA) methods, including models with 7 billion parameters, achieving a 10% increase in success rate under previously unseen conditions such as lighting and camera angles [1][23] Group 1: Background and Motivation - The rise of large foundational models has highlighted the potential of VLA models, which leverage rich visual and language understanding for scalable robotic learning [5] - The current VLA research landscape is fragmented, with various models claiming superior performance but lacking a unified framework for evaluation, necessitating a return to fundamental design principles [5] Group 2: Model Development Process - The research team began with a baseline model similar to RT2, using LLaMA as the backbone and focusing on action modeling through a simple architecture [7] - Key enhancements included the introduction of an independent policy module, deeper policy modeling, and action chunking to improve inference speed and model performance [9][11] Group 3: Foundational Components - The team discovered that decoupling language and action spaces and using an independent policy head significantly improved performance compared to reusing text tokens for action classification [9] - The model architecture was deepened to 29 layers to better capture action distributions, aligning with the backbone of the visual-language model (VLM) [9] Group 4: Perception Essentials - The study found that redundant historical visual information did not enhance performance, leading to the decision to use only the current frame's image [14] - Incorporating multi-view inputs, including third-person and wrist perspectives, provided complementary geometric cues, improving action accuracy [14] Group 5: Action Modeling Perspectives - The team explored the use of world models for action learning but opted against it due to increased training time, focusing instead on efficient modeling techniques [16] - They introduced frequency domain modeling through discrete cosine transforms (DCT) to enhance action prediction without significant additional training costs [16] Group 6: Experimental Results - VLANeXt demonstrated superior performance across various benchmarks, including LIBERO and LIBERO-plus, achieving an average score of 99.0 in the LIBERO benchmark [21][22] - The model's robustness was validated in real-world tasks, showing strong adaptability in both single-arm and bimanual scenarios without specialized pre-training [25]
上海一群青年,造了个学术版OpenClaw
量子位· 2026-03-02 16:00
Core Viewpoint - The article discusses the launch of "Da Sheng," a high-energy intelligent agent developed by the Shanghai Institute of Intelligent Science and Fudan University, aimed at transforming scientific research through advanced AI capabilities [4][5]. Group 1: AI Capabilities and Applications - Da Sheng can autonomously conduct research tasks, such as analyzing single-cell transcriptomics data and generating relevant experimental designs, significantly reducing the time required for such tasks from weeks to mere minutes [2][19]. - The AI has demonstrated its ability to create a closed-loop system in life sciences, linking computational models with real-world biological experiments, thus enhancing efficiency by 3 to 4 times compared to traditional methods [19]. - Da Sheng's multi-modal understanding allows it to process complex scientific data, such as RNA sequences and molecular structures, and generate high-performance experimental designs without the need for extensive text conversion [20][26]. Group 2: Innovations in Scientific Research - The AI has successfully integrated dry and wet lab processes, addressing a major pain point in life sciences where computational predictions often fail to translate into practical experiments [13][19]. - Da Sheng has been involved in space-related scientific computations, successfully deploying a weather model in space, which marks a significant advancement in remote scientific data processing [30][33]. - The AI's capabilities extend to humanities and social sciences, where it facilitates deep, Socratic-style discussions to enhance students' critical thinking skills [36][38]. Group 3: Development and Infrastructure - The development of Da Sheng is supported by a robust infrastructure that includes over 400 scientific models and 22PB of high-value data, which have been accumulated through collaborative efforts over the past year [40]. - The AI's architecture incorporates a multi-branch memory system that allows for effective isolation of information, ensuring that both successful and failed experiments contribute to the overall knowledge base [50][54]. - A skills system has been established, comprising over 300 reusable skills derived from real-world research experiences, which enhances the AI's practical application in various scientific fields [60]. Group 4: Safety and Security Measures - Da Sheng incorporates a comprehensive safety framework that addresses the challenges of high autonomy, security, and resource efficiency, ensuring safe operation in collaborative environments [66][69]. - The AI employs a sandbox environment for secure execution, allowing for real-time auditing and minimizing data leakage while maintaining high performance [69][71]. Group 5: Future Directions and Competitions - The article highlights the upcoming AI4S Intelligent Agent CNS Challenge, which aims to engage teams in developing intelligent agents capable of addressing top-tier scientific problems, thereby promoting the integration of AI in advanced research [84][87]. - The initiative seeks to reduce the repetitive workload of researchers, allowing them to focus on more complex scientific inquiries [87][89].