量子位
Search documents
9B端侧开源模型跑通百万上下文,面壁全新稀疏-线性混合注意力架构SALA立功了!
量子位· 2026-02-11 12:49
henry 发自 凹非寺 量子位 | 公众号 QbitAI 最强的大模型,已经把scaling卷到了一个新维度: 百万级上下文 。 几天前,Claude Opus 4.6发布,让人第一次真切感受到了百万上下文的涌现能力—— 单次吃进50万字中文内容、实现跨文档法律分析、多轮Agent规划…… 此情此景,用户火速用脚投票,华尔街更是直接给出K线回应。 与此同时,基于SALA注意力架构的模型 MiniCPM-SALA 也将一并开源。 除此之外,面壁还以OpenBMB社区名义,联合SGLang与NVIDIA发起 2026稀疏算子加速大奖赛(SOAR) ,将这套scaling能力直接交到 开发者手中,推动端侧Agent部署的性能突破。 Linear-Sparse混合注意力架构 太长不看,咱直接说重点—— 面壁这次全新的 线性与稀疏注意力混合架构SALA(Sparse Attention-Linear Attention,SALA) ,究竟是怎么个混合法呢? 简单来说,这套架构将 75%线性注意力(Lightning Attention) 与 25%稀疏注意力(InfLLM v2) 结合,并通过 混合位置编码HyPE ...
马斯克xAI雪崩!24小时两联创离职,一月内连失三位华人创始人,12人梦之队只剩一半
量子位· 2026-02-11 04:10
Core Viewpoint - The recent wave of departures from xAI, particularly among its Chinese co-founders, raises concerns about the company's stability and future direction, especially in light of its impending IPO plans and the competitive landscape in the AI sector [1][5][40]. Group 1: Departures of Key Personnel - Within 24 hours, two Chinese co-founders, Tony Wu and Jimmy Ba, announced their departures from xAI, following the earlier exit of Greg Yang due to health issues [2][4][9]. - In total, three out of five Chinese scientists from the original founding team have left xAI within a month, contributing to a total of six departures from the initial twelve-member team since its inception [5][22][27]. - The departures include significant figures such as Wu, who was a key expert in mathematics and symbolic reasoning, and Ba, known for his contributions to deep learning theory [16][18][21]. Group 2: Reasons Behind Departures - The high-pressure work culture at xAI, characterized by intense workloads and direct pressure from Elon Musk, has been cited as a contributing factor to the departures, with Yang attributing his health issues to overwork [30][31]. - Strategic shifts within the company, including the recent acquisition by SpaceX and the potential for an IPO, may have led to disagreements among founding members regarding the company's future direction and roles [33][34][40]. - xAI is also facing regulatory pressures and challenges related to its products, such as Grok AI, which has been criticized for creating problematic synthetic images, further complicating its operational landscape [37][39]. Group 3: Implications for xAI - The loss of key personnel, particularly those involved in critical areas like mathematical reasoning and deep learning, poses a risk to xAI's competitive edge against major players like OpenAI and Google [39][40]. - The ongoing talent exodus raises questions about the company's ability to maintain its technological advantages and execute its ambitious plans, especially with half of its founding team now departed [40][41]. - The current environment in the AI sector, characterized as a golden age for startups, may provide opportunities for the departing scientists, although their future plans remain uncertain [42][43].
超越CLIP!北大开源细粒度视觉识别大模型,每类识别训练仅需4张图像
量子位· 2026-02-11 01:55
Core Viewpoint - The article discusses the limitations of current multimodal large models in fine-grained visual recognition tasks and introduces the Fine-R1 model developed by Professor Peng Yuxin's team at Peking University, which significantly improves recognition accuracy with minimal training data [1][2][5]. Group 1: Fine-Grained Visual Recognition Challenges - Current multimodal large models excel in complex tasks but lag in fine-grained visual recognition compared to their visual encoders like CLIP [1]. - Real-world objects exhibit fine-grained characteristics, with numerous subclasses, such as over 500 types of fixed-wing aircraft, highlighting the importance of fine-grained recognition in practical applications [3]. Group 2: Fine-R1 Model Overview - The Fine-R1 model aims to leverage the rich knowledge of fine-grained subclasses and a generative decoding paradigm to overcome the limitations of traditional recognition methods, enabling fine-grained recognition of any visual object in an open domain [5]. - Fine-R1 enhances the model's ability to reason about unseen subclasses using a small number of training images (only 4 per subclass), outperforming models like OpenAI's CLIP and Google's DeepMind's SigLIP [5][15]. Group 3: Model Development Process - The development of Fine-R1 involves two main steps: 1. Chain-of-thought supervised fine-tuning, which simulates human reasoning to build inference capabilities [7]. 2. Triplet enhancement strategy optimization, which improves robustness to intra-class variations and inter-class distinctions by using positive and negative samples [8][10]. Group 4: Experimental Results - Fine-R1's performance was evaluated on six authoritative fine-grained image classification datasets, demonstrating superior accuracy in both seen and unseen categories compared to other models [15][17]. - The model's ability to utilize fine-grained subclass knowledge effectively was identified as the primary factor for its improved recognition accuracy, rather than enhancements in visual representation or knowledge storage [19]. Group 5: Conclusion and Future Work - The article concludes with the potential of Fine-R1 to excel in fine-grained visual recognition tasks, emphasizing its innovative approach to reasoning and knowledge application [21]. - The research has been accepted for ICLR 2026 and the code is open-sourced for further exploration [2][22].
小众架构赢麻了!通过编辑功能让100B扩散模型飙出892 tokens/秒的速度!
量子位· 2026-02-11 01:55
Core Viewpoint - The article discusses the emergence of the LLaDA2.1 model from Ant Group, which has achieved a remarkable speed of 892 tokens per second in complex programming tasks, marking a significant advancement over traditional autoregressive models [1][3][11]. Group 1: Model Performance and Features - LLaDA2.1 operates on a 100 billion parameter scale and has transitioned from a research model to a practical tool, demonstrating superior efficiency [3][4]. - The model introduces a dual-mode decoding strategy, allowing users to switch between Speedy Mode and Quality Mode with a single configuration, thus enhancing usability [9][10]. - In Speedy Mode, LLaDA2.1 achieves a peak speed of 892 tokens per second on the HumanEval+ benchmark, while in Quality Mode, it surpasses previous models in various reasoning tasks [11][31]. Group 2: Technical Innovations - The model employs an Error-Correcting Editable (ECE) mechanism, enabling it to generate drafts quickly and then refine them, addressing the limitations of traditional diffusion models [16][21]. - LLaDA2.1 successfully implements reinforcement learning (RL) on a 100 billion scale, enhancing its performance in instruction-following tasks and demonstrating that diffusion models can achieve both speed and understanding [23][26]. - The introduction of the EBPO algorithm allows for efficient training and editing, marking a significant milestone in the application of RL to diffusion models [25][28]. Group 3: Competitive Advantage - LLaDA2.1's performance in benchmark tests shows a significant advantage over mainstream autoregressive architectures, achieving high speeds without compromising quality [29][30]. - The model's ability to maintain quality even in Speedy Mode demonstrates its robustness, achieving a balance between speed and accuracy [32]. - A lighter 16 billion parameter Mini version has been released, achieving peak speeds exceeding 1500 tokens per second, indicating potential for more lightweight deployments [33].
人类画了100年的脑图,AI仅用几小时!还绘制出新脑区
量子位· 2026-02-10 11:59
Core Viewpoint - The article discusses the innovative machine learning algorithm CellTransformer developed by a neuroscience team at the University of California, San Francisco, which can classify and map the brain of five mice in just a few hours, potentially applicable to human brains in the future [1][4]. Group 1: Technology Overview - CellTransformer is an encoder-decoder architecture that significantly simplifies the process of brain mapping, which traditionally required manual drawing by scientists [5][10]. - The algorithm processes gene data from 10.4 million cells across five mice, identifying known brain regions and discovering new areas [3][20]. - The model employs a self-supervised learning approach, predicting gene expression based on neighboring cells, allowing for efficient and accurate mapping [11][15]. Group 2: Performance and Results - CellTransformer completed spatial modeling of 10.4 million cells in hours, outperforming traditional methods in both time and scale [20]. - It accurately aligns known brain structures, defining between 25 to 1300 neural regions without using brain region labels, demonstrating high alignment with existing anatomical and functional partitions [21][22]. - The algorithm also identifies and maps previously unrecognized brain regions, enhancing the understanding of brain structure and function [26][30]. Group 3: Broader Implications - The technology is not limited to mouse brains; it can be extended to other animals and potentially to human brains, with researchers optimistic about future applications [35][38]. - The algorithm could also be utilized in mapping other organs, such as kidneys, aiding in the differentiation between healthy and diseased tissues [41].
中文版Nano Banana来了?Qwen-Image-2.0炸场:1K长文本硬吃,中文生图彻底不拧巴了
量子位· 2026-02-10 11:59
梦瑶 发自 凹非寺 量子位 | 公众号 QbitAI 文本一长就糊、指令一杂就撂挑子、遇到中文更是一整个变形freestyle…… 「AI生图」的这点苦,到底有谁懂啊!!! 停,不用拧巴了,因为现在的AI,已经能稳稳吃下 1K token 的超长文字指令了: 复杂指令 也不在怕的,最近OpenClaw贼火,我索性让AI直接帮roll出一个赛博信息图海报(你就说牛不牛吧): 中文渲染 表现也不孬,《兰亭集序》这种公认的高难度文本,这AI居然能做到文字1:1还原,排版、笔锋都在线: 你以为到这儿就结束了,NONONO!因为它还能—— 多图编辑 。 随手丢给了它一张照片,人家直接给我甩出一组影棚级的9宫格写真!!(诶,突然感觉怒省一笔钱… 刚才帮我干活的这位,正是阿里刚刚发布的新一代图像生成及编辑模型—— Qwen-Image-2.0 。 1K token长文本、复杂指令、中文渲染、图片编辑、2K分辨率一次 性梭哈,连国际评测里的表现都已经冲到了仅次于Nano Banana Pro的 位置。 在AI生图界,最让人崩溃的倒不是写Prompt词,而是写了太多,AI根本不吃消,好的提示词真无!处!施!展! 不知道千问团队 ...
蚂蚁投了一家上海具身智能公司
量子位· 2026-02-10 07:00
Core Viewpoint - The article discusses the rapid development and investment activities in the field of embodied intelligence, highlighting a significant investment by Ant Group in a Shanghai-based startup, Daxiao Robotics, marking a notable entry into this sector for 2026 [2][5][6]. Investment Activities - Ant Group has led a financing round for Daxiao Robotics, which has gained attention in both academic and industrial circles [2][3][8]. - The funding round included participation from various investors such as Qiming Venture Partners, JinJing Capital, and others, with the capital aimed at advancing Daxiao's ACE embodied full-stack R&D paradigm and accelerating the development of its Kairos 3.0 world model [8][9]. Market Trends - The investment events in the embodied intelligence sector surged from 173 in the previous year to 447 in 2025, with total funding increasing from 13.7 billion to 55.4 billion yuan, representing year-on-year growth rates of over 250% and 400%, respectively [5]. Daxiao Robotics' Approach - Daxiao Robotics has introduced the ACE embodied full-stack R&D paradigm, which emphasizes a human-centered approach, contrasting with the traditional robot-centric development paths [11][10]. - The ACE paradigm focuses on environmental data collection as a foundational capability, utilizing multi-modal hardware to gather diverse information for training embodied models [13]. Technological Innovations - The Kairos 3.0 model aims to create a unified understanding framework across robotic entities, integrating physical laws and human behavior patterns to enhance the system's predictive capabilities [14][16]. - Daxiao Robotics is addressing common challenges in the field, such as data scarcity and generalization difficulties, by prioritizing data entry and world modeling before advancing to the deployment of embodied brain modules [18]. Team Composition - Daxiao Robotics boasts a strong leadership team, including Xiaogang Wang, a top-ranked computer scientist, and Da Cheng Tao, a distinguished professor with significant contributions to AI [19][21][27][31]. - The founding team includes researchers from prestigious institutions, enhancing the company's capability to tackle advanced AI challenges [33].
GLM-5架构曝光,智谱两日涨60%:采用DeepSeek同款稀疏注意力
量子位· 2026-02-10 07:00
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 不管Pony Alpha是不是智谱的,下一代旗舰大模型 GLM-5 都要来了。 GitHub代码确认,新一代架构细节曝光。 GLM-5采用了DeepSeek-V3/V3.2架构,包括稀疏注意力机制 (DSA)和多Token预测(MTP) ,总参数量745B,是上一代GLM-4.7的2 倍。 | 98 | + | | | --- | --- | --- | | ਰੇਰੇ | | | | | - | if model_arch == "DeepseekV32ForCausalLM": | | 100 | + | if model arch in ["DeepseekV32ForCausalLM", "GlmMoeDsaForCausalLM"]: | | 101 | | from vllm.platforms import current_platform | | 102 | | | | 103 | | capability = current platform.get device capability() | | | ) vllm/config/specu ...
ChatGPT开测广告!OpenAI终于向“钱”看
量子位· 2026-02-10 07:00
柚子 发自 凹非寺 量子位 | 公众号 QbitAI 该来的还是来了! ChatGPT+广告 终于一锤定音—— OpenAI刚刚正式宣布,开始在全美免费版及Go版测试ChatGPT广告功能。 不仅用户差评不断,对家Anthropic也在火上浇油,斥资数百万美元只为在超级碗广告中嘲讽OpenAI这一决定。 广告正在入侵AI,但 Claude不会 。 (doge) 结果不出所料,评论区依旧骂声一片…… 那为啥如此头铁,强行要将广告端上桌呢? 答案藏在OpenAI同步发出的25分钟播客里——为了支撑免费用户使用,AKA 缺钱 。 上线广告功能 至于为什么要上线广告功能,OpenAI是这样描述的: 为了实现AGI的全民普及。 众所周知,广告是相当成熟的商业模式。谷歌、脸书、Ins最初都是免费的,后来才通过定向广告实现了盈利。 尤其对于低消费群体而言,广告投放是提高这部分用户转化率的关键,例如Netflix在最新推出的每月8美元套餐中就引入了广告模式。 根据官方公告,本次上线只面向美国地区的 免费版 和 Go版 (8美元/月) 客户,其余订阅版本不受影响。 此外,大众所关心的ChatGPT内容污染,也得到了官方的明确答 ...
一个大脑搞定所有模态,百度ERNIE 5.0技术报告公布
量子位· 2026-02-10 05:33
Core Insights - The article discusses the release of the technical report for Baidu's ERNIE 5.0 model, highlighting its innovative architecture and performance metrics [1][3]. Group 1: Model Architecture - ERNIE 5.0 utilizes a super sparse Ultra-Sparse MoE architecture with a parameter count reaching trillions, but only less than 3% of these parameters are activated during inference, making it the first publicly available unified autoregressive model of this scale [3]. - The model achieves true native autoregressive unification across four modalities without relying on "splicing," allowing all modalities to operate within the same Transformer Backbone from the ground up [4]. - ERNIE 5.0 employs a modality-agnostic expert routing mechanism, breaking down barriers between different data modalities and eliminating the need for pre-labeled data [7]. Group 2: Expert Pool and Specialization - A shared expert pool is constructed in ERNIE 5.0, enabling data from all modalities to flow freely within a massive parameter network [8]. - The model exhibits emergent specialization, where experts autonomously develop roles such as visual experts or text logic specialists without any predefined instructions [12][13]. - This implicit collaboration enhances multimodal understanding and naturally extends the model's capabilities [14]. Group 3: Training Paradigm - ERNIE 5.0 introduces a flexible training paradigm that allows for the generation of multiple models from a single pre-training session, significantly saving time and computational resources [15]. - The model incorporates an Elastic Depth mechanism, allowing for random skipping of Transformer layers during training, enabling shallow networks to perform computations independently [17]. - It also supports dynamic adjustments in expert pool capacity and the number of active experts during inference, balancing between full-scale trillion-parameter models and lightweight deployments [18]. Group 4: Post-Training Optimization - The model implements a unified multimodal reinforcement learning strategy to optimize logical reasoning, instruction following, and multimodal generation tasks collaboratively [21]. - Techniques such as unbiased replay buffer and multi-granularity importance sampling are employed to enhance training efficiency and stability [23]. - Adaptive hint reinforcement learning is used to guide the model during the initial training phase, facilitating a smooth transition to independent problem-solving capabilities [23]. Group 5: Technical Details - The report details specific handling strategies for various modalities, including positional encoding variants for text, spatiotemporal patching for images and videos, and discrete coding for audio signals [24]. - Communication optimization strategies for the underlying PaddlePaddle framework on large clusters and efficient attention mechanisms for long contexts are also discussed [24].