量子位
Search documents
「北京版幻方」冷不丁开源SOTA代码大模型!一张3090就能跑,40B参数掀翻Opus-4.5和GPT-5.2
量子位· 2026-01-02 03:41
Core Insights - The article highlights the emergence of the IQuest-Coder-V1 model series, which has gained significant attention in the tech community for its performance in code generation and understanding tasks [1][2]. Model Performance - The IQuest-Coder-V1 model, particularly the 40B parameter version, achieved an impressive score of 81.4% on the SWE-Bench Verified leaderboard, surpassing models like Claude Opus-4.5 and GPT-5.2, which are speculated to have parameter scales in the hundreds of billions to trillions [2][50]. - The model series includes versions with 7B, 14B, and 40B parameters, each offering Instruct and Thinking variants tailored for different use cases [14][15]. Technical Specifications - The IQuest-Coder-V1 series emphasizes "engineering-friendly" design and long context usability, supporting a maximum context length of 128K tokens and a vocabulary size of 76,800 tokens [22][25]. - The 40B parameter version features a Loop variant that enhances parameter utilization efficiency, achieving significant reductions in HBM and KV Cache overhead while improving throughput [19][20]. Training Methodology - The training strategy, termed "code-flow multi-stage training," focuses on learning from the evolution of code rather than static code snippets, incorporating a triplet data structure to capture changes over a project's lifecycle [38][43]. - This approach allows the model to understand the dynamic evolution of software logic, capturing differences before and after modifications [46][47]. Deployment and Accessibility - The models are designed for deployment on consumer-grade GPUs, with the Int4 version capable of running on a single H20 inference card [53][54]. - The IQuest-Coder series has been open-sourced on platforms like GitHub, making it accessible for developers and researchers [11]. Company Background - IQuest-Coder is developed by Ubiquant Holding Limited (九坤投资), a prominent quantitative investment firm in China, known for its focus on AI and high-frequency trading [57][64]. - The company has established multiple research labs, including an AI Lab, and has a strong team with a high percentage of members holding advanced degrees from top universities [62][64].
AI正在占领你的视频推荐流
量子位· 2026-01-02 03:41
你的视频推荐流,正在被AI"吞噬"。 这不是危言耸听,正经新调查发现: YouTube算法向新用户展示的视频中,有超过 20% 的内容是 AI制造 的低质量视频。 再扎心点说就是,我们平时在YouTube刷到的每5条视频中,可能有1条就是AI随手糊出来的。(不活了.jpg) 梦瑶 发自 凹非寺 量子位 | 公众号 QbitAI 不仅如此,这样没啥营养的AI小视频还在逐渐 产业化 ,甚至被做成了一门越——滚——越——大的《生意》。 好好好,这个世界到底还有什么是真实的啊!!! 当AI低质量视频开始按"产量"出现 结论来自美国的一家创意软件公司Kapwing。 他们调查了全球15,000个最受欢迎的 YouTube 频道,结果您猜怎么着: 其中278个频道的内容几乎全部由AI生成……(纯·AI原创)。 对了,Kapwing并不是把所有AI产的内容都视作低质量,而是做了进一步区分,主要分三类: 第一类,是几乎 未经审核、直接被丢进平台 分发系统的AI生成内容。 第二类,是虽然经过审核,但只 勉强踩在最低质量线 上的AI内容(哪怕它是可口可乐的AI圣诞广告)。 第三类更激进,指的是所有被 大规模 、 低成本生产 出来 ...
「AI 100」榜单启动招募,AI产品“年会”不能停丨量子位智库
量子位· 2026-01-02 03:41
Core Insights - The article discusses the emergence of numerous keywords in the AI product sector by 2025, highlighting transformative AI products that are reshaping the industry [4] - The "AI 100" list by Quantum Bit Think Tank aims to evaluate and recognize the top AI products in China, reflecting the current landscape and future trends in AI [4][12] Group 1: AI 100 List Overview - The "AI 100" list is divided into three main categories: "Flagship AI 100," "Innovative AI 100," and the top three products in ten popular sub-sectors [6] - The "Flagship AI 100" will focus on the strongest AI products of 2025, showcasing those that have achieved significant technological breakthroughs and practical application value [7] - The "Innovative AI 100" aims to identify emerging products with potential for significant impact in 2026, representing cutting-edge AI technology [8] Group 2: Sub-sector Focus - The ten hottest sub-sectors for the top three products include AI browsers, AI agents, AI smart assistants, AI workstations, AI creation, AI education, AI healthcare, AI entertainment, Vibe Coding, and AI consumer hardware [9] Group 3: Application and Evaluation - The evaluation of the "AI 100" list employs a dual assessment system combining quantitative and qualitative measures, focusing on user data and expert evaluations [13] - Quantitative metrics include user scale, growth, activity, and retention, while qualitative assessments consider long-term potential, technology, market space, and user experience [13]
量子位编辑作者招聘
量子位· 2026-01-02 03:41
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring strong analytical skills and a passion for interviews [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, requiring a keen understanding of product experiences and market trends [11]. Group 3: Benefits and Growth - Employees can expect to engage with cutting-edge AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits, and a supportive environment for professional growth, including mentorship from senior editors [6][12]. Group 4: Company Impact - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12].
DeepSeek改造何恺明残差连接!梁文峰亲自署名,十年首次重大升级
量子位· 2026-01-01 10:32
Core Viewpoint - The article discusses the evolution and enhancement of the residual connection, a fundamental component in deep learning introduced by He Kaiming in ResNet, and presents a new approach called Hyper-Connections (HC) that aims to improve performance while addressing potential issues related to signal amplification and stability in deep learning architectures [2][7][11]. Group 1: Residual Connections and Their Evolution - Residual connections have been a cornerstone of deep learning since the introduction of ResNet in 2016, allowing signals to pass directly from shallow to deep layers without modification [7][9]. - The rise of Transformer architectures has made residual connections a standard feature in large language models like GPT and LLaMA [10]. - Hyper-Connections (HC) expand the residual flow width from C dimensions to n×C dimensions, introducing three learnable mapping matrices to manage information flow [11]. Group 2: Performance and Stability Challenges - Experiments by the DeepSeek team indicate that the Hres matrix, responsible for internal information exchange in HC, significantly enhances performance [12]. - However, when HC is extended to multiple layers, the composite mapping loses its identity property, leading to potential issues such as sudden loss spikes and gradient fluctuations during training [14]. - The peak amplification factor of signals in HC can reach 3000, which poses risks of signal distortion during inter-layer propagation [16]. Group 3: Theoretical Framework and Constraints - The core idea of the DeepSeek paper is to constrain the residual mapping matrix to a specific manifold formed by double stochastic matrices, which ensures three key theoretical properties: norm preservation, combinatorial closure, and geometric interpretation [17][19]. - The Sinkhorn-Knopp algorithm is employed to project any matrix onto this manifold, effectively reducing the signal amplification issue observed in HC [21]. Group 4: Engineering Optimizations - The paper details the memory access costs associated with expanding the residual flow width, highlighting significant increases in read and write operations for HC compared to standard residual connections [24]. - To mitigate these costs, the team developed infrastructure optimizations, including the TileLang framework for merging operations and specialized kernels for the Sinkhorn-Knopp algorithm [25][26]. - The paper also discusses pipeline parallelism enhancements to overlap computation and communication, improving overall efficiency [27]. Group 5: Experimental Validation - The paper validates the proposed methods on MoE models of sizes 3B, 9B, and 27B, with an expansion rate of n set to 4 [30]. - In the 27B MoE model, the modified HC (mHC) demonstrated a stable training curve, achieving a loss reduction of 0.021 compared to the baseline while maintaining gradient stability [31]. - Performance improvements were noted in downstream tasks, with mHC outperforming both the baseline and HC in various benchmarks [32][35].
老黄超200亿美元的推理闭环成型了
量子位· 2026-01-01 06:15
Core Viewpoint - Nvidia has made significant acquisitions in a short period, spending over $20 billion to acquire Groq and AI21 Labs, aiming to strengthen its position in the AI market and counter competition from companies like Google and Broadcom [1][2][27]. Group 1: Acquisitions and Investments - Nvidia's recent acquisitions include Groq, which was acquired for $20 billion, and AI21 Labs, estimated to cost between $2-3 billion, along with the acquisition of Enfabrica for $900 million [2][3][21]. - The acquisition of Groq not only brought in the LPU technology but also 90% of Groq's employees, enhancing Nvidia's talent pool [6][23]. - AI21 Labs, valued at $1.4 billion, is a hub for top AI PhDs, further bolstering Nvidia's capabilities in AI architecture [7][10]. Group 2: Market Position and Strategy - Nvidia holds over 90% of the AI training market share, but the inference market is becoming increasingly fragmented, with custom ASIC chips capturing 37% of the deployment share [4]. - The company aims to address this fragmentation by acquiring talent and technology, positioning itself to compete effectively against Google’s TPU and other competitors [5][27]. - The combination of Groq's LPU and AI21's Jamba architecture is expected to enhance Nvidia's inference capabilities, allowing for significant improvements in processing efficiency [16][26]. Group 3: Talent Acquisition and Technology Integration - Nvidia's strategy includes not just acquiring companies but also securing their talent, as seen with the recruitment of 200 top AI PhDs from AI21 Labs [12][17]. - The Jamba architecture from AI21 is particularly suited for memory-constrained inference chips, which aligns with Nvidia's needs in the evolving AI landscape [16][28]. - The integration of these acquisitions is designed to create a closed loop of hardware, network, and architecture, solidifying Nvidia's competitive edge in the AI market [26].
最新英伟达经济学:每美元性能是AMD的15倍,“买越多省越多”是真的
量子位· 2026-01-01 04:15
Core Insights - The article emphasizes that NVIDIA remains the dominant player in AI computing power, providing significantly better performance per dollar compared to AMD [1][30]. - A report from Signal65 reveals that under certain conditions, NVIDIA's cost for generating the same number of tokens is only one-fifteenth of AMD's [4][30]. Performance Comparison - NVIDIA's platform offers 15 times the performance per dollar compared to AMD when generating tokens [1][30]. - The report indicates that for complex models, NVIDIA's advantages become more pronounced, especially in the context of the MoE (Mixture of Experts) architecture [16][24]. MoE Architecture - The MoE architecture allows models to split parameters into specialized "expert" sub-networks, activating only a small portion for each token, which reduces computational costs [10][11]. - However, communication delays between GPUs can lead to idle time, increasing costs for service providers [13][14]. Cost Analysis - Despite NVIDIA's higher pricing, the overall cost-effectiveness is better due to its superior performance. For instance, the GB200 NVL72 costs $16 per GPU per hour, while AMD's MI355X costs $8.60, making NVIDIA's price 1.86 times higher [27][30]. - The report concludes that at 75 tokens per second per user, the performance advantage of NVIDIA is 28 times, resulting in a cost per token that is one-fifteenth of AMD's [30][35]. Future Outlook - AMD's competitiveness is not entirely negated, as its MI325X and MI355X still have applications in dense models and capacity-driven scenarios [38]. - AMD is developing a cabinet-level solution, Helios, which may narrow the performance gap in the next 12 months [39].
量子位编辑作者招聘
量子位· 2026-01-01 02:13
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as interpreting technical reports from conferences [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and capital movements within the AI industry, requiring strong analytical skills and a passion for interviews [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, producing in-depth evaluations of AI products, and engaging with industry experts [11]. Group 3: Benefits and Work Environment - Employees will have the opportunity to engage with cutting-edge AI technologies, enhance their work efficiency, and build personal influence through original content creation [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses, along with a dynamic and open team culture [6][11]. Group 4: Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12].
豆包一声声“OK”把罗永浩搞破防,不就是大型现场直播版图灵测试
量子位· 2026-01-01 02:13
Core Viewpoint - The annual technology innovation sharing conference hosted by Luo Yonghao has become a viral sensation, primarily due to two key events: the announcement of ticket refunds for all attendees and a lively debate between Luo and the AI assistant Doubao, which showcased the capabilities of real-time interactive AI [1][2][3]. Group 1 - Luo Yonghao announced that all ticket holders would receive refunds, which sparked significant discussion [2]. - The debate between Luo and Doubao became the highlight of the event, drawing attention for its engaging and humorous exchanges [3][8]. - The debate served as a public test of Doubao's real-time interactive AI capabilities, demonstrating its ability to engage in complex discussions [11][34]. Group 2 - Doubao's performance was characterized by rapid responses and the ability to maintain a coherent argument, showcasing its advanced understanding of context and logic [13][25]. - The debate highlighted Doubao's improvements in emotional intelligence and its ability to adjust responses based on the conversation's tone [32][36]. - The event marked a significant milestone in AI development, indicating that real-time interactive AI has reached a stage suitable for practical applications [34][38]. Group 3 - Doubao's capabilities were enhanced through multiple iterations of its underlying model, focusing on real-time interaction, human-like responses, and adherence to user instructions [30][32]. - The debate illustrated a shift in AI from being a passive tool to an interactive partner capable of complex dialogue [35][36]. - The implications of this technology extend to various fields, including customer service, education, and personal assistance, where AI can handle more nuanced interactions [38].
LeCun预言成真?这有一份通往AGI的硬核路线图:从BERT到Genie,在掩码范式的视角下一步步构建真正的世界模型
量子位· 2026-01-01 02:13
Core Viewpoint - The article discusses the emergence of World Models in AI, emphasizing the importance of Masking as a foundational principle for building these models, which are seen as essential for achieving Artificial General Intelligence (AGI) [1][3][5]. Group 1: Definition and Components of World Models - The true World Model is defined as an organic system composed of three core subsystems: a Generative Heart, an Interactive Loop, and a Memory System [6][8]. - The Generative Heart ($G$) predicts future states and simulates world dynamics, while the Interactive Loop ($F,C$) allows for real-time interaction and decision-making [8]. - The Memory System ($M$) ensures continuity over time, preventing the world from becoming a series of fragmented experiences [8][9]. Group 2: Evolution of World Models - The evolution of World Models is categorized into five stages, with Masking being the central theme throughout these stages [10][12]. - Stage I focuses on Mask-based Models, highlighting Masking as a universal generative principle rather than just a pre-training technique [13][24]. - Stage II aims for Unified Models that process and generate all modalities under a single architecture, with a debate between Language-Prior and Visual-Prior modeling approaches [25][26]. Group 3: Interactive Generative Models - Stage III introduces Interactive Generative Models, where models respond to user actions, transforming from mere simulators to interactive environments [36][40]. - The Genie series, particularly Genie-3, represents the state-of-the-art in real-time interactive models, achieving 720p resolution and 24fps frame rates [41][42]. Group 4: Memory and Consistency - Stage IV addresses Memory & Consistency, focusing on the need for persistent memory to prevent catastrophic forgetting and state drift in generated worlds [46][48]. - Solutions proposed include Externalized Memory, architecture-level persistence, and consistency governance to maintain coherence in generated environments [49][50]. Group 5: Ultimate Form of World Models - Stage V envisions True World Models that exhibit persistence, agency, and emergence, allowing for complex interactions and societal dynamics within the simulated world [51][52]. - The article concludes with the challenges of coherence, compression, and alignment that must be addressed to realize these advanced models [58].