量子位
Search documents
天选Windows打工AI来了!实测完Claude Cowork国产版:超顶
量子位· 2026-02-04 01:01
一水 发自 凹非寺 量子位 | 公众号 QbitAI 兵贵神速! 硅谷Claude Cowork前脚刚出圈,结果国产版立马出现了—— 昆仑天工Skywork桌面版 。 刚刷到时,还以为又是 追热点呢 。滑跪了,结果人家早在去年5月,就推出了网页版的天工超级智能体 (Skywork Super Agents,由多个 智能体组成的打工方队) ,而且还挺火。 好好好,所以兵贵神速更重要的还在前一句"厚积薄发"是吧~ 快速下载安装包后,打眼一瞧产品功能布局就能发现其两大特色: 秉着刷到就是缘分的态度,我们也火速去了解、实测了一番这款桌面版工具,结果发现: 对我等打工人确实友好,平时做个PPT或调研报告、根据本地文档做些数据分析的活儿,基本都是一句话的事儿。 划重点,还都不用你手动上传文件,现在都是智能体自动读取电脑上的海量文件进行work。而且所有文件处理都在本地完成,无需上传云 端,这样一来也能最大程度保证隐私安全。 当然了,更更重要的是,这一次我们 Windows选手 优先拿到了体验权 (P.S. 不像Claude Cowork主要面向macOS) 。 所以话不多说,一手实测这就奉上咯。 实测Skywork桌面版 ...
姚顺雨腾讯首篇论文:给AI下半场指路“上下文学习”
量子位· 2026-02-04 01:01
Core Insights - The article discusses the launch of CL-bench, a benchmark designed to evaluate the ability of large models to learn from context, led by Yao Shunyu, Tencent's Chief AI Scientist [1][2][4] - The research emphasizes that the focus should shift from merely increasing model size to ensuring models can effectively learn and apply knowledge in real-world tasks [5][10] - Current leading models, including GPT-5.1, show disappointing performance, with a task-solving rate of only 23.7%, indicating a significant gap in their contextual learning capabilities [7][29] Summary by Sections Context Learning Importance - The research highlights that while advanced models excel in standardized tests, they struggle in real-world applications where contextual learning is crucial [9][10] - Human learning relies on real-time context rather than static knowledge, which current models fail to replicate [11][14] CL-bench Design and Objectives - CL-bench consists of 500 complex contexts, 1899 tasks, and 31607 validation criteria, designed to require models to learn new knowledge from context [15][19] - The benchmark aims to assess models' abilities to apply knowledge from unfamiliar domains, rule systems, and procedural tasks [18][22] Model Performance Evaluation - Ten leading models were evaluated on CL-bench, with an average task-solving rate of only 17.2%, underscoring their inability to learn from complex contexts [28][29] - The best-performing model, GPT-5.1, achieved a maximum of 23.7%, revealing a widespread issue across models in contextual learning [30] Error Analysis - The analysis identified that ignoring or misusing context is a primary reason for model failures, with many errors stemming from the models' reliance on pre-trained static knowledge [31][32] - Models performed poorly in tasks requiring inductive reasoning from experimental data, often achieving less than 10% success [32] Future Directions - The research team aims to advance contextual learning in AI, moving beyond merely providing context to ensuring models can genuinely learn from it [36][40] - The collaboration between Tencent and Fudan University reflects a commitment to enhancing AI's practical applications in real-world scenarios [39]
黄仁勋2026大模型座上宾:杨植麟
量子位· 2026-02-03 10:35
Core Insights - Yang Zhiling, founder and CEO of Moon's Dark Side, has been invited as a keynote speaker at NVIDIA's GTC 2026, marking a significant recognition for him and his Kimi models [1][2][27] - The invitation reflects NVIDIA's strategic foresight in identifying emerging trends within the AI industry, as Huang Renxun (NVIDIA's CEO) typically selects speakers who align with future market directions [7][11] Group 1: Yang Zhiling and Kimi's Journey - Yang Zhiling represents a new pain point in AI development, as the industry faces challenges with existing models and the need for innovative solutions [28][30] - Kimi faced significant challenges in 2025 due to the impact of DeepSeek, which threatened its business model and user engagement [33][35] - After a period of silence and strategic retreat, Kimi re-emerged with the launch of Kimi K2, showcasing advanced capabilities and reaffirming its position in the market [38][39] Group 2: Market Position and Financial Developments - Kimi K2.5 was launched in January 2026, demonstrating superior performance in various benchmarks compared to competitors like GPT-5.2 and Claude 4.5 Opus [41][42] - Kimi's successful C-round financing raised $500 million, leading to a post-money valuation of $4.3 billion, indicating strong investor confidence and financial stability [46] - The cash reserves exceeding 10 billion yuan position Kimi well for continued research and development, aiming for leadership in the global SOTA (state-of-the-art) landscape [46]
猝不及防,Adobe关停2D动画软件Animate拥抱AI!最惨学生:一学期的课白上了
量子位· 2026-02-03 07:45
Core Viewpoint - Adobe has announced the discontinuation of Adobe Animate, a 2D animation software that has been in use for over 25 years, primarily due to a shift in focus towards AI technologies [10][38]. Group 1: Announcement and User Reactions - Adobe officially notified users that sales of Adobe Animate will cease on March 1, 2026, with varying support timelines for enterprise and individual users [10][19]. - The announcement has led to widespread disbelief and frustration among users, particularly those who have invested time in learning the software [3][5]. - Many users feel abandoned, citing a lack of communication and a suitable alternative from Adobe [28][29]. Group 2: Impact on Users and Industry - Despite its decline, Adobe Animate remains essential for many web animators, game developers, and content creators, with some users stating it is irreplaceable [11][13]. - The transition to alternative software, such as Toon Boom, is complicated by high migration costs and the need to relearn workflows [16][17]. - Users express concerns that Adobe's decision will negatively impact their work quality and existing projects [12][46]. Group 3: Adobe's Strategic Shift - Adobe's rationale for discontinuing Animate centers around the advancement of technology and a strategic pivot towards AI-driven tools [37][38]. - The company has been focusing on integrating AI features across its applications, which has led to the neglect of Animate [39][41]. - Critics argue that the decision to shut down Animate reflects a broader trend of prioritizing new technologies over established products, even when those products still have a dedicated user base [44][46]. Group 4: Historical Context and Legacy - Adobe Animate, originally launched as FutureSplash Animator in 1996, played a significant role in transforming the internet by enabling rich multimedia content [48][50]. - At its peak, Flash Player was installed on over 98% of computers, making it a cornerstone of web animation and independent game development [52][54]. - Despite its historical significance, Animate has struggled to adapt to modern demands, leading to its eventual phase-out [62][67].
阶跃新模型快到“没推理”!印奇上任,果然气势一新
量子位· 2026-02-03 07:45
Core Insights - The article discusses the launch of the new open-source agent model Step 3.5 Flash, which features a total of 196 billion parameters and 11 billion active parameters, supporting a context window of 256K [2][36]. Model Performance - The model achieves a peak inference rate of 350 TPS, comparable to closed-source models in agent scenarios and mathematical tasks, capable of handling complex, long-chain tasks [5][41]. - In benchmark tests, Step 3.5 Flash scored 97.3 in the AIME 2025 benchmark, 74.4% in the SWE-bench Verified coding tasks, and 88.2 in the τ²-Bench for agent tasks, indicating strong performance across various applications [7][6]. Technical Architecture - Step 3.5 Flash employs a MoE sparse mixture of experts architecture, activating approximately 11 billion parameters during inference to control computational and deployment costs effectively [36]. - The model incorporates a 3:1 sliding window attention mechanism to address long context issues, enhancing its ability to manage lengthy texts [37]. - It features a self-developed MIS-PO reinforcement learning framework to improve inference and agent execution capabilities, reducing data noise and gradient variance for stable optimization in long-sequence tasks [42]. Ecosystem Integration - The model is designed to work seamlessly with major AI acceleration chip platforms from various manufacturers, including Ascend, Mu Xi, and Alibaba's T-head, ensuring compatibility with current mainstream domestic AI hardware [4]. - Step 3.5 Flash emphasizes a cloud-edge collaboration approach, where the cloud handles complex planning and reasoning while the edge focuses on secure data retrieval and local execution [30][32]. Future Developments - The development team is already working on Step 4, indicating ongoing advancements in the model's capabilities [43].
量子位编辑作者招聘
量子位· 2026-02-03 04:52
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
马斯克视频生成模型首次交卷!电影级运镜+音效,免费可玩
量子位· 2026-02-03 04:52
Core Insights - xAI has launched Grok Imagine 1.0, described as the most powerful video and audio generation model to date [1] - The model supports text-to-video and image-to-video generation, with a maximum duration of 10 seconds and a resolution of 720P, significantly enhancing audio quality [2] Group 1: Model Capabilities - Grok Imagine 1.0 can accurately capture user creative concepts, producing rich and coherent visuals, such as an AI version of "How to Train Your Dragon" [4] - The model excels in generating interactive sound effects and expressions, enhancing the overall user experience [5] - Users can create short videos quickly by stringing together generated clips [6] Group 2: Performance Metrics - In the past 30 days, Grok Imagine has generated 1.245 billion videos [8] - The core capabilities of Grok Imagine are divided into video generation and video editing [9] - The model demonstrates cinematic-level understanding of camera movements and smooth scene transitions [11][13] Group 3: Editing Features - Grok Imagine allows users to replace objects and modify scenes, including changing colors and details of objects [25][29] - Users can apply different visual styles to existing video materials and animate static black-and-white line drawings [33] - The model has undergone iterative optimizations focusing on latency and cost control [35] Group 4: Benchmarking and Rankings - According to Artificial Analysis, Grok Imagine ranks first in text-to-video generation, excelling in cost and latency metrics [36] - Comparative evaluations from Artificial Analysis and LMArena confirm Grok Imagine's leading position in both latency and cost [39] - In a blind evaluation of video editing capabilities, Grok Imagine outperformed competitors in overall performance, instruction adherence, and effect consistency [43]
Clawdbot国产芯片适配完成!清华特奖出手,开源框架直接一键部署
量子位· 2026-02-03 04:52
Core Viewpoint - Clawdbot, now known as OpenClaw, has gained significant popularity, reaching 120,000 stars on GitHub within a week, with its Mac mini accessories sold out and rapid integration by major companies like Alibaba and Tencent [1][4]. Group 1: Clawdbot Features and Functionality - Clawdbot transforms AI from a standard chatbot into a 24/7 AI employee, capable of performing tasks while users are occupied or asleep [5]. - It can respond to messages on mobile devices and proactively notify users upon task completion [6]. - Users have reported high costs associated with using Clawdbot, as it can quickly consume hundreds of dollars in token fees for minimal output [10]. Group 2: Introduction of Xuanwu CLI - Xuanwu CLI is a new open-source framework that allows users to run Clawdbot locally without needing to purchase a Mac mini or incur API costs, making it more accessible [13][14]. - It simplifies the local deployment of models, providing an "app store-like" experience for users to select and use models without complex configurations [18]. - The command system of Xuanwu CLI is highly compatible with Ollama, allowing for easy transition for users familiar with that platform [20]. Group 3: Technical Advantages of Xuanwu CLI - Xuanwu CLI supports local AI engines, enabling integration with Clawdbot for continuous operation and interaction [25]. - It is designed to be user-friendly, requiring minimal setup and allowing for quick service startup, often within one minute [29]. - The framework is compatible with OpenAI API standards, facilitating easy integration with existing applications and reducing the cost of switching from cloud to local models [30]. Group 4: Adaptation to Domestic Chips - Xuanwu CLI is uniquely adapted to domestic chips, providing a cost-effective solution for running models locally, unlike other solutions that primarily rely on NVIDIA hardware [34]. - It addresses common issues faced with domestic chips, such as configuration complexity and performance variability, by encapsulating hardware differences and providing a unified resource pool [39]. - The architecture of Xuanwu CLI allows for intelligent scheduling and optimal resource allocation, ensuring stability and performance across different hardware setups [46]. Group 5: Company Background - Qingmiao Intelligent, founded in 2022, focuses on chip adaptation and the optimization of models, frameworks, and operators [48]. - The company has received significant investment and aims to create a comprehensive optimization system from hardware to intelligent agents [51]. - Qingmiao has successfully developed various domestic integrated machine solutions, achieving high performance and adaptability across multiple chip platforms [52].
疯狂!也就2500辆车上路,完成8760亿估值新融资
量子位· 2026-02-03 04:52
以下文章来源于智能车参考 ,作者有车无人 站上新台阶的Waymo明确,当下的业务重心是面向全球扩张,放话"大规模自动驾驶时代已经到来"。 出海,这也是中国Robotaxi玩家萝卜快跑、文远知行和小马智行正在开拓的方向,今年有望看到中美Robotaxi双雄同台竞技。 全球自动驾驶玩家,也将同步Waymo迎来新一轮重估。 Waymo官宣完成160亿美元融资 Waymo通过社交平台官宣,最新一轮融资融了 160亿美元 ,约合人民币 1112.4亿元 ,超越此前三轮融资之和。 智能车参考 . 追踪AI+汽车新进展。量子位旗下汽车频道 一凡 发自 副驾寺 智能车参考 | 公众号 AI4Auto Waymo官宣完成1112亿融资,估值超8760亿元。 特斯拉攻上门的当口,资本市场用真金白银,表达了对Waymo和「Waymo路线」的认可与坚信,托举Waymo一轮融了160亿美元。其中谷 歌被曝豪掷900亿,红杉资本押注,将Waymo估值推上新的高度, 19个月翻了3倍 。 Waymo透露, 最新一轮融资的主要投资者仍然是谷歌 ,领投方还包括红杉资本、Dragoneer Investment Group和DST Global ...
Kimi K2.5登顶开源第一!15T数据训练秘籍公开,杨植麟剧透K3
量子位· 2026-02-03 00:37
Core Insights - Kimi K2.5 has achieved significant recognition, topping the Trending chart on Hugging Face with over 53,000 downloads [2] - The model excels in agent capabilities, outperforming flagship closed-source models like GPT-5.2 and Claude 4.5 Opus in various benchmark tests [3] - Kimi K2.5's technical report reveals its development process and innovative features [5] Group 1: Model Architecture and Training - Kimi K2.5 is built on the K2 architecture and has undergone continuous pre-training with 15 trillion mixed visual and text tokens [6] - The model adopts a native multimodal approach, allowing it to process visual signals and text logic within the same parameter space [7] - This extensive data training has led to synchronized enhancements in visual understanding and text reasoning, breaking the previous trade-off between the two [8] - Kimi K2.5 demonstrates high cost-effectiveness, achieving better performance than GPT-5.2 while consuming less than 5% of its resources [9] Group 2: Visual Programming and Debugging - The model has unlocked "visual programming" capabilities, enabling it to infer code directly from video streams [11] - Kimi K2.5 can accurately capture the dynamics of visual elements in videos and translate them into executable front-end code [12] - To address issues with code execution and styling, K2.5 integrates a self-visual debugging mechanism that verifies the rendered interface against expected outcomes [14] - If discrepancies are found, the model can autonomously query documentation to identify and correct issues [15] - This "generate-observe-query-fix" automated loop simulates a senior engineer's debugging process, allowing the model to independently complete end-to-end software engineering tasks [16] Group 3: Agent Swarm Architecture - Kimi K2.5 features an Agent Swarm architecture, capable of autonomously constructing digital teams of up to 100 agents for parallel task execution [17] - This system breaks down complex tasks into numerous concurrent subtasks, significantly reducing processing time [18] - The operation of this large team is managed by the PARL (Parallel Agent Reinforcement Learning) framework, which includes a core scheduler and multiple sub-agents [20][21] - The scheduler oversees task distribution, while sub-agents focus on efficiently executing specific instructions [22] - The design balances flexibility in planning with the logical rigor required for large-scale parallel operations [23] Group 4: Training and Efficiency - The training process employs a phased reward shaping strategy to encourage efficient division of labor among agents [25] - Initially, the focus is on incentivizing the scheduler for parallel exploration, gradually shifting to the success rate of tasks as training progresses [26] - This gradual approach fosters a mindset in the model to maximize concurrency while ensuring result accuracy [27] - Efficiency evaluation incorporates critical steps as a core metric, emphasizing the reduction of end-to-end wait times [28] Group 5: Future Developments and Community Engagement - Following the launch of K2.5, the founders of Moonlight appeared on Reddit for a 3-hour AMA, discussing the model's development and future plans [29] - The team hinted at the next-generation Kimi K3, which may be based on a linear attention mechanism, promising significant advancements [31] - They acknowledged that while they cannot guarantee a tenfold improvement, K3 will likely represent a qualitative leap over K2.5 [32] - The team also addressed the model's occasional misidentification as Claude, attributing it to the high-quality programming training data that included Claude's name [34] - The laboratory emphasizes that achieving AGI is not solely about increasing computational power but also about developing more efficient algorithms and smarter architectures [38]