Workflow
量子位
icon
Search documents
刚刚,微软推出AI浏览器,上网从此不一样了
量子位· 2025-07-29 00:40
Core Viewpoint - Microsoft has transformed its Edge browser into an AI assistant with the introduction of the "Copilot mode," marking a significant shift in how users interact with the web [1][24]. Group 1: Features of Copilot Mode - The Copilot mode allows the Edge browser to function as an AI agent, capable of reading and analyzing multiple open tabs simultaneously to perform complex tasks [3][15]. - Users can interact with Copilot through a simplified interface that resembles a chat window, enabling various functions such as searching, navigating, and conversing with the AI [6][8]. - The AI can group tabs for better organization, helping users maintain focus and quickly find needed content [12]. Group 2: Future Developments - Upcoming features include a "thematic journey" function that organizes past and present browsing activities into a cohesive learning path, suggesting next steps based on user interests [17]. - Future plans for Copilot include capabilities such as making restaurant reservations, managing itineraries, and even shopping, contingent on user authorization [20]. Group 3: Market Implications - Microsoft's move is a direct challenge to Google Chrome, which has dominated the browser market with over 60% share, while Chrome's integration of AI features remains less aggressive [24][25]. - The introduction of AI capabilities in browsers may lead to a shift in business models, with potential subscription services for enhanced features, indicating that browsers may no longer be free software [30][32]. - This evolution suggests a transition from traditional browsing methods to a more intelligent assistant-driven experience, fundamentally changing how users engage with the web [32][34].
阿里开源电影级AI视频模型!MoE架构,5B版本消费级显卡可跑
量子位· 2025-07-29 00:40
Core Viewpoint - Alibaba has launched and open-sourced a new video generation model, Wan2.2, which utilizes the MoE architecture to achieve cinematic-quality video generation, including text-to-video and image-to-video capabilities [2][4][5]. Group 1: Model Features and Performance - Wan2.2 is the first video generation model to implement the MoE architecture, allowing for one-click generation of high-quality videos [5][24]. - The model shows significant improvements over its predecessor, Wan2.1, and the benchmark model Sora, with enhanced performance metrics [6][31]. - Wan2.2 supports a 5B version that can be deployed on consumer-grade graphics cards, achieving 24fps at 720P, making it the fastest basic model available [5][31]. Group 2: User Experience and Accessibility - Users can easily create videos by selecting aesthetic keywords, enabling them to replicate the styles of renowned directors like Wong Kar-wai and Christopher Nolan without needing advanced filmmaking skills [17][20]. - The model allows for real-time editing of text within videos, enhancing the visual depth and storytelling [22]. - Wan2.2 can be accessed through the Tongyi Wanxiang platform, GitHub, Hugging Face, and Modao community, making it widely available for users [18][56]. Group 3: Technical Innovations - The introduction of the MoE architecture allows Wan2.2 to handle larger token lengths without increasing computational load, addressing a key bottleneck in video generation models [24][25]. - The model has achieved the lowest validation loss, indicating minimal differences between generated and real videos, thus ensuring high quality [29]. - Wan2.2 has significantly increased its training data, with image data up by 65.6% and video data up by 83.2%, focusing on aesthetic refinement [31][32]. Group 4: Aesthetic Control and Dynamic Capabilities - Wan2.2 features a cinematic aesthetic control system that incorporates lighting, color, and camera language, allowing users to manipulate over 60 professional parameters [37][38]. - The model enhances the representation of complex movements, including facial expressions, hand movements, and interactions between characters, ensuring realistic and fluid animations [47][49][51]. - The model's ability to follow complex instructions allows for the generation of videos that adhere to physical laws and exhibit rich details, significantly improving realism [51]. Group 5: Industry Impact and Future Prospects - With the release of Wan2.2, Alibaba has continued to build a robust ecosystem of open-source models, with cumulative downloads of the Qwen series exceeding 400 million [52][54]. - The company is encouraging creators to explore the capabilities of Wan2.2 through a global creation contest, indicating a push towards democratizing video production [54]. - The advancements in AI video generation technology suggest a transformative impact on the film industry, potentially starting a new era in AI-driven filmmaking from Hangzhou [55].
抢跑GPT-5,智谱开源新SOTA模型,一句话搞出能看视频、发弹幕的B站!
量子位· 2025-07-28 14:44
Core Viewpoint - The release of GLM-4.5 marks a significant advancement in open-source large models, achieving state-of-the-art (SOTA) performance in various benchmarks and demonstrating a unique integration of capabilities [1][3][49]. Evaluation Metrics - The evaluation included 12 representative benchmarks such as MMLU Pro, AIME 24, and MATH 500, among others [4]. - GLM-4.5 ranked third globally in overall average scores, only behind closed-source models o3 and Grok4, while achieving first place in both open-source and domestic categories [5]. Model Architecture and Performance - GLM-4.5 utilizes a Mixture of Experts (MoE) architecture, featuring a total parameter count of 355 billion and 32 billion active parameters [9]. - The model boasts a generation speed of 100 tokens per second, significantly outperforming other AI models [6]. - Pricing for API calls is competitive, with input costs at 0.8 yuan per million tokens and output costs at 2 yuan per million tokens [8]. Practical Applications - GLM-4.5 can perform complex tasks such as coding and generating educational materials, showcasing its practical utility in real-world scenarios [21][25]. - The model has demonstrated superior performance in programming tasks compared to other open-source models, particularly in stability and task completion rates [24]. Training and Development - The training process involved multiple stages, starting with 15 terabytes of general pre-training data, followed by 7 terabytes of code and reasoning-related data [35]. - The model incorporates advanced techniques such as dynamic sampling temperature and adaptive pruning strategies to enhance stability and performance [48]. Community Engagement and Accessibility - GLM-4.5 is available for free public testing on platforms like chatglm.cn and Z.ai, promoting community engagement and feedback [12][50]. - The company has introduced a subscription model for developers, allowing unlimited access to GLM-4.5 for a nominal fee [55]. Conclusion - The launch of GLM-4.5 not only represents a technological leap for the company but also injects new vitality into the domestic open-source large model sector, showcasing China's capability to set new standards in AI [52][53].
WAIC现场最“聪明”展台!AI对话眼睛耳朵能力全打开
量子位· 2025-07-28 06:42
Core Viewpoint - The article highlights the advancements in Agora's conversational AI engine, showcasing its new features that enhance real-time interaction and user experience in various applications [4][5][31]. Group 1: Upgrades of the Conversational AI Engine - The upgraded conversational AI engine includes a selective attention locking feature that allows it to accurately capture user commands in noisy environments, filtering out 95% of background noise [12][16]. - The engine now has visual understanding capabilities, enabling it to recognize and interpret images in real-time, enhancing its contextual awareness during interactions [18][23]. - Integration with mainstream digital human solutions allows for more human-like interactions, where digital avatars can express emotions and gestures, making conversations feel more natural [25][30]. Group 2: Applications and Market Position - The conversational AI engine has been successfully implemented across various sectors, including education and smart hardware, demonstrating its versatility and reliability [38][44]. - Agora's long-standing expertise in Real-Time Engagement (RTE) technology positions it favorably in the growing market for multimodal AI interactions, which combine audio and visual inputs [49][50]. - The focus on user experience rather than just technical specifications is expected to enhance the competitive edge of Agora's products in the evolving AI landscape [51][52].
抗干扰能力提升近40% !无需对抗训练,北航上海AI Lab新蒸馏方法提升模型鲁棒性 | ICML 2025
量子位· 2025-07-28 06:42
ROME团队 投稿 量子位 | 公众号 QbitAI 在人工智能模型规模持续扩大的今天,数据集蒸馏(Dataset Distillation,DD)方法能够通过使用更少的数据,达到接近完整数据的训练效 果,提升模型训练效率,降低训练成本。 但是,通过数据集蒸馏训练的模型,要在安全性要求比较高的任务中(如医疗诊断、自动驾驶),实现不受干扰并保持性能效果,还有一定难 度。 来自北京航空航天大学、上海人工智能实验室和英国利物浦大学的研究团队,提出了名为 ROME 的新方法,这是首次将 信息瓶颈理论 引入 数据集蒸馏任务。该方法无需对抗训练,即可显著提升模型的对抗鲁棒性,最大提升近40%。 实验结果显示,在不同数据集上,相较于以往最优方法,ROME的鲁棒性均实现了大幅超越, 最高从此前43.97%暴涨至103.09% 。 目前,相关成果已被国际机器学习顶会ICML 2025正式接收,项目代码与数据已全面开源。 其核心思想是 通过最小化输入数据与其中间层潜在表示之间的冗余信息,同时增强该表示对于最终标签信息的有效性,从而从源头上提升合 成数据的对抗鲁棒性 。 此外,ROME还引入了基于条件熵瓶颈(Conditional ...
最高能效比!他又死磕“存算一体”2年,拿出全新端边大模型AI芯片
量子位· 2025-07-28 06:42
Core Viewpoint - The article highlights the launch of the M50 AI chip by Houmo Intelligent, which boasts the highest energy efficiency in the industry for integrated storage and computing, marking a significant advancement in AI technology [3][4][8]. Group 1: Product Launch and Specifications - The M50 chip features 160 TOPS@INT8 physical computing power, 100 TFLOPS@bFP16 floating-point computing power, and a bandwidth of 153.6 GB/s, with a typical power consumption of only 10W [4][8]. - The M50 is built on the second-generation integrated storage and computing technology developed by Houmo Intelligent, which allows for significant improvements in energy efficiency [8][9]. Group 2: Technological Innovation - The integrated storage and computing technology merges computation and storage, eliminating the need for data transfer between memory and processing units, thus overcoming the "power wall" and "storage wall" limitations of traditional architectures [11][12]. - The M50 utilizes SRAM-CIM technology, which involves deep structural changes to SRAM arrays, enabling parallel loading and computation, thereby doubling efficiency [12][15]. Group 3: Software and Ecosystem - Accompanying the M50 is the new compiler toolchain, Houmo Avenue®, which simplifies the optimization process for developers, allowing for automatic search of the best strategies [24]. - The company has developed a complete product matrix that includes various hardware solutions for both terminal and edge computing, enhancing the accessibility of AI capabilities across different applications [28][36]. Group 4: Market Positioning and Future Outlook - Houmo Intelligent's focus on integrated storage and computing is seen as a necessary differentiation strategy in a competitive landscape dominated by giants like NVIDIA and Huawei [37][40]. - The company aims to address the increasing demand for computing power and bandwidth in the era of large models, with a vision of making AI capabilities ubiquitous in everyday devices [41][42].
LeCun回应赵晟佳出任“首席科学家”
量子位· 2025-07-28 06:42
Core Viewpoint - The appointment of Shengjia Zhao as the Chief Scientist of Meta's Superintelligence Labs signifies a strategic shift in Meta's AI leadership, emphasizing the importance of young talent in the rapidly evolving AI landscape [1][29]. Group 1: Leadership Changes - Shengjia Zhao, a 90s-born Chinese scientist and a key member of ChatGPT and o3, has been appointed as the Chief Scientist of Meta's Superintelligence Labs [1][29]. - Yann LeCun, a Turing Award winner born in 1960, remains the Chief Scientist of Meta's Fundamental AI Research (FAIR) and has confirmed his ongoing role [2][3][5]. - There is public speculation regarding LeCun's position and the dynamics within Meta's AI teams, particularly following Zhao's appointment [11][28]. Group 2: Structural Changes in AI Teams - FAIR, founded by LeCun in December 2013, has been a core institution for AI research at Meta, achieving significant breakthroughs in various fields [17]. - Recently, FAIR has been integrated into the newly formed Meta Superintelligence Labs, indicating a shift in its operational focus [15][19]. - The restructuring has led to a perceived marginalization of FAIR, as it now operates alongside a separate team focused on consumer products and AGI research [22][23]. Group 3: Zhao's Background and Contributions - Zhao graduated from Tsinghua University and later obtained a PhD from Stanford University, where he received multiple prestigious awards [30][32]. - He has been a pivotal figure at OpenAI, contributing to the development of ChatGPT and other models, and is recognized for his work in chain-of-thought reasoning models [32][33][34]. - Zhao's leadership in Meta's AI strategy is anticipated to bring innovative advancements to the company [35].
只需一次指令微调,大模型变身全能专家天团,8B模型性能反超全微调基线 | ACL25 Oral
量子位· 2025-07-28 06:42
Core Insights - The article discusses the limitations of current methods for upgrading large language models (LLMs) and introduces a new framework called Sparse Interpolation Mixture of Experts (SIMoE) that allows for efficient and effective model adaptation with minimal fine-tuning costs [1][4]. Group 1: Limitations of Current Methods - Existing upgrade methods for LLMs face two main limitations: reliance on manual experience for selecting upgrade locations and lack of a systematic mechanism to balance expert specialization and collaboration [4][7]. - The first limitation involves a static upgrade strategy that ignores the dynamic differences between model layers and task-specific requirements, leading to suboptimal performance [7][8]. - The second limitation is the inefficiency in expert collaboration, where traditional methods either force collaboration among experts or train them independently, resulting in knowledge redundancy and poor generalization [9][10]. Group 2: Introduction of SIMoE - SIMoE offers a novel solution by enabling automatic upgrades of standard LLMs to high-performance sparse expert models through a single-stage fine-tuning process [4][6]. - The framework utilizes structured sparse optimization to identify neuron-level expert parameters, combining shared incremental parameters with orthogonal penalties to achieve dual breakthroughs in performance and efficiency [4][14]. Group 3: Performance Metrics - SIMoE has demonstrated superior performance metrics, with an 8B model outperforming the fully fine-tuned baseline by 1.6% in ROUGE-L scores, a 10% increase in safety metrics, and a 30% reduction in inference memory [6][24]. - In various benchmark tests, SIMoE has shown significant improvements in accuracy across multiple tasks, including a 2.8% increase in zero-shot settings and a 75.02% accuracy in few-shot scenarios [24][27]. Group 4: Innovations in SIMoE - The framework introduces a structured sparse upgrade mechanism that transforms the selection of upgrade locations into a learnable sparse optimization problem, enhancing global optimization capabilities [15][16]. - Additionally, SIMoE incorporates a "non-involution protocol" within expert teams to balance collaboration and specialization, ensuring efficient knowledge transfer and minimizing parameter redundancy [20][22]. Group 5: Experimental Validation - SIMoE has been validated through extensive experiments on both visual and natural language models, showcasing its effectiveness in small sample learning and cross-task generalization [22][25]. - The results indicate that SIMoE consistently outperforms baseline models across various datasets and tasks, reinforcing its potential as a leading framework for LLM adaptation [24][27].
万万没想到,这家央企竟让香农和图灵又“握了一次手”
量子位· 2025-07-28 05:35
金磊 发自 WAIC 量子位 | 公众号 QbitAI 有点意思,能 让香农和图灵又握上一次手 的,竟然是一家 央企 。 他们俩握手是什么意思呢? 这两位大师,一位定义了"信息"如何高效、准确地传递,另一位则开启了"智能"如何被创造和模拟的探索。 而二人的握手,则是信息技术和通信技术的一次融合。 例如当你身处浩瀚的海洋之上,这里是传统通信的"死亡地带",卫星信号微弱且昂贵,别说视频通话,连发一张清晰的图片都可能要耗费半 天。 然而现在,正因为他俩的"握手",在海上流畅地打视频电话已经变成一种现实: 这背后,并非是电信公司发射了什么超级卫星,或者铺设了跨洋光缆。实际上,船员的手机与外界交换的数据量,仅仅是传统视频通话的百 分之一,甚至千分之一。 这,就是由 中国电信人工智能研究院(TeleAI) 研发布局的 智传网(AI Flow) ,不是你以为的简单数据传输,而是让智能体之间互相 连接,高效协作,突破单模型的性能极限,实现连接与交互的智能涌现! 技术一经发布,可谓是惊呆了外国友人,有位网友给出了这样的评价: 重大突破:一个可能重塑GenAI工作方式的AI框架。 那么智传网到底是如何打破传递智能的"壁"的呢? ...
拆箱开源版Coze:Agent核心三件套大公开,48小时揽下9K Star
量子位· 2025-07-28 03:25
Core Viewpoint - The article discusses the recent open-source release of Coze's products, which aims to facilitate the development and deployment of AI agents, marking a significant step towards making agent technology more accessible and practical for developers [1][45]. Group 1: Open Source Products - Coze has released two new open-source products: Coze Studio and Coze Loop, alongside the previously released Eino framework, creating a comprehensive open-source ecosystem for agent development [2][5][32]. - Coze Studio is a low-code platform designed to simplify the creation of AI workflows, while Coze Loop focuses on the development, evaluation, and monitoring of agents [12][21][25]. - The open-source products are licensed under the Apache 2.0 license, allowing for commercial use and modifications without the requirement to open-source changes [7][57]. Group 2: Market Trends and Challenges - The article highlights the growing popularity of agents, transitioning from novelty items to practical tools, as evidenced by the increasing support from major companies and the emergence of various successful agent applications [3][46]. - Despite the enthusiasm, the widespread adoption of agents faces challenges, including inconsistent user experiences and high development barriers, which Coze aims to address through its open-source offerings [47][50]. Group 3: Development and Evaluation Capabilities - Coze Studio provides a complete workflow engine, allowing developers to easily create agents by dragging and dropping functional components, thus lowering the technical barrier for entry [16][19]. - Coze Loop offers a comprehensive solution for prompt development, evaluation, and monitoring, enabling developers to assess agent performance across multiple dimensions [25][30]. - Eino, the earlier released framework, provides a unified component abstraction and flexible orchestration capabilities, enhancing the development process for AI applications [36][39]. Group 4: Future Implications - The open-source initiative is expected to accelerate the deployment of agents across various industries, particularly in internal automation, small teams, and vertical sectors like healthcare and finance [43][42]. - Coze's open-source strategy is seen as a proactive move to capitalize on the impending explosion of agent technology, aiming to create a robust ecosystem that fosters collaboration and innovation among developers [45][56].