量子位
Search documents
AI玩拼图游戏暴涨视觉理解力,告别文本中心训练,无需标注的多模态大模型后训练范式
量子位· 2025-10-15 10:20
VisualJigsaw团队 投稿 量子位 | 公众号 QbitAI 在多模态大模型的后训练浪潮中,强化学习驱动的范式已成为提升模型推理与通用能力的关键方向。 然而,大多数现有方法仍 以文本为中心 ,视觉部分常被动地作为辅助信号输入。相比之下,我们认为在后训练阶段重新审视 视觉自监督学 习 的潜力,设计 以视觉为中心 的后训练对于增强多模态大模型对于视觉信息本身的细粒度深入理解也同样至关重要。 为此,来自MMLab@南洋理工大学的最新论文 《Visual Jigsaw Post-Training Improves MLLMs》 提出了一种全新的针对多模态大模 型后训练任务- Visual Jigsaw 。 它将经典的自监督拼图任务重新设计为多模态大模型后训练阶段的核心目标,让模型在不依赖额外标注、也无需视觉生成模块的情况下,显式 强化自身的视觉感知与理解能力。在图片,视频,和3D三种视觉模态下都验证了其有效性。 Visual Jigsaw 方法简介 对于不同视觉模态,具体的Visual Jigsaw任务设计如下 Image Jigsaw: 图片在2D空间上被划分为 个相同大小的子图,打乱后模型需恢复正确的空间 ...
波士顿动力狗gogo回来了!“五条腿”协同发力
量子位· 2025-10-15 10:20
Core Insights - The article discusses the advancements in Boston Dynamics' Spot robot, which can lift and manipulate a tire weighing 15 kg in just 3.7 seconds, showcasing its dynamic whole-body manipulation capabilities [3][31]. Group 1: Dynamic Whole-Body Manipulation - The method combines sampling and learning for dynamic whole-body manipulation, utilizing reinforcement learning and sampling-based control to enable coordinated tasks involving arms, legs, and torso [11][12]. - A hierarchical control approach is employed, dividing control problems into two complementary layers: a low layer for direct motor torque control and a high layer for task-specific strategies [12][13]. Group 2: Task Execution and Control Strategies - For tasks like tire alignment and stacking, the system uses sampling-based control to simulate potential future scenarios and discover optimal strategies [14]. - Reinforcement learning is applied to maintain stability during rolling tasks, capturing the necessary dynamic features and reactive control mechanisms [15][26]. Group 3: Performance and Efficiency - The Spot robot's performance in tire manipulation exceeds traditional static assumptions, demonstrating the ability to handle weights beyond its peak lifting capacity of 11 kg [35]. - The robot's dynamic coordination of movements allows it to efficiently perform tasks that were previously limited to slower, static methods [36][33]. Group 4: Simplification of Control Problems - Separating high-level and low-level control significantly simplifies the control challenges, allowing the high-level controller to focus on task completion without needing to reason about joint torques or stability constraints [37][38]. - The learned motion abstractions enable the high-level controller to operate in a simplified action space, enhancing computational feasibility and task execution efficiency [38].
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-15 10:20
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 1、注册地在中国,或主营业务主要面向中国市场; 2、主营业务属于人工智能及相关产业,或已将人工智能广泛应用于主营业务,并在细分领域居于行业领先地位; 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人工智能领域创新创业力量,将评选出最具投资价值和发展潜力的AI创业公司, 参选条件 : 评选标准 : 3、具备成熟的产品或服务,已获得实际客户应用及市场认可; 4、近一年在技术 ...
腾讯发布超低成本AI训练法!120元效果秒杀70000元微调方案
量子位· 2025-10-15 06:27
Core Viewpoint - Tencent proposes a new method for upgrading large model agents called Training-Free GRPO, which significantly reduces costs and improves performance without the need for parameter tuning [1][5][11]. Group 1: Methodology - The Training-Free GRPO method allows for performance enhancement by learning from brief experiences embedded in prompts, eliminating the need for parameter adjustments [2][11]. - This approach maintains the model parameters in a frozen state while dynamically updating an external knowledge base to optimize performance [14][22]. - The method leverages the core logic of traditional GRPO but transforms it into a non-parametric reasoning process [13]. Group 2: Experimental Results - Experiments demonstrate that the DeepSeek-V3.1-Terminus model using Training-Free GRPO shows significant performance improvements in mathematical reasoning and web search tasks [4][25]. - Compared to fine-tuning a 32B model, Training-Free GRPO requires less training data and incurs lower costs, with a notable example being a cost of approximately $18 compared to over $10,000 for traditional methods [5][28]. - In the AIME24 and AIME25 tests, the model's performance improved from 80.0% to 82.7% and from 67.9% to 73.3%, respectively, showcasing a clear advantage with minimal training samples [28]. Group 3: Performance Evaluation - The method achieved a Pass@1 score of 67.8% on the WebWalkerQA benchmark, a significant increase from the baseline score of 63.2% [35]. - The results indicate that the learned experiences help the model avoid redundant tool calls and improve decision-making efficiency [31][30]. - The effectiveness of Training-Free GRPO is contingent upon the underlying model's reasoning and tool usage capabilities, as demonstrated by its lower performance on less capable models [40].
开源模型TOP5,被中国厂商包圆了
量子位· 2025-10-15 06:27
Core Insights - The article highlights the significant rise of Chinese open-source large models, with notable mentions of Alibaba's Qwen series and DeepSeek, which are expected to have a profound impact on the open-source community starting in the second half of 2024 [1][6][20]. Model Rankings - Chinese open-source models have moved from being followers to leaders in the field, as evidenced by their positions in the LMArena rankings, where models like GLM-4.6 and DeepSeek-v3.2 are closely following top proprietary models such as GPT-5 and Gemini-2.5-pro [7][10]. - Qwen3-max-preview has reached the top three in rankings, although it is not yet open-sourced [8]. Performance in Various Domains - In the text generation domain, Chinese models like DeepSeek-R1/V3.1 and GLM-4.6 are competing closely with leading proprietary models [10]. - In web development tasks, models such as DeepSeek-R1-0528 and Qwen3-Coder have also made it to the top ten [11]. - In the visual domain, Tencent's Hunyuan-vision-1.5 and Qwen3 are among the strongest open-source models, with Hunyuan-vision-1.5 still in the planning phase for open-sourcing [12]. Popularity and Downloads - Qwen3 is noted as one of the highest downloaded models, leading among open-source models when scaled to hundreds of billions of parameters [18]. - The most popular model currently is DeepSeek-R1, indicating strong user engagement and preference [17]. Industry Trends - The article suggests that the shift in dominance within the open-source model landscape is not just about who leads but may redefine the global innovation landscape [21]. - The driving force behind this momentum is increasingly recognized as coming from China, indicating a potential shift in the global AI development paradigm [20].
王兴兴硕士论文惊现GitHub,宇树雏形那时候就有了
量子位· 2025-10-15 06:27
一水 发自 凹非寺 量子位 | 公众号 QbitAI 人火了是连毕业论文都要被翻出来的(doge)。 这不,宇树科技CEO 王兴兴的 硕士毕业论文 就被网友们掘地三尺找到了。 (不在知网,而是在GitHub上找到的。) 此时回看这篇近10年前的论文,有两点颇让人注意: 一是王兴兴当时大胆押注的电驱式机器人方案,目前已经被业界广泛接受。当时包括波士顿动力在内的国内外团队都将研究集中于液压方案, 而现在,这一形式已经发生逆转。 (波士顿动力从去年开始改液压为电驱) 二是宇树科技 (已经估值百亿且即将IPO) 的开局,其实就是源自论文所提出的那只名叫XDog的机器小狗。不止王兴兴本人在多个场合公 开提到这只小狗,而且它还被明晃晃摆在宇树科技展厅的起首位置。 当然更重要的是,论文中所蕴含的"性价比"思想后来也几乎成了宇树科技的"立身之本"—— 不谈如今已满大街跑的机器狗,这家公司去年8月发布的G1双足人形机器人,更是首次将人形机器人价格下探至10万元大关 (9.9万元起售) 所以,要问明星独角兽宇树科技是如何炼成的?创始人王兴兴的这篇论文,或许可以找到一些线索。 论文已初现机器人"性价比"思维 这篇论文完成于2016 ...
OPPO新AI操作系统,走出屏幕“指哪答哪”,嘈杂环境只听你声音
量子位· 2025-10-15 04:00
Core Viewpoint - OPPO has launched the new generation of AIOS, ColorOS 16, featuring upgraded functionalities such as "One-Click Flash Memory" and "One-Click Question Screen" to enhance user experience and interaction with AI technology [1][50]. Group 1: One-Click Flash Memory - The "One-Click Flash Memory" function allows users to save key information with a single button press, which has been significantly upgraded in ColorOS 16 [9][8]. - Users can now save multiple images at once, extracting key information and text without the need to browse through them [12]. - The AI can automatically generate summaries from long videos, identifying key timestamps for easier reference [14]. - This feature also enables users to remember takeout codes and payment details, automatically recognizing and storing them for future access [20][23]. - The system can create personalized consumption reports by recognizing spending types and amounts [23]. - It incorporates a "memory symbiosis" feature, which can recommend restaurants based on users' health reports, avoiding unsuitable food options [26]. - Users can also capture paper receipts using the camera for record-keeping [27]. Group 2: One-Click Question Screen - The "One-Click Question Screen" feature has been updated to support voice recognition, allowing users to interact with AI even in noisy environments [34][36]. - Users can simply point at objects in the real world for the AI to provide information, enhancing the interaction experience [38]. - This feature has been expanded to include collaboration with popular review platforms, enhancing the exploration experience [41]. Group 3: New AI Technology Architecture - OPPO introduced a new AI technology architecture that includes new computing, new perception, and new ecosystem layers [43]. - The new computing aspect focuses on intelligent edge computing, enabling high-performance inference capabilities [44]. - The new perception layer features a memory symbiosis engine that allows for continuous awareness of the physical world and lifelong memory capabilities [46]. - The new ecosystem aims to facilitate cross-application AI capabilities and enhance interaction between devices and users [48]. - This architecture marks the transition of ColorOS into a new AIOS era, set to debut with the upcoming Find X9 series and OnePlus devices [50][52].
人工智能年度榜单火热报名中!五大奖项,寻找AI+时代的先锋力量
量子位· 2025-10-15 04:00
组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行 的企业、人物与产品。 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现 并致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
谷歌新版Gemini一夜端掉UI:单HTML文件复刻macOS,成功率100%
量子位· 2025-10-15 01:08
Core Insights - Google's AI, Gemini 3.0 Pro, has demonstrated the ability to create a fully functional macOS-like web operating system from simple prompts, showcasing its advanced capabilities in UI design and functionality [2][3][4] - The AI's success in generating operating systems for macOS, Windows, and Linux within a single HTML file indicates a significant leap in programming models, potentially positioning Gemini 3.0 Pro as a leading tool in the field [10][12][15] - Despite the impressive results, some experts caution that these creations are merely simulations and not true operating systems, emphasizing the distinction between emulation and actual implementation [18] Group 1: Gemini 3.0 Pro Capabilities - Gemini 3.0 Pro can replicate macOS UI features, including animations, window management, and bundled software, all functioning correctly [4][10] - The AI can also generate a web-based Windows environment with integrated Python and gaming capabilities, demonstrating versatility across different operating systems [12][11] - A successful attempt to create a Linux desktop environment further highlights the AI's comprehensive capabilities in UI and functionality [16][15] Group 2: Community Reactions and Comparisons - Users have expressed excitement over the potential of Gemini 3.0 Pro, suggesting it could become the strongest programming model to date if the final version meets these expectations [9] - Comparisons with other AI models, such as Claude 4.5 Sonnet, reveal that Gemini 3.0 Pro outperforms its competitors in generating functional applications [13] - The community acknowledges the impressive nature of the AI's output while also recognizing the limitations of its current capabilities, particularly in terms of true operating system functionality [18] Group 3: Future Prospects - Although Google has not officially announced the release date for Gemini 3.0 Pro, industry insiders speculate it may debut in the coming months based on previous patterns [19][20] - Increased visibility through demonstration videos from influencers suggests a strategic marketing approach by Google, reminiscent of past successful campaigns [22] - The anticipation surrounding Gemini 3.0 Pro raises concerns about potential disappointment if expectations are set too high, similar to the reception of previous AI models [22]
实测新版LiblibAI:终于把模型、生图、工作流塞进一个碗了
量子位· 2025-10-15 01:08
Core Insights - The article discusses the significant upgrades in LiblibAI 2.0, transforming it from a model-finding website to a comprehensive AIGC (AI-Generated Content) platform, enhancing user experience and functionality [11][36]. Group 1: Platform Upgrades - LiblibAI 2.0 introduces multiple models and video effects, moving beyond simple interface changes to a more integrated creative workflow [3][12]. - The platform now allows users to create content without switching between multiple websites, streamlining the creative process [11][12]. - The interface has evolved to resemble a combination of ChatGPT and Canva, making it more user-friendly [12]. Group 2: Model Integration - The platform retains its core strength by integrating popular models such as Qwen-Image, Seedream 4.0, and the latest Midjourney V7 model, which was only recently released [15][16]. - LiblibAI 2.0 has also incorporated various mainstream video models, ensuring a comprehensive offering for users [17][18]. Group 3: User Experience - The new feature of adding special effects to videos has been highlighted as a standout capability, allowing for creative transformations [19][21]. - Users have reported mixed experiences, with some noting issues like page lag and limited editing capabilities for generated content [28][38]. - The platform's ability to visualize model selection through a global image style library simplifies the process for new users [33]. Group 4: Company Background - LiblibAI has a history of rapid growth, having completed four rounds of financing in one year, setting a record in the domestic AI application sector [39]. - The founder, Chen Mian, has a strong background in commercializing products, previously working with popular applications like Jianying and CapCut [42][43]. - The company is transitioning from a model-sharing community to a comprehensive AI toolkit for creators, which poses challenges in maintaining user trust and engagement [45].