Workflow
量子位
icon
Search documents
嚯!国产视频模型的物理水准超神了 | 实测MiniMax海螺02
量子位· 2025-06-19 06:25
鱼羊 一水 发自 凹非寺 量子位 | 公众号 QbitAI 满场观众瞩目之下,体操运动员稳稳完成一个跳步动作,然后……突然来了段木上芭蕾??? 这可不是什么网球王子排球少年真人版之类的运动电影特技—— 要知道,前段时间让谷歌出尽了风头的Veo 3,都还在这一挑战面前翻了车,让网友直呼: 体操就是视频生成模型的图灵测试。 新模型名叫Hailuo 02,主打一个"超清画质"、"精准响应": 原生支持1080p,可以hold住 极端复杂的物理场景 。 不仅是体操,搞点城市特技也是信手拈来,并且连玻璃里的倒影都符合真实世界的客观规律。 以上画面, 完全由AI生成 。 没错,这一次 MiniMax视频生成模型上新 ,还真是把"体操"这个AI视频生成的亘古难题给搞定了! △ 图源:@WuxiaRocks 总而言之就是:物理表现有点太强了吧。 如此水准,使得Hailuo 02深夜发布即炸场,海内外网友抹平时差第一时间纷纷玩嗨。 不少网友直言:比Veo 3更好。 值得一提的是,Hailuo 02一发布,也直接冲上了AI视频竞技场图生视频排行榜第二名,在基准测试中超越当红炸子鸡Veo 3。 | | Text to Video ...
美团提出首个语音交互GUI智能体,端到端语音训练能力优于传统文本训练
量子位· 2025-06-19 06:25
GUIRoboTron-Speech团队 投稿 量子位 | 公众号 QbitAI 只需要动动嘴就可以驱动GUI代理? 由美团和浙江大学联合推出的 GUIRoboTron-Speech ——让用户解放双手,直接对计算机"发号施令"。 从文本到语音,智能代理的下一次进化 当前,以大型语言模型(LLMs)为核心的自主GUI智能体,已能通过文本指令自动执行跨应用、多步骤的复杂任务,极大地提升了用户的工 作效率。但这种对文本的依赖,限制了其在更广泛场景下的应用。 试想一个常见的家庭场景:在对家中的公用电脑发出指令"打开我的浏览器"时,一个仅能理解文本的智能体将不知所措——它无法分辨指令发 出者是家庭中的哪一位成员,自然不知道什么是"我的"浏览器。 然而,一个能够直接处理语音的智能体,则可以通过分析独特的声纹特征,准确识别指令发出者的身份,并打开该成员的个性化Google浏览 器界面。 这正是语音模态所蕴含的独特价值——它不仅传递了指令内容,更包含了身份、情绪等丰富的非言语线索,而这些对于实现真正个性化和智能 化的交互至关重要。 传统的解决方案,如采用"语音识别(ASR)模型转录+文本GUI代理"的级联方式,不仅会增加系 ...
田渊栋:连续思维链效率更高,可同时编码多个路径,“叠加态”式并行搜索
量子位· 2025-06-19 06:25
Core Viewpoint - The article discusses a new research achievement by a team led by AI expert Tian Yuandong, which introduces a continuous thinking chain model that parallels quantum superposition, enhancing efficiency in complex tasks compared to traditional discrete thinking chains [2][4]. Group 1: Research Findings - Traditional large language models (LLMs) utilize discrete tokens for reasoning, which can be inefficient for complex tasks, requiring O(n^2) decoding steps and often getting stuck in local optima [4]. - Recent studies indicate that using continuous hidden vectors for reasoning can significantly improve performance, although theoretical explanations were previously lacking [5]. - The team demonstrated that a two-layer Transformer with D-step continuous chains of thought (CoTs) can solve directed graph reachability problems, outperforming discrete CoTs models that require O(n^2) decoding steps [7]. Group 2: Methodology - The continuous thinking chain allows for simultaneous encoding of multiple candidate graph paths, akin to breadth-first search (BFS), providing a significant advantage over discrete thinking chains, which resemble depth-first search (DFS) [8]. - A designed attention selector mechanism enables the model to focus on specific positions based on the current token, ensuring effective information extraction [11][12]. - The first layer of the Transformer organizes edge information, while the second layer facilitates parallel exploration of all possible paths [21][22]. Group 3: Experimental Results - The team conducted experiments using a subset of the ProsQA dataset, which required 3-4 reasoning steps to solve, with each node represented as a dedicated token [26]. - The COCONUT model, utilizing a two-layer Transformer, achieved an accuracy close to 100% in solving ProsQA problems, while a 12-layer discrete CoT model only reached 83% accuracy, and a baseline model solved approximately 75% of tasks [27][28]. - The model's behavior was further validated through analysis of attention patterns and continuous thinking representations, supporting the theoretical hypothesis of superposition search behavior [30].
AI眼镜主题沙龙报名,一起碰撞产业一线共识|量子位AI沙龙
量子位· 2025-06-19 02:56
Core Viewpoint - The article discusses the emergence and development of AI glasses, highlighting the competitive landscape and the challenges in creating a successful product in this space [1]. Group 1: Market Overview - In the past month, nearly ten new AI glasses have been launched by various companies, marking a significant interest in AI hardware for 2025 [1]. - The AI glasses are evolving to be lighter, have longer battery life, and more fashionable designs, making them more suitable for everyday use [1]. Group 2: Industry Challenges - The article raises questions about the market feedback on the first generation of AI glasses and the challenges in creating a blockbuster product [10]. - It emphasizes the need to identify the "killer application" for AI glasses that would drive widespread adoption [10]. Group 3: Event Information - An AI salon event is scheduled for June 25, featuring discussions with representatives from AI glasses manufacturers and ecosystem participants like Xiaomi and Baidu Smart Cloud [1][10].
MiniMax AI超级智能体发布!编程/多模态能力突出,MCP工具无缝集成,无需邀请码即可试用
量子位· 2025-06-19 02:56
Core Viewpoint - The article introduces the launch of MiniMax Agent, an AI super-intelligent agent designed to function as a reliable teammate, capable of expert-level multi-step planning, flexible task decomposition, and end-to-end execution [1][2]. Summary by Sections MiniMax Agent Features - MiniMax Agent offers programming capabilities, allowing users to create interactive web pages, such as an "online Louvre" in just three minutes [9]. - It possesses multi-modal understanding and generation capabilities, enabling it to comprehend audio and video while outputting various formats like images, audio, PPT, and animations [13]. - The agent supports seamless MCP integration, allowing users to invoke MCP tools simply by using the "@" symbol in the input box [14]. User Engagement and Accessibility - The MiniMax Agent is now fully open for public testing without the need for an invitation, and users can try it for free on the web [6]. - New users receive 1,000 points, with a monthly subscription priced at 19 yuan for approximately 15 tasks, while a professional version costs 69 yuan for about 60 tasks [17]. MiniMax Week Highlights - On the first day of MiniMax Week, the company released MiniMax-M1, the world's first open-source large-scale hybrid architecture reasoning model, which supports an input length of 1 million tokens and an output length of 80,000 tokens [20]. - The second day featured the release of version 2.0 of the "Sea Conch," capable of handling extreme physical conditions and supporting 1080P natively, achieving top-tier instruction adherence and generation quality [21]. - The third day marked the launch of MiniMax Agent, with anticipation for further developments in the coming days [22].
直播预告:小众细分赛道也能实现百万级用户?和咔皮记账聊聊初创AI APP如何突围|AI产品Time
量子位· 2025-06-18 09:17
Core Insights - The article emphasizes the importance of acquiring active and highly engaged users for AI products, highlighting a group of startup AI apps that are experiencing rapid growth despite not being from major internet companies or targeting large markets [1][2]. Group 1: AI Product Growth - A number of startup AI apps are growing at an unusual speed due to clever feature design, personalized marketing strategies, and unique user growth tactics [1]. - The AI product "Time" by Quantum Bit Think Tank aims to explore the success factors of these products through in-depth interviews with product leaders [2]. Group 2: Case Study - Kapi Accounting - Kapi Accounting, a top AI accounting app in China, has achieved over one million users in just six months, despite being in a niche market with existing competitors [3]. - Unlike competitors that focus on emotional value, Kapi emphasizes AI upgrades to become a financial assistant, introducing innovative features like consumption MBTI and smart budgeting [3]. Group 3: Upcoming Insights - The product leader Zhang Yang will provide a comprehensive analysis of Kapi Accounting, aiming to inspire the audience with insights from this interesting product [5]. Group 4: Ongoing Research - Quantum Bit Think Tank is continuously tracking and analyzing domestic AI products through various formats such as monthly reports, AI 100 rankings, panoramic maps, and in-depth reports [6].
大模型全员0分!谢赛宁领衔华人团队,最新编程竞赛基准出炉,题目每日更新禁止刷题
量子位· 2025-06-18 09:17
Core Viewpoint - The recent LiveCodeBench Pro benchmark test revealed that leading large language models (LLMs) performed poorly, with all models scoring zero points, indicating that they have not yet reached the level of human experts in competitive programming tasks [1][2][8]. Group 1: Benchmark Overview - LiveCodeBench Pro is a real-time benchmark testing platform that includes competitive programming problems from IOI, Codeforces, and ICPC [3]. - The question bank is updated daily to prevent LLMs from memorizing questions, ensuring a challenging evaluation environment [4][15]. - The benchmark consists of 584 top-tier competition problems, categorized by cognitive focus and difficulty level, with automatic selection based on normal distribution [15][17]. Group 2: Model Performance - The best-performing model achieved a pass rate of only 53% on medium difficulty questions, while the pass rate for hard questions was 0% [9][10]. - The performance metrics of various models showed that while they excelled in knowledge-intensive and logic-intensive problems, they struggled with observation-intensive problems [26][29]. - LLMs demonstrated advanced skills in precise implementations but fell short in algorithm design and complex case analysis [28][29]. Group 3: Testing Methodology - The testing team categorized problems based on underlying algorithmic concepts and recorded the official difficulty ratings from Codeforces [19]. - Each model's submissions were evaluated against human expert solutions, with results indicating that LLMs often failed to utilize provided sample inputs effectively [30][32]. - The team plans to release a completely new evaluation set quarterly to maintain the relevance and challenge of the testing environment [38]. Group 4: Team Composition - The LiveCodeBench Pro team consists of several Olympic competition winners, with a significant portion being of Chinese descent [40]. - Key team members have backgrounds in prestigious institutions and have previously interned at major tech companies, contributing to the project's credibility and expertise [41][44].
我在618主场,和3位顶尖技术博士聊了聊
量子位· 2025-06-18 07:49
Core Viewpoint - The article discusses the evolution of technology and its impact on the shopping experience during the 618 shopping festival, highlighting the advancements in logistics, customer service, and product recommendation systems that enhance user experience and operational efficiency [1][2][3]. Group 1: Technology and User Experience - Technology serves to improve quality of life rather than complicate it, as evidenced by advancements in logistics and customer service [3][4]. - The improvements in user experience and error reduction are attributed to the efforts of technical teams working behind the scenes to optimize systems and processes [4][20]. - The implementation of a "same product identification system" allows for better product comparison and competitive pricing, enhancing the shopping experience [8][9]. Group 2: Case Studies of Technical Teams - Chang Lin from the retail division focuses on optimizing the same product identification system using model distillation to improve efficiency and reduce costs [11][13][16]. - Xing Yan from the logistics division emphasizes the importance of understanding specific operational scenarios, leading to the development of intelligent distribution models for delivery personnel [33][38]. - Chu Xue from the technology division works on voice recognition systems, ensuring high accuracy in applications like smart customer service and AI-driven calls [42][51]. Group 3: Talent Development and Company Culture - The company emphasizes a solid technical foundation and long-term investment in talent, as seen in the TGT (Tech Genius Team) program aimed at recruiting top technical talent [57][59]. - The TGT program offers a unique structure with no salary cap based on potential, mentorship from experienced professionals, and access to real-world data for practical applications [59][60]. - The company fosters a collaborative environment where technical personnel are encouraged to engage with real-world problems while developing their skills [61][62].
真·罗永浩直播干不过假·罗永浩?网友:不是老罗在演AI吧?
量子位· 2025-06-18 07:49
Core Viewpoint - The article discusses the successful debut of a digital avatar of Luo Yonghao during the 618 shopping festival, which achieved record-breaking sales and viewer engagement, showcasing advancements in digital human technology by Baidu [1][7][11]. Group 1: Digital Human Performance - Luo Yonghao's digital avatar achieved over 13 million views and generated a GMV (Gross Merchandise Volume) exceeding 55 million yuan during the live stream [7]. - The digital avatar surpassed the sales performance of Luo Yonghao's real persona during his previous live stream in May, indicating significant improvements in digital human capabilities [7][28]. - The interaction between the digital avatars of Luo Yonghao and Zhu Xiaomu was seamless, mimicking real human interaction effectively [2][15]. Group 2: Technological Advancements - Baidu's digital human technology, known as Huibo Star, incorporates a high-persuasion digital human model that combines image, perception, decision-making, and action [11][30]. - The introduction of the first dual digital human interactive live streaming room allows for natural dialogue and interaction, enhancing viewer experience [13][14]. - The technology utilizes advanced language models and multi-modal systems to create dynamic interactions, making the digital human appear more lifelike and engaging [30][31]. Group 3: Market Impact and Accessibility - The advancements in digital human technology have lowered the entry barrier for new streamers, enabling even those without prior experience to utilize digital avatars for live streaming [50]. - Small and medium-sized businesses have reported significant increases in order volumes by adopting digital human technology for continuous live streaming [51][54]. - Baidu's initiatives, such as the Dream Butterfly and Starry Sky plans, aim to support merchants by increasing the number of digital human avatars and providing financial backing for their use [59][60]. Group 4: Broader Industry Implications - The use of digital humans is expanding across various sectors, with over 100,000 businesses leveraging this technology, resulting in an average GMV increase of 62% and an 80% reduction in operational costs [58]. - The article emphasizes that digital human technology is not exclusive to top streamers but represents a new form of shared productivity accessible to a wider audience [61].
字节Seed提出序贯策略优化方法,突破同传“质量-延迟”权衡问题
量子位· 2025-06-18 07:49
SeqPO-SiMT团队 投稿 量子位 | 公众号 QbitAI 这个决策过程可以被形式化地表示为: 该框架的一个关键灵活性在于,如果模型决定等待更多上下文,输出的 可以为空,其长度完全由策略模型 自行决定。 AI字幕总是慢半拍,质量和延迟难以平衡是业界老问题了。 为此,香港中文大学、字节跳动Seed和斯坦福大学研究团队出手,提出了一种面向同声传译的序贯策略优化框 架 (Sequential Policy Optimization for Simultaneous Machine Translation, SeqPO-SiMT )。 在70亿参数(7B)规模上实现SOTA。 实验结果显示,SeqPO-SiMT的翻译质量不仅优于监督微调(SFT)的离线模型及LLaMA-3-8B,其表现甚至能 媲美乃至超越Qwen-2.5-7B的离线翻译水平。 方法:SeqPO-SiMT序贯策略优化 针对以上难点,研究团队提出SeqPO-SiMT框架。其核心思想是将同声传译任务建模为一个序贯决策问题,综合 评估整个翻译流程的翻译质量和延迟,并对整个决策序贯进行端到端的优化。 该方法的主要特点是: 它不再孤立地评估每一步决策的好坏 ...