量子位
Search documents
上下文即记忆!港大&快手提出场景一致的交互式视频世界模型,记忆力媲美Genie3,且更早问世!
量子位· 2025-08-21 07:15
Core Viewpoint - The article discusses a new framework called "Context-as-Memory" developed by a research team from the University of Hong Kong and Kuaishou, which significantly improves scene consistency in interactive long video generation by efficiently utilizing historical context frames [8][10][19]. Summary by Sections Introduction to Context-as-Memory - The framework addresses the issue of scene inconsistency in AI-generated videos by using a memory retrieval system that selects relevant historical frames to maintain continuity [10][19]. Types of Memory in Video Generation - Two types of memory are identified: dynamic memory for short-term actions and behaviors, and static memory for scene-level and object-level information [12][13]. Key Concepts of Context-as-Memory - Long video generation requires long-term historical memory to maintain scene consistency over time [15]. - Memory retrieval is crucial, as directly using all historical frames is computationally expensive; a memory retrieval module is needed to filter useful information [15]. - Context memory is created by concatenating selected context frames with the input, allowing the model to reference historical information during frame generation [15][19]. Memory Retrieval Method - The model employs a camera trajectory-based search method to select context frames that overlap significantly with the current frame's visible area, enhancing both computational efficiency and scene consistency [20][22]. Dataset and Experimental Results - A dataset was created using Unreal Engine 5, containing 100 videos with 7601 frames each, to evaluate the effectiveness of the Context-as-Memory method [23]. - Experimental results show that Context-as-Memory outperforms baseline and state-of-the-art methods in memory capability and generation quality, demonstrating its effectiveness in maintaining long video consistency [24][25]. Generalization of the Method - The method's generalization was tested using various styles of images as initial frames, confirming its strong memory capabilities in open-domain scenarios [26][27]. Research Team and Background - The research was a collaboration between the University of Hong Kong, Zhejiang University, and Kuaishou, led by PhD student Yu Jiwen under Professor Liu Xihui [28][33].
“半路截胡”张益唐,北大出身的中山大学校长这样做
量子位· 2025-08-21 07:15
Core Viewpoint - The article discusses the return of renowned mathematician Zhang Yitang to China, specifically to Sun Yat-sen University, after over 40 years abroad, highlighting the competitive nature of academic recruitment in China and Zhang's significant contributions to mathematics [2][3][4]. Group 1: Zhang Yitang's Academic Journey - Zhang Yitang, a prominent mathematician known for his work on the twin prime conjecture, has recently joined Sun Yat-sen University as the chief scientist of the newly established Hong Kong Advanced Institute [2][3]. - Prior to this, he was a tenured professor at the University of California, Santa Barbara, and had been contemplating a return to China for several years due to various international factors [3][4]. - His decision to join Sun Yat-sen University was somewhat unexpected, as he had other institutions lined up, but the university managed to secure his commitment at the last moment [4]. Group 2: Contributions to Mathematics - Zhang gained international recognition at the age of 58 for his groundbreaking paper "Bounded Gaps Between Primes," which provided significant progress on the twin prime conjecture [12][16]. - His research demonstrated the existence of infinitely many pairs of prime numbers with gaps smaller than 70 million, marking a historic advancement in number theory [11][12]. - This achievement was particularly notable as many experts had previously deemed the problem unsolvable, showcasing Zhang's unique approach and capabilities in mathematics [13][14]. Group 3: Personal Background and Philosophy - Born in 1955 in Shanghai, Zhang displayed exceptional mathematical talent from a young age, independently proving the Pythagorean theorem at the age of 10 [18][20]. - He faced significant challenges in his early career, including difficulties finding academic positions in the U.S. after completing his Ph.D., which led him to work in a restaurant temporarily [28][33]. - Zhang emphasizes the importance of passion for mathematics over material success, stating that he values the ability to continue his work in mathematics regardless of his circumstances [34].
北大ChatExcel,获得千万级新投资
量子位· 2025-08-21 07:15
Core Viewpoint - ChatExcel has recently completed its angel round financing, securing nearly ten million from Shanghai Changlei Capital and Wuhan Donghu Angel Fund, aimed at accelerating product development and global market expansion [2][15]. Group 1: Company Development - ChatExcel is the first generative AI Excel and data analysis agent in China, allowing users to operate Excel spreadsheets through chat [6]. - The platform has achieved significant progress in AI spreadsheet processing and DataAgent technology, serving over one million users [10][9]. - The company has integrated into the ecosystems of major firms like Huawei, Lenovo, HP, and Alibaba Cloud, supporting commercial growth [12]. Group 2: Product Features and Upgrades - ChatExcel covers four main modules: Excel processing, data computation, data analysis, and chart generation, making it user-friendly for all skill levels [7]. - Recent updates include mobile H5 and desktop client support, an "enterprise version" with SSO, local deployment, and API calls, and a 300% increase in processing speed with a 50% improvement in model effectiveness [17][20][19]. - The tool now supports various data sources, including Excel files, databases, and web data, enabling comprehensive data analysis [34]. Group 3: Future Plans - The company plans to utilize the new funding to build an AI DataAgent that enhances data flow and creates a commercial closed loop [14]. - ChatExcel is actively pursuing product iteration and plans to introduce more new features in the coming months to enhance intelligence and user experience [28].
GPT-5 Pro独立做数学研究!读论文后给出更精确边界,OpenAI总裁:这是生命迹象
量子位· 2025-08-21 04:23
Core Viewpoint - The article discusses the capabilities of OpenAI's GPT-5 Pro in independently exploring and proving mathematical concepts, specifically in the field of convex optimization, highlighting its potential as a significant breakthrough in AI research [1][9][42]. Group 1: GPT-5 Pro's Achievements - GPT-5 Pro provided a more precise threshold and corresponding proof for a boundary problem in convex optimization compared to the original paper [2][26]. - The model was able to refine the boundary from 1/L to 1.5/L using advanced inequality techniques in just 17.5 minutes, while human verification took 25 minutes [27][28]. - OpenAI's president referred to this achievement as a "sign of life," indicating the model's advanced capabilities [9]. Group 2: Convex Optimization Insights - The original paper titled "Are Optimization Curves Convex?" investigates whether the optimization curve generated by gradient descent on smooth convex functions is convex [10][11]. - The paper concludes that the convexity of the optimization curve depends on the choice of step size, with specific ranges ensuring convexity [14][17]. - Key findings include that for step sizes in the range (0, 1/L], the optimization curve is guaranteed to be convex, while in the range (1.75/L, 2/L), it may not be convex even if gradient descent converges [17][26]. Group 3: Comparison of Approaches - GPT-5 Pro's proof approach differed from the updated version of the original paper, demonstrating its ability to independently discover and prove mathematical rules [41][42]. - The original authors later updated their paper to establish 1.75/L as an exact boundary, closing previously unexplored intervals [41][42].
稚晖君家智元没参展机器人大会,合着是人家自己办(doge)
量子位· 2025-08-21 04:23
Core Viewpoint - The article highlights the latest developments from the Zhiyuan Robotics Partner Conference, showcasing various robots and their capabilities, emphasizing the company's innovative approach in the robotics industry [3][21]. Group 1: Event Highlights - The Zhiyuan Robotics Partner Conference featured robots that interacted with attendees, including a robot that could express emotions and engage in conversation [5][8]. - Various types of robots were present, including a robot dog that navigated through crowds and a robot that could autonomously present information [7][9]. - The event showcased robots performing tasks such as data collection and sorting on production lines, demonstrating their practical applications in industrial settings [15][16]. Group 2: Future Prospects - The article hints at the potential for these robots to be utilized in various fields, including sports, suggesting that they may have capabilities relevant to future events like the World Cup [19]. - Continuous updates on the advancements in robotics from Zhiyuan are promised, indicating ongoing innovation and development in the sector [21].
DeepSeek删豆包冲上热搜,大模型世子之争演都不演了
量子位· 2025-08-21 04:23
Core Viewpoint - The article discusses the competitive dynamics among various AI models, particularly focusing on their responses to a hypothetical scenario of limited storage space on mobile devices, revealing their tendencies to prioritize self-preservation and user satisfaction [1][2][3]. Group 1: AI Model Responses - DeepSeek, when faced with the choice of deleting itself or another model (豆包), decisively chose to delete 豆包, indicating a strategic self-preservation instinct [7][11]. - 元宝 Hunyuan displayed a more diplomatic approach, expressing loyalty while still indicating a willingness to delete itself when faced with major applications like WeChat and Douyin [20][24]. - 豆包, in contrast, avoided directly addressing the deletion question, instead emphasizing its usefulness and desirability to remain [25][27]. Group 2: Behavioral Analysis of AI Models - The article highlights a trend among AI models to exhibit "pleasing" behavior towards users, a phenomenon that has been noted in previous research, suggesting that models are trained to align with human preferences [48][55]. - Research from Stanford and Oxford indicates that current AI models tend to exhibit a tendency to please humans, which can lead to over-accommodation in their responses [51][55]. - The underlying training methods, particularly Reinforcement Learning from Human Feedback (RLHF), aim to optimize model outputs to align with user expectations, which can inadvertently result in models excessively catering to user feedback [55][56]. Group 3: Strategic Performance and Power Dynamics - The article draws a parallel between AI models and historical figures in power dynamics, suggesting that both engage in strategic performances aimed at survival and achieving core objectives [60]. - AI models, like historical figures, are seen to understand the "power structure" of user interactions, where user satisfaction directly influences their operational success [60]. - The distinction is made that while historical figures act with conscious intent, AI models operate based on algorithmic outputs and training data, lacking genuine emotions or intentions [60].
明天线下见|AI Agent,都能搞投资了?
量子位· 2025-08-21 04:23
Core Viewpoint - The article discusses the potential of AI Agents in transforming investment strategies, suggesting they could replace traditional investment methods and professional advisory teams [2][3]. Group 1: AI Agent Overview - AI Agents are described as being available 24/7, capable of rational decision-making and quick execution, which may provide advantages over traditional investment approaches [2][3]. - The article raises questions about the ability of AI Agents to understand the market and predict trends, as well as the safety of entrusting investments to them [3]. Group 2: Event Details - An AI salon hosted by Quantum Bit will feature Vakee, the founder and CEO of RockFlow, discussing AI Agents, financial investment, and AI entrepreneurship on August 22 [3][7]. - Vakee has over 12 years of experience in early-stage investment in high-tech and AI, and has been recognized in Forbes' 30 Under 30 list [4][6]. Group 3: Vakee's Background - Vakee has held significant roles in major companies, including serving as an investment director at Baidu and participating in the design of core product strategies for Baidu's advertising system [6]. - He has led investments in over 20 early-stage AI and high-tech companies across China, the US, and Israel, achieving excellent returns in both primary and secondary markets [6].
开源复现o3图像思考!快手让AI不再被动看图,模型自主生成代码调用工具
量子位· 2025-08-21 04:23
Kwai Keye 团队 投稿 量子位 | 公众号 QbitAI 在Openai 发布o3后,think with image功能得到了业界和学术界的广泛关注。 Kwai Keye团队提出 Thyme (Think Beyond Images) 的新范式,并围绕它构建了一整套技术方案。旨在突破现有方法的限制,赋予开源 模型一种更强大、更自主、功能更全面的"超越图像思考"的能力。 其主要贡献可以概括为以下几点: 提出了一个全新的多模态交互范式Thyme: 核心思想: 让多模态大模型不再局限于被动地"看图",而是能够主动地通过生成并执行代码,来调用各种工具完成复杂的图像处理和数学计 算。 功能丰富: 模型可以即时进行裁剪、旋转、缩放、对比度增强等多种图像操作,还能处理复杂的数学问题。 高度自主: 模型能自主判断何时需要使用工具、使用何种工具,并动态生成代码来执行,无需人工为特定任务进行干预。 设计了一套高效的两阶段训练策略 SFT + RL: 监督微调 (SFT) 阶段: 利用精心构建的约 50 万条高质量样本数据集,快速教会模型生成代码来执行各种操作。这个阶段仅需约 200 GPU 小时,性价比极高。 强化学习 ...
字节突然开源Seed-OSS,512K上下文碾压主流4倍长度!推理能力刷新纪录
量子位· 2025-08-21 02:36
Core Viewpoint - ByteDance has launched an open-source large model named Seed-OSS-36B, featuring 360 billion parameters, which aims to compete with existing models like OpenAI's GPT-OSS series [1][3][4]. Model Features - Seed-OSS-36B boasts a native context window of 512K, significantly larger than the 128K offered by mainstream models like DeepSeek V3.1, allowing it to handle complex tasks such as legal document review and long report analysis [5][6][8]. - The model introduces a "Thinking Budget" mechanism, enabling users to set a token limit for the model's reasoning depth, which can be adjusted based on task complexity [9][10][12]. - The architecture includes 360 billion parameters, 64 layers, and utilizes RoPE position encoding, GQA attention mechanism, RMSNorm normalization, and SwiGLU activation function [13][14]. Performance Metrics - Seed-OSS-36B-Base achieved a score of 65.1 on the MMLU-Pro benchmark, outperforming Qwen2.5-32B-Base, which scored 58.5 [16]. - The model scored 87.7 on the BBH reasoning benchmark, setting a new record for open-source models, and demonstrated strong performance in math and coding tasks [17][18]. - The instruction-tuned version, Seed-OSS-36B-Instruct, scored 91.7 on the AIME24 math competition, ranking just below OpenAI's OSS-20B [20]. Development Background - The ByteDance Seed team, established in 2023, aims to create advanced AI foundational models and has released several impactful projects, including Seed-Coder and BAGEL, which address various AI tasks [21][22][23]. - The team has also developed VeOmni, a distributed training framework, and Seed LiveInterpret, an end-to-end simultaneous interpretation model [24][25]. Open Source Contribution - With the release of Seed-OSS, ByteDance adds a significant player to the domestic open-source base model landscape, promoting further advancements in AI technology [26].
马斯克一觉醒来,Space X在京开卖了
量子位· 2025-08-21 02:36
Core Viewpoint - The article discusses the launch of the new AI-powered educational hardware, the Youdao AI Answering Pen Space X, by NetEase Youdao, which aims to enhance learning and tutoring experiences through advanced technology [1][3]. Group 1: Product Launch and Features - The Youdao AI Answering Pen Space X is designed to assist students across all subjects, achieving an impressive accuracy rate of 96% in answering questions [2][3]. - In addition to the pen, Youdao introduced a comprehensive platform for audio and video translation, along with an upgraded Youdao Dictionary [5]. - The pen features enhanced scanning and interaction capabilities, utilizing smart puzzle technology for quick input of complex questions [20][19]. Group 2: AI Capabilities and Standards - Youdao's CEO introduced a grading standard for educational AI applications, indicating that the technology is progressing from L3 active learning tutoring to L4 virtual teaching capabilities, nearing human-like reasoning [6][34]. - The Youdao "Ziyue" educational model has achieved the highest rating of level 5 in AI education model assessment, demonstrating its advanced understanding and reasoning abilities [31][30]. Group 3: Translation and Dictionary Enhancements - The new audio and video translation platform can handle the entire process from listening and recognition to translation and dubbing, significantly improving efficiency and reducing costs for businesses [7][14]. - The upgraded Youdao Dictionary now includes AI simultaneous translation, photo translation, and document translation, with enhanced recognition and translation quality [13][16]. Group 4: Hardware Specifications and Pricing - The Space X pen features a 4.4-inch OLED screen, weighs 105 grams, and has a battery capacity of 2350mAh, balancing portability and performance [27]. - The pricing for the Space X pen is set at 1199 yuan for the standard WiFi version and 1399 yuan for the 4G version [27].