腾讯研究院AI速递 20250926

Group 1: Qualcomm's AI Chip Launch - Qualcomm has released the fifth-generation Snapdragon 8 Gen 2 mobile chip, featuring a 20% increase in CPU performance, a 23% increase in GPU performance, and a 37% increase in NPU performance [1] - The Snapdragon X2 Elite series PC processor has an NPU computing power of 80 TOPS, achieving stable 5GHz operation on Arm architecture, with AI performance 5.7 times that of Intel's competitors [1] - The focus is on AI agent technology, enabling cross-device collaborative processing for seamless interaction among smartphones, glasses, watches, and other devices [1] Group 2: Meta's Code World Model - Meta has launched the first open-source code world model (CWM), innovatively applying world models to code generation tasks to predict execution outcomes and optimize generation quality [2] - The 32 billion parameter model achieved a score of 65.8% in the SWE-bench Verified test, placing it in the top tier of open-source models, close to the performance of the closed-source Gemini-2.5-Thinking [2] - Currently, CWM serves as a proof-of-concept demo, simulating Python program execution and agent interaction to validate the improvement in code generation effectiveness [2] Group 3: Google's Neural Operating System - Google has introduced a prototype of a "neural operating system" driven by Gemini 2.5 Flash, with an interface generated in real-time by AI without pre-coding, dynamically adjusting based on user interactions [3] - The core technology employs a dual-input mechanism of "UI charter + UI interaction," combined with interaction tracking and streaming generation technology for near-instantaneous response [3] - The generative UI map addresses stateless issues, providing session-specific memory caching and opening new research directions for intelligent human-computer interaction interfaces [3] Group 4: Shengshu Technology's Vidu Q2 - Shengshu Technology has launched the Vidu Q2 video generation model, marking a transition from "video generation" to "performance generation," capable of accurately depicting complex expressions and action scenes [4][5] - The new model shows significant improvements in lens language and semantic understanding, supporting complex camera transitions and precise prompt adherence for a "point-and-shoot" creative experience [5] - It offers flexible duration options of 2-8 seconds and a lightning mode that generates 5 seconds of 1080P video in just 20 seconds, balancing creative flexibility with rapid production efficiency [5] Group 5: JD's JoyAgent Update - JD has fully open-sourced its AI technology stack, including the enterprise-level agent JoyAgent 3.0, multi-agent framework OxyGent, and the medical large model Jingyi Qianxun 2.0 [6] - JoyAgent 3.0 has added DataAgent data analysis capabilities, achieving a validation set accuracy of 77% in the GAIA evaluation, with GitHub receiving 10.1k stars [6] - JD aims to build a technological ecosystem through systematic open-sourcing, lowering the barriers for AI implementation in enterprises and promoting industry standardization and collaborative development [6] Group 6: Quark's AI Creation Platform - Quark has launched the "ZaoDian AI" creation platform, integrating Midjourney V7 and Tongyi Wanshang Wan2.5, with MJ V7 offered at half price and Wan2.5 providing a 7-day free trial [7] - The platform supports AI-generated images and videos, maintaining the original effects of MJ V7 while lowering usage barriers, with Quark Image 1.0 specializing in Asian portraits and Chinese content generation [7] - Wan2.5 has been upgraded to support audio-visual synchronization, 10-second 1080P video output, and audio-driven features, significantly enhancing character consistency and practical creativity [7] Group 7: Jieyue's AI Desktop Companion - Jieyue AI has introduced a desktop companion "Xiao Yue," which resides in the upper right corner of the desktop, supporting multi-task execution and local file operations, with a "Miao Ji" feature for reusing operation steps [8] - Xiao Yue possesses autonomous task planning capabilities, handling complex tasks such as interview preparation, e-commerce tracking, and invoice organization, with support for scheduled tasks and system reminders [8] - Currently, the Mac version is available for invitation testing, while the Windows version is under development, with users able to download and apply for an invitation to experience it [8] Group 8: Zhiyuan's RoboBrain-Audio - Zhiyuan Research Institute has released RoboBrain-Audio, the first large model supporting native full-duplex voice dialogue, achieving "listen and speak" interaction with a response delay reduced to 80ms [10] - It innovatively uses a "natural monologue alignment" mechanism instead of word-level alignment, combining dual training paradigms (post-training + supervised fine-tuning) to reach industry-leading levels with only 1 million hours of data [10] - The model demonstrates superior performance in ASR, TTS, and full-duplex dialogue tasks, and will be integrated with the RoboBrain series to advance embodied intelligent voice interaction capabilities [10] Group 9: Skild AI's Skild Brain - Skild AI, valued at $4.5 billion, has launched the Skild Brain robot control system, trained in a virtual environment with 100,000 types of robot forms, capable of adapting to various faults and unseen robots [11] - The system exhibits strong adaptability, handling sudden situations such as limb loss and motor failures, quickly adjusting control strategies through contextual learning, with a memory window 100 times longer than traditional systems [11] - Founded by two CMU professors, the company has completed $414 million in financing, with investors including SoftBank, NVIDIA, and Sequoia Capital [11] Group 10: Terence Tao's Community Phenomenon Insights - Terence Tao presents a four-layer analytical framework for modern society, arguing that current technologies and incentive mechanisms empower individuals and large organizations while severely undermining the ecological niche of small organizations [12] - Small organizations can provide genuine social emotional connections and individual influence, while large organizations, despite economic advantages, create feelings of alienation and powerlessness among individuals [12] - He suggests recognizing the value of emerging grassroots organizations, which can offer individuals a sense of belonging and serve as meaningful channels connecting individuals with larger systems [12]