腾讯研究院AI速递 20260126

Group 1 - OpenAI CEO Altman announced the release of significant Codex-related content starting next week, with a technical blog revealing the core architecture of Codex CLI, specifically the intelligent agent loop [1] - The intelligent agent loop coordinates user instructions, model inference, and local tool execution through the Responses API, employing a "consistent prompt prefix" strategy to trigger cache optimization [1] - Codex supports zero data retention configurations to ensure privacy and utilizes automatic compression technology to manage context windows, with further details on tool invocation and sandbox models to be introduced later [1] Group 2 - Google DeepMind released D4RT, which unifies 3D reconstruction, camera tracking, and dynamic object capture into a single "query" action, achieving speeds 18 to 300 times faster than existing state-of-the-art methods [2] - The core innovation is a unified spatiotemporal query interface, where AI first globally "reads" videos to generate scene representations and then searches for 3D trajectories, depth, and poses of any pixel on demand [2] - This technology is significant for embodied intelligence, autonomous driving, and AR, although training still requires a 1 billion parameter model and 64 TPUs [2] Group 3 - Claude Code upgraded its internal "Todos" to "Tasks," enabling multi-session or sub-agent collaboration on long-term complex projects across multiple context windows [3] - Tasks are stored in a file system for easy collaboration among multiple sessions, with updates in one session broadcasting to all sessions handling the same task list [3] - The new feature is compatible with Opus 4.5, enhancing autonomous operation capabilities, allowing users to enable multiple sessions to collaborate on the same task list through environment variables [3] Group 4 - Baidu's Wenxin 5.0 officially launched with a parameter count of 2.4 trillion, utilizing native multimodal unified modeling technology to support understanding and generation of text, images, audio, and video [4] - It has topped the LMArena text and visual understanding leaderboard five times, entering the global first tier, with language and multimodal understanding capabilities leading internationally [4] - Practical tests show the model excels in complex emotional understanding, subtext analysis, and creative writing tasks, earning the title of "strongest liberal arts student" [4] Group 5 - The open-source project Clawdbot has gained popularity in Silicon Valley, capable of running on Mac mini, serving as both a local AI agent and chat gateway, allowing conversations via WhatsApp, iMessage, etc. [5] - Clawdbot addresses the memory limitations of large models, capable of recalling conversations from two weeks ago, proactively sending emails, reminders, and executing tasks on the computer [5] - The project has received 9.2k stars on GitHub, with a minimum monthly cost of approximately $25, though it requires some technical knowledge for deployment, and users report it can automate business management and code writing, replacing paid services like Zapier [5] Group 6 - Turing Award winner LeCun announced that AMI Labs' core direction is "world models," aiming to build intelligent systems that understand the real world, possess persistent memory, and have reasoning and planning capabilities [6] - This approach argues that merely predicting the next token does not lead to true understanding of reality, necessitating predictions and reasoning at a higher representational level to filter out unpredictable noise [6] - AMI Labs is reportedly seeking financing at a valuation of $3.5 billion, targeting applications in industrial control, robotics, and healthcare, where reliability is crucial [6] Group 7 - Anthropic launched the Claude in Excel plugin, available for Pro, Max, Team, and Enterprise users, based on the Opus 4.5 model, which can be installed and activated via Microsoft Marketplace [7] - The plugin can search the internet and automatically fill in spreadsheets, supporting formula reading, debugging errors, zero-based modeling, and pivot table creation, compatible with .xlsx and .xlsm formats [7] - Currently, it does not support conditional formatting, macros, or VBA, and the company warns of prompt injection risks, advising users to only use files from trusted sources, with high-risk functions triggering confirmation prompts [7] Group 8 - Claude Code's creator Boris Cherny provided a detailed tutorial on using Cowork, emphasizing its role as an "executor" rather than a chat tool, capable of directly manipulating documents, browsers, and various tools [8] - He reiterated that the core workflow involves running multiple tasks in parallel while overseeing Claude instances, starting with "planning mode" for communication until satisfaction is achieved, then switching to "auto-accept edits" mode for execution [8] - Cherny highlighted the importance of Claude.md as a team compounding knowledge base, where any mistakes made by Claude should be documented, and methods for validating Claude's outputs can significantly enhance quality [8] Group 9 - Google Cloud AI Director Addy Osmani warned that programmers who only write prompts will be eliminated by 2026, stating that AI can handle 70% of preliminary work, but the remaining 30% requires experienced engineers [9] - A Stack Overflow survey indicated that developer trust in AI accuracy dropped from 40% to 29%, with 73% of respondents encountering issues with code comprehension due to "ambient coding" [9] - By 2026, the true core competency will be transforming vague problems into clear execution intentions, designing appropriate contextual structures, and distinguishing what is truly important [9] Group 10 - At the Davos Forum, tech giants shared notable insights, with Musk predicting that AI will surpass human intelligence by the end of 2026 and be smarter than the collective intelligence of humanity by 2030, with Tesla set to launch the humanoid robot Optimus next year [10] - Microsoft CEO Nadella warned that if AI only consumes resources without improving outcomes, society will lose tolerance, while Huang Renxun stated that embodied intelligence represents a "once-in-a-generation opportunity" [10] - DeepMind CEO Hassabis believes AGI will still require 5-10 years, while Anthropic CEO Dario claimed that models are just 6-12 months away from being able to complete software development end-to-end [10]