腾讯研究院AI速递 20260204

Group 1 - OpenAI launched a macOS desktop version of Codex, designed as an "AI agent command center" that supports multi-agent parallel work through a "work tree" mode to isolate code changes for different tasks [1] - The application features asynchronous background operation, a skill system, and scheduled automation tasks, with a built-in sandbox for precise AI permission management; the CEO stated that a complete project was accomplished solely with Codex [1] - OpenAI temporarily doubled rate limits for all paid users for two months and opened Codex access to free users, directly competing with Anthropic and Cursor [1] Group 2 - Zhipu released and open-sourced the GLM-OCR model, achieving a state-of-the-art score of 94.6 on OmniDocBench V1.5 with only 0.9 billion parameters, closely rivaling Gemini-3-Pro [2] - The model specializes in challenging scenarios such as handwriting, complex tables, code documents, and seals, supporting deployment via vLLM, SGLang, and Ollama, with an API price of only 0.2 yuan per million tokens [2] - Technically, it employs a self-developed CogViT visual encoder and introduces multi-token prediction loss into OCR training, enabling batch processing and retrieval-augmented generation [2] Group 3 - Tencent's Hongyuan Technology blog launched, presenting research results from Yao Shunyu's team on CL-bench, revealing that current state-of-the-art models have significant deficiencies in learning from context [3] - Evaluation shows that the average of ten state-of-the-art models only solves 17.2% of tasks, with the best model, GPT-5.1, achieving only 23.7%, and 68.5% of candidate solutions contain fundamental errors [3] - The research indicates that the focus of AI competition will shift from model capability to "who can provide the richest context," with memory mechanisms potentially becoming a core research theme by 2026 [3] Group 4 - xAI officially released the Grok Imagine 1.0 video generation model, supporting text-to-video and image-to-video generation, capable of producing 10 seconds of 720P video per instance with significantly improved audio effects [4] - The model features cinematic-level camera understanding and natural interaction among multiple subjects, ranking first in the Artificial Analysis text-to-video category with optimal latency and cost metrics [4] - During the 30-day testing period, 1.245 billion videos were generated, and the API has been released with free access on the official website [4] Group 5 - Tencent's ima integrated the Hongyuan Image 3.0 model, enabling users to upload photos to generate creative content across multiple scenarios, such as travel images, home decoration effects, and four-panel comics [5][6] - The product can be utilized for entertainment, custom family photos, rapid design draft generation, and medical science popularization illustrations [5][6] Group 6 - Adobe announced the discontinuation of its 25-year-old Animate software, with enterprise customers receiving three years of support and other users only one year, after which access to any files will be lost [7] - Adobe did not provide a suitable replacement, merely suggesting After Effects and Adobe Express as partial alternatives, which has been criticized as inadequate [7] - This move is seen as a signal of Adobe's full pivot towards an AI strategy, raising concerns among users about being forced to use immature technology, reminiscent of Flash's historical impact on multimedia [7] Group 7 - Elon Musk announced that SpaceX has completed the acquisition of xAI, with a combined valuation of $1.25 trillion, making xAI a wholly-owned subsidiary of SpaceX [8] - SpaceX plans to advance the deployment of space data centers, with Musk stating that annual satellite launches could add 100GW of AI computing power, with a long-term goal of reaching 1TW [8] - The merger provides xAI with stable funding support, as it previously burned approximately $1 billion monthly, with SpaceX regarded as Musk's "most successful and stable" enterprise [8] Group 8 - Google utilized Gemini to tackle 700 unresolved mathematical problems, making progress on 13, with 5 being new solutions generated by the model and 8 derived from overlooked literature [9] - The research revealed that 68.5% of candidate solutions contained fundamental errors, with only 6.5% being meaningful correct answers, indicating significant time spent on verification, correction, and literature review [9] - Google acknowledged that these problems could be easily solved by experts in any field, highlighting the true costs of AI-assisted mathematical research and the risks of "subconscious plagiarism" from literature [9] Group 9 - a16z's AI applications team believes that the AI era represents a convergence of all technology cycles, with traditional software transitioning to AI-native, where greenfield opportunities outweigh brownfield ones [10] - Software is "eating" the labor market, but the real value lies not in cost savings but in revenue generation, as seen with Salient, which improved its collection rate by 50% through AI rather than merely reducing costs [10] - Companies with proprietary data are seeing their value multiply, making moats more important than ever in an era where software can be rapidly constructed [10]