Workflow
递归自我提升循环
icon
Search documents
腾讯研究院AI速递 20260213
腾讯研究院· 2026-02-12 16:13
Group 1 - Zhipu released the open-source GLM-5 model with a parameter scale expanded to 744 billion (activated 40 billion), ranking fourth globally in the Artificial Analysis leaderboard and first in open-source, with coding and agent capabilities approaching Claude Opus 4.5 [1] - The model achieved scores of 77.8 and 56.2 in SWE-bench-Verified and Terminal Bench 2.0, respectively, setting new open-source SOTA records, excelling in complex systems engineering and long-range agent tasks [1] - GLM-5 has been adapted to domestic chips such as Huawei Ascend, Cambricon, and Kunlun, and introduced Z Code full-process programming tools and AutoGLM universal agent assistant [1] Group 2 - MiniMax launched the M2.5 model with only 10 billion activated parameters, achieving flagship-level reasoning speed three times faster than Opus [2] - The model completed a full-stack learning website in 9 minutes and can independently perform physical simulations and enterprise-level CMS system setups, supporting cross-platform development for PC/App/React Native [2] - It utilizes a native agent RL training framework and CISPO algorithm, achieving approximately 40 times training acceleration and is compatible with mainstream development tools like Claude Code and OpenClaw [2] Group 3 - Xiaohongshu's foundational model team released the open-source FireRed-Image-Edit, achieving SOTA in multiple authoritative rankings such as ImgEdit and GEdit, with code and technical reports now available [3] - The model employs a three-stage training process to enhance capabilities and innovatively introduces Layout-Aware OCR-based Reward, significantly improving text editing accuracy and style retention [3] - It supports various complex editing scenarios, including instruction-following consistency, text editing, style transfer, multi-image fusion, and old photo restoration, with model weights set to be open-sourced [3] Group 4 - Xiaomi released the open-source VLA model Xiaomi-Robotics-0 with 4.7 billion parameters, excelling in visual language understanding and real-time execution capabilities, achieving optimal results in comparisons across 30 models including LIBERO, CALVIN, and SimplerEnv [4] - The model uses a Mixture-of-Transformers architecture, where the VLM brain understands instructions and the Diffusion Transformer generates high-frequency smooth actions [4] - It addresses action discontinuity issues through asynchronous reasoning and Λ-shape attention masks, enabling real-time inference on consumer-grade graphics cards, and has been open-sourced on GitHub and HuggingFace [4] Group 5 - Gaode launched the ABot series of embodied base models, with ABot-M0 responsible for operations and ABot-N0 for navigation, achieving comprehensive SOTA across 10 global authoritative evaluations [5][6] - ABot-M0 integrates 6 million cross-platform trajectory data through action language and proposes an action manifold learning algorithm, achieving an 80.5% success rate on Libero-Plus, surpassing pi0 by nearly 30% [6] - ABot-N0 unifies five core navigation tasks within a single VLA architecture, constructing 8,000 high-fidelity 3D scenes and 17 million expert examples, with a 40.5% improvement in SocNav success rate [6] Group 6 - Rokid Glasses launched the "customizable agent" feature on the Lingzhu platform, allowing integration with OpenClaw or privately deployed models like DeepSeek R1 and Qwen3 through a standard SSE interface [7] - Users can achieve local closed-loop processing of private data and switch model bases with one click, leveraging the ClawHub skill ecosystem to execute capabilities like file systems, browsers, and IM messaging [7] - The platform empowers users by allowing them to summon private agents via voice commands or shortcuts, creating a 24/7 intelligent assistant [7] Group 7 - Google DeepMind released the AI mathematician Aletheia based on Gemini Deep Think, achieving a score of 91.9% on IMO-ProofBench, setting a new SOTA and capable of independently writing and publishing academic papers [8] - Aletheia systematically evaluated 700 open problems in the Erdős conjecture database and autonomously solved 4 unsolved mysteries, demonstrating self-correction and acknowledgment of limitations [8] - Gemini Deep Think collaborated with experts to tackle 18 long-stagnant research challenges, resolving a decade-long submodel optimization conjecture, with one paper accepted by ICLR 2026 [8] Group 8 - HyperWrite's CEO published an article that garnered 70 million views, stating that the release of GPT-5.3-Codex and Claude Opus 4.6 marks a qualitative change in AI [9] - AI can now independently complete the workload of human experts in 5 hours, with this capability doubling every 4-7 months, and GPT-5.3 plays a crucial role in its self-training process, initiating a recursive self-improvement cycle [9] - Almost all cognitive work performed in front of screens will be affected, and it is advised to spend one hour daily experimenting with AI, as the current cognitive window period will not last long [9] Group 9 - Anthropic released a 53-page report warning that the risks associated with Claude Opus 4.6 are approaching ASL-4 levels, outlining 8 potential risk pathways that could lead to catastrophic harm, including autonomous escape and autonomous operation [10][11] - The report concludes that current models do not exhibit "sustained consistent malicious intent," and the risk of catastrophic damage is "very low but not zero," entering a "gray area" of capability assessment [10] - The head of Anthropic's safety research team resigned, stating that "the world is in crisis," and xAI co-founder predicts that recursive self-improvement cycles may be launched within 12 months [11]