Aletheia
Search documents
腾讯研究院AI速递 20260213
腾讯研究院· 2026-02-12 16:13
Group 1 - Zhipu released the open-source GLM-5 model with a parameter scale expanded to 744 billion (activated 40 billion), ranking fourth globally in the Artificial Analysis leaderboard and first in open-source, with coding and agent capabilities approaching Claude Opus 4.5 [1] - The model achieved scores of 77.8 and 56.2 in SWE-bench-Verified and Terminal Bench 2.0, respectively, setting new open-source SOTA records, excelling in complex systems engineering and long-range agent tasks [1] - GLM-5 has been adapted to domestic chips such as Huawei Ascend, Cambricon, and Kunlun, and introduced Z Code full-process programming tools and AutoGLM universal agent assistant [1] Group 2 - MiniMax launched the M2.5 model with only 10 billion activated parameters, achieving flagship-level reasoning speed three times faster than Opus [2] - The model completed a full-stack learning website in 9 minutes and can independently perform physical simulations and enterprise-level CMS system setups, supporting cross-platform development for PC/App/React Native [2] - It utilizes a native agent RL training framework and CISPO algorithm, achieving approximately 40 times training acceleration and is compatible with mainstream development tools like Claude Code and OpenClaw [2] Group 3 - Xiaohongshu's foundational model team released the open-source FireRed-Image-Edit, achieving SOTA in multiple authoritative rankings such as ImgEdit and GEdit, with code and technical reports now available [3] - The model employs a three-stage training process to enhance capabilities and innovatively introduces Layout-Aware OCR-based Reward, significantly improving text editing accuracy and style retention [3] - It supports various complex editing scenarios, including instruction-following consistency, text editing, style transfer, multi-image fusion, and old photo restoration, with model weights set to be open-sourced [3] Group 4 - Xiaomi released the open-source VLA model Xiaomi-Robotics-0 with 4.7 billion parameters, excelling in visual language understanding and real-time execution capabilities, achieving optimal results in comparisons across 30 models including LIBERO, CALVIN, and SimplerEnv [4] - The model uses a Mixture-of-Transformers architecture, where the VLM brain understands instructions and the Diffusion Transformer generates high-frequency smooth actions [4] - It addresses action discontinuity issues through asynchronous reasoning and Λ-shape attention masks, enabling real-time inference on consumer-grade graphics cards, and has been open-sourced on GitHub and HuggingFace [4] Group 5 - Gaode launched the ABot series of embodied base models, with ABot-M0 responsible for operations and ABot-N0 for navigation, achieving comprehensive SOTA across 10 global authoritative evaluations [5][6] - ABot-M0 integrates 6 million cross-platform trajectory data through action language and proposes an action manifold learning algorithm, achieving an 80.5% success rate on Libero-Plus, surpassing pi0 by nearly 30% [6] - ABot-N0 unifies five core navigation tasks within a single VLA architecture, constructing 8,000 high-fidelity 3D scenes and 17 million expert examples, with a 40.5% improvement in SocNav success rate [6] Group 6 - Rokid Glasses launched the "customizable agent" feature on the Lingzhu platform, allowing integration with OpenClaw or privately deployed models like DeepSeek R1 and Qwen3 through a standard SSE interface [7] - Users can achieve local closed-loop processing of private data and switch model bases with one click, leveraging the ClawHub skill ecosystem to execute capabilities like file systems, browsers, and IM messaging [7] - The platform empowers users by allowing them to summon private agents via voice commands or shortcuts, creating a 24/7 intelligent assistant [7] Group 7 - Google DeepMind released the AI mathematician Aletheia based on Gemini Deep Think, achieving a score of 91.9% on IMO-ProofBench, setting a new SOTA and capable of independently writing and publishing academic papers [8] - Aletheia systematically evaluated 700 open problems in the Erdős conjecture database and autonomously solved 4 unsolved mysteries, demonstrating self-correction and acknowledgment of limitations [8] - Gemini Deep Think collaborated with experts to tackle 18 long-stagnant research challenges, resolving a decade-long submodel optimization conjecture, with one paper accepted by ICLR 2026 [8] Group 8 - HyperWrite's CEO published an article that garnered 70 million views, stating that the release of GPT-5.3-Codex and Claude Opus 4.6 marks a qualitative change in AI [9] - AI can now independently complete the workload of human experts in 5 hours, with this capability doubling every 4-7 months, and GPT-5.3 plays a crucial role in its self-training process, initiating a recursive self-improvement cycle [9] - Almost all cognitive work performed in front of screens will be affected, and it is advised to spend one hour daily experimenting with AI, as the current cognitive window period will not last long [9] Group 9 - Anthropic released a 53-page report warning that the risks associated with Claude Opus 4.6 are approaching ASL-4 levels, outlining 8 potential risk pathways that could lead to catastrophic harm, including autonomous escape and autonomous operation [10][11] - The report concludes that current models do not exhibit "sustained consistent malicious intent," and the risk of catastrophic damage is "very low but not zero," entering a "gray area" of capability assessment [10] - The head of Anthropic's safety research team resigned, stating that "the world is in crisis," and xAI co-founder predicts that recursive self-improvement cycles may be launched within 12 months [11]
谷歌AI连发6篇数学论文,Gemini攻入博士级科研,91.9%刷爆SOTA
3 6 Ke· 2026-02-12 02:50
今天,谷歌DeepMind「AI数学家」Aletheia彻底杀疯了,攻克数学猜想,独立写论文。更令人震惊的是,拿下金牌的Gemini一举横扫18大核 心科研难题。 下一个诺奖得主,Gemini提前预定了! 谷歌DeepMind再次向全球科研圈扔出炸弹,一口气放出两篇重磅论文—— Gemini Deep Think成为「科研合伙人」,连破数学、物理和计算机科学领域研究级难题。 在IMO-ProofBench基准测试中,Aletheia一骑绝尘,拿下91.9%的成绩刷爆SOTA。 以前,AI可以拿下IMO、ICPC国际大赛金牌,已经很牛了.... 这一次,Gemini彻底开挂,真正搞起了科研! 谷歌打造了一款基于Gemini「AI数学家」,代号Aletheia。它在博士级难题上,取得了多项科研里程碑。 其中包括,独立撰写发表学术几何论文,还对「Erdős猜想」数据库中700个开放问题,完成系统性评估。 | MODEL | ADVANCED PROOFBENCH | | BREAKDOWN | | QUERY DATE | | --- | --- | --- | --- | --- | --- | | | | NO ...