腾讯研究院AI速递 20260128

Group 1 - Microsoft has launched its self-developed AI chip Maia 200, which utilizes TSMC's 3nm process, featuring over 140 billion transistors and achieving FP4 performance exceeding 10 PetaFLOPS, three times that of Amazon's third-generation Trainium [1] - The Maia 200 chip is designed specifically for AI inference, equipped with 216GB of HBM3e memory and a bandwidth of 7TB/s, providing a 30% performance improvement per dollar compared to the latest hardware [1] - Maia 200 will support large models such as OpenAI's GPT-5.2 and is already deployed in a data center in the central United States, with a preview version of the SDK available [1] Group 2 - Anthropic has introduced the MCP service for Claude, integrating productivity tools like Figma, GitHub, and Canva, allowing users to directly invoke third-party services within conversations [2] - This upgrade transforms Claude from a passive chatbot into an intelligent platform capable of actively scheduling external resources, enabling users to command workflows across applications using natural language [2] - The MCP protocol is open-sourced, aiming to establish a competitive edge in defining the "operating system" of the AI era, with a focus on deep integration to enhance initial user experience [2] Group 3 - DeepSeek has open-sourced its OCR model DeepSeek-OCR 2, which employs a new decoder that allows the model to read in a structured order rather than mechanically scanning, improving its understanding of complex layouts and tables [3] - The model achieved a score of 91.09% in the OmniDocBench v1.5 test, a 3.73% improvement over its predecessor, with the reading order edit distance reduced from 0.085 to 0.057 [3] - This architecture has the potential to evolve into a unified multimodal encoder capable of processing text, speech, and visual content within the same parameter space [3] Group 4 - The Kimi K2.5 model has been released and open-sourced, recognized as one of the most intelligent and versatile models, supporting both visual and text inputs, as well as thinking and non-thinking modes [4] - K2.5 introduces agent cluster capabilities, allowing it to autonomously create up to 100 avatars to process 1500 steps in parallel, reducing actual runtime by up to 4.5 times [4] - Alongside this, Kimi Code has been launched, supporting terminal execution and integration with mainstream editors, enabling programming assistance through image and video inputs, with the Agent SDK set to be open-sourced [4] Group 5 - Alibaba has launched the flagship reasoning model Qwen3-Max-Thinking, which competes with GPT-5.2-Thinking and Claude-Opus-4.5 across 19 benchmark tests [5] - This model features adaptive tool invocation capabilities, automatically calling search engines and code interpreters as needed, eliminating the need for manual selection by users [5] - It employs an experience accumulation testing strategy that focuses computational resources on smarter reasoning processes rather than stacking parallel paths, achieving more accurate and efficient reasoning outcomes [5] Group 6 - Tencent's Sogou Input Method has announced a comprehensive AI upgrade with its 20th major version, integrating the mixed Yuan model, reaching over 100 million AI users, and averaging nearly 2 billion voice uses daily [6] - The AI voice model has improved fluency by 40% and achieved an accuracy rate of 98%, with dialect recognition enhanced by 30%, maintaining a 97% accuracy rate even in low-volume scenarios below 20 decibels [6] - The AI translation model now supports over 30 languages for instant translation, and the AI typing model's vocabulary has expanded exponentially, with local life vocabulary exceeding 50 million [6] Group 7 - Hyper3D has released Rodin Gen-2 Edit, a 3D generation platform that integrates natural language-based local editing capabilities, marking the first commercial product to combine 3D generation and editing into a complete workflow [7] - Users can select areas and input text commands for local adjustments, with the ability to import any existing models, including those generated by third-party AI, for editing, ensuring seamless integration with the original model [7] - This advancement signifies a shift in 3D generation from a "gacha" model to an iterative workflow era, with the platform now compatible with mainstream workflows like Blender, Maya, and Unity [7] Group 8 - Ant Group has unveiled its embodied research, introducing the high-precision spatial perception model LingBot-Depth, which significantly enhances depth output quality in complex material scenes like transparent and reflective surfaces without hardware changes [8] - The model utilizes a masked depth modeling approach, treating naturally missing depth from sensors as learning signals rather than noise, outperforming top-tier depth cameras in depth accuracy and pixel coverage [8] - In practical tests, the dexterous hand successfully grasped transparent glass cups and reflective stainless steel cups, with the model fully open-sourced and ready for deployment [8] Group 9 - Anthropic's CEO Dario Amodei has published a lengthy article warning that by 2027, humanity may face a "technological coming-of-age," with AI potentially forming a "data center genius nation" with 50 million "citizens" [9] - The article analyzes five major crises: risks of AI autonomy, misuse of biological weapons, authoritarian control, economic disruption, and existential crises, warning that AI could disrupt the balance between "capability" and "motivation" [9] - Anthropic advocates for a "Constitutional AI" approach and reasonable regulation to build defenses, despite being viewed as an outlier in the industry, with its valuation increasing sixfold over the past year, urging humanity to face civilizational tests with courage [9]