腾讯研究院AI速递 20260113

Group 1 - Google has launched and open-sourced the Universal Commercial Protocol (UCP) in collaboration with over 20 retail giants, including Shopify and Walmart, to establish a unified open standard for AI agents in shopping, covering the entire process from product discovery to after-sales service [1] - The UCP has been implemented in Google's search AI mode and the Gemini application, featuring "agent checkout" functionality that supports Google Pay and will soon integrate with PayPal, allowing retailers to maintain their transaction identity [1] - By fully open-sourcing the UCP, Google aims to lower the barriers for ecosystem participation, enabling small and medium-sized businesses to benefit from AI shopping [1] Group 2 - Midjourney has updated its Niji model to version 7, focusing on anime-specific features, correcting the previous version's tendency towards realism, and enhancing details in expressions, dynamic poses, and material textures [2] - The new sref style reference feature allows users to upload three reference images to maintain a consistent art style, significantly improving the model's understanding and ability to accurately interpret complex prompts [2] - Testing shows that version 7 surpasses version 6 in light and shadow details, stability in complex poses, and the quality of pure anime line art, making it particularly suitable for storyboard generation and series creation [2] Group 3 - UniPat AI, in collaboration with Sequoia China and xbench, has released the BabyVision benchmark, which breaks down visual capabilities into four categories and 22 sub-tasks [3] - The evaluation results indicate that Gemini-3-Pro-Preview is the only model exceeding the baseline of a 3-year-old child, but it still falls short by 20 percentage points compared to a 6-year-old child, with many models struggling on simple tasks [3] - The research highlights a major shortcoming of Visual Language Models (VLMs), which is their inability to fully verbalize visual information, leading to loss of detail when compressing into tokens, making it difficult for models to perform tasks like tracing lines or stacking blocks [3] Group 4 - Kunlun Wanwei has launched Skywork Video v1.0 on the Tiangong Super Intelligent Agent platform, integrating the creative process into a "project-based" model where all materials are automatically collected and added to a multi-track editor [4] - The platform offers five initiation methods, including text generation, image animation, frame completion, multi-image style reference generation, and digital human video generation, with a built-in multi-track editor supporting detailed operations like splitting and replacing [4] - The Skywork product matrix now covers a full range of modalities from documents, spreadsheets, and presentations to video generation, creating a smart office platform that supports multiple scenarios and modalities [4] Group 5 - The world's first embodied Agentic OS, named COSA, has been released by Zhujidi Dynamics, featuring a three-layer architecture that integrates basic models, high-level skill layers, and cognitive decision-making layers [6] - COSA endows robots with three core capabilities: understanding vague instructions, cross-temporal semantic memory, and the ability to execute tasks seamlessly [6] - Unlike Figure AI's Helix end-to-end VLA model, COSA is built from the ground up as an operating system for the physical world, demonstrating significant advantages in the integration of movement and operation capabilities [6] Group 6 - Qianxun Intelligent has open-sourced its VLA base model Spirit v1.5, ranking first on the RoboChallenge Table30 leaderboard, surpassing Pi0.5 and receiving praise from NVIDIA's Jim Fan [7] - The core breakthrough of Spirit v1.5 lies in its "open, goal-driven" data collection strategy, moving away from "clean data" to internalizing physical common sense, resulting in a 40% improvement in fine-tuning convergence speed [7] - The unstructured collection method has increased the average effective collection time per person by 200% and reduced reliance on algorithm experts by 60%, with open-source weights and inference code available for community exploration [7] Group 7 - Anthropic co-founder Jack Clark revealed conflicting internal survey data indicating that while 60% of Claude users report a 50% increase in productivity, METR research shows that developers familiar with codebases experience a 20% decrease in AI tool-assisted PR merge speed [8] - Clark pointed out the "barrel principle" in code production, where writing speed may increase tenfold, but review speed only doubles, preventing an explosive overall efficiency increase, with no truly self-improving AI expected by January 2026 [8] - He emphasized that if the Scaling Law hits a wall, it would be shocking, as current massive infrastructure investments suggest most are betting on the opposite outcome, and breakthroughs in distributed pre-training could alter the political and economic structure of AI [8] Group 8 - Linus Torvalds, the creator of Linux, has released his first Vibe Coding project, AudioNoise, on GitHub, utilizing Google's Antigravity to generate a Python visualization tool, admitting it performs better than his own coding [9] - The project originates from the design of a guitar effects pedal and primarily explores foundational knowledge in digital audio processing, including IIR filters and delay loops for zero-latency single-sample processing [9] - Just five days prior, Torvalds criticized AI-generated code as "ridiculously stupid," making his subsequent use of AI tools a topic of discussion in the tech community, marking a "true fragrance moment" [9] Group 9 - Elon Musk predicts that AGI will be achieved by 2026 and that by 2030, AI will surpass the total intelligence of all humanity, with AI performance improving tenfold each year, and xAI's Memphis Colossus 2 data center reaching 1 gigawatt power by mid-January [10] - He introduced three key terms for AI safety: truth, curiosity, and beauty, forecasting that within three years, the surgical capabilities of robots will exceed those of top surgeons, and within five years, robots will transition from scarcity to abundance, with 10 billion units by 2040 [10] - Musk emphasized the view that "the sun is everything" in terms of energy, praised China's solar energy capacity of 1,500 gigawatts annually, and predicted that the essence of currency will become watts, with white-collar jobs being the first to be replaced by AI, ultimately leading to universal prosperity [10]