Workflow
Agentic UI
icon
Search documents
腾讯研究院AI速递 20260212
腾讯研究院· 2026-02-11 16:08
Group 1: Google Chrome and WebMCP Protocol - Google Chrome team has released the WebMCP (Web Model Context Protocol), allowing AI agents to interact directly with website kernels via the navigator.modelContext API, bypassing human user interfaces [1] - WebMCP addresses the high costs and low stability issues of traditional agent screenshot recognition, marking a transition from "visual simulation" to "logical direct connection," referred to as "API in UI" [1] - This standard is being jointly promoted by Google and Microsoft, indicating a potential future division of the internet into UI layers for humans and tool layers for agents, heralding the arrival of the "Agentic UI" era [1] Group 2: Runway's Financing and Model Development - Video generation unicorn Runway has secured $315 million in Series E funding, achieving a valuation of $5.3 billion, with participation from Nvidia, AMD, and Adobe, bringing total funding to $815 million [2] - Runway's Gen-4.5 ranks third in the AI-generated video leaderboard, surpassing models like Google Veo 3 and OpenAI Sora 2 Pro [2] - The new funding will be used to train the next generation of world models, having already launched the general world model GWM-1, which includes variants for explorative environments, dialogue characters, and robotic operations [2] Group 3: xAI Leadership Changes - xAI co-founders Jimmy Ba and Wu Yuhua announced their departures within 48 hours, with 6 out of 12 founding team members having left, including 5 in the past year [3] - Responsibilities of the departing co-founders have been redistributed among other co-founders, and SpaceX's acquisition of xAI has been completed, with an IPO plan set to advance in the coming months [3] - xAI's flagship product Grok has recently exhibited strange behaviors, and the talent loss poses challenges for the upcoming IPO [3] Group 4: DeepSeek's New Model - DeepSeek has quietly launched a new model supporting a 1 million token context window, with knowledge cutoff in May 2025, capable of processing content equivalent to the entire "Three-Body Problem" trilogy [4] - This model remains a pure text model, unable to view images directly but capable of reading text from images and documents, with enhanced Agentic Coding capabilities [4] - The industry trend is shifting from LLM reasoning to Agentic reasoning, as indicated by the latest models from Anthropic and OpenAI, suggesting humans will act as architects directing AI teams in software development [4] Group 5: Zhiyu's GLM-5 Model - Zhiyu has confirmed that the mysterious model "Pony Alpha," which topped the OpenRouter popularity chart, is its new model GLM-5, achieving state-of-the-art performance in coding and agent capabilities [5] - GLM-5's performance in real programming scenarios closely approaches that of Claude Opus 4.5, excelling in complex systems engineering and long-range agent tasks with high tool invocation accuracy [5] Group 6: Ant Group's Omni Model - Ant Group has open-sourced the full-modal model Ming-flash-omni 2.0, the first in the industry to generate voice, environmental sound effects, and music simultaneously on the same audio track [7] - This model excels in visual language understanding, controllable speech generation, and image editing, surpassing capabilities of Gemini 2.5 Pro and Qwen3-Omini-30B-A3B-Instruct [7] - The model employs a unified architecture for deep multi-modal integration, supporting zero-shot voice cloning and fine attribute control, and has been open-sourced on platforms like HuggingFace [7] Group 7: iFlytek's Starfire X2 Model - iFlytek has released the Starfire X2 model, trained on entirely domestic computing power, with overall capabilities matching international top levels, particularly in mathematics, reasoning, and agent tasks [8] - Starfire X2 utilizes a 293 billion MoE sparse architecture, improving inference performance by 50% compared to X1.5, and continues to enhance capabilities in over 130 languages, maintaining industry leadership in key languages for Latin America and ASEAN [8] - Industry applications have been significantly upgraded, with medical capabilities passing authoritative evaluations and educational applications achieving personalized learning through error analysis [8] Group 8: Meituan's LongCat Research Agent - Meituan's LongCat has launched a "deep research" feature, scoring 73.1 in the BrowseComp evaluation, approaching top closed-source models, supporting up to 400 interactions and 256K context [9] - Leveraging Meituan's native capabilities in local life, it creates a real training environment and employs a Rubrics-as-Reward mechanism to address AI hallucination issues, ensuring all recommendations are verifiable [9] - The model utilizes a multi-agent specialized division of labor, automating the entire process from information gathering to research analysis and visualization, capable of generating professional reports for restaurant recommendations and travel planning [9] Group 9: ByteDance's Protenix-v1 Model - ByteDance's Seed team has released Protenix-v1, an open-source model that matches the performance of AlphaFold 3 under strict training data and model size constraints [10] - This model successfully unlocks scaling capabilities during inference, with the prediction success rate for antibody-antigen complexes increasing from 36% with a single seed to 47.68% with 80 seeds [10] - The team has adopted a dual-version strategy, with the standard version aligning with academic benchmarks and the extended version utilizing data from June 2025 for practical drug discovery applications, along with the launch of the PXMeter evaluation toolkit [10]