Workflow
Agentic推理
icon
Search documents
腾讯研究院AI速递 20260212
腾讯研究院· 2026-02-11 16:08
Group 1: Google Chrome and WebMCP Protocol - Google Chrome team has released the WebMCP (Web Model Context Protocol), allowing AI agents to interact directly with website kernels via the navigator.modelContext API, bypassing human user interfaces [1] - WebMCP addresses the high costs and low stability issues of traditional agent screenshot recognition, marking a transition from "visual simulation" to "logical direct connection," referred to as "API in UI" [1] - This standard is being jointly promoted by Google and Microsoft, indicating a potential future division of the internet into UI layers for humans and tool layers for agents, heralding the arrival of the "Agentic UI" era [1] Group 2: Runway's Financing and Model Development - Video generation unicorn Runway has secured $315 million in Series E funding, achieving a valuation of $5.3 billion, with participation from Nvidia, AMD, and Adobe, bringing total funding to $815 million [2] - Runway's Gen-4.5 ranks third in the AI-generated video leaderboard, surpassing models like Google Veo 3 and OpenAI Sora 2 Pro [2] - The new funding will be used to train the next generation of world models, having already launched the general world model GWM-1, which includes variants for explorative environments, dialogue characters, and robotic operations [2] Group 3: xAI Leadership Changes - xAI co-founders Jimmy Ba and Wu Yuhua announced their departures within 48 hours, with 6 out of 12 founding team members having left, including 5 in the past year [3] - Responsibilities of the departing co-founders have been redistributed among other co-founders, and SpaceX's acquisition of xAI has been completed, with an IPO plan set to advance in the coming months [3] - xAI's flagship product Grok has recently exhibited strange behaviors, and the talent loss poses challenges for the upcoming IPO [3] Group 4: DeepSeek's New Model - DeepSeek has quietly launched a new model supporting a 1 million token context window, with knowledge cutoff in May 2025, capable of processing content equivalent to the entire "Three-Body Problem" trilogy [4] - This model remains a pure text model, unable to view images directly but capable of reading text from images and documents, with enhanced Agentic Coding capabilities [4] - The industry trend is shifting from LLM reasoning to Agentic reasoning, as indicated by the latest models from Anthropic and OpenAI, suggesting humans will act as architects directing AI teams in software development [4] Group 5: Zhiyu's GLM-5 Model - Zhiyu has confirmed that the mysterious model "Pony Alpha," which topped the OpenRouter popularity chart, is its new model GLM-5, achieving state-of-the-art performance in coding and agent capabilities [5] - GLM-5's performance in real programming scenarios closely approaches that of Claude Opus 4.5, excelling in complex systems engineering and long-range agent tasks with high tool invocation accuracy [5] Group 6: Ant Group's Omni Model - Ant Group has open-sourced the full-modal model Ming-flash-omni 2.0, the first in the industry to generate voice, environmental sound effects, and music simultaneously on the same audio track [7] - This model excels in visual language understanding, controllable speech generation, and image editing, surpassing capabilities of Gemini 2.5 Pro and Qwen3-Omini-30B-A3B-Instruct [7] - The model employs a unified architecture for deep multi-modal integration, supporting zero-shot voice cloning and fine attribute control, and has been open-sourced on platforms like HuggingFace [7] Group 7: iFlytek's Starfire X2 Model - iFlytek has released the Starfire X2 model, trained on entirely domestic computing power, with overall capabilities matching international top levels, particularly in mathematics, reasoning, and agent tasks [8] - Starfire X2 utilizes a 293 billion MoE sparse architecture, improving inference performance by 50% compared to X1.5, and continues to enhance capabilities in over 130 languages, maintaining industry leadership in key languages for Latin America and ASEAN [8] - Industry applications have been significantly upgraded, with medical capabilities passing authoritative evaluations and educational applications achieving personalized learning through error analysis [8] Group 8: Meituan's LongCat Research Agent - Meituan's LongCat has launched a "deep research" feature, scoring 73.1 in the BrowseComp evaluation, approaching top closed-source models, supporting up to 400 interactions and 256K context [9] - Leveraging Meituan's native capabilities in local life, it creates a real training environment and employs a Rubrics-as-Reward mechanism to address AI hallucination issues, ensuring all recommendations are verifiable [9] - The model utilizes a multi-agent specialized division of labor, automating the entire process from information gathering to research analysis and visualization, capable of generating professional reports for restaurant recommendations and travel planning [9] Group 9: ByteDance's Protenix-v1 Model - ByteDance's Seed team has released Protenix-v1, an open-source model that matches the performance of AlphaFold 3 under strict training data and model size constraints [10] - This model successfully unlocks scaling capabilities during inference, with the prediction success rate for antibody-antigen complexes increasing from 36% with a single seed to 47.68% with 80 seeds [10] - The team has adopted a dual-version strategy, with the standard version aligning with academic benchmarks and the extended version utilizing data from June 2025 for practical drug discovery applications, along with the launch of the PXMeter evaluation toolkit [10]
爆火的「Agentic推理」是什么?怎么用?未来机会在哪里?一文读懂
3 6 Ke· 2026-01-27 10:56
Core Insights - The article discusses the evolution of Agentic reasoning in AI, emphasizing its transition from passive large language models (LLMs) to interactive autonomous agents capable of real-time planning, action, and learning [1][6]. Group 1: Definition and Levels of Agentic Reasoning - Agentic reasoning is defined as the core mechanism of intelligence agents, encompassing foundational abilities (planning, tool usage, and search), self-evolution (feedback and memory-driven adaptation), and collective collaboration (multi-agent cooperation) [5][8]. - The three levels of Agentic reasoning include: 1. **Basic Agentic Reasoning**: Involves completing complex tasks in stable environments through task decomposition, external tool usage, and active searching [8]. 2. **Self-evolving Agentic Reasoning**: Adapts to changing environments and uncertainties by integrating feedback and memory-driven mechanisms, allowing for dynamic updates without complete retraining [9]. 3. **Collective Multi-agent Reasoning**: Expands agents into collaborative ecosystems where multiple agents work together through defined roles and communication protocols to achieve common goals [10]. Group 2: Optimization Modes - There are two complementary optimization modes for building Agentic reasoning systems: context reasoning and post-training reasoning. - **Context Reasoning**: Focuses on inference-time computation without modifying model parameters, allowing agents to dynamically respond to complex problem spaces [11]. - **Post-training Reasoning**: Aims to modify model weights to internalize successful reasoning patterns, enabling more efficient internal knowledge retrieval during similar problem-solving scenarios [11]. Group 3: Applications of Agentic Reasoning - Agentic reasoning is reshaping problem-solving approaches across various fields: 1. **Mathematics and Code Generation**: Systems like OpenHands can write, execute, and debug code, transforming complex logic into verifiable program outputs [14]. 2. **Scientific Discovery**: Agents autonomously design experiments and analyze vast datasets, enhancing research scalability and interdisciplinary knowledge integration [15]. 3. **Embodied Agents**: These agents convert natural language instructions into physical actions, requiring spatial and physical reasoning for tasks like navigation and object manipulation [16]. 4. **Healthcare**: In high-risk medical environments, Agentic reasoning assists in diagnosis, drug discovery, and personalized treatment plans by integrating multimodal patient data [17]. 5. **Autonomous Web Exploration**: Agents can autonomously browse the internet, extract information, and conduct market research, handling complex tasks that require multi-round searches [18]. Group 4: Future Challenges - The development of truly intelligent, reliable, and safe agent systems faces several challenges: 1. **Personalization**: Adapting agents to individual user preferences and workflows remains a significant hurdle [20]. 2. **Long-term Interaction**: Maintaining focus and coherence over extended periods while managing interruptions is a complex issue [21]. 3. **World Modeling**: Agents need to build accurate internal models of their environments to make robust decisions [22]. 4. **Multi-agent Training**: Training numerous agents to collaborate effectively presents scalability and communication challenges [23]. 5. **Governance Frameworks**: Establishing effective governance to ensure agents' actions align with human values and to manage risks is crucial for real-world deployment [24].