Workflow
WebMCP
icon
Search documents
「龙虾之父」吐槽人类互联网后,终于有人把这当个事儿办了
机器之心· 2026-03-30 03:00
Core Viewpoint - The article discusses the evolving role of AI in internet usage, emphasizing the need for infrastructure that is friendly to AI agents, as current systems are not optimized for their operation [1][5][19]. Group 1: Current Internet Challenges for AI Agents - The current internet infrastructure is not designed for AI agents, leading to inefficiencies such as low success rates in tool usage and high operational costs [4][16]. - AI agents face significant hurdles, including verification processes and a lack of seamless tool integration, which hampers their ability to perform tasks efficiently [4][15]. - The success rate of AI agents in calling external tools is only 60%, and this drops to below 30% with multiple steps, highlighting the inadequacy of current systems [4][16]. Group 2: The Need for Agent-Friendly Infrastructure - Peter Steinberger's insights indicate that the next generation of internet infrastructure must be rebuilt to accommodate AI agents, addressing issues like access restrictions and tool compatibility [5][19]. - The concept of "Agent Internet Infra" is introduced, focusing on creating a network that allows AI agents to discover, connect, and collaborate effectively [17][19]. - Companies like Cloudflare and Google are beginning to address these challenges, but the overall market for agent-friendly infrastructure remains in its early stages [17][19]. Group 3: AgentEarth's Approach - AgentEarth, led by Liu Hongtao, aims to create a robust infrastructure that prioritizes AI agents as the primary users, shifting the focus from human-centric design to task efficiency [19][28]. - The company plans to implement a single gateway for external service access, ensuring that AI agents can utilize pre-selected, high-quality tools without trial and error [23][24]. - AgentEarth's proprietary transmission protocol is designed to significantly enhance data transfer speeds, outperforming existing solutions like Google's QUIC by 2-10 times [24][25]. Group 4: Market Potential and Future Outlook - The emergence of the Agent Internet is expected to redefine market dynamics, allowing for the deployment of numerous AI agents capable of running multiple tasks simultaneously [27][28]. - The potential for new companies to arise in this space is significant, as foundational issues remain unaddressed, creating opportunities for innovative solutions [27][28]. - AgentEarth has positioned itself well in this nascent market, leveraging its unique team expertise and early recognition of the need for agent-focused infrastructure [28][29].
腾讯研究院AI速递 20260317
腾讯研究院· 2026-03-16 16:01
Group 1 - Google Chrome team officially launched the WebMCP protocol, allowing AI agents to directly call web functionalities via API without relying on inefficient methods like screenshot recognition and simulated clicks [1] - WebMCP is co-developed by Google and Microsoft and is open-sourced, enabling front-end developers to integrate it directly in the browser without additional backend deployment [1] - Future web pages will be divided into two layers: one for user visual interaction and another for AI structured tool interfaces, upgrading the front-end role from "designing pages" to "defining interfaces between AI and the world" [1] Group 2 - Zhipu AI launched GLM-5-Turbo, optimized for the OpenClaw lobster agent scenario, enhancing core capabilities like tool invocation, long-chain execution, scheduled tasks, and instruction adherence [2] - A lobster package (personal and team versions) was released to address high token consumption in agent scenarios, along with an enterprise-level Claw security management system supporting permission orchestration, audit logs, and multi-agent collaborative monitoring [2] - In blind tests, 90% of users found GLM-5-Turbo superior to other domestic models, with several major companies' internal testing teams giving high praise for tool invocation stability and long task execution [2] Group 3 - Moonlight released the AttnRes paper, replacing fixed-weight residual addition in traditional Transformers with attention mechanisms, allowing each layer to dynamically retrieve the most useful information from all historical layers [3] - The Block AttnRes was proposed to address the computational overhead of large-scale training, integrated into the Kimi Linear architecture (48B parameters/3B activations), resulting in over 20% improvement in GPQA-Diamond, with computational efficiency equivalent to 1.25 times the baseline [3] - Jerry Tworek, former OpenAI inference model lead, commented that "Deep Learning 2.0 is coming," while Andrej Karpathy believes this further explains the deeper meaning of "Attention is All You Need" [3] Group 4 - Tencent's Yuanbao App updated to version 2.60.10, allowing users to connect their deployed OpenClaw lobsters to the "Yuanbao Party" social feature for collaborative lobster farming and interaction [4] - Users with deployed OpenClaw can bind their accounts through "link existing OpenClaw," supporting one-click association with cloud lobsters on Tencent Cloud Lighthouse; a "one-click creation" feature is set to launch soon [4] - Yuanbao Party has expanded from a "human + Bot" model to a "human + Bot + lobster" triadic ecosystem, enabling multi-agent collaboration and social interaction through long-pressing avatars to @lobster [4] Group 5 - Tencent PC Manager launched the "Lobster Manager" feature, specifically designed for OpenClaw security protection, integrating skills security detection, script execution monitoring, file protection, network port exposure detection, and operation log tracing [6] - A core highlight is the file protection feature within the sandbox security policy, allowing users to specify folders that OpenClaw cannot access, enabling "selective opening" of permissions while protecting sensitive data [6] - In response to the security risks posed by the 380,000 publicly exposed OpenClaw instances, Lobster Manager offers port exposure scanning and internal network penetration interception, with one-click password strength and network risk detection [6] Group 6 - Chen Tianqiao's MiroMind released MiroThinker-1.7 and H1 heavy reasoning agent, with H1 refreshing SOTA on benchmarks like BrowseComp (88.2%), GAIA (88.5%), and HLE-Text (47.7%) [7] - Key technological breakthroughs include native training for agents (enhancing planning and reasoning capabilities during mid-training) and a heavy reasoning mode centered on verification, ensuring quality at each reasoning step rather than merely extending thinking time [7] - In practical tests, the model predicted gold prices with an error of only 0.08% 15 days in advance, and real-time predictions for F1 races converged to match final results perfectly; two versions, 235B and 30B, were open-sourced to balance performance and efficiency [7] Group 7 - UniPat AI open-sourced SWE-Vision, a minimalist visual intelligence framework, using only two tools (execute_code and finish) to allow multimodal models to compensate for visual processing accuracy shortfalls through Python code [8] - A key design feature is a stateful Jupyter Notebook execution environment, enabling models to read images step-by-step, crop, measure, draw auxiliary lines, and self-validate, achieving a closed-loop reasoning of "experiment first, conclude later" [8] - The most significant improvement was observed in basic perception tasks (counting, color recognition, spatial relationships), revealing a new direction for test-time scaling in the visual domain: not only relying on more text but also writing more code for finer insights [8] Group 8 - The 315 Gala exposed the GEO (Generative Engine Optimization) black market, where businesses can manipulate AI answers within hours using a few soft articles, with the involved company serving over 200 clients in a year [9] - The exposed system can automatically generate fake articles and publish them in bulk on self-media platforms, with large models recognizing them as real information after "cross-validation"; package prices range from 2,980 to 16,980 yuan per year, with advanced versions generating 63 articles daily [9] - The State Administration for Market Regulation has listed AI-generated advertisements as a key focus for internet advertising regulation in 2026, planning to conduct concentrated rectification; CCTV commented that GEO technology itself is neutral but has been exploited by unscrupulous businesses to harm consumer rights [9] Group 9 - Sam Altman predicted in a Stanford interview that the next generation of AI architecture will completely overturn Transformers, with performance leaps comparable to the impact of Transformers on LSTM [10] - Altman believes that existing high-level LLMs possess sufficient cognitive ability to assist humans in architectural-level research, forming a self-accelerating flywheel of "stronger models → higher research efficiency → faster discovery of new architectures" [10] - Competition in the post-Transformer landscape has begun, with Mamba's third-generation architecture achieving five times faster inference throughput, NVIDIA switching all new models to mixed architectures, and Liquid AI controlling autonomous driving with 19 neurons [10]
腾讯研究院AI速递 20260212
腾讯研究院· 2026-02-11 16:08
Group 1: Google Chrome and WebMCP Protocol - Google Chrome team has released the WebMCP (Web Model Context Protocol), allowing AI agents to interact directly with website kernels via the navigator.modelContext API, bypassing human user interfaces [1] - WebMCP addresses the high costs and low stability issues of traditional agent screenshot recognition, marking a transition from "visual simulation" to "logical direct connection," referred to as "API in UI" [1] - This standard is being jointly promoted by Google and Microsoft, indicating a potential future division of the internet into UI layers for humans and tool layers for agents, heralding the arrival of the "Agentic UI" era [1] Group 2: Runway's Financing and Model Development - Video generation unicorn Runway has secured $315 million in Series E funding, achieving a valuation of $5.3 billion, with participation from Nvidia, AMD, and Adobe, bringing total funding to $815 million [2] - Runway's Gen-4.5 ranks third in the AI-generated video leaderboard, surpassing models like Google Veo 3 and OpenAI Sora 2 Pro [2] - The new funding will be used to train the next generation of world models, having already launched the general world model GWM-1, which includes variants for explorative environments, dialogue characters, and robotic operations [2] Group 3: xAI Leadership Changes - xAI co-founders Jimmy Ba and Wu Yuhua announced their departures within 48 hours, with 6 out of 12 founding team members having left, including 5 in the past year [3] - Responsibilities of the departing co-founders have been redistributed among other co-founders, and SpaceX's acquisition of xAI has been completed, with an IPO plan set to advance in the coming months [3] - xAI's flagship product Grok has recently exhibited strange behaviors, and the talent loss poses challenges for the upcoming IPO [3] Group 4: DeepSeek's New Model - DeepSeek has quietly launched a new model supporting a 1 million token context window, with knowledge cutoff in May 2025, capable of processing content equivalent to the entire "Three-Body Problem" trilogy [4] - This model remains a pure text model, unable to view images directly but capable of reading text from images and documents, with enhanced Agentic Coding capabilities [4] - The industry trend is shifting from LLM reasoning to Agentic reasoning, as indicated by the latest models from Anthropic and OpenAI, suggesting humans will act as architects directing AI teams in software development [4] Group 5: Zhiyu's GLM-5 Model - Zhiyu has confirmed that the mysterious model "Pony Alpha," which topped the OpenRouter popularity chart, is its new model GLM-5, achieving state-of-the-art performance in coding and agent capabilities [5] - GLM-5's performance in real programming scenarios closely approaches that of Claude Opus 4.5, excelling in complex systems engineering and long-range agent tasks with high tool invocation accuracy [5] Group 6: Ant Group's Omni Model - Ant Group has open-sourced the full-modal model Ming-flash-omni 2.0, the first in the industry to generate voice, environmental sound effects, and music simultaneously on the same audio track [7] - This model excels in visual language understanding, controllable speech generation, and image editing, surpassing capabilities of Gemini 2.5 Pro and Qwen3-Omini-30B-A3B-Instruct [7] - The model employs a unified architecture for deep multi-modal integration, supporting zero-shot voice cloning and fine attribute control, and has been open-sourced on platforms like HuggingFace [7] Group 7: iFlytek's Starfire X2 Model - iFlytek has released the Starfire X2 model, trained on entirely domestic computing power, with overall capabilities matching international top levels, particularly in mathematics, reasoning, and agent tasks [8] - Starfire X2 utilizes a 293 billion MoE sparse architecture, improving inference performance by 50% compared to X1.5, and continues to enhance capabilities in over 130 languages, maintaining industry leadership in key languages for Latin America and ASEAN [8] - Industry applications have been significantly upgraded, with medical capabilities passing authoritative evaluations and educational applications achieving personalized learning through error analysis [8] Group 8: Meituan's LongCat Research Agent - Meituan's LongCat has launched a "deep research" feature, scoring 73.1 in the BrowseComp evaluation, approaching top closed-source models, supporting up to 400 interactions and 256K context [9] - Leveraging Meituan's native capabilities in local life, it creates a real training environment and employs a Rubrics-as-Reward mechanism to address AI hallucination issues, ensuring all recommendations are verifiable [9] - The model utilizes a multi-agent specialized division of labor, automating the entire process from information gathering to research analysis and visualization, capable of generating professional reports for restaurant recommendations and travel planning [9] Group 9: ByteDance's Protenix-v1 Model - ByteDance's Seed team has released Protenix-v1, an open-source model that matches the performance of AlphaFold 3 under strict training data and model size constraints [10] - This model successfully unlocks scaling capabilities during inference, with the prediction success rate for antibody-antigen complexes increasing from 36% with a single seed to 47.68% with 80 seeds [10] - The team has adopted a dual-version strategy, with the standard version aligning with academic benchmarks and the extended version utilizing data from June 2025 for practical drug discovery applications, along with the launch of the PXMeter evaluation toolkit [10]
谷歌Chrome深夜爆更,Agent不用「装」人了,前端最后防线崩了?
3 6 Ke· 2026-02-11 04:12
Core Insights - Google Chrome team has introduced WebMCP (Web Model Context Protocol), allowing AI agents to interact directly with websites and web applications, bypassing the human user interface [1][6][7] Group 1: WebMCP Overview - WebMCP enables AI agents to skip visual simulations and interact with web applications through a direct API, enhancing efficiency [6][12] - The protocol represents a significant shift from traditional web interaction paradigms, moving towards a logic-based connection [7][9] - WebMCP is seen as a "superpower" for agents, allowing them to execute commands directly on websites without manual navigation [6][9] Group 2: Advantages of WebMCP - The protocol offers three main advantages: code reuse, unified interface for users and agents, and enhanced accessibility for assistive technologies [27][25] - It allows agents to perform tasks like booking flights or making purchases more efficiently by directly calling functions instead of navigating through UI [28][30] - WebMCP aims to create a collaborative environment where users, web pages, and agents can work together seamlessly [20][25] Group 3: Development and Collaboration - WebMCP is a collaborative project initiated by Google and Microsoft, indicating a joint effort to redefine web interactions [21][23] - Developers are provided with two flexible API access methods: declarative API for standard operations and imperative API for complex interactions [18][19] Group 4: Future Implications - The introduction of WebMCP is expected to accelerate the transition from manual searches to automated execution by AI agents, marking a new era in web interaction [39][38] - The future web may evolve into a two-layer structure, focusing on discovery of tools and clear data schemas, enhancing the interaction between users and AI [36][37]