General Agent
Search documents
腾讯研究院AI速递 20260311
腾讯研究院· 2026-03-10 16:01
Group 1 - Anthropic has introduced a multi-agent code review system for Claude Code, increasing the proportion of PRs receiving substantial review feedback from 16% to 54% after deployment [1] - In large PRs exceeding a thousand lines, 84% receive review comments, averaging 7.5 issues found, with incorrect review results marked at less than 1% [1] - The review system operates on a token-based billing model, costing between $15 to $25 per review, and allows customization of review rules for team and enterprise users [1] Group 2 - AMI Labs, founded by Turing Award winner Yann LeCun, has completed a $1.03 billion seed round with a valuation of $3.5 billion, led by former FAIR engineering director Alex LeBrun as CEO [2] - The company aims to build a world model based on the JEPA architecture, focusing on high-reliability scenarios in industrial control, robotics, wearables, and healthcare [2] - Alexey Sutskever, the proposer of the DiT architecture, has joined as Chief Scientist, with the first practical application expected to take at least a year of research [2] Group 3 - Microsoft has launched Copilot Cowork, which fully integrates with Excel, Word, PPT, and Outlook, utilizing the Anthropic Claude model for reasoning [3] - Key functionalities include automatic weekly schedule organization, preparing entire client meeting agendas with a single command, and executing comprehensive plans from competitive analysis to product launch [3] - The pricing is set at an additional $30/month on top of the M365 enterprise version, with a new E7 package available for $99/month, currently in limited customer research preview [3] Group 4 - Tencent's Mix Yuan 3D team has open-sourced the first reinforcement learning post-training framework for world models, named WorldCompass, addressing pre-trained world model instruction failures [4] - The framework features three core innovations: slice-level sampling to reduce computational complexity, interaction-following scoring based on a 3D base model, and efficient RL optimization algorithms [4] - Interaction accuracy in composite action scenarios has improved from 20% to 55%, achieving better scores on the Stanford WorldScore benchmark [4] Group 5 - Zhipu has launched AutoClaw, a one-click installation tool for local versions on macOS and Windows, providing full OpenClaw capabilities and automatic integration with instant messaging tools [6] - The tool includes the Pony-Alpha-2 model optimized for OpenClaw scenarios, enhancing task execution and integrating AutoGLM Browser-Use capabilities [6] - It features over 50 mainstream skills and APIs covering content creation, office tasks, coding, marketing, and finance, with support for various model APIs [6] Group 6 - Reports indicate that the U.S. military utilized Palantir's Maven system embedded with the Claude model during the U.S.-Iran conflict, analyzing over 150 information streams on the first day [7] - The Maven system integrates data from satellite images, drone footage, and intercepted communications, allowing Claude to generate target suggestions and precise coordinates in real-time [7] - The military has reportedly struck over 3,000 targets, with a Georgetown University study showing that the workload previously requiring 2,000 personnel can now be handled by just 20 [7] Group 7 - Figure has released an update on its robot, which autonomously organizes a living room using the Helix 02 system, performing tasks such as disinfecting surfaces and organizing items [8] - The Helix 02 system features a three-layer architecture for semantic reasoning, perception conversion, and control based on extensive human motion data [8] - The team has not developed new algorithms or customized scenarios, instead allowing the system to learn new tasks simply by supplementing data [8] Group 8 - The AI system OALL has launched O-DataMap, mapping experimental data from global papers into a navigable two-dimensional coordinate system [9] - The map allows users to assess research field heat and maturity, trace knowledge lineages of individual studies, and evaluate research gaps based on input ideas [9] - The map grows in real-time as the AI pipeline continuously analyzes new papers, providing insights into the influence of researchers across fields [9] Group 9 - The latest a16z global AI product Top 100 report shows ChatGPT leading with 900 million weekly users, while Claude's paid subscriptions have increased by over 200% [10] - ChatGPT is expanding into over 85 categories, including travel and shopping, while Claude focuses on professional users with integrated financial terminals and developer infrastructure [10] - OpenClaw has become the highest-starred project on GitHub, surpassing React and Linux, indicating a shift in the competitive landscape of AI products [10] Group 10 - A discussion between Fields Medalist Terence Tao and OpenAI's Mark Chen highlighted that AI is transforming mathematics into a more industrialized field, with significant reductions in error rates [11] - Tao noted that AI has become a daily research tool, outsourcing complex calculations, and has already solved several long-standing mathematical problems with minimal human oversight [11] - Chen emphasized that formal verification systems in mathematics serve as natural judges for reinforcement learning, enabling a mechanism for "infinite cheap trial and error" [11]