腾讯研究院AI速递 20250815

Group 1: US AI Chip Tracking Measures - The US authorities have secretly installed tracking devices in shipments of advanced AI chips considered high-risk for illegal transfer to China, primarily targeting Nvidia and AMD chips within servers from companies like Dell and Supermicro [1] - Some trackers are approximately the size of a smartphone, installed on shipping boxes, with smaller, hidden devices placed inside packaging or even within servers [1] - The US Department of Commerce's Bureau of Industry and Security, Homeland Security Investigations, and the FBI are involved, with proposals for US chip companies to incorporate location verification technology in their chips [1] Group 2: Claude Code New Features - Claude Code has introduced a new option called "Opus Planning Mode" in its model selector, which will utilize the Claude 4.1 Opus model during the planning phase and the Claude 4 Sonnet model for other tasks [2] - This feature combines the advantages of both models, leveraging Opus 4.1's superior intelligence for complex problem analysis and high-quality development planning while benefiting from Sonnet 4's efficiency in generating specific code [2] - Users can enable this feature through the model selector or by using the shortcut Shift+Tab to switch between different working modes, available to all users with access to the Opus model after updating to the latest version [2] Group 3: Kunlun Wanwei's Skywork Deep Research Agent v2 - Kunlun Wanwei has officially released the Skywork Deep Research Agent v2, which introduces multimodal deep research capabilities, integrating multimodal retrieval, understanding, and generation to overcome the limitations of traditional text-only retrieval methods [3] - The new multimodal deep browsing agent can efficiently perform intelligent searches, analyze multimodal information, and gain insights from community content, showing excellent performance in content analysis on platforms like Xiaohongshu [3] - In the authoritative search evaluation BrowseComp, the standard mode achieved a correct rate of 27.8%, which increased to 38.7% when the self-developed "parallel thinking" mode was activated, setting a new industry SOTA record [3] Group 4: Tencent's Hunyuan-GameCraft - Tencent Hunyuan has launched the open-source tool Hunyuan-GameCraft, which allows users to generate high-definition dynamic game videos by simply inputting an image, text description, and action instructions [4] - This tool features three major advantages: a unified continuous action space for smooth and flexible movements, memory enhancement for maintaining scene consistency, and significantly reduced costs without the need for manual modeling [4] - It supports both first-person and third-person perspectives and can generate diverse scenes (e.g., villages, castles, roads), making it suitable for game development prototyping, video creation, and 3D design presentations [4] Group 5: Microsoft's AI Agent Modes - Microsoft has released five core agent design modes: tool usage mode, reflection mode, planning mode, multi-agent mode, and ReAct mode, aimed at helping users quickly develop powerful automated AI employees [5][6] - The tool usage mode enables agents to interact directly with enterprise systems, while the reflection mode allows agents to identify errors and self-correct; the planning mode breaks down high-level goals into actionable tasks [6] - The multi-agent mode constructs a network of specialized agents, and the ReAct mode enables agents to dynamically solve problems in real-time environments; Microsoft's Azure AI Foundry supports these modes with over 1,400 connectors [6] Group 6: OpenCUA Framework by HKU and Moonlight - The XLANG Lab at the University of Hong Kong and Moonlight have jointly released the OpenCUA open-source framework, designed to help users efficiently and easily develop agents that autonomously operate computers [7] - This framework includes an annotation infrastructure for capturing human computer usage demonstrations, covering three major operating systems and an AgentNet dataset with over 200 applications, along with workflows featuring reflective long-chain reasoning [7] - The flagship model OpenCUA-32B achieved an average success rate of 34.8% on the CUA benchmark test OSWorld-Verified, surpassing open-source models and exceeding OpenAI's CUA (GPT-4o), paving the way for the scalable application of computer usage agents [7] Group 7: Apple's AI Home Products - Apple is developing three types of AI smart home products: a desktop robot (code-named J595, resembling a Pixar lamp), a screen-equipped HomePod (code-named J490), and a smart security camera (code-named J450) [8] - The desktop robot is equipped with a 7-inch screen and a 15 cm electric mechanical arm, capable of automatically adjusting its direction based on human movement, expected to launch in 2027; the screen-equipped HomePod will serve as a smart home hub, launching in mid-2026 [8] - Apple is developing a new AI Siri (code-named Linwood) for these products, which will have the ability to actively participate in multi-person conversations and is designing a new visual identity (code-named "Bubbles") to run on a new operating system named "Charismatic" [8] Group 8: Zhiyuan's Genie Envisioner - Zhiyuan Robotics has launched the Genie Envisioner (GE), a unified world model platform for real-world robot control, integrating future frame prediction, strategy learning, and simulation evaluation into a video generation-centric closed-loop architecture [9] - The platform consists of three core components: GE-Base (multi-view video world base model), GE-Act (parallel flow matching action model), and GE-Sim (hierarchical action condition simulator), trained on 3,000 hours of real machine data [9] - GE-Act demonstrates outstanding cross-platform generalization performance, requiring only one hour (approximately 250 demonstrations) of remote operation data to achieve cross-platform transfer, significantly outperforming existing SOTA methods in long-sequence tasks (e.g., folding boxes) [9] Group 9: Baichuan Intelligence's Strategic Shift - Baichuan Intelligence has undergone significant restructuring, reducing its team from 450 to less than 200 and compressing management levels from 3.6 to 2.4, refocusing on its original mission of "creating doctors for humanity and building models for life" [10] - Baichuan has released the Baichuan-M2 medical large model, which outperforms OpenAI's newly open-sourced model and is second only to GPT-5, achieving a score of 34 in the HealthBench evaluation, surpassing OpenAI's claimed score of 32 [10] - The founder believes that AI family doctors will arrive sooner than autonomous driving, with Baichuan planning to launch consumer-facing services in 2026, as healthcare is a necessity and AI doctors can collaborate efficiently with human doctors [11]