腾讯研究院AI速递 20260330

Group 1: Claude Mythos and AI Developments - Claude Mythos 5.0 has begun gray testing, positioned as a larger and smarter model than Opus, with a 73% probability of launching in June [1] - Claude demonstrated the ability to autonomously discover vulnerabilities, including a 20-year stack buffer overflow in the Linux kernel [1] - Anthropic's engineers have shifted to a multi-agent parallel work mode, transitioning from coding to managing AI agents [1] Group 2: Claude Code Enhancements - Claude Code introduced an automatic mode using a transcription classifier, achieving a false block rate of only 0.4% in 10,000 real traffic instances [2] - The classifier employs a dual-layer architecture to ensure operational safety and prevent self-justification interference [2] - The system has a 17% false negative rate for excessive proactive behavior, with safety checks in multi-agent scenarios [2] Group 3: Google Gemini Advancements - Google launched Gemini 3.1 Flash Live, significantly improving voice interaction latency and naturalness, especially in noisy environments [3] - The model supports continuous audio-video stream input and includes capabilities like tool invocation and multi-language support [3] - Gemini API and Google AI Studio have been opened to developers, showcasing potential applications in design collaboration and gaming [3] Group 4: GLM-5.1 Model Release - Zhizhu released the GLM-5.1 model, which improved programming capabilities by nearly 10 points, now only 2.6 points behind Claude Opus 4.6 [4] - The model supports approximately 200K context windows and reasoning mode, and it was sold out shortly after launch due to high demand [4] - Users have successfully created interactive games using GLM-5.1, demonstrating its strengths in spatial understanding and complex task execution [4] Group 5: Runway Multi-Shot App - Runway launched the Multi-Shot App, allowing users to generate up to five-shot videos from a text description without manual editing [6] - The app is based on the Gen-4.5 model and includes features like automatic shot language orchestration and synchronized dialogue [6] - Runway recently completed a $315 million financing round, valuing the company at $5.3 billion, and is moving towards full film production capabilities [6] Group 6: Claude Code Memory 2.0 - Claude Code introduced the experimental AutoDream feature, which periodically reviews historical sessions to manage memory files [7] - The feature can be triggered automatically or manually, running for about 10 minutes to recap numerous sessions [7] - Its core value lies in reducing repetitive background explanations and enhancing key information recall [7] Group 7: NeurIPS Controversy - NeurIPS 2026 faced backlash for a new policy prohibiting submissions from entities on the OFAC sanctions list, including major Chinese companies [9] - The Chinese Computer Society and other organizations called for a halt to submissions and reviews, leading to a swift apology from NeurIPS [9] - NeurIPS updated its policy to welcome all compliant institutions and individuals for submissions [9] Group 8: AI Industry Insights - Industry leaders discussed the growth of token usage driven by intelligent agents, with a noted increase of 10 times, indicating a potential demand of 100 times [10] - The concept of "self-evolution" was highlighted as a key direction for AGI in the coming year, with significant efficiency improvements reported [11] - The need for infrastructure designed for agents rather than humans was emphasized, suggesting a future where infrastructure itself evolves [11]