腾讯研究院AI速递 20251009

Group 1: OpenAI Developments - OpenAI released the AgentKit toolkit, which includes a visual Agent Builder, Connector Registry, and ChatKit, providing drag-and-drop workflow orchestration and safety features, posing a threat to startups [1] - The official version of Codex was launched with new Slack integration and SDK, achieving a daily active usage increase of over 10 times in three months, with GPT-5-Codex processing over 40 trillion tokens [1] - New model interfaces such as Sora 2 API, gpt-realtime-mini, and gpt-image-1-mini were released, and ChatGPT opened Apps SDK for third-party application integration [1] Group 2: Gemini 3.0 Pro Insights - Internal testing of Gemini 3.0 Pro shows strong front-end and web programming capabilities, accurately executing complex tasks like physics engine simulations and SVG graphic generation [2] - In benchmark tests, it achieved an accuracy rate of over 20% in ARC-AGI-2 thinking mode, surpassing GPT-5 and Grok 4 with a human exam score of 32.4% [2] - Google is expected to release the Gemini 3.0 series (including Pro and Flash versions) next week, directly competing with recently released models from OpenAI and Anthropic [2] Group 3: Thinking Machines Lab Product Launch - Thinking Machines Lab launched its first product, Tinker, simplifying the fine-tuning of large models, allowing researchers to retain 90% control without dealing with complex infrastructure [3] - Tinker utilizes LoRA technology to share GPU resources across multiple tasks, supporting Qwen3 and Llama3 models, with model switching requiring only a single string parameter change [3] - The founder, Murati, aims to recreate the early OpenAI model, focusing on open research sharing and granting researchers more freedom, contrasting with OpenAI's shift towards socialization [3] Group 4: Claude Sonnet 4.5 Features - Claude Sonnet 4.5 was released, maintaining its price while achieving industry-leading results in SWE-bench Verified programming assessments, sustaining focus on complex tasks for over 30 hours [4] - The Claude Agent SDK was introduced, integrating Claude Code's underlying infrastructure, offering memory management, permission systems, and sub-agent coordination for a wide range of tasks [4] - An experimental feature, "Imagine with Claude," allows real-time software generation without pre-written code, set to be available for Max subscribers within five days [4] Group 5: GLM-4.6 Model Release - Zhiyu released the GLM-4.6 flagship model, enhancing coding capabilities by 27% compared to the previous GLM-4.5, aligning with Claude Sonnet 4 as the strongest coding model domestically, with context window expanded from 128K to 200K [5] - In tests of 74 real programming tasks, GLM-4.6 outperformed Claude Sonnet 4 while consuming over 30% fewer tokens than GLM-4.5, with all test questions and trajectories publicly available for verification [5] - GLM-4.6 achieved FP8+Int4 mixed-precision deployment on domestic chips from Cambrian and Moore Threads, launching a Coding Plan subscription starting at 20 yuan per month, supporting over 10 mainstream programming tools [5] Group 6: Sora's Market Performance - Sora topped the US App Store charts within three days of launch, achieving 164,000 downloads, surpassing Google Gemini and ChatGPT; the new "Cameo" feature ensures character consistency and audio-visual synchronization, with the Pro version generating high-quality 15-second videos [6] - Testing indicated Sora 2 scored 55% on the scientific quiz GPQA, close to GPT-4o's 72%, suggesting integration of language models for prompt rewriting and content understanding [6] - Ultraman announced plans for an "interactive fan creation" mode and revenue-sharing mechanisms, though experts warned that Sora's realistic video generation could be misused for forgery and fraud, making it difficult to discern authenticity [6] Group 7: Tencent's Mixed Yuan Image 3.0 - Tencent's Mixed Yuan Image 3.0 topped the LMArena text-to-image leaderboard, surpassing Google's Nano Banana and ByteDance's Seedream 4, becoming the strongest open-source image generation model globally, and is completely free [7] - The model employs an 80B parameter MoE architecture with native multimodal design, supporting world knowledge reasoning, 1000-token long text understanding, and precise rendering in Chinese and English, achieving commercial-grade aesthetics [7] - Tencent plans to intensively open-source the Mixed Yuan series models by 2025, maintaining leadership in 3D and video generation, and is building a comprehensive AI system covering text, image, video, and 3D applications [7] Group 8: Google Nano Banana Updates - Google Nano Banana officially opened its API, pricing image generation at approximately 0.28 yuan per image, allowing developers to embed it into their products for large-scale content production [8] - New features include aspect ratio selection, supporting over ten ratios such as 16:9, 9:16, 4:3, and 3:2, as well as a pure image output mode, making it suitable for e-commerce displays and design tools [8] - Users can manually create applications in Google AI Studio or integrate via the Gemini API, with image generation priced at 12 times that of text mode, and a maximum image size of 1024x1024 pixels [8] Group 9: Insights from Former Google CEO - Former Google CEO Schmidt believes that while the US will win the AGI race, China will dominate the humanoid robot market, similar to the electric vehicle market, citing examples like the $6,000 robot from Yuzhu Technology [9] - The US AI leadership faces an energy bottleneck, needing to add 92 gigawatts of power generation capacity by 2030; failure to address energy issues could hinder the full utilization of technological advantages [9] - The entrepreneurial barrier has dropped to zero, but competition is fierce; success hinges on rapid action and building systems around "learning" to create self-reinforcing learning loops and network lock-in effects to establish platform-level companies [9]