腾讯研究院AI速递 20251014

Group 1: OpenAI and Chip Partnerships - OpenAI has announced a strategic partnership with Broadcom to deploy 100 billion watts of custom AI chips designed by OpenAI, with deployment starting in the second half of 2026 and completion by the end of 2029 [1] - This marks OpenAI's third significant deal with a chip giant in a month, following a $100 billion investment from NVIDIA and a $60 billion GPU deployment agreement with AMD [1] - Sam Altman revealed that both companies have been designing the new chip over the past 18 months, utilizing OpenAI's own models in the design process, leading to a significant increase in Broadcom's stock price by over 10% after the announcement [1] Group 2: Google Gemini 3.0 Update - Google is set to release Gemini 3.0 on October 22, showcasing impressive front-end development capabilities that can generate web pages, games, and original music with a single click [2] - Gemini 3.0 employs a MoE architecture with over a trillion parameters, activating 15-20 billion parameters per query, and can handle context from 1 million to several million tokens, enabling it to process entire books and codebases [2] - Internal tests indicate that Gemini 3.0 outperformed in front-end tests, including generating 3D pixel art, with a year-on-year growth rate of 46.24% expected by September 2025 [2] Group 3: LiblibAI 2.0 Upgrade - LiblibAI 2.0 has integrated over 10 popular video models and numerous image models, allowing users to complete all AI creative tasks within the platform [3] - The upgrade includes a one-click video effect feature and seamless switching between image generation and video creation, incorporating models like Midjourney V7 and Qwen-image [3] - New asset management and AI toolbox features have been added, providing a comprehensive AI experience for both new and existing users [3] Group 4: Mamba-3 Development - The third generation of Mamba, Mamba-3, has entered blind review for ICLR 2026, featuring innovations such as trapezoidal rule discretization, complex state spaces, and multi-input multi-output design [4][5] - Mamba-3 introduces complex hidden states to handle periodic patterns and parity checks, significantly enhancing arithmetic intensity to fully utilize GPU capabilities [5] - It has shown excellent performance in long-context information retrieval tests, with reduced inference latency, making it suitable for long text processing, real-time interaction, and edge computing applications [5] Group 5: SAM 3 Concept Segmentation - The suspected Meta-developed SAM 3 paper has been submitted to ICLR 2026, achieving prompt concept segmentation (PCS) that allows users to segment matching instances using simple noun phrases or image examples [6] - SAM 3 has demonstrated at least a twofold performance improvement on the SA-Co benchmark, achieving an average precision of 47.0 on the LVIS dataset, surpassing the previous record of 38.5 [6] - It utilizes a dual encoder-decoder transformer architecture, built on a high-quality training dataset containing 4 million unique phrases and 52 million masks, processing over 100 object images in just 30 milliseconds on a single H200 GPU [6] Group 6: Google's ReasoningBank Framework - Google has introduced the ReasoningBank memory framework, which extracts memory items from the successes and failures of agents to form a closed-loop self-evolution system that learns without real labels [7] - The framework incorporates memory-aware testing time expansion (MaTTS) to generate diverse explorations through parallel and sequential setups, enhancing the synthesis of more universal memories [7] - ReasoningBank has shown a 34.2% improvement in effectiveness and a 16.0% reduction in interaction steps in benchmark tests such as WebArena, Mind2Web, and SWE-Bench-Verified [7] Group 7: AI Performance in Astronomy - Recent studies indicate that GPT-5 and Gemini 2.5 Pro achieved gold medal results in the International Olympiad on Astronomy and Astrophysics (IOAA), with GPT-5 scoring an average of 84.2% in theoretical exams [8] - Both models outperformed the best students in theoretical exams, although their accuracy in geometric/spatial problems (49-78%) was notably lower than in physics/mathematics problems (67-91%) [8] - This highlights AI's strong reasoning capabilities not only in mathematics but also in astronomy and astrophysics, approaching top human-level performance across multiple scientific domains [8] Group 8: Unitree G1 Robot Developments - The Unitree G1 robot has demonstrated advanced movements such as aerial flips and kung fu techniques, showcasing its agility and capabilities [10] - Unitree plans to launch a humanoid robot standing 1.8 meters tall in the second half of this year, having applied for nearly 10 patents related to humanoid robots [10] - The domestic robotics industry has seen an average growth rate of 50%-100% in the first half of this year, with algorithm upgrades enabling robots to theoretically perform various dance and martial arts movements [10] Group 9: Apple AI Glasses - Bloomberg reports that Apple's smart glasses may run a full version of visionOS when paired with a Mac and switch to a lightweight mobile interface when connected to an iPhone, with a planned release between 2026 and 2027 [11] - Apple has shifted focus from developing a lighter "Vision Air" headset to smart glasses, directly competing with Meta's Ray-Ban Display [11] - The first generation of the product will not feature a display but will include audio speakers, cameras, voice control, and potential health functionalities, with plans for a multi-tiered product line in the future [11] Group 10: Sam Altman's Insights on AI and Work - Sam Altman stated in a recent interview that AI will change the nature of work but will not eliminate true jobs, suggesting that future work may become easier while human intrinsic motivation remains [12] - Regarding the development of GPT-6, the focus will be on creating smarter models with longer context and better memory capabilities, with Codex already capable of completing full-day tasks [12] - OpenAI currently has 800 million active users weekly, and Altman believes that voice will not be the ultimate form of AI interaction, with the team working on a new voice interaction device that will not be revealed in the short term [12]