腾讯研究院AI速递 20251218

Group 1: OpenAI Developments - OpenAI launched a new image generation model, ChatGPT Images, which enhances image generation speed by 4 times and allows for precise editing while maintaining detail [1] - The model supports various editing types such as adding, removing, and combining elements, with improved text rendering capabilities for handling dense and small text [1] - The new Images feature is available to all ChatGPT users, with the API offered at a 20% lower price than the previous version [1] Group 2: Meta Innovations - Meta has open-sourced the audio segmentation model SAM Audio, which can separate any sound from complex audio mixes using text, visual, and time span prompts [2] - The core engine PE-AV is based on Perception Encoder and has been trained on over 100 million videos, achieving a processing speed faster than real-time [2] - SAM Audio-Bench and SAM Audio Judge have been released for benchmarking and evaluation, achieving state-of-the-art performance in various audio separation tasks [2] Group 3: Xiaomi's AI Model - Xiaomi released and open-sourced the MiMo-V2-Flash model, featuring 309 billion total parameters and 15 billion active parameters, surpassing all open-source models with a SWE-bench Verified score of 73.4% [3] - Key innovations include a 5:1 hybrid sliding window attention mechanism and lightweight multi-token prediction, improving inference speed by 2 to 2.6 times [3] - The post-training process uses a multi-teacher online distillation strategy, requiring only 1/50th of the computational power to achieve peak teacher performance [3] Group 4: Tencent's Real-Time Model - Tencent officially released and open-sourced the HY WorldPlay model, enabling real-time interactive 3D world creation from text or image inputs at 24 FPS and 720P video quality [4] - Innovations include a memory reconstruction mechanism for geometric consistency and a 3D autoregressive diffusion model for enhanced learning [4] - The model provides a comprehensive real-time world model training system, covering data, training, and streaming inference deployment [4] Group 5: Vidu Agent Launch - Vidu Agent has opened global beta testing, focusing on "one-click video creation" capabilities, allowing users to upload product images and information to generate ready-to-launch advertisements [6] - Highlights include storyboard-level control, fine editing capabilities, and multi-language customization [6] - The platform supports video replication, enabling bulk production of high-quality videos based on popular one-minute videos and product images [6] Group 6: Google's Gemini Updates - Google introduced the Super Gems feature in Gemini, integrating Opal applications with the Gems manager, making the Opal workflow directly accessible in the Labs area [7] - The new Workflow Builder allows for automatic generation of complete workflow steps and visual elements based on scene descriptions [7] - Workflows can be shared via links without relying on Google Drive permissions, enhancing user accessibility [7] Group 7: OpenAI's FrontierScience Benchmark - OpenAI launched the FrontierScience benchmark to assess expert-level scientific capabilities, featuring over 700 physics, chemistry, and biology questions [8] - GPT-5.2 scored 77% in the Olympiad track and 25% in the research track, outperforming other leading models [8] - The research track uses a 10-point scale focusing on reasoning correctness, revealing issues in logical reasoning and understanding of professional concepts [8] Group 8: Xiaomi's Future Plans - Xiaomi's Luo Fuli made her first public appearance, discussing the MiMo-V2-Flash model's core directions, emphasizing the need for models that can interact with the physical world [9] - She highlighted that computational power and data are not the ultimate moat; the true moat lies in scientific research culture and the ability to turn unknown problems into usable products [9] - Xiaomi plans to invest over 200 billion yuan in R&D over the next five years, with an estimated 40 billion yuan allocated for 2026 [9]