腾讯研究院AI速递 20250924

Group 1: Nvidia and OpenAI Partnership - Nvidia announced a strategic partnership with OpenAI, planning to invest up to $100 billion, with OpenAI deploying up to 10 gigawatts of Nvidia systems, equivalent to 4-5 million GPUs [1] - The first phase of the system is set to operate in the second half of 2026 based on Nvidia's Vera Rubin platform [1] - Both companies will collaborate to optimize the technical roadmap for models and infrastructure software and hardware, aiming to advance OpenAI's mission for general artificial intelligence, resulting in a nearly 4% increase in Nvidia's stock price following the announcement [1] Group 2: Wuwen Xinqun's Agentic Infra - Wuwen Xinqun launched an infrastructure intelligent agent swarm, utilizing a multi-agent collaborative architecture to cover various modules such as model selection, resource operation, troubleshooting, and cluster operation and maintenance [2] - This solution transforms the traditional production model from IaaS to PaaS to MaaS to Agent applications, building a highly collaborative system centered around intelligent agents, significantly enhancing resource utilization and operational efficiency [2] - Collaborations with clients like Nia TA and Soul have resulted in a fivefold increase in iteration speed and a hundredfold expansion in operational capabilities, promoting the shift from "AI infrastructure paradigm" to "Agentic Infra" [2] Group 3: Alibaba's Qwen3-Omni Model - Alibaba's Tongyi has open-sourced the Qwen3-Omni multimodal model, capable of seamlessly processing text, images, audio, and video inputs, supporting real-time streaming responses and simultaneous text and voice output [3] - The model achieved state-of-the-art (SOTA) results in 32 out of 36 audio and audio-video benchmark tests, surpassing closed-source strong models like Gemini-2.5-Pro, and supports 119 text languages, 19 speech understanding languages, and 10 speech generation languages [3] - Alibaba also open-sourced the Qwen3-TTS-Flash speech synthesis model and the Qwen-Image-Edit-2509 image editing model, with the former supporting 17 voice tones and 10 languages, and the latter introducing multi-image editing and single-image consistency enhancement features [3] Group 4: Kimi's Agent Membership Service - Kimi introduced an Agent membership service, allowing users to receive a full refund of previous tipping amounts upon first subscription [4] - The membership service is named after musical tempos: the free version is Adagio, with paid versions priced at 49 yuan for Andante and 99 yuan for Moderato, and an overseas option at $199 for Vivace [4] - The main difference between paid and free users lies in the number of Agent usage instances, with mid to high-tier subscriptions offering equivalent API exchange vouchers and higher-tier members receiving priority access during peak times [4] Group 5: MiniCPM-V 4.5 Model Release - Tsinghua University's NLP lab and Mianbi Intelligence released the MiniCPM-V 4.5 technical report, which, with 8 billion parameters, surpasses larger models like GPT-4o-latest and Qwen2.5-VL-72B [5] - The model employs three innovative technologies: a unified 3D-Resampler architecture for high-density video compression, a document-oriented unified OCR knowledge learning paradigm, and controllable mixed fast/deep thinking multimodal reinforcement learning [6] - MiniCPM-V 4.5 achieved an average score of 77.0 in the OpenCompass comprehensive evaluation, demonstrating high inference efficiency, with time costs on VideoMME being only one-tenth of similar models, and has been downloaded over 220,000 times on HuggingFace and ModelScope [6] Group 6: ZhiYuan Robot's GO-1 Model - ZhiYuan Robot open-sourced the GO-1 general embodiment base model, utilizing the first global Vision-Language-Latent-Action (ViLLA) architecture, bridging the semantic gap between image-text input and robot action execution [8] - The model features a three-layer collaborative design: a multimodal understanding layer based on InternVL-2B, an implicit planner, and an action expert based on diffusion models, validated across various robots and simulation environments [8] - ZhiYuan Robot also launched Genie Studio, a one-stop development platform providing a full-stack solution for developers, including data collection, management, model training, fine-tuning, evaluation, and deployment, while supporting the LeRobot universal data format for compatibility with other robot platforms [8] Group 7: OpenAI's Future AI Development - Lukasz Kaiser, a member of the Transformer team at OpenAI, is involved in the development of GPT-5 and related reasoning models, emphasizing the potential of large models for cross-domain learning [9] - Kaiser proposed the concept of "One Model To Learn Them All" in 2017, predicting that the next phase of AI will focus on teaching models to "think" [9] - He forecasts a paradigm shift in AI computation from large-scale pre-training to massive reasoning calculations on a small amount of high-quality specific data, aligning more closely with human intelligence patterns [9]