可灵2.5
Search documents
腾讯研究院AI速递 20250923
腾讯研究院· 2025-09-22 16:01
Group 1 - MediaTek launched the new flagship 5G AI chip Dimensity 9500, which uses a third-generation 3nm process and a full big-core architecture, integrating over 30 billion transistors, with NPU performance improved by 111% and power consumption reduced by 56% [1] - The chip features a dual NPU architecture for super performance and efficiency, introducing in-memory computing design and BitNet 1.58 bit quantization inference framework, supporting on-device model training [1] - In practical applications, it supports 128K long text processing and 4K quality image generation, with flagship new devices from manufacturers like vivo and OPPO set to utilize this chip for personalized AI scenarios [1] Group 2 - OpenAI has invested $16 billion in computing resources and plans to spend $350 billion on leasing services from 2024 to 2030, with an expected annual expenditure of $100 billion by 2030 [2] - The company signed a 5-year $300 billion computing power contract with Oracle, adding an extra $100 billion for backup servers, breaking the traditional tech giants' R&D cost model of 10%-20% of revenue [2] - OpenAI announced the upcoming launch of a compute-intensive new product in the coming weeks, but Pro users will need to pay extra, leading to user dissatisfaction [2] Group 3 - Google has introduced a new research paradigm for Agents, moving beyond the traditional "plan-retrieve-generate" model, allowing Agents to draft first and iteratively learn and correct [3] - The new framework employs a "diffusion denoising" process, enabling Agents to identify information gaps based on drafts and search for evidence externally, optimizing research content repeatedly [3] - Google has also incorporated multi-version intelligent self-critique and report-level denoising technology, outperforming OpenAI's DeepResearch in tasks like GAIA, and is available for trial in Google Agentspace [3] Group 4 - DeepSeek released the ultimate Terminus version of its model DeepSeek-V3.1, addressing user feedback with improvements in two main areas [4][5] - The new version alleviates language consistency issues such as mixed Chinese and English and further optimizes the performance of Code Agent and Search Agent [5] - DeepSeek-V3.1-Terminus is now available across official apps, web platforms, mini-programs, and DeepSeek API models, with the open-source version downloadable from Hugging Face and ModelScope [5] Group 5 - The Keling 2.5 video model has achieved significant breakthroughs in motion capabilities and expression performance, accurately depicting subtle facial expression changes and complex emotions while maintaining character consistency across different scenes [6] - The model seamlessly connects actions like falling, running, and riding a motorcycle, preserving realistic environmental interaction details and understanding complex causal relationships [6] - Keling 2.5 excels in action scenes, generating high-quality parkour, jumping, combat, and explosion scenarios, with greatly enhanced continuity and physical realism, currently in gray testing for super creators [6] Group 6 - Meituan's LongCat team has released the efficient reasoning model LongCat-Flash-Thinking, achieving advanced levels in logic, mathematics, coding, and agent capabilities while maintaining extreme speed [7] - The new model introduces a pioneering domain-parallel reinforcement learning training method, achieving threefold speedup through an asynchronous elastic shared card system, and features a dual-path reasoning framework to enhance agent capabilities [7] - In reasoning benchmark tests, it outperforms open-source models and performs comparably to top closed-source models like GPT-5 in tests such as AIME and LiveCodeBench, with formal reasoning capabilities significantly ahead of all participating models in the MiniF2F-test benchmark [7] Group 7 - Baidu's Qianfan-VL visual understanding model has been fully open-sourced, offering three configurations: 3B, 8B, and 70B, supporting OCR recognition and educational applications [8] - The model was developed by Baidu's team based on open-source models and completed all computational processes on its self-developed Kunlun chip P800, supporting single-task parallel computing at a scale of 5000 cards [8] - The Qianfan-VL series demonstrates chain-of-thought capabilities, full-scene OCR recognition, and complex document understanding, performing excellently in multiple benchmark tests, and is available for free experience on Baidu Smart Cloud [8] Group 8 - The 2025 "35 Innovators Under 35" Asia-Pacific list has been released by MIT Technology Review, featuring 35 innovators from fields such as AI, robotics, and materials [10] - Innovators like Xia Fei and Min Shiyuan have made breakthroughs in artificial intelligence, including embodied intelligence and non-parametric large language models [10] - China has the highest number of honorees, with 82 individuals selected over 11 editions as of 2024, surpassing Singapore's 76, reflecting a shift in the Asia-Pacific region from technology following to innovation leadership [10] Group 9 - The core team of Nano Banana suggests that the quality of image generation models is nearing its peak, with the next challenge being to integrate the "world knowledge" of LLMs into image models to understand user intentions [11] - While the quality ceiling of existing image models is close to being reached, there is still significant room for improvement in the "lower limit," with future developments focusing on enhancing "model expressiveness" and performance in complex scenarios [11] - Future interactive interfaces will integrate text, images, and voice, but user expectations for instant "finished product" generation are unrealistic, indicating that AI models and traditional tools will coexist in professional workflows for a long time [11]
实测可灵AI的新视频模型,它生成的动作戏酷到封神。
数字生命卡兹克· 2025-09-22 01:33
Core Viewpoint - The article discusses the advancements of the AI video generation model, 可灵2.5, highlighting its significant improvements in motion and performance capabilities compared to its predecessor, 可灵2.1, and its potential impact on creative freedom for young creators [1][54]. Group 1: Motion Evolution - 可灵2.5 demonstrates a substantial enhancement in motion capabilities, allowing for seamless transitions between complex actions such as falling, running, and riding a motorcycle, showcasing a high level of realism [2][5]. - The model can generate dynamic and fluid movements in various scenarios, including parkour and sports, achieving effects comparable to professional films [10][18][20]. - In contrast, 可灵2.1 struggled with maintaining realistic interactions with the environment, often resulting in disjointed or unrealistic movements [6][12]. Group 2: Performance Evolution - 可灵2.5 shows a marked improvement in the accuracy of emotional expressions and character performances, allowing for nuanced portrayals of complex emotions [29][45]. - The model can effectively convey subtle emotional transitions, such as a character's shift from anger to calmness, which was less successful in 可灵2.1 [29][42]. - The ability to generate diverse emotional expressions has been significantly enhanced, allowing for more relatable and engaging character interactions [35][50]. Group 3: Overall Improvements - The update to 可灵2.5 not only elevates motion and performance capabilities but also enhances the model's understanding of context and detail, addressing previous limitations in generating coherent narratives [54][56]. - The advancements in text-to-video capabilities allow creators to generate content with minimal input, fostering greater creative freedom [55][57].