Workflow
DeepMind
icon
Search documents
DeepMind率先提出CoF:视频模型有自己的思维链
量子位· 2025-09-28 03:39
Core Viewpoint - DeepMind introduces the concept of Chain-of-Frames (CoF) for video models, paralleling the Chain-of-Thought (CoT) in language models, suggesting a shift towards general-purpose visual understanding capabilities in machine vision [1][3][28]. Group 1: Introduction of CoF - The CoF concept arises from the curiosity of whether video generation models can achieve general-purpose capabilities similar to large language models (LLMs) without specialized training [6][7]. - The goal is to validate the hypothesis that video models can perform various visual tasks using a single underlying logic based on vast data [7][8]. Group 2: Capabilities of Veo 3 - Veo 3 demonstrates four progressive capabilities: 1. It can handle many classic visual tasks without specialized training, showcasing perceptual abilities [10][11]. 2. It can establish rules of the visual world, indicating modeling capabilities [13][14]. 3. It can perform creative modifications and simulations, reflecting operational abilities [16]. 4. It can achieve cross-temporal visual reasoning, embodying the CoF concept [18][21]. Group 3: Performance Analysis - Analysis of 62 qualitative tasks and 7 quantitative tasks revealed that Veo 3 can solve many tasks it has not been specifically trained for, indicating its general potential [23]. - The performance of Veo 3 shows significant improvement over its predecessor, Veo 2, suggesting rapid development in video model capabilities [24][25]. Group 4: Future Outlook - DeepMind predicts that general-purpose models like Veo 3 will eventually replace specialized models in the video domain, similar to the evolution seen in LLMs [25][26]. - The cost of video generation is currently higher than specialized models, but it is expected to decrease over time, paralleling trends observed in LLMs [25][26].
X @Decrypt
Decrypt· 2025-09-27 20:50
Google DeepMind's updated Gemini Robotics models mark a shift from single-task machines to robots that plan multi-step missions. https://t.co/hrR2MIib1a ...
快讯|小鹏汽车公布人形机器人专利;宇树科技下半年将推1.8米高人形机器人;DeepMind发布模型赋能机器人与具身智能
机器人大讲堂· 2025-09-26 12:14
1、 小鹏汽车公布人形机器人专利,或提升拟人表现 据企查查信息,近日广州小鹏汽车科技有限公司"人形机器人"专利正式公布。该专利摘要指出,此款人形 机器人由机器人主体、机械臂及机械臂驱动结构构成。其中,机械臂的连接支架可活动地安装在机器人主 体上,臂主体设于连接支架,机械臂驱动结构则用于驱动连接支架活动,进而带动臂主体。这种设计通过 可活动的连接支架,增加了机械臂的活动自由度与灵活性,让活动关节分布更均匀,机械臂造型更为协 调,从多个维度有效提升了人形机器人的拟人效果,为行业发展带来新思路。 2、 宇树科技下半年将推1.8米高人形机器人 近日,宇树科技创始人、CEO王兴兴在第四届全球数字贸易博览会上透露重要信息。他表示,宇树科技机 器人算法今年已历经数次迭代,预计下半年将发布身高达1.8米的人形机器人。王兴兴还提到,今年上半 年国内机器人行业热度颇高,中国智能机器人相关企业平均增长率在50%到100%之间。此外,宇树科技 近期再度更新算法,推出"反重力模式",极大提升了机器人稳定性,使其在受干扰时能自行恢复站立。算 法升级后,理论上机器人可完成各类舞蹈、武术动作,这为即将发布的人形机器人增添更多期待。 3、 D ...
2025人工智能产业十大关键词
机器人圈· 2025-09-26 09:29
Core Insights - The 2025 Artificial Intelligence Industry Conference highlighted ten key trends in AI, emphasizing the convergence of technology, applications, and ecosystems, leading to a clearer vision of a smart-native world [1]. Group 1: Foundation Super Models - In 2025, foundational models and reasoning models are advancing simultaneously, with a comprehensive capability increase of over 30% from late 2024 to August 2025 [3][4]. - Key features of leading large models include the integration of thinking and non-thinking modes, enhanced understanding and reasoning abilities, and built-in agent capabilities for real-world applications [4][6]. - The emergence of foundational super models simplifies user interaction, enhances workflow precision, and raises new data supply requirements [6]. Group 2: Autonomous Intelligent Agents - Highly encapsulated intelligent agent products are unlocking the potential of large models, showing better performance in complex tasks compared to single models [9][10]. - Current intelligent agents still have significant room for improvement, particularly in long-duration task execution and interconnectivity [12]. Group 3: Embodied Intelligence - Embodied intelligence is transitioning from laboratory settings to real-world applications, with models being deployed in practical scenarios [15][16]. - Challenges remain in data quality, model generalization, and soft-hard coordination for effective task execution [18]. Group 4: World Models - World models are emerging as a core pathway to general artificial intelligence (AGI), focusing on capabilities like data generation, action interpretation, environment interaction, and scene reconstruction [21][22]. - The development of world models faces challenges such as unclear definitions, diverse technical routes, and limited application scope [22]. Group 5: AI Reshaping Software - AI is transforming the software development lifecycle, with significant increases in token usage for programming tasks and the introduction of advanced AI tools [25][28]. - The role of software developers is evolving into more complex roles, leading to the emergence of "super individuals" [28]. Group 6: Open Intelligent Computing Ecosystem - The intelligent computing landscape is shifting towards an open-source model, fostering collaboration and innovation across various sectors [30][32]. - The synergy between software and hardware is improving, with domestic hardware achieving performance parity with leading systems [30]. Group 7: High-Quality Industry Data Sets - The focus of AI data set construction is shifting from general-purpose to high-quality industry-specific data sets, addressing critical quality issues [35][38]. - New data supply chains are needed to support advanced technologies like reinforcement learning and world models [38]. Group 8: Open Source as Standard - Open-source initiatives are reshaping the AI landscape, with significant adoption of domestic open-source models and a growing number of active developers [40][42]. - The business model is evolving towards "open-source free + high-level service charges," promoting cloud services and chip demand [42]. Group 9: Mitigating Model Hallucinations - The issue of hallucinations in large models is becoming a significant barrier to application, with ongoing research into mitigation strategies [44][46]. - Various approaches are being explored to enhance data quality, model training, and user-side testing to reduce hallucination rates [46]. Group 10: AI as an International Public Good - Global AI development is uneven, necessitating international cooperation to promote equitable access to AI technologies [49][51]. - Strategies are being implemented to address challenges in cross-border compliance and data flow, aiming to make AI a truly shared international public good [51].
Google DeepMind researchers react to Nano Banana demos 🍌
Google DeepMind· 2025-09-24 17:26
I think the fact that people surprise us with a model we built is the best idea. So, so this is like a demo with nano banana hooked up into I think it's an studio demo. It's hooked onto a canvas and you can like drag these isometric shapes around.Oh, and you're so cool. I mean, we often thought of like Nano Banana as a single tool, as a single thing, but now actually this becomes more part of a pipeline. Wait, San Francisco. They merged San Francisco, New York halfway.What. Oh, no way. Oh, wow.Is that the B ...
放榜了!NeurIPS 2025论文汇总(自动驾驶/大模型/具身/RL等)
自动驾驶之心· 2025-09-22 23:34
Core Insights - The article discusses the recent announcements from NeurIPS 2025, focusing on advancements in autonomous driving, visual perception reasoning, large model training, embodied intelligence, reinforcement learning, video understanding, and code generation [1]. Autonomous Driving - The article highlights various research papers related to autonomous driving, including "FutureSightDrive" and "AutoVLA," which explore visual reasoning and end-to-end driving models [2][4]. - A collection of papers and codes from institutions like Alibaba, UCLA, and Tsinghua University is provided, showcasing the latest developments in the field [6][7][13]. Visual Perception Reasoning - The article mentions "SURDS," which benchmarks spatial understanding and reasoning in driving scenarios using vision-language models [11]. - It also references "OmniSegmentor," a flexible multi-modal learning framework for semantic segmentation [16]. Large Model Training - The article discusses advancements in large model training, including papers on scaling offline reinforcement learning and fine-tuning techniques [40][42]. - It emphasizes the importance of adaptive methods for improving model performance in various applications [44]. Embodied Intelligence - Research on embodied intelligence is highlighted, including "Self-Improving Embodied Foundation Models" and "ForceVLA," which enhance models for contact-rich manipulation [46][48]. Video Understanding - The article covers advancements in video understanding, particularly through the "PixFoundation 2.0" project, which investigates the use of motion in visual grounding [28][29]. Code Generation - The article mentions developments in code generation, including "Fast and Fluent Diffusion Language Models" and "Step-By-Step Coding for Improving Mathematical Olympiad Performance" [60].
X @Decrypt
Decrypt· 2025-09-22 19:30
Updates to Google DeepMind's Frontier Safety Framework highlight concerns that advanced AI can evade human control and sway user beliefs. https://t.co/56K1FGNKav ...
诺奖得主David Baker推出RFdiffusion3,颠覆蛋白质设计格局,开启全原子生物分子设计新时代
生物世界· 2025-09-22 04:14
Core Viewpoint - The article discusses the advancements in protein design using generative artificial intelligence, particularly focusing on the breakthrough of RFdiffusion3, which allows for atomic-level precision in designing proteins that can interact with specific small molecules, DNA, and other biomolecules [9][24]. Group 1: RFdiffusion3 Overview - RFdiffusion3 represents a significant advancement in protein design, enabling the design of proteins with atomic-level precision, including interactions with non-protein components [9][10]. - The model is built on previous versions, RFdiffusion and RFdiffusion2, and offers improvements in accuracy, efficiency, and versatility [10][28]. - RFdiffusion3 can handle complex atomic constraints, such as hydrogen bonds and solvent accessibility, and is capable of designing various interactions, including protein-protein, protein-small molecule, and protein-nucleic acid interactions [10][28]. Group 2: Performance and Applications - In benchmark tests, RFdiffusion3 demonstrated superior performance with a computational cost only one-tenth of previous methods, making it significantly more efficient [3][10]. - The model has shown excellent results in designing DNA-binding proteins and enzymes, achieving a binding activity of 5.89±2.15 μM for a designed DNA-binding protein and a Kcat/Km value of 3557 for a designed cysteine hydrolase [21][28]. - RFdiffusion3 has outperformed its predecessor in multiple target designs, producing an average of 8.2 unique successful clusters compared to 1.4 from RFdiffusion [15]. Group 3: Technical Innovations - The core innovation of RFdiffusion3 lies in its all-atom diffusion model, which allows for simultaneous simulation of protein backbone and side chains, as well as interactions with non-protein components [9][10]. - The model employs a unified representation of amino acids, standardizing them to 14 atoms, which facilitates the handling of varying side chain atom counts [13][14]. - The architecture is based on a Transformer U-Net, which includes downsampling, sparse transformer modules, and upsampling to predict coordinate updates [14]. Group 4: Future Implications - The introduction of RFdiffusion3 marks a paradigm shift in protein design, enabling unprecedented control over complex functionalities, such as specifying enzyme active sites and controlling hydrogen bond states [24][25]. - As the technology continues to evolve, it is expected to lead to innovative therapies, new types of proteases, and biomaterials, fulfilling the vision of "designing life molecules" [25].
微软 AI CEO:当 AI 的边际成本接近零,权力的分配变了
3 6 Ke· 2025-09-22 00:58
Group 1 - The core argument presented by Mustafa Suleyman is that AI's marginal cost is approaching zero, leading to a decentralization of power and making AI accessible to everyone [1][11][30] - AI is evolving from a "predictive system" to an "action agent," which means that the ability to execute tasks is being redistributed from large organizations to individuals [2][15][20] - The ability to use AI effectively is becoming a new form of power, as it allows individuals to perform tasks that previously required teams and resources [14][21][30] Group 2 - Suleyman emphasizes that AI is not just a tool but a form of actionable execution power that can perform tasks autonomously, such as sending emails, generating contracts, and managing schedules [10][12][30] - The reduction in marginal costs means that individuals can now achieve the same level of influence as large companies, as AI can handle complex tasks with minimal input [11][13][30] - The traditional power structures based on resource allocation and organizational processes are being disrupted, allowing individuals to take direct action with AI [20][22][30] Group 3 - The future of power will depend on who can effectively utilize AI to turn ideas into reality, as the ability to predict and execute tasks becomes a key determinant of influence [15][21][30] - Suleyman predicts that AI will soon possess perfect memory and the ability to make long-term, accurate predictions based on user interactions, further enhancing individual capabilities [21][30] - The shift in power dynamics signifies a move away from centralized control towards individual empowerment through AI [30]
X @Elon Musk
Elon Musk· 2025-09-20 02:44
🚀Dustin Tran (@dustinvtran):I departed Google DeepMind after 8 years. So many fond memories—from early foundational papers in Google Brain (w/ @noamshazeer @ashvaswani @lukaszkaiser on Image Transformer, Tensor2Tensor, Mesh TensorFlow) to lead Gemini posttraining evals to catch up & launch in 100 days, then ...