量子位
Search documents
DeepMind强化学习掌门人David Silver离职创业!Alpha系列AI缔造者,哈萨比斯左膀右臂
量子位· 2026-01-31 01:34
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 强化学习大神David Silver ,离开DeepMind了。 这位在DeepMind待了整整15年的元老级研究员已经出走,创办自己的AI公司 Ineffable Intelligence 。 根据注册文件显示,这家公司早在2025年11月就已悄然成立,Silver本人于2026年1月16日被正式任命为公司董事。 在正式离职DeepMind前的几个月里,他也一直处于休假状态。 Ineffable Intelligence总部设在伦敦,目前正在积极招募AI研究人才并寻求风险投资。 Google DeepMind的发言人证实了Silver的离职,并对其在职期间的贡献表示感谢。 除了在谷歌 DeepMind 的工作之外,Silver还是伦敦大学学院的教授,他将继续保持这一职务。 他于2010年公司成立之初便加入,彼时DeepMind还只是一个小团队,Silver和Demis Hassabis在剑桥读大学时是老朋友,他们还一同创办 过游戏公司Elixir Studios。 2016年,他领导开发的AlphaGo击败围棋世界冠军李世石,成为AI发展史上的标志性事件 ...
谷歌Genie 3暴击游戏公司市值!GTA开发商缩水10%,游戏引擎Unity暴跌21%
量子位· 2026-01-31 01:34
Core Viewpoint - Google has officially launched the experimental research prototype Project Genie, which allows users to create and interact with 3D worlds using AI technology [1][17]. Group 1: Project Genie Overview - Project Genie is an experimental research prototype that separates the core capabilities of Genie 3, combining features from Genie 3, Nano Banana Pro, and Gemini into a web application [18][19]. - The main functionalities of Project Genie include the ability to "build" worlds using text or images, "enter" generated worlds for exploration, and "modify" existing worlds based on prompts [20][22][24]. Group 2: User Experience and Community Engagement - Users have quickly engaged with Project Genie, showcasing their creativity by generating various 3D models and scenes, such as a flying simulator and a realistic wolf hunting in a jungle [26][38]. - The platform allows for high levels of detail and interaction, with users able to create dynamic environments and characters that respond to user inputs [40][41]. Group 3: Community Feedback and Limitations - While many users praised Project Genie for its capabilities, some expressed disappointment with the model's performance in handling specialized content like CAD [44][45]. - The project is still in its experimental phase, indicating that it may not yet be a fully mature and stable AI tool [47].
大事不好!机器人学会预测未来了
量子位· 2026-01-30 13:34
Core Viewpoint - The article discusses the groundbreaking advancements made by Ant Group's LingBot-VA, which represents a significant leap in robot control by enabling robots to predict future actions before executing them, thus enhancing their decision-making capabilities [2][11][56]. Group 1: Technological Innovations - LingBot-VA introduces a causal video-action world model that allows robots to visualize future scenarios before taking action, moving beyond the traditional "observe-react" model [6][12]. - The model features strong memory retention, enabling it to remember previous actions during long sequences, and demonstrates high adaptability with minimal training samples [8][10]. - The architecture separates visual understanding and action control, enhancing sample efficiency and generalization capabilities [14][15]. Group 2: Performance and Testing - In real-world tests, LingBot-VA successfully handled complex tasks such as preparing breakfast and manipulating delicate objects, showcasing its stability and precision [34][36]. - The model achieved a success rate of 92.93% in the RoboTwin 2.0 benchmark for easy tasks, outperforming competitors by 4.2% [40]. - In the LIBERO benchmark, LingBot-VA set a new state-of-the-art record with a 98.5% average success rate [42]. Group 3: Industry Impact - The continuous open-sourcing of LingBot-VA and its related projects signals a shift towards a video-centric approach in robotics, where video becomes a medium for reasoning and action [46][48]. - The advancements in LingBot-VA position world models as a central capability in robotics, evolving from mere action to thoughtful decision-making [49][56]. - The ripple effect of these innovations is evident, with increased attention from global tech companies and media, indicating a strategic move in the competitive landscape of robotics [52][56].
天下苦CUDA久矣,又一国产方案上桌了
量子位· 2026-01-30 13:34
Core Viewpoint - The article emphasizes that while domestic computing infrastructure has improved, the real challenge for developers lies in the usability of these systems, particularly in the context of AI development, where the existing software ecosystem remains heavily reliant on established foreign tools and frameworks [1][2]. Group 1: Current State of AI Development - The AI landscape is vibrant with numerous models being released, yet the underlying software ecosystem's maturity is a significant bottleneck for deployment efficiency [11][12]. - The development of high-performance operators (算子) is crucial as they serve as the "translators" between AI algorithms and hardware, impacting inference speed, energy consumption, and compatibility [13][14]. Group 2: KernelCAT Introduction - KernelCAT is introduced as a local AI agent designed to accelerate computing and facilitate model migration, capable of handling both specialized tasks and general software engineering duties [17]. - Unlike traditional tools, KernelCAT combines intelligent code understanding and optimization with operational research algorithms to automate parameter tuning, significantly reducing the time and effort required for optimization [21][22]. Group 3: Performance and Competitive Edge - In tests, KernelCAT demonstrated superior performance compared to both open-source and commercial operators, achieving execution times as low as 0.0077 ms for 1M scale tasks, which translates to acceleration ratios exceeding 200% [26]. - KernelCAT's unique approach allows it to optimize operators effectively, showcasing its potential to compete with established solutions in the market [25][27]. Group 4: Ecosystem Challenges - The article highlights that over 90% of significant AI training tasks currently run on NVIDIA GPUs, with a developer ecosystem that includes over 5.9 million users and more than 400 operators, indicating a substantial barrier for domestic alternatives [28][30]. - The success of NVIDIA is attributed to its comprehensive control over software and algorithms, underscoring the importance of a mature ecosystem for hardware performance to be fully realized [32]. Group 5: Future Directions - KernelCAT represents a shift towards building self-evolving computational foundations, moving away from reliance on existing ecosystems to developing capabilities that can adapt and grow independently [39]. - The article concludes with an invitation for users to experience KernelCAT, indicating its ongoing development and potential for broader adoption in the industry [40].
5秒出4张2K大图!阿里提出2步生成方案,拉爆AI生图进度条
量子位· 2026-01-30 11:02
Core Insights - The article discusses advancements in AI image generation, particularly focusing on the Qwen model, which has significantly reduced image generation time from nearly one minute to just 5 seconds for 4 high-definition images [1][3]. Group 1: Model Performance Improvements - The Qwen model's latest open-source version has achieved a state-of-the-art (SOTA) compression level, reducing the forward computation steps from 80-100 to just 2 steps, resulting in a 40-fold speed increase [2]. - The introduction of the DMD2 algorithm has shifted the constraints from sample space to probability space, enhancing the quality of generated images by addressing detail loss issues [8][10]. - The Reverse-KL loss design in DMD2 allows the student model to generate images independently while receiving guidance from the teacher model, improving detail and realism in the generated images [11][12]. Group 2: Challenges and Solutions - Traditional trajectory distillation methods faced challenges in generating high-quality images with low iteration steps, often resulting in blurry outputs due to insufficient learning of detailed features [6][7]. - To mitigate distribution degradation issues, the team implemented a "warm start" using PCM distillation, which significantly improved the model's ability to generate realistic shapes [14][17]. - The introduction of adversarial learning (GAN) further enhanced the student model's performance by improving texture and detail in generated images [20][26]. Group 3: Future Directions - The team plans to continue releasing faster and more effective generative models, addressing limitations in complex scenarios where noise reduction steps may still require improvement [32]. - Ongoing efforts will focus on developing and iterating more diffusion acceleration technologies, with an emphasis on open-source contributions to the community [33][35]. - The advancements will be made available on the Wuli AI platform, aiming to provide accessible creative tools for designers, content creators, and AI enthusiasts [36].
国内首个!360发布“纳米漫剧流水线”,AI漫剧生成进入工业化时代
量子位· 2026-01-30 11:02
Core Insights - The AI comic drama industry is experiencing rapid growth with an annual increase of 80%, and the market size is projected to exceed 200 billion by 2025 [7] - Despite the growth, the industry faces significant challenges, including low success rates in content generation, which average only 15%, leading to inefficiencies and waste [8] - 360 has launched the "Nano Comic Drama Production Line," aiming to streamline the production process and improve efficiency, achieving a material generation success rate of over 90% [3][12] Industry Challenges - Traditional AI tools suffer from issues such as "black box generation, quality control failures, and content homogenization," which hinder innovation and sustainable development in the industry [8] - The current production methods struggle to balance high output with quality, often resulting in either low-quality products or time-consuming processes [10] 360's Solution - The "Nano Comic Drama Production Line" integrates script analysis, asset generation, storyboard creation, and dynamic composition into a unified workflow, enhancing content quality and production capacity [11] - The platform has partnered with leading companies in the film and comic drama sectors to explore new production models based on industrialized processes [11] - The public testing phase of the platform has been launched, allowing users to experience the new production capabilities [13] Production Efficiency - The platform significantly reduces production time, with single episodes being produced in 30 minutes to 1 hour, achieving a speed three times faster than mainstream tools [14] - It employs a dual-mode interaction of "production line advancement + intelligent canvas adjustment" to facilitate the efficient output of high-quality content [14] Creative Control - The platform features a dedicated "video world model" that ensures consistency in style and narrative throughout the production process, allowing creators to focus on storytelling and creativity [14] - It supports film-level storyboard design and dynamic storytelling, ensuring 100% control over the creative process while allowing for unique visual styles [14]
量子位编辑作者招聘
量子位· 2026-01-30 11:02
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
这个真人版《火影忍者》竟然是AI做的,来自中国AI视频新王者Vidu Q3
量子位· 2026-01-30 11:02
Core Viewpoint - The article highlights the rapid advancements in AI video generation technology, particularly focusing on the capabilities of Vidu Q3, which can generate 16-second audio and video outputs seamlessly, showcasing significant improvements in narrative and visual quality [2][5][40]. Group 1: Vidu Q3 Features - Vidu Q3 is the first AI model globally to support the simultaneous generation of 16 seconds of audio and video, producing outputs that closely resemble original anime scenes [2][5]. - The model supports multiple languages, including Chinese, English, and Japanese, enhancing its usability across different markets [3]. - Vidu Q3 has achieved recognition from Artificial Analysis, ranking first in China and second globally, surpassing competitors like Elon Musk's Grok and Google's Veo [5]. Group 2: Technical Capabilities - The AI can generate video and audio in one go, with features like free switching of camera angles and transitions, and it supports a resolution of 1080P, which can be enhanced to 4K [6]. - The model demonstrates complete narrative capabilities, with precise text rendering and the ability to understand and incorporate contextual audio effects, such as background sounds and character expressions [19][22]. Group 3: Industry Evolution - The evolution of AI video generation has been rapid, with significant advancements occurring in less than nine months, contrasting sharply with the historical timeline of human cinema development [33][35]. - The introduction of audio-video integration marks a shift from visual-only generation to a multi-modal approach, indicating a deeper understanding of the relationship between sound and visuals [38][40]. - Vidu Q3's ability to produce coherent narratives within a 16-second timeframe signifies a leap in AI's storytelling capabilities, suggesting that future developments in AI video generation may come even faster than anticipated [40][41].
LeCun离职后不止创一份业!押注与大模型不同的路线,加入硅谷初创董事会
量子位· 2026-01-30 04:23
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 离开Meta这座围城后,Yann LeCun似乎悟了"不要把鸡蛋装在同一个篮子里"。 一边,他亲手打造了自己的初创公司AMI,试图在世界模型这条赛道上大展拳脚;同时,他的目光又投向了硅谷的另一角。 就在最近, LeCun正式宣布加入一家名为Logical Intelligence的初创公司,担任技术研究委员会的创始主席。 挺有意思的。因为Logical Intelligence选择了一条与当前主流大模型 (LLM) 截然不同的技术路线。 该公司主推的是一种 能量-推理模型,"更擅长学习、推理和自我纠正"。 在数独游戏测试上,Logical Intelligence推出的模型Kona不到1s就正确完成了数字填写, 而GPT 5.2、Claude Opus 4.5、Claude Sonnet 4.5都跑了100s了,还没个结果…… | さ | | KONA 1.0 EBM | | | | | | Done in 0.72s | V | GPT 5.2 Running. . . 99.10s DK | | --- | --- | --- | --- | --- ...
花几百万开发布会结果无人问津?或许你该看看这个…
量子位· 2026-01-30 04:23
允中 发自 凹非寺 量子位 | 公众号 QbitAI 2025年,科技公司总算是悟了: 技术再硬,没人看见,等于没发生过。 发布会讲不动了,白皮书出不了圈,参数表也只在行业内自嗨。 但与此同时,另一条路正在变宽——无人驾驶被拍进真实街景,AI潜入了创作者的生活日常,前沿技术不再是用来"被科普"的知识点,而是 被当场验证、甚至随手吐槽的生活方式 。 科技内容不再束之高阁,开始像生活碎片一样,能 被刷到、被玩梗、被二创 。 这时,一个新的问题浮出水面:这还是我们印象中的科技传播吗? 答案很可能是 否定 的。 接下来发生的一切,更像是一场关于 "谁能让技术拥有大众体温" 的竞争。而在这场竞争中,抖音正意外地站到舞台中央。 短视频把技术拉回了第一现场 很多人其实并不知道,中国自动驾驶已经跑到什么位置了。 如果你在抖音刷到科技创作者 林亦 那条视频,就会对这个问题有了具体坐标。 在《中国无人出租横扫阿布扎比?深度探访:中国科技如何征服中东土豪》里,他亲身去到了阿联酋阿布扎比,体验了中国头部自动驾驶企业 文远知行 在当地的Robotaxi运营情况。 不是展厅,不是PPT,也不是发布会现场,而是日常打车场景,非常稀松平常。 ...