量子位
Search documents
告别“音画割裂”与“人物崩坏”!AutoMV:首个听懂歌词、卡准节拍的开源全曲级MV生成Agent
量子位· 2025-12-29 06:37
Core Viewpoint - The article discusses the introduction of AutoMV, a multi-agent system designed to automatically generate coherent and synchronized music videos (MVs) without the need for training, addressing the challenges faced by existing AI video generation models in creating full-length MVs [2][25]. Group 1: Challenges in Current AI Video Generation - Existing AI video generation models struggle with creating full-length MVs due to high costs (approximately $10,000) and lengthy production times (dozens of hours) for independent musicians [3]. - Three main challenges are identified: 1. Duration Limitations: Most models can only generate short clips, failing to cover entire songs [4]. 2. Audio-Visual Disconnection: Generated visuals often ignore musical beats, structure, and lyrical meaning [5]. 3. Inconsistency: Characters may change appearance, and scenes lack narrative coherence in longer videos [6]. Group 2: Introduction of AutoMV - AutoMV is a multi-agent collaborative system that simulates human filmmaking processes, designed to overcome the aforementioned challenges [7]. - The system operates in four main stages: music preprocessing, scriptwriting and directing, video generation, and verification [9][11]. Group 3: AutoMV Workflow - The system dissects music using professional tools to extract vocals, instrumentals, lyrics, timestamps, song structure, and emotional analysis [12]. - Gemini acts as the screenwriter, while Doubao serves as the director, generating prompts and keyframes for video creation [13][14]. - A unique verification step involves a Verifier Agent that checks for coherence, richness, and lip-sync accuracy in the generated video [15]. Group 4: Advantages of AutoMV - AutoMV significantly reduces production costs to approximately $15 while achieving quality close to professional standards [9]. - It demonstrates superior character consistency, action diversity, and narrative alignment with lyrical themes compared to existing commercial products [18][20]. - The system has been evaluated using the M2V Benchmark, which includes 30 diverse songs and 12 detailed evaluation criteria [20][23]. Group 5: Future Prospects - AutoMV offers an open-source, training-free framework that addresses key issues in long-form music video generation, providing a low-cost creative tool for independent musicians [25]. - Although the current generation time for a complete MV is around 30 minutes, there is potential for improvement as underlying video generation models evolve [25].
AI医生终于有了硬标尺!全球首个专病循证评测框架GAPS发布,蚂蚁联合北大王俊院士团队出品
量子位· 2025-12-29 06:37
允中 发自 凹非寺 量子位 | 公众号 QbitAI 蚂蚁健康 与北京大学人民医院 王俊 院士团队历时6个多月,联合十余位胸外科医生共同打磨,发布了 全球首个大模型专病循证能 力的评测框架 —— GAPS (Grounding, Adequacy, Perturbation, Safety) ,及其配套评测集 GAPS-NSCLC-preview。 旨在解决现有医疗AI评测局限于考试式问答、缺乏临床深度、完整性、鲁棒性与安全性综合评估的问题。 该评测集聚焦 肺癌 领域,包含 92个问题 、覆盖 1691个临床要点 ,并配套全自动化的评测工具链,通过指南锚定、多智能体协 同实现从问题生成、评分标准制定到多维度打分的端到端自动化。 目前,相关成果已应用于"蚂蚁阿福",论文《GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians》、配套评测集GAPS-NSCLC-preview、自动化评测框架已全面公开。 这项研究客观评价了大模型的临床能力:当前主流医疗大模型虽已具备"医学百科全书"般的知识广度,但在临床实践中仍处于 ...
ViT一作盛赞:这个中国开源“PS模型”强过Nano Banana
量子位· 2025-12-29 04:32
也就是说现在图片元素也支持精细化修改了: 梦瑶 发自 凹非寺 量子位 | 公众号 QbitAI 太香了太香了,妥妥完爆ChatGPT和Nano Banana! 刚刚,ViT核心作者、Meta超级智能团队成员 Lucas Beyer 连发三条帖子,怒赞通义千问不久前发布的开源模型 Qwen—Image— Layered 。 在他看来,这才是图像生成的正确打开方式~ 他还顺便自补了一句:这个模型方向自己其实也想做来着,只是太忙,一直没来得及动手……(笑) 实话实说,Qwen—Image—Layered模型确实不一般,因为它可以让我们真正实现ps级别的 拆图自由 。 Qwen—Image—Layered 模型的核心能力,就是专治「一图定生死」这事儿的。 它能将一张普通图片分解成多个包含透明度信息的 RGBA分离图层 ,实现真正意义上的图片素材的可编辑性。 光说概念有点抽象,咱直接看例子~ 连网友们看了 模型效果后都不禁感叹:咋有种开源PhotoShop的感觉,amazing啊~ 所以,这套让Lucas Beyer反复点赞的模型到底强在哪儿,咱一起来看! 图片也能像PS一样拆拆拆了 如果说Nano Banana技能点 ...
良心老黄不搞硅谷资本家那套!Groq人均套现500万美元
量子位· 2025-12-29 04:32
Core Viewpoint - Nvidia's acquisition of Groq for $20 billion is not just about technology but also involves significant compensation for Groq's employees and shareholders, effectively a "talent acquisition" strategy [2][10][19]. Group 1: Acquisition Details - Nvidia's acquisition includes not only technology rights but also a commitment to Groq's employees and shareholders, with a valuation that has tripled from previous estimates [3][19]. - 90% of Groq's team will be integrated into Nvidia, with each employee receiving an average of $5 million [4][20]. - Groq will continue to operate as an independent entity, with its cloud service platform GroqCloud remaining active [8]. Group 2: Employee and Shareholder Compensation - Employees will receive cash for vested shares and Nvidia stock for unvested shares, with a significant portion of the compensation being accelerated [11][12]. - Employees who have been with Groq for less than a year will still receive some compensation, as Nvidia waived the typical vesting cliff [15][16]. - Shareholders, including major investors like Disruptive and Blackstone, will receive dividends based on the $20 billion valuation [17][19]. Group 3: Market Context and Implications - The acquisition reflects a broader trend where companies prefer "acquisition-style hiring" to avoid antitrust scrutiny while gaining access to key technologies and talent [21][22]. - Nvidia's financial strength, with $60.6 billion in cash and short-term investments, enables such large-scale acquisitions [32]. - The deal signifies Nvidia's recognition of the need to adapt to changing AI technology landscapes, particularly in inference capabilities [44][45].
救命!和漫画角色聊上头了,AI陪伴的新答案有了
量子位· 2025-12-29 02:03
Core Viewpoint - The article discusses the innovative AI companion interactive comics launched by Kuaikan, which integrate AI into existing comic narratives, allowing users to engage deeply with characters and stories, addressing common issues in current AI companion products [11][54]. Group 1: Product Features - The AI companion product allows users to "soul travel" into comic worlds, interacting with characters in real-time, thus altering the ongoing story [6][8]. - Unlike traditional AI companions that require users to create character backgrounds, this product embeds AI into established comic characters, providing a richer interaction experience [10][26]. - Users can engage in daily conversations with characters that are contextually relevant to the ongoing story, enhancing the depth of interaction [31][32]. Group 2: User Engagement - The new format appeals to two user groups: those tired of mechanical AI interactions and core comic fans seeking deeper character engagement [13][56]. - The product has shown a 50% increase in user retention compared to traditional comics, indicating a shift towards a more social and engaging relationship with characters [56]. Group 3: Technical Collaboration - Kuaikan collaborates with various AI companies to enhance the interactive experience, ensuring that the AI can respond accurately within the narrative context [62]. - The integration of multiple AI technologies supports character interactions and dialogue generation, creating a more immersive experience for users [64]. Group 4: Financial Performance - During the testing phase, the new product saw a nearly threefold increase in weekly paid subscriptions compared to traditional reading products, with a 130% rise in average weekly user spending [65].
老黄200亿「钞能力」回应谷歌:联手Groq,补上推理短板
量子位· 2025-12-28 06:59
Core Viewpoint - Nvidia's acquisition of Groq for $20 billion signifies a strategic move to enhance its capabilities in the AI inference market, addressing concerns over competition from Google's TPU and other emerging chip paradigms [2][3][28]. Group 1: Nvidia's Strategic Acquisition - Nvidia's $20 billion investment in Groq aims to secure a foothold in the rapidly evolving AI landscape, particularly in inference technology [2][28]. - The acquisition reflects Nvidia's recognition of its vulnerabilities in the inference segment, especially against competitors like Google [31][34]. Group 2: Groq's Technological Advantages - Groq's LPU (Logic Processing Unit) outperforms GPUs and TPUs in inference speed, capable of processing 300-500 tokens per second, making it significantly faster due to its on-chip SRAM storage [21][22]. - The LPU's architecture allows for better performance in the decode phase of inference, where low latency is critical for user experience [11][17]. Group 3: Market Dynamics and Challenges - The shift in AI competition from training to application emphasizes the importance of speed in user experience, which Groq's technology addresses [30]. - Despite the advantages, Groq's LPU has a smaller memory capacity (230MB) compared to Nvidia's H200 GPU (141GB), necessitating a larger number of LPU chips for model deployment, which could lead to higher overall hardware costs [24][26][27]. Group 4: Implications for Nvidia - The acquisition of Groq is seen as a necessary step for Nvidia to fend off potential disruptions in the AI market, similar to how it previously disrupted competitors in the gaming sector [28][32]. - The inference chip market is characterized by high volume but low margins, contrasting sharply with the high-profit margins associated with GPUs, indicating a challenging new landscape for Nvidia [34].
量子位编辑作者招聘
量子位· 2025-12-28 03:06
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 参与核心采访,对话产业专家、技术大牛、撰写AI云落地案例。 任职要求: AI财经商业方向 岗位职责: 任职要求: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 ...
Ruby 4.0正式发布!推出全新编译器+原生隔离环境,网友:没有它圣诞都不完整
量子位· 2025-12-28 03:06
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 30周年之际,Ruby语言带着全新的4.0版本,给开发者送上了年终大礼。 新 增隔离命名空间 、 新的JIT编译器 ,还有重设计的 Ractor API ,这款开源语言迎来一系列更新。 Ruby是一种开源的面向对象脚本语言,在20世纪90年代由日本人松本行弘开发,遵守GPL协议和Ruby License。 其主要特性就是简单快捷,变量没有类型、任何东西都有值,不需要注释就可以读懂。 对于这次更新,网友们给予了高度评价,表示要是没有Ruby更新,连圣诞节都不完整了。 那么,30岁的Ruby,这次都迎来了哪些更新呢? 全新编译器ZJIT Ruby 4.0中,Rails at Scale团队正式推出了名为 ZJIT 的全新即时编译器(Just-In-Time Compiler)。 这是一种一种传统的方法级编译器,核心架构采用了 静态单赋值 (SSA,Static Single Assignment)形式的中间表示,旨在突破现有YJIT 编译器的性能上限。 传统的Ruby解释器是逐行执行代码,效率较低,而JIT编译器则是将热点代码转换成机器码。 其中,YJIT的 ...
12毫秒暴露自动驾驶致命缺陷,北航新研究实现场景感知的动态物理对抗攻击|TPAMI2025
量子位· 2025-12-28 03:06
DynamicPAE团队 投稿 量子位 | 公众号 QbitAI 近日,部分L3级自动驾驶车型已经通过工信部批准正式上路,这标志着这我国自动驾驶产业的新阶段。 然而,假设你正乘坐自动驾驶汽车在高速上行驶,前方道路上出现了一个具有看似正常但实则为恶意生成纹理外观的障碍物,而你的自动驾 驶车辆感知系统可能并未准确识别,可能因错判、漏判引发严重事故。 这类对智能系统具有诱导性且可以在真实世界中复现的纹理,正是 物理对抗样本 (PAE, Physical Adversarial Examples) 。 无论是为发动PAE攻击还是防范PAE攻击,生成足够的PAE样本都至关重要。 目前已有不少方法研究如何生成PAE,但它们往往以静态场景为前提,无法有效应对动态变化 (环境、如光、物体运动等) 的现实环境。 因此,如何实时生成适应不同场景的物理对抗样本,成为智能安全领域亟待解决的问题。 北京航空航天大学等机构提出了 DynamicPAE框架 ,开创性地实现了实时场景感知的动态PAE生成方法。 该方法通过对抗训练中的反馈问题,结合残差引导的对抗模式探索和场景对齐技术,实现了PAE在动态场景中的毫秒级生成。 该工作被 IEEE ...
国足缺席世界杯,但中国大模型们集体参赛
量子位· 2025-12-28 03:06
Core Viewpoint - The article discusses the upcoming AlphaGoal Prediction Cup, an AI competition organized by Lenovo, where Chinese large models will compete in predicting football match outcomes, marking a significant shift from traditional AI applications to real-world engagement [4][25][34]. Group 1: Event Overview - The AlphaGoal Prediction Cup will feature eight major Chinese AI models competing against each other and against AI agents created by fans and developers [6][10]. - This event is described as a historic first for public participation in AI predictions, potentially transforming the experience of football from mere observation to active involvement [8][27]. Group 2: Participating Models - The eight participating models include notable players such as Baidu's Wenxin Yiyan, Tencent's Hunyuan, and SenseTime, each with unique strengths in data processing and prediction capabilities [14][15]. - The competition aims to challenge these models to predict match outcomes using a variety of data points, including player statistics, historical match data, and even social media sentiment [22][17]. Group 3: Significance of the Event - The AlphaGoal Prediction Cup is positioned as a pivotal moment for AI, moving beyond traditional testing environments to engage with the complexities of the real world, akin to previous landmark human-AI competitions [29][34]. - The event is expected to demonstrate AI's ability to understand causality and not just correlation, marking a step towards general artificial intelligence [35][34]. Group 4: Lenovo's Role - Lenovo, as the organizer and official technology partner of FIFA, is facilitating this competition to connect AI models with real-world applications, positioning itself as an ecosystem organizer rather than just a hardware provider [38][39]. - The Lenovo Tianxi AI platform, with over 280 million monthly active users, serves as a crucial interface for these AI models to reach and engage with a broad audience [40][41].