量子位
Search documents
不靠死记布局也能按图生成,多实例生成的布局控制终于“可控且不串脸”了丨浙大团队
量子位· 2025-12-19 07:20
浙江大学ReLER团队 投稿 量子位 | 公众号 QbitAI 尽管扩散模型在单图像生成上已经日渐成熟,但 当任务升级为高度定制化的多实例图像生成 (Multi-Instance Image Generation, MIG) 时 ,挑战随之显现: 如何在实现空间布局控制的同时,保持多主体身份与参考图像高度一致? 现有方法在面对需要宏观的布局控制和微观的身份注入的复杂任务时 常常陷入两难 。 能显式控制布局的方法,往往无法利用参考图像来对实例进行定制。 而能以参考图像为指导的方法,则难以实现对布局的精确控制,且在实例数量增加时面临着严重的身份信息丢失问题。 为解决这一制约自定义图像生成的技术瓶颈, 浙江大学ReLER团队发布基于DiT的新框架ContextGen 。 该框架通过分层解耦上下文,解决布局控制与身份保真度的难题,并在多项关键指标上取得了SOTA突破。 机制创新:布局与身份的协同控制 ContextGen的核心在于提出了双重上下文注意力机制,将复杂的全局控制和局部注入任务,并在DiT的不同层级进行部署。 Contextual Layout Anchoring (CLA):宏观布局锚定 CLA机制将包含 ...
量子位编辑作者招聘
量子位· 2025-12-19 07:20
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 任职要求: AI财经商业方向 岗位职责: 任职要求: AI产品方向 岗位职责: 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术 ...
当年带你上网冲浪的头号老玩家,这回是真AI上头了
量子位· 2025-12-19 07:20
Core Viewpoint - QQ Browser has successfully transformed into an AI browser, leveraging Tencent's self-developed large model capabilities to enhance user experience across various scenarios, including AI search, browsing, learning, and office tasks [2][57]. Group 1: Transformation and Features - QQ Browser has shifted its product direction towards AI, introducing the QBot intelligent agent and achieving comprehensive AI integration [2][3]. - The browser has ranked highly in multiple authoritative lists in the AI Agent sector, indicating its strong performance in the industry [3]. - The evolution of QQ Browser over the past fifteen years reflects a consistent logic of simplifying complex capabilities and returning control to users [8][57]. Group 2: User Experience and AI Integration - The upgraded QQ Browser interface prioritizes AI functionality, allowing users to seamlessly switch between traditional search engines and AI dialogue [14][15]. - The AI+Mini Window feature integrates over ten AI capabilities, enhancing user efficiency without disrupting browsing flow [18][20]. - Key functionalities like webpage summarization, mind mapping, and translation are designed to assist users in managing lengthy content and improving reading efficiency [23][25][29]. Group 3: Agent Capabilities - The QBot Agent Center consolidates various agents capable of completing tasks, addressing traditional browser limitations [34]. - The AI Video Assistant offers features such as multi-language subtitle generation and content summarization, enhancing the video viewing experience [36][38]. - The AI Subscription Assistant efficiently aggregates and tracks relevant information, significantly reducing the time spent on manual searches [41][42]. Group 4: Mobile Expansion and Ecosystem - QQ Browser's AI capabilities have expanded to mobile platforms, providing comprehensive document handling and educational tools tailored for students [51][53]. - The integration with Tencent's ecosystem allows users to access various services without switching applications, streamlining the user experience [55]. - The shift towards AI in browsers reflects a broader industry trend of moving from simple information retrieval to task completion [56].
首个文本到3D生成RL范式诞生,攻克几何与物理合理性
量子位· 2025-12-19 07:20
3DGenR1团队 投稿 量子位 | 公众号 QbitAI 在大语言模型和文生图领域,强化学习 (RL) 已成为提升模型思维链与生成质量的关键方法。 但当我们将目光转向更为复杂的文本到3D生成时,这套方法还会还管用吗? 近期,一项由 西北工业大学、北京大学、香港中文大学、上海人工智能实验室、香港科技大学合作 开展 的研究系统性探索了这一重要问 题。 论文链接: https://arxiv.org/pdf/2512.10949 代码链接: https://github.com/Ivan-Tang-3D/3DGen-R1 强化学习是否能够用于Text-to-3D生成,以加强3D自回归模型的逐步推理与生成过程? 在LLM推理和2D文生图中,RL已经证明可以显著提升CoT推理能力和生成质量。但 3D物体更长、更稠密、更具几何约束 。 因此相关方向研究常面临这几个问题: Progressive Investigation:四个层次拆解Text-to-3D+RL 1. Reward设计层 1. 奖励如何同时刻画语义对齐、几何一致性和视觉质量? 2. 现有RL算法是否适合自回归式3D生成? 3. 缺乏专门考察"3D推理能力 ...
DeepMind掌门人万字详解通往AGI之路
量子位· 2025-12-19 07:20
Core Viewpoint - Achieving AGI requires a balanced approach of technological innovation and scaling, with both aspects being equally important [2][55]. Group 1: Path to AGI - Demis Hassabis outlines a realistic path to AGI, emphasizing that 50% of efforts should focus on model scaling and 50% on scientific breakthroughs [5]. - The success of AlphaFold demonstrates AI's potential to solve fundamental scientific problems, with ongoing research expanding into materials science and nuclear fusion [5][9]. - Current AI models rely heavily on human knowledge, and the next goal is to develop autonomous learning capabilities similar to AlphaZero [5][27]. Group 2: AI Performance and Limitations - AI exhibits a "jagged intelligence" phenomenon, performing well in complex tasks like the International Mathematical Olympiad but struggling with basic logical problems [5][19]. - The need for models to improve self-reflection and verification capabilities is highlighted, as current systems often provide incorrect answers when uncertain [5][57]. - The introduction of confidence mechanisms is necessary to address the hallucination problem, where models generate plausible but incorrect responses [5][56]. Group 3: World Models and Simulation - World models enhance understanding of physical dynamics and sensory experiences, which language models struggle to convey [5][69]. - The use of simulation environments for training AI agents can lead to infinite task generation and complex behavior training, potentially aiding in the exploration of life and consciousness origins [5][80]. - The Genie project exemplifies the potential of interactive world models, which could be applied in robotics and general assistance [5][70]. Group 4: Commercialization and Social Risks - The commercialization of AI poses social risks, and there is a need to avoid the pitfalls of social media's focus on user engagement [5][101]. - Building AI personas that support scientific reasoning and personalized feedback is essential to prevent echo chambers [5][105]. Group 5: Scaling and Innovation - Despite discussions of scaling challenges, the release of Gemini 3 indicates that significant progress continues to be made [5][50]. - The combination of top-tier research capabilities and infrastructure, such as TPUs, positions the company favorably for ongoing innovation and scaling [5][54]. Group 6: Future of AI and AGI - The integration of various projects, including Gemini and world models, is crucial for developing a unified system that could serve as a candidate for AGI [5][114]. - The potential societal impacts of AGI necessitate proactive planning for labor transitions and economic adjustments, similar to lessons learned from the Industrial Revolution [5][118].
LeCun创业首轮估值247亿!Alexandre当CEO
量子位· 2025-12-19 01:01
这家名为 Advanced Machine Intelligence Labs (AMI Labs)的新公司,计划于明年一月正式亮相,目标估值 30亿欧元 (约247亿人 民币)。 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI LeCun在Meta的Last Day还没来,新公司又被曝出更多细节。 前脚LeCun本人在播客当中宣布了新公司名称,现在融资和估值目标就被《金融时报》曝光了。 AMI Labs的研究方向,就是LeCun一直主推的"世界模型",而且将走开源路线,老东家Meta也将与其保持合作。 另外,曝料也透露了AMI Labs的 CEO人选并非LeCun本人 ,而是他的一位老部下。 LeCun不当CEO 新公司AMI Labs定于2026年1月在巴黎正式启动,在Meta逐渐转向封闭生态的背景下,LeCun选择了他在学术界一贯坚持的开源路线。 而且在技术层面,AMI Labs选择了比主流的LLM更具挑战性的道路—— 死磕"世界模型" 。 因为在LeCun看来,基于自回归机制的LLM存在根本性的逻辑缺陷,它们只是在统计概率上预测下一个字符,并不真正理解物理世界的运行规 律。 为此,新公司将通过 ...
认知偏差、落地断层、体验割裂是目前AI产品的三大痛点|百度王颖@MEET2026
量子位· 2025-12-19 01:01
Core Insights - The article discusses the evolution of AI from a conversational partner to an action assistant, highlighting the increasing complexity of tasks that users face despite advancements in AI capabilities. It identifies three main challenges: cognitive bias, implementation gaps, and fragmented experiences [1][5][14]. Group 1: AI Challenges - Cognitive bias, implementation gaps, and fragmented experiences are identified as the three major pain points for users of AI products [5][14]. - Users often experience a disconnect between AI's capabilities and their ability to execute complex tasks, leading to frustration [14]. Group 2: GenFlow and AI Development - GenFlow serves as the core scheduling hub for Baidu's super personal intelligent agent framework, achieving a monthly active user base in the tens of millions, making it the largest general-purpose intelligent agent globally [5][10]. - The recently updated GenFlow 3.0 integrates a memory system that allows it to retain user interactions and preferences, enhancing personalization [13][17]. Group 3: Product Innovations - Baidu Wenku launched the AI learning platform OREATE AI, which has surpassed 1.4 million monthly active users within a month of its launch and topped the ProductHunt global daily rankings [37]. - Baidu Wangpan has expanded its services to 175 countries, featuring multilingual subtitles and AI camera functionalities, receiving positive feedback globally [39]. Group 4: Future Vision - The vision is to create a super personal intelligent agent that empowers users to become super individuals, enhancing their capabilities in various tasks [8][9]. - The integration of Office Agent and GenX aims to facilitate seamless collaboration between users and AI, enhancing productivity and creativity [20][28].
不儿,这谁还能看出是AI演的视频啊
量子位· 2025-12-18 09:26
金磊 发自 凹非寺 量子位 | 公众号 QbitAI 这一次,我真的分不清 视频到底是不是AI生成 的了。 来,咱们先来看一下这段 演技飙升 的视频片段: Prompt:女子泣不成声,说台词:"江辰……你一定要活着回来,好吗?……答应我"。女子边说话边将右手抬起抚摸男子的脸。背景 音乐伤感。影视级。 这台词、这演技、这眼神、这口型,不说是AI生成的,一般人绝对会以为是哪个电影里的片段。 但重点还不是效果的逼真—— 因为这10s的片段,人物对白配音、视频背景音乐和音效,统统都是通过上面的Prompt 一锅出 的。 这就是刚刚 火山引擎 在FORCE原动力大会上推出的最新 豆包视频生成模型Seedance 1.5 Pro 。 主打的就是 音画高精同步,一镜入戏 。 就这个功能一出,打造一个有趣好玩的小短片,那真是分分钟的事情了。 例如我们以这位AI女主角为原型: 然后就可以用Seedance 1.5 Pro搞一个"川剧"—— 《至辣园》 : 从这两个实测案例中,我们不难看出,这次豆包视频生成模型Seedance 1.5 Pro整体亮点可以总结为: 目前,Seedance 1.5 Pro已经上线 即梦AI 和 豆包 ...
港股通用GPU第一股也冲刺了!哈佛博士带队,估值209亿
量子位· 2025-12-18 09:26
Core Viewpoint - The article highlights the emergence of domestic GPU companies in China, particularly focusing on Birran Technology, which is set to become the first domestic GPU company listed on the Hong Kong Stock Exchange with a valuation of 20.9 billion yuan [1][40]. Company Overview - Birran Technology, founded in 2019 by Zhang Wen, a Harvard Law PhD, specializes in developing general-purpose GPU chips and intelligent computing solutions for AI training and inference, providing full-stack support from cloud to edge [2][3]. - The company has attracted significant investment, completing over 10 funding rounds in six years, with notable investors including Qiming Venture Partners, IDG Capital, and Hillhouse Capital [2][39]. Product Offerings - Birran's core products include a hardware system based on its self-developed GPGPU architecture, designed specifically for AI workloads, and a software platform called BIRENSUPA for developing AI applications [3][10]. - The hardware system consists of various configurations, including PCIe boards and GPGPU servers, with key chips like the Birran 106 and 110 designed for training and edge inference, respectively [5][8]. Financial Performance - The company's revenue has shown significant growth, increasing from 500,000 yuan in 2022 to 337 million yuan in 2024, with a 50% year-on-year increase in the first half of 2023 [19][20]. - The main revenue source is the intelligent computing solutions, which started contributing to revenue in 2023, while the company also generates income from product sales and rental of intelligent computing clusters [20][21]. Profitability and Expenses - Despite revenue growth, Birran Technology remains unprofitable, with losses of 1.474 billion yuan in 2022 and 1.744 billion yuan in 2023, although adjusted net losses have decreased [27][28]. - Research and development expenses are a significant portion of the budget, amounting to 1.018 billion yuan in 2022 and 886 million yuan in 2023, with a notable increase in the first half of 2024 [29][30]. Market Position and Future Plans - Birran Technology aims to launch the second-generation Birran 20X series chips for cloud training and inference by 2026, with further developments planned for 2028 [8]. - The company has established a strong customer base, including major enterprises in high-computing industries, with nine Fortune China 500 companies among its clients [15].
行啊AI PC!现在都能隔空测血压、检测皮肤了
量子位· 2025-12-18 09:26
Core Insights - The article discusses the innovative capabilities of AI PCs, particularly in non-contact health monitoring and skin analysis, showcasing how technology can enhance personal health management and beauty care [1][17]. Group 1: Health Monitoring - The AI PC can perform non-contact health assessments, including measuring heart rate, blood oxygen levels, blood pressure, and even blood glucose concentration using a connected camera [5][24]. - The technology relies on remote photoplethysmography (rPPG), which detects subtle changes in skin color due to blood flow, allowing for accurate health metrics without physical sensors [20][24]. - The system can also evaluate vascular health risks and potential heart conditions, providing a comprehensive health overview [25][26]. Group 2: Skin Analysis - The AI PC offers a skin analysis feature that generates a detailed report on skin conditions, including hydration levels and sensitivity, based solely on visual input from the camera [10][30]. - The skin analysis utilizes optical imaging and spectral analysis to assess skin health, identifying issues like redness and pigmentation [29][30]. - Personalized skincare and beauty recommendations are provided based on the analysis, including specific product suggestions for targeted skin concerns [16][30]. Group 3: Technology and Performance - The AI PC's capabilities are powered by Intel's Core Ultra processor, which includes a neural processing unit (NPU) designed for efficient AI tasks, ensuring low power consumption and high performance [31][33]. - The integration of NPU allows for real-time processing of complex algorithms, enabling immediate health and beauty assessments without the need for internet connectivity [38][39]. - This local processing enhances user privacy and security, distinguishing AI PCs from traditional computing devices [39][44]. Group 4: Market Potential - The article highlights the growing ecosystem of AI applications tailored for AI PCs, driven by the unique capabilities of Intel's xPU+ architecture [41][43]. - As more developers create specialized applications for AI PCs, the technology is set to redefine user interaction across various domains, from health management to creative production [44].