Workflow
量子位
icon
Search documents
6小时告破30年数学难题,亚里士多德一夜成名
量子位· 2025-12-01 05:45
微软前AI副总裁、目前在OpenAI研究AGI的Sebastien Bubeck激动分享了这一消息,并表示: 30年悬而未决的数学难题就这样被AI证明了?! 此时此刻, (前推特) 正在刮起一股讨论之风—— 来自Harmonic的数学AI模型独立证明了 Erdős问题 #124 ,而这个问题已经被数学家无奈搁置了近30年。 一水 发自 凹非寺 量子位 | 公众号 QbitAI 该解决方案100%由AI生成,总计耗时6小时。 甚至连陶哲轩这样的顶尖数学家也跑来围观讨论,他在对比了Gemini和ChatGPT的深度研究工具后发现,Harmonic模型对该问题的证明表 现更佳。 所以这到底是一个怎样的问题?Harmonic模型又是如何"大显神功"? 咱接着瞧—— AI证明了Erdős问题 #124简易版 首先需要提醒,在听完各路大神讨论后,我们才意识到—— 原来Harmonic模型所证明的并非原版Erdős问题 #124 ,而是一个简易版本 。 Erdős问题 #124需要提供的证明如下 : $$\sum_{1\leq i\leq k}{\frac{1}{d_{i}-1}}\geq1.$$ 通俗理解即为: 假设你有 ...
免费国产Banana真香!我想把PS给卸载了
量子位· 2025-12-01 05:45
Core Viewpoint - The article discusses the advancements in Vidu Q2, a product from Shengshu Technology, highlighting its superior consistency and new features in AI-generated images and videos, positioning it as a competitive alternative to established players like OpenAI and Google [8][9][57]. Group 1: Product Features - Vidu Q2 has upgraded its reference image generation capabilities, claiming to have the industry's strongest consistency, allowing for repeated edits while maintaining character and object integrity [8]. - The new features include text-to-image generation and image editing, enabling users to create images with simple prompts, comparable to advanced editing software [9][35]. - Vidu Q2's image editing function allows users to change image proportions and details without complex processes, making it user-friendly and efficient [37][46]. Group 2: Performance Comparison - In a performance comparison, Vidu Q2 ranked fourth in the latest AA leaderboard, surpassing OpenAI and competing closely with major companies like Google and ByteDance [9]. - The article emphasizes that Vidu Q2 maintains high consistency in image generation, outperforming competitors like Nano Banana Pro in preserving background and structural details [20][29]. Group 3: User Experience and Accessibility - Vidu Q2 offers a one-month free membership for its new features, making it accessible for users to explore its capabilities [11]. - The platform provides a streamlined workflow for creators, allowing seamless transitions between image and video generation, which reduces the trial-and-error costs associated with content creation [52][57].
联通破解扩散模型速度质量零和博弈,推理速度提升5倍丨CVPR 2025 Highlight
量子位· 2025-12-01 04:26
Core Insights - The article discusses the advancements in diffusion models, particularly focusing on the ShortDF and LeMiCa papers, which represent significant breakthroughs in the field of image and video generation [1][2][4]. Group 1: Technical Evolution - ShortDF serves as a theoretical pioneer in optimizing diffusion models through online training, while LeMiCa expands this theory into offline mapping for higher-dimensional tasks [4]. - The core challenge in diffusion models is the expensive inference costs, which hinder real-time applications [8]. - The non-linear denoising trajectory of diffusion models is identified as a primary reason for slow progress in the field [9]. Group 2: ShortDF's Mechanisms - ShortDF introduces a "shortest path optimization" approach to directly straighten the denoising trajectory during training, aiming to break the trade-off between speed and quality [12]. - The model's core insight is that the denoising process is fundamentally a correction of the initial error, which can be minimized to improve overall performance [13][14]. - ShortDF employs a three-pronged strategy: 1. Locking the "error upper bound" to optimize from the source [14][15]. 2. Utilizing graph theory to relax and compress paths, thereby minimizing the error upper bound [20][21]. 3. Implementing multi-state optimization to ensure training stability amidst random noise [28][29]. Group 3: Performance Metrics - ShortDF demonstrates superior performance in speed and quality, achieving a 5.0 times speed increase over DDIM while improving image quality (FID score of 9.08 compared to DDIM's 11.14) [36]. - The model shows robustness in complex scenarios, effectively restoring object contours faster than competing methods [37]. - In various datasets, ShortDF maintains a balance between performance and speed, showcasing its potential for real-world applications [40]. Group 4: Industry Implications - The advancements in ShortDF and LeMiCa highlight the importance of refined mathematical modeling over mere computational power in enhancing diffusion model speeds [41]. - These developments are crucial for the application of AIGC technology in resource-constrained environments, such as mobile devices and real-time interactive designs [42].
ChatGPT广告代码泄露!奥特曼一年三变脸:从“广告令人不安”到“并非完全不可取”
量子位· 2025-12-01 04:26
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI ChatGPT广告代码泄露,就在发布三周年之际,终于要开始变现了? 在ChatGPT安卓APP测试版的代码中,发现了多个与广告相关的引用。 就在2024年5月奥特曼还称ChatGPT加广告是 "最后的手段"和"令人不安的" 。 到2025年10月,他已经改口 "觉得广告有点令人反感,但并非完全不可取" 。 到现在从技术细节来看,这次的广告系统已经相当成熟,距离正式上线不远了。 OpenAI的运营模式面临着巨大的财务压力,探索广告变现终于还是提上了日程。 汇丰银行最近给OpenAI算了一笔账,运营ChatGPT的开销中仅维持其算力基础设施每年就可能需要数千亿美元。 目前ChatGPT Plus订阅服务每月收费20美元,加上API授权收入,远远无法覆盖这些成本。 OpenAI在2029年之前将持续处于亏损状态,累计亏损可能超过1000亿美元。 ChatGPT变现,奥特曼变脸 开发者Tibor首次曝光了这个重磅消息。他在ChatGPT安卓应用1.2025.329测试版的代码中,发现了多个与广告相关的引用。 代码中明确出现了"ads feature"字样,还包含了 ...
6B文生图模型,上线即登顶抱抱脸
量子位· 2025-12-01 04:26
Core Viewpoint - The article discusses the launch and performance of Alibaba's new image generation model, Z-Image, which has quickly gained popularity and recognition in the AI community due to its impressive capabilities and efficiency [1][3]. Group 1: Model Overview - Z-Image is a 6 billion parameter image generation model that has achieved significant success, including 500,000 downloads on its first day and topping two charts on Hugging Face within two days of launch [1][3]. - The model is available in three versions: Z-Image-Turbo (open-source), Z-Image-Edit (not open-source), and Z-Image-Base (not open-source) [8]. Group 2: Performance and Features - Z-Image demonstrates state-of-the-art (SOTA) performance in image quality, text rendering, and semantic understanding, comparable to contemporaneous models like FLUX.2 [3][8]. - The model excels in generating realistic images and handling complex text rendering, including mixed-language content and mathematical formulas [6][15]. - Users have reported high-quality outputs, including detailed portraits and creative visual interpretations, showcasing the model's versatility [11][14][32]. Group 3: Technical Innovations - Z-Image's speed and efficiency are attributed to its architecture optimization and model distillation techniques, which reduce computational load without sacrificing quality [34][39]. - The model employs a single-stream architecture (S3-DiT) that integrates text and image processing, streamlining the workflow and enhancing performance [35]. - The distillation process allows Z-Image to generate high-quality images with only eight function evaluations, significantly improving generation speed [40][42]. Group 4: Market Position and Future Prospects - The timing of Z-Image's release is strategic, coinciding with the launch of FLUX.2, indicating a competitive landscape in the AI image generation market [44]. - The model's open-source availability on platforms like Hugging Face and ModelScope positions it favorably for further adoption and experimentation within the AI community [45].
量子位编辑作者招聘
量子位· 2025-12-01 04:26
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
对商户投放ROI负责,这个视频营销Agent底气从何而来?丨对话布尔向量
量子位· 2025-11-30 11:30
Core Insights - The article discusses the emergence of Temvideo, an AI video agent developed by Boolvector, aimed at addressing the marketing challenges faced by cross-border e-commerce businesses. The product enhances video production efficiency and reduces costs while maintaining high ROI for users [6][11]. Group 1: Product Overview - Temvideo is the world's first AI video agent designed specifically for marketing scenarios, targeting the pain points of low efficiency and high costs in video production for cross-border e-commerce [11]. - The core functionality of Temvideo includes batch video generation using verified high ROI templates, significantly reducing production time while achieving quality comparable to human-made videos [11][12]. - The product is designed to cater to e-commerce users with annual revenues between 10 million and 100 million, focusing on their advertising needs and ensuring high click-through and conversion rates [12][22]. Group 2: Unique Features and Advantages - Temvideo's design incorporates industry know-how, allowing it to generate effective marketing videos based on successful past campaigns, thus enhancing the quality of output [12][36]. - The product utilizes a combination of large models and industry-specific algorithms to improve video content understanding and production accuracy, addressing the limitations of generic AI models [30][32]. - Temvideo's ability to automatically segment video clips and match background music enhances the overall video quality, meeting the high detail requirements of merchants [29][30]. Group 3: Market Context and Trends - The article highlights that only about 10% of e-commerce businesses currently utilize AI video and image generation technologies, indicating significant room for growth in this sector [71]. - The demand for high-quality video content on social media platforms is increasing, with platforms like TikTok and Meta requiring more engaging and effective video advertisements [75][76]. - The potential market for AI-generated video content is substantial, with two primary business models: charging per video produced or sharing revenue based on performance metrics [78][79]. Group 4: Challenges and Future Directions - The article notes that many AI products face challenges in user retention due to high expectations and the complexity of AI capabilities, which can lead to unsatisfactory results [86]. - Boolvector aims to balance result delivery and cost control, focusing on optimizing the video generation process to ensure user satisfaction and retention [92][93]. - The future vision for Temvideo includes transitioning from a pay-per-video model to a performance-based payment system, fostering a sustainable business model that aligns with user success [95][98].
Transformer作者爆料GPT-5.1内幕!OpenAI内部命名规则变乱了
量子位· 2025-11-30 11:30
Core Insights - The article discusses a significant paradigm shift in AI, indicating that the development of AI is not slowing down but rather transitioning to a new phase of growth [1][7][12]. Group 1: AI Development Trends - There are two contrasting views on AI development: one claims that AI growth is slowing down, while the other highlights continuous advancements with new models like GPT-5.1 and Gemini 3 being released [3][12]. - Łukasz Kaiser argues that the perception of slowing growth is incorrect, stating that AI's capability growth follows a smooth exponential curve, akin to Moore's Law [15][16]. - The shift from pre-training to reasoning models is a key factor in this transition, with pre-training being in a later stage of its S-curve while reasoning models are still in their early stages [18][19]. Group 2: Reasoning Models and Their Impact - The industry is focusing on smaller, cost-effective models that maintain quality, leading to the misconception that pre-training has stalled [21]. - Reasoning models, which allow for more complex thought processes and the use of tools during inference, are expected to progress rapidly due to their emerging nature [22][27]. - The evolution of models like ChatGPT demonstrates a qualitative leap in performance, with newer versions incorporating reasoning and external tool usage for more accurate responses [23][24]. Group 3: GPT-5.1 Insights - GPT-5.1 is not merely a minor update but represents a significant stability iteration, enhancing reasoning capabilities through reinforcement learning and synthetic data [34][35]. - The naming convention for versions has shifted to focus on user experience rather than technical details, allowing for greater flexibility in development [38]. - Despite improvements, GPT-5.1 still has limitations, particularly in multi-modal reasoning, as illustrated by its struggles with basic tasks that require contextual understanding [41][42]. Group 4: Future of AI and Robotics - AI is expected to change the nature of work without eliminating jobs, as human expertise will still be needed in high-stakes scenarios [62][66]. - Home robots are anticipated to be the next visible AI revolution, driven by advancements in multi-modal capabilities and general reinforcement learning [67][69]. - The integration of these technologies is expected to lead to a significant leap in the capabilities of home robots, making them more intuitive and perceptible compared to current AI models like ChatGPT [69].
居然有21%的ICLR 2026评审纯用AI生成…
量子位· 2025-11-30 06:45
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI ICLR 2026,居然有21%的评审是纯纯由AI生成的?! 上面这个相当扎心的答案,来自Pangram实验室的分析报告。 这件事被发现的起因颇具戏剧色彩 :CMU的AI研究员Graham Neubig,感觉自己收到的同行评审AI味超级重。 他之所以起疑心,是因为这些评审内容"非常冗长,且包含大量符号",并且所要求的分析方式并非"审稿人通常在AI或ML论文中所要求的那种 标准统计分析方式"。 做事嘛,不能光靠直觉,要真凭实据啦。 Graham Neubig自己干不了这个事儿,就 在上发布了一个悬赏令,希望有人能做一轮系统性的检测 ,看ICLR的论文和审稿中到底夹杂了 多少AI文本。 我愿意悬赏50美元,给第一个做了这件事的人~ Pangram实验室就是那个接黄榜的。 这个实验室的业务之一,正好是开发检测AI生成文本的工具。 结论简单粗暴: 一个顶级AI学术会议,审稿和投稿两头都出现大规模AI代写…… 是怎么测出"AI味"的? Pangram这次对ICLR的全部提交论文和所有评审做了系统分析,并且在博客中公开了全过程。 他们 先在OpenReview上,把I ...
告别GUI Agent工程基建噩梦:阶跃开源4B Agent模型,跑通所有安卓设备,手搓党一键部署
量子位· 2025-11-30 06:45
Core Insights - The article discusses the launch of GELab-Zero, an open-source GUI Agent model that allows for easy deployment and aims to enhance the scalability of mobile agents in various applications [1][8]. Group 1: Model Performance and Capabilities - The 4B version of the GUI Agent model has achieved state-of-the-art (SOTA) performance across multiple GUI benchmarks on both mobile and desktop platforms [2][11]. - GELab-Zero-4B-preview outperforms other mainstream models, including larger parameter models like GUI-Owl-32B, demonstrating superior performance and easier deployment [13][11]. - The model is designed to handle complex tasks and vague instructions effectively, showcasing its versatility in various applications [19][24]. Group 2: Development and Deployment - The article emphasizes the need to lower development and usage barriers for mobile agents, allowing developers to focus on value creation rather than infrastructure setup [7][30]. - GELab-Zero includes a complete technical architecture that enables one-click deployment, facilitating a seamless experience for developers [25][26]. - The model supports lightweight local inference, enabling it to run on consumer-grade hardware while maintaining low latency and privacy [26]. Group 3: Evaluation Standards - The research team has established a new evaluation standard called AndroidDaily, which focuses on real-world applications and user scenarios, moving beyond traditional productivity benchmarks [5][31]. - AndroidDaily assesses the model across six core dimensions of modern life, including dining, travel, shopping, housing, information consumption, and entertainment [33]. - The evaluation framework includes both static and end-to-end testing methodologies to ensure comprehensive assessment of the model's capabilities [35][38]. Group 4: Future Directions - The research team aims to continue optimizing model performance, expanding cross-platform support, and enriching the ecosystem of tools while adhering to principles of openness, control, and privacy [41].