Workflow
量子位
icon
Search documents
6小时告破30年数学难题,亚里士多德一夜成名
量子位· 2025-12-01 05:45
微软前AI副总裁、目前在OpenAI研究AGI的Sebastien Bubeck激动分享了这一消息,并表示: 30年悬而未决的数学难题就这样被AI证明了?! 此时此刻, (前推特) 正在刮起一股讨论之风—— 来自Harmonic的数学AI模型独立证明了 Erdős问题 #124 ,而这个问题已经被数学家无奈搁置了近30年。 一水 发自 凹非寺 量子位 | 公众号 QbitAI 该解决方案100%由AI生成,总计耗时6小时。 甚至连陶哲轩这样的顶尖数学家也跑来围观讨论,他在对比了Gemini和ChatGPT的深度研究工具后发现,Harmonic模型对该问题的证明表 现更佳。 所以这到底是一个怎样的问题?Harmonic模型又是如何"大显神功"? 咱接着瞧—— AI证明了Erdős问题 #124简易版 首先需要提醒,在听完各路大神讨论后,我们才意识到—— 原来Harmonic模型所证明的并非原版Erdős问题 #124 ,而是一个简易版本 。 Erdős问题 #124需要提供的证明如下 : $$\sum_{1\leq i\leq k}{\frac{1}{d_{i}-1}}\geq1.$$ 通俗理解即为: 假设你有 ...
免费国产Banana真香!我想把PS给卸载了
量子位· 2025-12-01 05:45
Core Viewpoint - The article discusses the advancements in Vidu Q2, a product from Shengshu Technology, highlighting its superior consistency and new features in AI-generated images and videos, positioning it as a competitive alternative to established players like OpenAI and Google [8][9][57]. Group 1: Product Features - Vidu Q2 has upgraded its reference image generation capabilities, claiming to have the industry's strongest consistency, allowing for repeated edits while maintaining character and object integrity [8]. - The new features include text-to-image generation and image editing, enabling users to create images with simple prompts, comparable to advanced editing software [9][35]. - Vidu Q2's image editing function allows users to change image proportions and details without complex processes, making it user-friendly and efficient [37][46]. Group 2: Performance Comparison - In a performance comparison, Vidu Q2 ranked fourth in the latest AA leaderboard, surpassing OpenAI and competing closely with major companies like Google and ByteDance [9]. - The article emphasizes that Vidu Q2 maintains high consistency in image generation, outperforming competitors like Nano Banana Pro in preserving background and structural details [20][29]. Group 3: User Experience and Accessibility - Vidu Q2 offers a one-month free membership for its new features, making it accessible for users to explore its capabilities [11]. - The platform provides a streamlined workflow for creators, allowing seamless transitions between image and video generation, which reduces the trial-and-error costs associated with content creation [52][57].
ChatGPT广告代码泄露!奥特曼一年三变脸:从“广告令人不安”到“并非完全不可取”
量子位· 2025-12-01 04:26
Core Viewpoint - OpenAI is preparing to monetize ChatGPT through advertising, as indicated by the discovery of ad-related code in the Android app, marking a significant shift in its operational strategy [1][11]. Group 1: Advertising Implementation - The code in the ChatGPT Android app reveals multiple references to advertising features, including "ads feature," "bazaar content," and "search ads carousel," suggesting at least three different advertising formats [12][13]. - The advertising formats include search ads targeting specific queries, a carousel format for multiple ads, and a marketplace-style content display for promoting products or services [18]. Group 2: Financial Pressures - OpenAI faces substantial financial pressures, with estimates suggesting that operating ChatGPT could require several hundred billion dollars annually just to maintain its computational infrastructure [8]. - Current revenue from ChatGPT Plus subscriptions and API licensing is insufficient to cover these operational costs, leading to projections of continued losses exceeding $100 billion by 2029 [9][10]. Group 3: User Engagement and Trust - ChatGPT has achieved a remarkable user base, with 800 million active users weekly and 2.5 billion daily interactions, a sevenfold increase from 100 million users in November 2023 [14]. - The potential for advertising revenue is significant, even without considering the advanced contextual understanding of large models, as traditional internet advertising revenue can be estimated using active user numbers and average ad impressions [15][16]. Group 4: Leadership Perspectives - OpenAI's CEO, Sam Altman, has expressed concerns about balancing profitability with user trust, questioning the ethics of paid rankings in search results [17][20]. - Altman believes that if ChatGPT can provide the best answers without bias from paid advertisements, it could maintain user trust, suggesting a model where commissions are earned from bookings rather than direct ad placements [22]. Group 5: Organizational Influence - There are indications that OpenAI's shift towards advertising is influenced by the hiring of former Meta employees, who are accustomed to a business model heavily reliant on ad revenue [23]. - User feedback suggests that some believe advertising is already present in ChatGPT, with internal discussions at OpenAI considering the integration of ads based on user interactions and preferences [25].
联通破解扩散模型速度质量零和博弈,推理速度提升5倍丨CVPR 2025 Highlight
量子位· 2025-12-01 04:26
Core Insights - The article discusses the advancements in diffusion models, particularly focusing on the ShortDF and LeMiCa papers, which represent significant breakthroughs in the field of image and video generation [1][2][4]. Group 1: Technical Evolution - ShortDF serves as a theoretical pioneer in optimizing diffusion models through online training, while LeMiCa expands this theory into offline mapping for higher-dimensional tasks [4]. - The core challenge in diffusion models is the expensive inference costs, which hinder real-time applications [8]. - The non-linear denoising trajectory of diffusion models is identified as a primary reason for slow progress in the field [9]. Group 2: ShortDF's Mechanisms - ShortDF introduces a "shortest path optimization" approach to directly straighten the denoising trajectory during training, aiming to break the trade-off between speed and quality [12]. - The model's core insight is that the denoising process is fundamentally a correction of the initial error, which can be minimized to improve overall performance [13][14]. - ShortDF employs a three-pronged strategy: 1. Locking the "error upper bound" to optimize from the source [14][15]. 2. Utilizing graph theory to relax and compress paths, thereby minimizing the error upper bound [20][21]. 3. Implementing multi-state optimization to ensure training stability amidst random noise [28][29]. Group 3: Performance Metrics - ShortDF demonstrates superior performance in speed and quality, achieving a 5.0 times speed increase over DDIM while improving image quality (FID score of 9.08 compared to DDIM's 11.14) [36]. - The model shows robustness in complex scenarios, effectively restoring object contours faster than competing methods [37]. - In various datasets, ShortDF maintains a balance between performance and speed, showcasing its potential for real-world applications [40]. Group 4: Industry Implications - The advancements in ShortDF and LeMiCa highlight the importance of refined mathematical modeling over mere computational power in enhancing diffusion model speeds [41]. - These developments are crucial for the application of AIGC technology in resource-constrained environments, such as mobile devices and real-time interactive designs [42].
6B文生图模型,上线即登顶抱抱脸
量子位· 2025-12-01 04:26
Core Viewpoint - The article discusses the launch and performance of Alibaba's new image generation model, Z-Image, which has quickly gained popularity and recognition in the AI community due to its impressive capabilities and efficiency [1][3]. Group 1: Model Overview - Z-Image is a 6 billion parameter image generation model that has achieved significant success, including 500,000 downloads on its first day and topping two charts on Hugging Face within two days of launch [1][3]. - The model is available in three versions: Z-Image-Turbo (open-source), Z-Image-Edit (not open-source), and Z-Image-Base (not open-source) [8]. Group 2: Performance and Features - Z-Image demonstrates state-of-the-art (SOTA) performance in image quality, text rendering, and semantic understanding, comparable to contemporaneous models like FLUX.2 [3][8]. - The model excels in generating realistic images and handling complex text rendering, including mixed-language content and mathematical formulas [6][15]. - Users have reported high-quality outputs, including detailed portraits and creative visual interpretations, showcasing the model's versatility [11][14][32]. Group 3: Technical Innovations - Z-Image's speed and efficiency are attributed to its architecture optimization and model distillation techniques, which reduce computational load without sacrificing quality [34][39]. - The model employs a single-stream architecture (S3-DiT) that integrates text and image processing, streamlining the workflow and enhancing performance [35]. - The distillation process allows Z-Image to generate high-quality images with only eight function evaluations, significantly improving generation speed [40][42]. Group 4: Market Position and Future Prospects - The timing of Z-Image's release is strategic, coinciding with the launch of FLUX.2, indicating a competitive landscape in the AI image generation market [44]. - The model's open-source availability on platforms like Hugging Face and ModelScope positions it favorably for further adoption and experimentation within the AI community [45].
量子位编辑作者招聘
量子位· 2025-12-01 04:26
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit" to track AI advancements and become content experts in various AI-related fields [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are full-time and based in Beijing, with opportunities for editorial roles at various levels, including editor, lead writer, and chief editor [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Focuses on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [6]. - **AI Finance Direction**: Involves tracking venture capital and financial reports in the AI sector, analyzing capital movements within the industry [6]. - **AI Product Direction**: Concentrates on the application and hardware advancements of AI, including software products and hardware implementations [6]. Group 3: Benefits and Growth - Employees will have access to the latest AI technologies and tools, enhancing work efficiency and creativity [6]. - The company offers a vibrant team environment, professional mentorship, and competitive compensation packages, including various benefits [6][12]. - The company aims to help employees build personal influence in the AI field through original content creation and networking opportunities [6]. Group 4: Company Overview - As of 2025, Quantum Bit has over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
对商户投放ROI负责,这个视频营销Agent底气从何而来?丨对话布尔向量
量子位· 2025-11-30 11:30
Core Insights - The article discusses the emergence of Temvideo, an AI video agent developed by Boolvector, aimed at addressing the marketing challenges faced by cross-border e-commerce businesses. The product enhances video production efficiency and reduces costs while maintaining high ROI for users [6][11]. Group 1: Product Overview - Temvideo is the world's first AI video agent designed specifically for marketing scenarios, targeting the pain points of low efficiency and high costs in video production for cross-border e-commerce [11]. - The core functionality of Temvideo includes batch video generation using verified high ROI templates, significantly reducing production time while achieving quality comparable to human-made videos [11][12]. - The product is designed to cater to e-commerce users with annual revenues between 10 million and 100 million, focusing on their advertising needs and ensuring high click-through and conversion rates [12][22]. Group 2: Unique Features and Advantages - Temvideo's design incorporates industry know-how, allowing it to generate effective marketing videos based on successful past campaigns, thus enhancing the quality of output [12][36]. - The product utilizes a combination of large models and industry-specific algorithms to improve video content understanding and production accuracy, addressing the limitations of generic AI models [30][32]. - Temvideo's ability to automatically segment video clips and match background music enhances the overall video quality, meeting the high detail requirements of merchants [29][30]. Group 3: Market Context and Trends - The article highlights that only about 10% of e-commerce businesses currently utilize AI video and image generation technologies, indicating significant room for growth in this sector [71]. - The demand for high-quality video content on social media platforms is increasing, with platforms like TikTok and Meta requiring more engaging and effective video advertisements [75][76]. - The potential market for AI-generated video content is substantial, with two primary business models: charging per video produced or sharing revenue based on performance metrics [78][79]. Group 4: Challenges and Future Directions - The article notes that many AI products face challenges in user retention due to high expectations and the complexity of AI capabilities, which can lead to unsatisfactory results [86]. - Boolvector aims to balance result delivery and cost control, focusing on optimizing the video generation process to ensure user satisfaction and retention [92][93]. - The future vision for Temvideo includes transitioning from a pay-per-video model to a performance-based payment system, fostering a sustainable business model that aligns with user success [95][98].
Transformer作者爆料GPT-5.1内幕!OpenAI内部命名规则变乱了
量子位· 2025-11-30 11:30
Core Insights - The article discusses a significant paradigm shift in AI, indicating that the development of AI is not slowing down but rather transitioning to a new phase of growth [1][7][12]. Group 1: AI Development Trends - There are two contrasting views on AI development: one claims that AI growth is slowing down, while the other highlights continuous advancements with new models like GPT-5.1 and Gemini 3 being released [3][12]. - Łukasz Kaiser argues that the perception of slowing growth is incorrect, stating that AI's capability growth follows a smooth exponential curve, akin to Moore's Law [15][16]. - The shift from pre-training to reasoning models is a key factor in this transition, with pre-training being in a later stage of its S-curve while reasoning models are still in their early stages [18][19]. Group 2: Reasoning Models and Their Impact - The industry is focusing on smaller, cost-effective models that maintain quality, leading to the misconception that pre-training has stalled [21]. - Reasoning models, which allow for more complex thought processes and the use of tools during inference, are expected to progress rapidly due to their emerging nature [22][27]. - The evolution of models like ChatGPT demonstrates a qualitative leap in performance, with newer versions incorporating reasoning and external tool usage for more accurate responses [23][24]. Group 3: GPT-5.1 Insights - GPT-5.1 is not merely a minor update but represents a significant stability iteration, enhancing reasoning capabilities through reinforcement learning and synthetic data [34][35]. - The naming convention for versions has shifted to focus on user experience rather than technical details, allowing for greater flexibility in development [38]. - Despite improvements, GPT-5.1 still has limitations, particularly in multi-modal reasoning, as illustrated by its struggles with basic tasks that require contextual understanding [41][42]. Group 4: Future of AI and Robotics - AI is expected to change the nature of work without eliminating jobs, as human expertise will still be needed in high-stakes scenarios [62][66]. - Home robots are anticipated to be the next visible AI revolution, driven by advancements in multi-modal capabilities and general reinforcement learning [67][69]. - The integration of these technologies is expected to lead to a significant leap in the capabilities of home robots, making them more intuitive and perceptible compared to current AI models like ChatGPT [69].
居然有21%的ICLR 2026评审纯用AI生成…
量子位· 2025-11-30 06:45
Core Insights - A significant 21% of reviews for ICLR 2026 are suspected to be entirely AI-generated, highlighting a growing trend in AI involvement in academic peer review processes [1][21][26]. Group 1: Discovery and Analysis - The investigation began when CMU researcher Graham Neubig noticed an unusual AI-like quality in the peer reviews he received, prompting him to seek a systematic analysis of ICLR submissions and reviews [2][3]. - Pangram Labs conducted a comprehensive analysis of approximately 19,490 submitted papers and 75,800 reviews from ICLR 2026, revealing that 15,899 reviews (21%) were highly suspected to be AI-generated [8][9][21]. - The analysis utilized advanced OCR and text classification models to accurately assess the content of both submissions and reviews, ensuring minimal interference from formatting issues [11][12][13]. Group 2: AI Involvement in Submissions and Reviews - Over half of the reviews exhibited varying degrees of AI participation, while 61% of the papers were human-written, with 199 papers (1%) being entirely AI-generated [22][24]. - The study found that AI-generated content in papers correlated with lower average review scores, indicating that AI writing may not yet match the quality of human-authored work [34]. - Conversely, reviews with higher AI involvement tended to receive more favorable scores, suggesting a lenient bias in AI-generated reviews [38]. Group 3: Ethical Considerations and Guidelines - ICLR has established clear guidelines regarding the use of AI in submissions and reviews, emphasizing the need for disclosure and adherence to ethical standards [29][31]. - Authors can utilize AI to assist in writing but must acknowledge its use, while reviewers are discouraged from relying solely on AI for their evaluations due to confidentiality and authenticity concerns [32][31]. - The emergence of AI-generated reviews raises questions about the integrity of the peer review process and the importance of maintaining human judgment in academic evaluations [51].
告别GUI Agent工程基建噩梦:阶跃开源4B Agent模型,跑通所有安卓设备,手搓党一键部署
量子位· 2025-11-30 06:45
Core Insights - The article discusses the launch of GELab-Zero, an open-source GUI Agent model that allows for easy deployment and aims to enhance the scalability of mobile agents in various applications [1][8]. Group 1: Model Performance and Capabilities - The 4B version of the GUI Agent model has achieved state-of-the-art (SOTA) performance across multiple GUI benchmarks on both mobile and desktop platforms [2][11]. - GELab-Zero-4B-preview outperforms other mainstream models, including larger parameter models like GUI-Owl-32B, demonstrating superior performance and easier deployment [13][11]. - The model is designed to handle complex tasks and vague instructions effectively, showcasing its versatility in various applications [19][24]. Group 2: Development and Deployment - The article emphasizes the need to lower development and usage barriers for mobile agents, allowing developers to focus on value creation rather than infrastructure setup [7][30]. - GELab-Zero includes a complete technical architecture that enables one-click deployment, facilitating a seamless experience for developers [25][26]. - The model supports lightweight local inference, enabling it to run on consumer-grade hardware while maintaining low latency and privacy [26]. Group 3: Evaluation Standards - The research team has established a new evaluation standard called AndroidDaily, which focuses on real-world applications and user scenarios, moving beyond traditional productivity benchmarks [5][31]. - AndroidDaily assesses the model across six core dimensions of modern life, including dining, travel, shopping, housing, information consumption, and entertainment [33]. - The evaluation framework includes both static and end-to-end testing methodologies to ensure comprehensive assessment of the model's capabilities [35][38]. Group 4: Future Directions - The research team aims to continue optimizing model performance, expanding cross-platform support, and enriching the ecosystem of tools while adhering to principles of openness, control, and privacy [41].