量子位
Search documents
让大模型学会“高维找茬”,中国联通新研究解决长文本图像检索痛点|AAAI 2026 Oral
量子位· 2025-12-01 05:45
Core Insights - The article discusses a new state-of-the-art (SOTA) model for long-text image retrieval called HiMo-CLIP, developed by the China Unicom Data Science and AI Research Institute, which addresses limitations in existing models like CLIP by effectively capturing semantic differences in context [2][4]. Group 1: Model Limitations - Existing models, including Long-CLIP, struggle with long text descriptions, often resulting in decreased alignment scores as the text becomes more detailed, indicating a failure to process the hierarchical structure of language [6][9]. - The phenomenon where longer descriptions lead to lower alignment scores highlights the inadequacy of current models in distinguishing core semantics from detailed information [6][9]. Group 2: HiMo-CLIP Framework - HiMo-CLIP introduces a plug-and-play representation framework that includes two core components: Hierarchical Decomposition (HiDe) and Monotonicity-aware Contrastive Loss (MoLo) [10][12]. - HiDe dynamically extracts semantic components using PCA within batches, while MoLo enforces alignment between the full text and its semantic components, ensuring monotonicity [12][17]. Group 3: Performance and Efficiency - HiMo-CLIP demonstrates significant advantages in both long and short text retrieval tasks, outperforming models trained on much larger datasets, achieving SOTA with only 1 million training samples [17][20]. - The model's ability to extract unique features from complex scenes allows it to maintain high performance across various retrieval benchmarks [18][22]. Group 4: Evaluation Metrics - The research team constructed the HiMo-Docci dataset and introduced the HiMo@K metric to quantify the model's understanding of hierarchical structures, achieving a high monotonicity correlation coefficient of 0.88, surpassing comparative methods [22][25]. - As text descriptions become more complete, HiMo-CLIP's scores show a consistent upward trend, while other models exhibit significant fluctuations [25][26].
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-12-01 05:45
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, has extensive experience in AI and digital video technologies [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has a strong background in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value, highlighting representative organizations and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
6小时告破30年数学难题,亚里士多德一夜成名
量子位· 2025-12-01 05:45
微软前AI副总裁、目前在OpenAI研究AGI的Sebastien Bubeck激动分享了这一消息,并表示: 30年悬而未决的数学难题就这样被AI证明了?! 此时此刻, (前推特) 正在刮起一股讨论之风—— 来自Harmonic的数学AI模型独立证明了 Erdős问题 #124 ,而这个问题已经被数学家无奈搁置了近30年。 一水 发自 凹非寺 量子位 | 公众号 QbitAI 该解决方案100%由AI生成,总计耗时6小时。 甚至连陶哲轩这样的顶尖数学家也跑来围观讨论,他在对比了Gemini和ChatGPT的深度研究工具后发现,Harmonic模型对该问题的证明表 现更佳。 所以这到底是一个怎样的问题?Harmonic模型又是如何"大显神功"? 咱接着瞧—— AI证明了Erdős问题 #124简易版 首先需要提醒,在听完各路大神讨论后,我们才意识到—— 原来Harmonic模型所证明的并非原版Erdős问题 #124 ,而是一个简易版本 。 Erdős问题 #124需要提供的证明如下 : $$\sum_{1\leq i\leq k}{\frac{1}{d_{i}-1}}\geq1.$$ 通俗理解即为: 假设你有 ...
免费国产Banana真香!我想把PS给卸载了
量子位· 2025-12-01 05:45
Core Viewpoint - The article discusses the advancements in Vidu Q2, a product from Shengshu Technology, highlighting its superior consistency and new features in AI-generated images and videos, positioning it as a competitive alternative to established players like OpenAI and Google [8][9][57]. Group 1: Product Features - Vidu Q2 has upgraded its reference image generation capabilities, claiming to have the industry's strongest consistency, allowing for repeated edits while maintaining character and object integrity [8]. - The new features include text-to-image generation and image editing, enabling users to create images with simple prompts, comparable to advanced editing software [9][35]. - Vidu Q2's image editing function allows users to change image proportions and details without complex processes, making it user-friendly and efficient [37][46]. Group 2: Performance Comparison - In a performance comparison, Vidu Q2 ranked fourth in the latest AA leaderboard, surpassing OpenAI and competing closely with major companies like Google and ByteDance [9]. - The article emphasizes that Vidu Q2 maintains high consistency in image generation, outperforming competitors like Nano Banana Pro in preserving background and structural details [20][29]. Group 3: User Experience and Accessibility - Vidu Q2 offers a one-month free membership for its new features, making it accessible for users to explore its capabilities [11]. - The platform provides a streamlined workflow for creators, allowing seamless transitions between image and video generation, which reduces the trial-and-error costs associated with content creation [52][57].
联通破解扩散模型速度质量零和博弈,推理速度提升5倍丨CVPR 2025 Highlight
量子位· 2025-12-01 04:26
Core Insights - The article discusses the advancements in diffusion models, particularly focusing on the ShortDF and LeMiCa papers, which represent significant breakthroughs in the field of image and video generation [1][2][4]. Group 1: Technical Evolution - ShortDF serves as a theoretical pioneer in optimizing diffusion models through online training, while LeMiCa expands this theory into offline mapping for higher-dimensional tasks [4]. - The core challenge in diffusion models is the expensive inference costs, which hinder real-time applications [8]. - The non-linear denoising trajectory of diffusion models is identified as a primary reason for slow progress in the field [9]. Group 2: ShortDF's Mechanisms - ShortDF introduces a "shortest path optimization" approach to directly straighten the denoising trajectory during training, aiming to break the trade-off between speed and quality [12]. - The model's core insight is that the denoising process is fundamentally a correction of the initial error, which can be minimized to improve overall performance [13][14]. - ShortDF employs a three-pronged strategy: 1. Locking the "error upper bound" to optimize from the source [14][15]. 2. Utilizing graph theory to relax and compress paths, thereby minimizing the error upper bound [20][21]. 3. Implementing multi-state optimization to ensure training stability amidst random noise [28][29]. Group 3: Performance Metrics - ShortDF demonstrates superior performance in speed and quality, achieving a 5.0 times speed increase over DDIM while improving image quality (FID score of 9.08 compared to DDIM's 11.14) [36]. - The model shows robustness in complex scenarios, effectively restoring object contours faster than competing methods [37]. - In various datasets, ShortDF maintains a balance between performance and speed, showcasing its potential for real-world applications [40]. Group 4: Industry Implications - The advancements in ShortDF and LeMiCa highlight the importance of refined mathematical modeling over mere computational power in enhancing diffusion model speeds [41]. - These developments are crucial for the application of AIGC technology in resource-constrained environments, such as mobile devices and real-time interactive designs [42].
ChatGPT广告代码泄露!奥特曼一年三变脸:从“广告令人不安”到“并非完全不可取”
量子位· 2025-12-01 04:26
Core Viewpoint - OpenAI is preparing to monetize ChatGPT through advertising, as indicated by the discovery of ad-related code in the Android app, marking a significant shift in its operational strategy [1][11]. Group 1: Advertising Implementation - The code in the ChatGPT Android app reveals multiple references to advertising features, including "ads feature," "bazaar content," and "search ads carousel," suggesting at least three different advertising formats [12][13]. - The advertising formats include search ads targeting specific queries, a carousel format for multiple ads, and a marketplace-style content display for promoting products or services [18]. Group 2: Financial Pressures - OpenAI faces substantial financial pressures, with estimates suggesting that operating ChatGPT could require several hundred billion dollars annually just to maintain its computational infrastructure [8]. - Current revenue from ChatGPT Plus subscriptions and API licensing is insufficient to cover these operational costs, leading to projections of continued losses exceeding $100 billion by 2029 [9][10]. Group 3: User Engagement and Trust - ChatGPT has achieved a remarkable user base, with 800 million active users weekly and 2.5 billion daily interactions, a sevenfold increase from 100 million users in November 2023 [14]. - The potential for advertising revenue is significant, even without considering the advanced contextual understanding of large models, as traditional internet advertising revenue can be estimated using active user numbers and average ad impressions [15][16]. Group 4: Leadership Perspectives - OpenAI's CEO, Sam Altman, has expressed concerns about balancing profitability with user trust, questioning the ethics of paid rankings in search results [17][20]. - Altman believes that if ChatGPT can provide the best answers without bias from paid advertisements, it could maintain user trust, suggesting a model where commissions are earned from bookings rather than direct ad placements [22]. Group 5: Organizational Influence - There are indications that OpenAI's shift towards advertising is influenced by the hiring of former Meta employees, who are accustomed to a business model heavily reliant on ad revenue [23]. - User feedback suggests that some believe advertising is already present in ChatGPT, with internal discussions at OpenAI considering the integration of ads based on user interactions and preferences [25].
6B文生图模型,上线即登顶抱抱脸
量子位· 2025-12-01 04:26
Core Viewpoint - The article discusses the launch and performance of Alibaba's new image generation model, Z-Image, which has quickly gained popularity and recognition in the AI community due to its impressive capabilities and efficiency [1][3]. Group 1: Model Overview - Z-Image is a 6 billion parameter image generation model that has achieved significant success, including 500,000 downloads on its first day and topping two charts on Hugging Face within two days of launch [1][3]. - The model is available in three versions: Z-Image-Turbo (open-source), Z-Image-Edit (not open-source), and Z-Image-Base (not open-source) [8]. Group 2: Performance and Features - Z-Image demonstrates state-of-the-art (SOTA) performance in image quality, text rendering, and semantic understanding, comparable to contemporaneous models like FLUX.2 [3][8]. - The model excels in generating realistic images and handling complex text rendering, including mixed-language content and mathematical formulas [6][15]. - Users have reported high-quality outputs, including detailed portraits and creative visual interpretations, showcasing the model's versatility [11][14][32]. Group 3: Technical Innovations - Z-Image's speed and efficiency are attributed to its architecture optimization and model distillation techniques, which reduce computational load without sacrificing quality [34][39]. - The model employs a single-stream architecture (S3-DiT) that integrates text and image processing, streamlining the workflow and enhancing performance [35]. - The distillation process allows Z-Image to generate high-quality images with only eight function evaluations, significantly improving generation speed [40][42]. Group 4: Market Position and Future Prospects - The timing of Z-Image's release is strategic, coinciding with the launch of FLUX.2, indicating a competitive landscape in the AI image generation market [44]. - The model's open-source availability on platforms like Hugging Face and ModelScope positions it favorably for further adoption and experimentation within the AI community [45].
量子位编辑作者招聘
量子位· 2025-12-01 04:26
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...
对商户投放ROI负责,这个视频营销Agent底气从何而来?丨对话布尔向量
量子位· 2025-11-30 11:30
Core Insights - The article discusses the emergence of Temvideo, an AI video agent developed by Boolvector, aimed at addressing the marketing challenges faced by cross-border e-commerce businesses. The product enhances video production efficiency and reduces costs while maintaining high ROI for users [6][11]. Group 1: Product Overview - Temvideo is the world's first AI video agent designed specifically for marketing scenarios, targeting the pain points of low efficiency and high costs in video production for cross-border e-commerce [11]. - The core functionality of Temvideo includes batch video generation using verified high ROI templates, significantly reducing production time while achieving quality comparable to human-made videos [11][12]. - The product is designed to cater to e-commerce users with annual revenues between 10 million and 100 million, focusing on their advertising needs and ensuring high click-through and conversion rates [12][22]. Group 2: Unique Features and Advantages - Temvideo's design incorporates industry know-how, allowing it to generate effective marketing videos based on successful past campaigns, thus enhancing the quality of output [12][36]. - The product utilizes a combination of large models and industry-specific algorithms to improve video content understanding and production accuracy, addressing the limitations of generic AI models [30][32]. - Temvideo's ability to automatically segment video clips and match background music enhances the overall video quality, meeting the high detail requirements of merchants [29][30]. Group 3: Market Context and Trends - The article highlights that only about 10% of e-commerce businesses currently utilize AI video and image generation technologies, indicating significant room for growth in this sector [71]. - The demand for high-quality video content on social media platforms is increasing, with platforms like TikTok and Meta requiring more engaging and effective video advertisements [75][76]. - The potential market for AI-generated video content is substantial, with two primary business models: charging per video produced or sharing revenue based on performance metrics [78][79]. Group 4: Challenges and Future Directions - The article notes that many AI products face challenges in user retention due to high expectations and the complexity of AI capabilities, which can lead to unsatisfactory results [86]. - Boolvector aims to balance result delivery and cost control, focusing on optimizing the video generation process to ensure user satisfaction and retention [92][93]. - The future vision for Temvideo includes transitioning from a pay-per-video model to a performance-based payment system, fostering a sustainable business model that aligns with user success [95][98].
Transformer作者爆料GPT-5.1内幕!OpenAI内部命名规则变乱了
量子位· 2025-11-30 11:30
Core Insights - The article discusses a significant paradigm shift in AI, indicating that the development of AI is not slowing down but rather transitioning to a new phase of growth [1][7][12]. Group 1: AI Development Trends - There are two contrasting views on AI development: one claims that AI growth is slowing down, while the other highlights continuous advancements with new models like GPT-5.1 and Gemini 3 being released [3][12]. - Łukasz Kaiser argues that the perception of slowing growth is incorrect, stating that AI's capability growth follows a smooth exponential curve, akin to Moore's Law [15][16]. - The shift from pre-training to reasoning models is a key factor in this transition, with pre-training being in a later stage of its S-curve while reasoning models are still in their early stages [18][19]. Group 2: Reasoning Models and Their Impact - The industry is focusing on smaller, cost-effective models that maintain quality, leading to the misconception that pre-training has stalled [21]. - Reasoning models, which allow for more complex thought processes and the use of tools during inference, are expected to progress rapidly due to their emerging nature [22][27]. - The evolution of models like ChatGPT demonstrates a qualitative leap in performance, with newer versions incorporating reasoning and external tool usage for more accurate responses [23][24]. Group 3: GPT-5.1 Insights - GPT-5.1 is not merely a minor update but represents a significant stability iteration, enhancing reasoning capabilities through reinforcement learning and synthetic data [34][35]. - The naming convention for versions has shifted to focus on user experience rather than technical details, allowing for greater flexibility in development [38]. - Despite improvements, GPT-5.1 still has limitations, particularly in multi-modal reasoning, as illustrated by its struggles with basic tasks that require contextual understanding [41][42]. Group 4: Future of AI and Robotics - AI is expected to change the nature of work without eliminating jobs, as human expertise will still be needed in high-stakes scenarios [62][66]. - Home robots are anticipated to be the next visible AI revolution, driven by advancements in multi-modal capabilities and general reinforcement learning [67][69]. - The integration of these technologies is expected to lead to a significant leap in the capabilities of home robots, making them more intuitive and perceptible compared to current AI models like ChatGPT [69].