VAREdit
Search documents
图像编辑太慢太粗糙?全新开源自回归模型实现精准秒级修改 | 智象未来
量子位· 2025-09-02 10:45
Core Viewpoint - The rapid development of AI image editing technology, particularly diffusion models, faces challenges such as affecting the entire image when modifying a detail and slow generation speed, which hinders real-time interaction [1][2]. Group 1: Introduction of VAREdit - HiDream.ai has introduced a new self-regressive image editing framework called VAREdit, which aims to address the challenges faced by existing models [2][3]. - VAREdit incorporates a Visual Autoregressive (VAR) architecture that significantly enhances editing accuracy and generation speed, marking a new phase in image editing [3][5]. Group 2: Technical Details - VAREdit defines image editing as a next-scale prediction problem, generating the next-scale target feature residuals autoregressively for precise image editing [5]. - The model encodes image representations into multi-scale residual visual token sequences, allowing for the accumulation of features through codebook queries and upsampling operations [6]. Group 3: Model Design Challenges - A core challenge in designing VAREdit is integrating source image information into the backbone network as reference information for target scale generation [12]. - Two initial organizational schemes were explored: full-scale conditions, which increased computational costs, and maximum-scale conditions, which led to scale mismatches [13][14]. Group 4: Scale Alignment Reference Module - The Scale Alignment Reference (SAR) module was proposed as a hybrid solution, providing multi-scale alignment references in the first layer while focusing on the finest scale features in subsequent layers [17]. - This approach enhances the model's performance by allowing for better attention distribution across different scales [15]. Group 5: Benchmark Performance - VAREdit has shown outstanding performance in benchmark tests, outperforming competitors in both CLIP and GPT metrics, indicating superior editing accuracy [18][19]. - The VAREdit-8.4B model improved the GPT-Balance metric by 41.5% compared to ICEdit and 30.8% compared to UltraEdit, while the lightweight VAREdit-2.2B also achieved significant improvements [19]. Group 6: Speed and Efficiency - VAREdit demonstrates a clear advantage in speed, with the 8.4B model completing edits in 1.2 seconds for a 512×512 image, making it 2.2 times faster than similar diffusion models [20]. - The 2.2B model requires only 0.7 seconds, providing an instant editing experience while maintaining high quality [20]. Group 7: Versatility and Future Directions - VAREdit is versatile, achieving the best results across most editing types, with larger models compensating for smaller models' shortcomings in global style and text editing [23]. - The HiDream.ai team plans to continue exploring next-generation multi-modal image editing architectures to enhance quality, speed, and controllability in instruction-guided image generation technology [27].
智象未来发布全新自回归图像编辑框架VAREdit;豆包未成年人保护模式上线丨AIGC日报
创业邦· 2025-08-27 00:12
Group 1 - The new self-regressive image editing framework VAREdit has been launched by Zhixiang Future, which executes user commands accurately and improves editing speed to 0.7 seconds [2] - Doubao has introduced a minor protection mode that, when activated by parents, disables recommended videos, browsing third-party websites, and interactions with AI outside of Doubao, while still allowing translation and research functions [2] - Saudi AI company Humain has begun construction on its first data center, set to operate by early 2026, and plans to import chips from suppliers like Nvidia to establish Saudi Arabia as a regional AI hub [2] - Alibaba Cloud's model service platform, Bailian, has announced a price reduction for certain model context caching, lowering the cost of cached input tokens from 40% to 20% of the standard input token price [2]
马斯克正式起诉OpenAI和苹果,电商成为小红书一级入口 | 蓝媒GPT
Sou Hu Cai Jing· 2025-08-26 10:48
Group 1: Legal Actions and Allegations - Musk's company xAI has filed a lawsuit against OpenAI and Apple, accusing them of illegal collusion to hinder competition in the AI sector [1] - Musk claims that Apple is violating antitrust laws by favoring OpenAI in its app store rankings, making it difficult for other AI companies to compete [1] - Musk has threatened immediate legal action against Apple and questioned why Apple has not included xAI's applications in its recommended section [1] Group 2: E-commerce Developments - Xiaohongshu has launched a new version of its app, making e-commerce a primary entry point, indicating a significant investment in its e-commerce business [1] - The new version features a "Market" page in the bottom navigation bar, positioned near the main "Home" page, showcasing Xiaohongshu's lifestyle e-commerce offerings [1] Group 3: Technology and Product Innovations - Zhixiang Future has introduced a new self-regressive image editing framework called VAREdit, which executes user commands accurately and improves editing speed to 0.7 seconds [1] - Multiple smartphone manufacturers, including Vivo, Honor, Xiaomi, Huawei, and OPPO, are entering the mixed reality (MR) and augmented reality (AR) markets, reflecting a strategic shift towards new human-computer interaction opportunities [2] - Meta Platforms plans to unveil a new type of smart glasses with a display at the upcoming Connect conference, priced around $800, indicating ongoing expansion in augmented reality and wearable technology [3]
0.7秒实现精准图像编辑!智象未来团队提出全新自回归图像编辑框架VAREdit
Mei Ri Jing Ji Xin Wen· 2025-08-25 07:35
Core Insights - The article discusses the introduction of a new image editing framework called VAREdit by Zhixiang Future, aimed at addressing issues of "loss of control" and inefficiency in image editing processes [1] Group 1: VAREdit Framework - VAREdit incorporates a Visual Auto-Regressive (VAR) architecture into image editing, presenting a novel instruction-guided editing framework [1] - The framework has shown significant advantages in benchmark tests, outperforming traditional CLIP metrics and demonstrating improved editing precision with the GPT metrics [1] Group 2: Performance Metrics - VAREdit-8.4B achieved a 41.5% improvement over ICEdit and a 30.8% improvement over UltraEdit in the GPT-Balance metric [1] - The lightweight version, VAREdit-2.2B, can perform high-fidelity editing of 512×512 images in just 0.7 seconds [1] Group 3: Availability - VAREdit has been fully open-sourced on platforms such as GitHub and Hugging Face [1]
智象未来发布全新自回归图像编辑框架 VAREdit ,0.7 秒完成高保真图像编辑
Ge Long Hui· 2025-08-25 06:26
Core Insights - The launch of VAREdit marks a significant breakthrough in image editing technology, being the world's first purely autoregressive image editing model [1] - VAREdit enhances editing speed to 0.7 seconds, facilitating real-time interaction and efficient creation [1] Group 1: Technology and Innovation - VAREdit addresses limitations of diffusion models in image editing, such as imprecise modifications and low efficiency in multi-step iterations [1] - The framework introduces a visual autoregressive (VAR) architecture, defining editing as "next-scale prediction" to achieve precise local modifications while maintaining overall structure [1] - The innovative Scale Alignment Reference (SAR) module effectively resolves scale matching issues, further improving editing quality and efficiency [1] Group 2: Performance Metrics - In authoritative benchmarks EMU-Edit and PIE-Bench, VAREdit outperforms competitors across various metrics, including CLIP and GPT [1] - The VAREdit-8.4B model shows a 41.5% and 30.8% improvement in the GPT-Balance metric compared to ICEdit and UltraEdit, respectively [1] - The lightweight VAREdit-2.2B model can achieve high-fidelity editing of 512×512 images within 0.7 seconds, resulting in multiple speed enhancements [1] Group 3: Future Developments - VAREdit is fully open-sourced on GitHub and Hugging Face platforms, indicating a commitment to community engagement and collaboration [2] - The company plans to explore applications in video editing and multimodal generation, aiming to advance AI image editing into a new era of efficiency, control, and real-time capabilities [2]