AI图像编辑
Search documents
图像编辑太慢太粗糙?全新开源自回归模型实现精准秒级修改 | 智象未来
量子位· 2025-09-02 10:45
Core Viewpoint - The rapid development of AI image editing technology, particularly diffusion models, faces challenges such as affecting the entire image when modifying a detail and slow generation speed, which hinders real-time interaction [1][2]. Group 1: Introduction of VAREdit - HiDream.ai has introduced a new self-regressive image editing framework called VAREdit, which aims to address the challenges faced by existing models [2][3]. - VAREdit incorporates a Visual Autoregressive (VAR) architecture that significantly enhances editing accuracy and generation speed, marking a new phase in image editing [3][5]. Group 2: Technical Details - VAREdit defines image editing as a next-scale prediction problem, generating the next-scale target feature residuals autoregressively for precise image editing [5]. - The model encodes image representations into multi-scale residual visual token sequences, allowing for the accumulation of features through codebook queries and upsampling operations [6]. Group 3: Model Design Challenges - A core challenge in designing VAREdit is integrating source image information into the backbone network as reference information for target scale generation [12]. - Two initial organizational schemes were explored: full-scale conditions, which increased computational costs, and maximum-scale conditions, which led to scale mismatches [13][14]. Group 4: Scale Alignment Reference Module - The Scale Alignment Reference (SAR) module was proposed as a hybrid solution, providing multi-scale alignment references in the first layer while focusing on the finest scale features in subsequent layers [17]. - This approach enhances the model's performance by allowing for better attention distribution across different scales [15]. Group 5: Benchmark Performance - VAREdit has shown outstanding performance in benchmark tests, outperforming competitors in both CLIP and GPT metrics, indicating superior editing accuracy [18][19]. - The VAREdit-8.4B model improved the GPT-Balance metric by 41.5% compared to ICEdit and 30.8% compared to UltraEdit, while the lightweight VAREdit-2.2B also achieved significant improvements [19]. Group 6: Speed and Efficiency - VAREdit demonstrates a clear advantage in speed, with the 8.4B model completing edits in 1.2 seconds for a 512×512 image, making it 2.2 times faster than similar diffusion models [20]. - The 2.2B model requires only 0.7 seconds, providing an instant editing experience while maintaining high quality [20]. Group 7: Versatility and Future Directions - VAREdit is versatile, achieving the best results across most editing types, with larger models compensating for smaller models' shortcomings in global style and text editing [23]. - The HiDream.ai team plans to continue exploring next-generation multi-modal image editing architectures to enhance quality, speed, and controllability in instruction-guided image generation technology [27].
阿里云收入增26%创三年新高,计算机ETF(159998)年内份额增长率居同标的第一,云计算ETF沪港深(517390)盘中大涨超4%
2 1 Shi Ji Jing Ji Bao Dao· 2025-09-01 02:25
Group 1 - The cloud computing ETF (517390) experienced a significant increase, rising over 4.5% at one point and closing with a gain of 1.55%, with a trading volume exceeding 19 million yuan and a premium rate of 0.13% [1] - Key stocks within the cloud computing ETF included Alibaba, which rose over 15%, and other companies like Data Harbor and Zhongji Xuchuang also saw gains [1] - The computer ETF (159998) reported a slight increase of 0.09% with a trading volume over 22 million yuan, and it had a net inflow of over 36 million yuan in the previous trading day [1][2] Group 2 - The computer ETF (159998) tracks the CSI Computer Theme Index, which includes companies involved in IT services, application software, system software, and computer hardware [2] - Alibaba's recent financial report indicated a record capital expenditure of 38.6 billion yuan in AI and cloud investments, with cloud revenue growth accelerating to 26%, marking a three-year high [2] - The computer industry is showing signs of recovery, with median revenue growth rates projected at 3.17% and profit growth at 7.6% for the first half of 2025, indicating a sustained upward trend [3]
谷歌又赢了,nano banana「被迫」改名后,网友搞出7种神仙玩法
机器之心· 2025-08-28 10:40
Core Viewpoint - Google has claimed the AI image editing model "nano banana," renaming it to "Gemini-2.5-flash-image," which has gained significant popularity, comparable to the excitement generated by GPT-4o [2][5]. Group 1: Model Features and Capabilities - The Gemini-2.5-flash-image model is faster, cheaper, and more capable in image generation and editing compared to competitors, receiving widespread praise as the best AI photo editor [5]. - Users can experience the model for free through Gemini applications and Google AI Studio, allowing for easy image uploads and text prompts [5][10]. - The model can create isometric models by easily isolating buildings or objects, transforming night scenes into daytime images while adding missing architectural details [9][12]. Group 2: Innovative Use Cases - Users have developed various creative applications, such as generating location-based augmented reality experiences by annotating real-world images [15][18]. - The model can produce multiple views of a subject in a consistent isometric perspective, useful for product modeling and industrial design [12]. - It can generate detailed natural landscape images based on digital elevation models (DEMs), accurately reflecting terrain features [26]. Group 3: Fashion and Style Applications - The model allows users to upload outfit photos and instantly generate a clothing list, appealing to fashion enthusiasts [27]. - It can also transform outfits of both real and animated characters, although some minor inaccuracies may occur [31]. Group 4: Creative Content Generation - Users can create storyboard frames for films by uploading character portraits and providing simple prompts, showcasing versatility in style [37]. - The model can recognize hand-drawn content and generate complex action scenes based on specified poses [40]. - It can convert photographs into black-and-white manga styles while adding dynamic effects and even create humorous comic panels based on prompts [43][44]. Group 5: Restoration and Enhancement - The model excels in restoring old photographs and adding color to black-and-white images, demonstrating its capabilities in traditional photo editing tasks [50].
谷歌发布图像生成模型纳米香蕉;白宫宣布持股英特尔;京东官宣进军团播
Guan Cha Zhe Wang· 2025-08-27 01:04
Group 1: Google - Google officially launched its advanced image generation and editing model Gemini 2.5 Flash Image, codenamed "nano banana" [1] - The model ranks first in the LMArena benchmark for AI image editing, featuring character consistency, precise natural language editing, and multi-image fusion capabilities [1] - Users can access the model through the Gemini App and API, with API pricing set at $30 per million output tokens, and the cost to generate a single image approximately $0.039 [1] Group 2: Alibaba Cloud - Alibaba Cloud's model service platform, Baolian, announced a price reduction for certain model context caching [2] - The new pricing structure charges 20% of the input token price for cached tokens, down from 40% [2] Group 3: Apple - Apple announced its annual fall product launch event scheduled for September 10, with expectations to unveil the new iPhone 17 series [4] - The event will also serve as a platform for Apple to showcase its latest advancements in AI technology [4] Group 4: Intel - The U.S. government announced an investment of $8.9 billion for a 9.9% stake in Intel, primarily funded by subsidies from the CHIPS Act [5] - This move marks a departure from the government's usual practice of only intervening in crises, raising concerns about potential impacts on corporate governance [5] Group 5: Cambricon - Cambricon reported a staggering 4347% year-on-year increase in revenue for the first half of the year, totaling 2.881 billion yuan [5] - The company achieved a net profit of 1.038 billion yuan, reversing a loss of 530 million yuan in the same period last year [5] Group 6: BYD - BYD announced its first export of electric vehicles from its Thailand factory to Europe, with over 900 units shipped to the UK, Germany, and Belgium [6] Group 7: Douyin - Douyin launched a minor protection mode for minors, which disables certain features like video recommendations and third-party interactions when activated by parents [3] Group 8: JD.com - JD.com announced its entry into group broadcasting, set to launch during the Qixi Festival on August 28, featuring well-known idol groups [8][9] Group 9: Xiaohongshu - Xiaohongshu is testing a new version of its app that positions e-commerce as a primary entry point, with a "market" option added to the main interface [10]
智象未来发布全新自回归图像编辑框架 VAREdit ,0.7 秒完成高保真图像编辑
Ge Long Hui· 2025-08-25 06:26
Core Insights - The launch of VAREdit marks a significant breakthrough in image editing technology, being the world's first purely autoregressive image editing model [1] - VAREdit enhances editing speed to 0.7 seconds, facilitating real-time interaction and efficient creation [1] Group 1: Technology and Innovation - VAREdit addresses limitations of diffusion models in image editing, such as imprecise modifications and low efficiency in multi-step iterations [1] - The framework introduces a visual autoregressive (VAR) architecture, defining editing as "next-scale prediction" to achieve precise local modifications while maintaining overall structure [1] - The innovative Scale Alignment Reference (SAR) module effectively resolves scale matching issues, further improving editing quality and efficiency [1] Group 2: Performance Metrics - In authoritative benchmarks EMU-Edit and PIE-Bench, VAREdit outperforms competitors across various metrics, including CLIP and GPT [1] - The VAREdit-8.4B model shows a 41.5% and 30.8% improvement in the GPT-Balance metric compared to ICEdit and UltraEdit, respectively [1] - The lightweight VAREdit-2.2B model can achieve high-fidelity editing of 512×512 images within 0.7 seconds, resulting in multiple speed enhancements [1] Group 3: Future Developments - VAREdit is fully open-sourced on GitHub and Hugging Face platforms, indicating a commitment to community engagement and collaboration [2] - The company plans to explore applications in video editing and multimodal generation, aiming to advance AI image editing into a new era of efficiency, control, and real-time capabilities [2]