Workflow
AI图像编辑
icon
Search documents
图像编辑太慢太粗糙?全新开源自回归模型实现精准秒级修改 | 智象未来
量子位· 2025-09-02 10:45
智象未来团队 投稿 量子位 | 公众号 QbitAI AI图像编辑技术发展迅猛,扩散模型凭借强大的生成能力,成为行业主流。 但这类模型在实际应用中始终面临两大难题:一是"牵一发而动全身",即便只想修改一个细节,系统也可能影响到整个画面;二是生成速度缓 慢,难以满足实时交互的需求。 针对这些痛点,智象未来(HiDream.ai)团队开辟了新路径:提出全新的自回归图像编辑框架 VAREdit 。 它引入了视觉自回归(VAR)架构,能够在遵循指令的前提下做到"指哪打哪",大幅提升编辑精准度与生成速度,推动图像编辑进入新的阶 段。 模型与代码均已开源,具体链接可见文末。 全新自回归图像编辑框架VAREdit 智象未来提出的VAREdit将视觉自回归建模引入指令引导的图像编辑中,将图像编辑定义为下一尺度预测问题,通过自回归地生成下一尺度目 标特征残差,以实现精确的图像编辑。 多尺度量化编码 :将图像表征 编码为多尺度残差视觉令牌序列R₁,R₂,…,Rₖ,其中Rₖ的空间规模(hₖ,wₖ)随着k的增大而依次递 增;融合前k个尺度残差信息的连续累积特征可通过码本查询和上采样操作进行加和,表示为 。 该方法虽能提供逐尺度参考, ...
阿里云收入增26%创三年新高,计算机ETF(159998)年内份额增长率居同标的第一,云计算ETF沪港深(517390)盘中大涨超4%
Group 1 - The cloud computing ETF (517390) experienced a significant increase, rising over 4.5% at one point and closing with a gain of 1.55%, with a trading volume exceeding 19 million yuan and a premium rate of 0.13% [1] - Key stocks within the cloud computing ETF included Alibaba, which rose over 15%, and other companies like Data Harbor and Zhongji Xuchuang also saw gains [1] - The computer ETF (159998) reported a slight increase of 0.09% with a trading volume over 22 million yuan, and it had a net inflow of over 36 million yuan in the previous trading day [1][2] Group 2 - The computer ETF (159998) tracks the CSI Computer Theme Index, which includes companies involved in IT services, application software, system software, and computer hardware [2] - Alibaba's recent financial report indicated a record capital expenditure of 38.6 billion yuan in AI and cloud investments, with cloud revenue growth accelerating to 26%, marking a three-year high [2] - The computer industry is showing signs of recovery, with median revenue growth rates projected at 3.17% and profit growth at 7.6% for the first half of 2025, indicating a sustained upward trend [3]
谷歌又赢了,nano banana「被迫」改名后,网友搞出7种神仙玩法
机器之心· 2025-08-28 10:40
Core Viewpoint - Google has claimed the AI image editing model "nano banana," renaming it to "Gemini-2.5-flash-image," which has gained significant popularity, comparable to the excitement generated by GPT-4o [2][5]. Group 1: Model Features and Capabilities - The Gemini-2.5-flash-image model is faster, cheaper, and more capable in image generation and editing compared to competitors, receiving widespread praise as the best AI photo editor [5]. - Users can experience the model for free through Gemini applications and Google AI Studio, allowing for easy image uploads and text prompts [5][10]. - The model can create isometric models by easily isolating buildings or objects, transforming night scenes into daytime images while adding missing architectural details [9][12]. Group 2: Innovative Use Cases - Users have developed various creative applications, such as generating location-based augmented reality experiences by annotating real-world images [15][18]. - The model can produce multiple views of a subject in a consistent isometric perspective, useful for product modeling and industrial design [12]. - It can generate detailed natural landscape images based on digital elevation models (DEMs), accurately reflecting terrain features [26]. Group 3: Fashion and Style Applications - The model allows users to upload outfit photos and instantly generate a clothing list, appealing to fashion enthusiasts [27]. - It can also transform outfits of both real and animated characters, although some minor inaccuracies may occur [31]. Group 4: Creative Content Generation - Users can create storyboard frames for films by uploading character portraits and providing simple prompts, showcasing versatility in style [37]. - The model can recognize hand-drawn content and generate complex action scenes based on specified poses [40]. - It can convert photographs into black-and-white manga styles while adding dynamic effects and even create humorous comic panels based on prompts [43][44]. Group 5: Restoration and Enhancement - The model excels in restoring old photographs and adding color to black-and-white images, demonstrating its capabilities in traditional photo editing tasks [50].
谷歌发布图像生成模型纳米香蕉;白宫宣布持股英特尔;京东官宣进军团播
Guan Cha Zhe Wang· 2025-08-27 01:04
Group 1: Google - Google officially launched its advanced image generation and editing model Gemini 2.5 Flash Image, codenamed "nano banana" [1] - The model ranks first in the LMArena benchmark for AI image editing, featuring character consistency, precise natural language editing, and multi-image fusion capabilities [1] - Users can access the model through the Gemini App and API, with API pricing set at $30 per million output tokens, and the cost to generate a single image approximately $0.039 [1] Group 2: Alibaba Cloud - Alibaba Cloud's model service platform, Baolian, announced a price reduction for certain model context caching [2] - The new pricing structure charges 20% of the input token price for cached tokens, down from 40% [2] Group 3: Apple - Apple announced its annual fall product launch event scheduled for September 10, with expectations to unveil the new iPhone 17 series [4] - The event will also serve as a platform for Apple to showcase its latest advancements in AI technology [4] Group 4: Intel - The U.S. government announced an investment of $8.9 billion for a 9.9% stake in Intel, primarily funded by subsidies from the CHIPS Act [5] - This move marks a departure from the government's usual practice of only intervening in crises, raising concerns about potential impacts on corporate governance [5] Group 5: Cambricon - Cambricon reported a staggering 4347% year-on-year increase in revenue for the first half of the year, totaling 2.881 billion yuan [5] - The company achieved a net profit of 1.038 billion yuan, reversing a loss of 530 million yuan in the same period last year [5] Group 6: BYD - BYD announced its first export of electric vehicles from its Thailand factory to Europe, with over 900 units shipped to the UK, Germany, and Belgium [6] Group 7: Douyin - Douyin launched a minor protection mode for minors, which disables certain features like video recommendations and third-party interactions when activated by parents [3] Group 8: JD.com - JD.com announced its entry into group broadcasting, set to launch during the Qixi Festival on August 28, featuring well-known idol groups [8][9] Group 9: Xiaohongshu - Xiaohongshu is testing a new version of its app that positions e-commerce as a primary entry point, with a "market" option added to the main interface [10]
智象未来发布全新自回归图像编辑框架 VAREdit ,0.7 秒完成高保真图像编辑
Ge Long Hui· 2025-08-25 06:26
Core Insights - The launch of VAREdit marks a significant breakthrough in image editing technology, being the world's first purely autoregressive image editing model [1] - VAREdit enhances editing speed to 0.7 seconds, facilitating real-time interaction and efficient creation [1] Group 1: Technology and Innovation - VAREdit addresses limitations of diffusion models in image editing, such as imprecise modifications and low efficiency in multi-step iterations [1] - The framework introduces a visual autoregressive (VAR) architecture, defining editing as "next-scale prediction" to achieve precise local modifications while maintaining overall structure [1] - The innovative Scale Alignment Reference (SAR) module effectively resolves scale matching issues, further improving editing quality and efficiency [1] Group 2: Performance Metrics - In authoritative benchmarks EMU-Edit and PIE-Bench, VAREdit outperforms competitors across various metrics, including CLIP and GPT [1] - The VAREdit-8.4B model shows a 41.5% and 30.8% improvement in the GPT-Balance metric compared to ICEdit and UltraEdit, respectively [1] - The lightweight VAREdit-2.2B model can achieve high-fidelity editing of 512×512 images within 0.7 seconds, resulting in multiple speed enhancements [1] Group 3: Future Developments - VAREdit is fully open-sourced on GitHub and Hugging Face platforms, indicating a commitment to community engagement and collaboration [2] - The company plans to explore applications in video editing and multimodal generation, aiming to advance AI image editing into a new era of efficiency, control, and real-time capabilities [2]