Workflow
自回归模型
icon
Search documents
12秒生成1万token!谷歌推出文本「扩散模型」Gemini Diffusion,研究员:演示都得降速看
量子位· 2025-05-21 10:39
Core Viewpoint - Google DeepMind has introduced Gemini Diffusion, a new language model that utilizes diffusion technology to significantly enhance text generation speed and quality compared to traditional autoregressive models [1][4][9]. Group 1: Technology and Performance - Gemini Diffusion can generate text at a speed of 2000 tokens per second, which is faster than the previous model, Gemini 2.0 Flash-Lite [7][11]. - The model employs a unique approach of refining noise to learn output generation, allowing for rapid iterations and error correction during the generation process [6][10][15]. - Unlike traditional models that generate one token at a time, Gemini Diffusion can generate entire blocks of tokens simultaneously, resulting in more coherent responses [14][9]. Group 2: Benchmarking and Comparisons - Benchmark tests show that Gemini Diffusion performs comparably to larger models, with specific metrics indicating it outperforms Gemini 2.0 Flash-Lite in several coding tasks [8]. - For example, in the HumanEval benchmark, Gemini Diffusion achieved a score of 76.0%, slightly higher than Gemini 2.0 Flash-Lite's 75.8% [8]. Group 3: Implications and Future Directions - The introduction of diffusion technology in language models suggests a potential shift towards more hybrid models in the future, as seen in similar research by other institutions [19][20]. - The ability to perform non-causal reasoning during text generation opens up new possibilities for complex problem-solving tasks that traditional autoregressive models struggle with [16][17].
阶跃星辰开源图像编辑模型Step1X-Edit;阿里巴巴AI旗舰应用夸克发布全新“AI相机”丨AIGC日报
创业邦· 2025-04-27 23:48
扫码订阅 AIGC 产业日报, 3.【Meta Token-Shuffle登场:自回归模型突破瓶颈,可AI生成 2048×2048 分辨率图像】报道称Meta AI创 新推出Token-Shuffle,目标解决自回归(Autoregressive,AR)模型在生成高分辨率图像方面的扩展难 题。在语言生成方面,自回归模型大放异彩,近年来也被广泛探索用于图像合成,然而在面对高分辨率 图像时,AR模型遭遇瓶颈。不同于文本生成仅需少量token,图像合成中高分辨率图片往往需要数千个 token,计算成本随之暴增。这让许多基于 AR 的多模态模型只能处理低中分辨率图像,限制了其在精细 图像生成中的应用。尽管扩散模型(Diffusion Models)在高分辨率上表现强劲,但其复杂的采样过程和 较慢的推理速度也存在局限。(搜狐) 4.【Adobe发布Firefly Image Model 4模型:AI生图再升级】Adobe发布博文,推出Firefly Image Model 4和 Firefly Image Model 4 Ultra两款文本生成图像AI模型,并预告针对Photoshop和Illustrator的Crea ...
“计算机视觉被GPT-4o终结了”(狗头)
量子位· 2025-03-29 07:46
Core Viewpoint - The article discusses the advancements in computer vision (CV) and image generation capabilities brought by the new GPT-4o model, highlighting its potential to disrupt existing tools and methodologies in the field [1][2]. Group 1: Technological Advancements - GPT-4o introduces native multimodal image generation, expanding the functionalities of AI tools beyond traditional applications [2][12]. - The image generation process in GPT-4o is based on a self-regressive model, differing from the diffusion model used in DALL·E, which allows for better adherence to instructions and enhanced image editing capabilities [15][19]. - Observations suggest that the image generation may involve a multi-scale self-regressive combination, where a rough image is generated first, followed by detail filling while the rough shape evolves [17][19]. Group 2: Industry Impact - The advancements in GPT-4o's capabilities have raised concerns among designers and computer vision researchers, indicating a significant shift in the competitive landscape of AI tools [6][10]. - OpenAI's approach of scaling foundational models to achieve these capabilities has surprised many in the industry, suggesting a new trend in AI development [12][19]. - The potential for GPT-4o to enhance applications in autonomous driving has been noted, with implications for future developments in this sector [10]. Group 3: Community Engagement - The article encourages community members to share their experiences and innovative uses of GPT-4o, fostering a collaborative environment for exploring AI applications [26].