FLUX.1

Search documents
黄仁勋:家用 240W,这才是交给马斯克的“第一台 AI”
3 6 Ke· 2025-10-17 00:24
2025 年 10 月 15日,英伟达 CEO 黄仁勋亲手把一台小巧得像纸质书的设备,交到了马斯克手里。 地点在德州,星舰发射基地。 他说:"想象一下,把最小的超级计算机,放在最大火箭旁边工作。" 这并非一次普通的设备交付,而是一场隆重的仪式。在工程师们的夹道欢迎中,马斯克郑重接过这台名为 DGX Spark 的机器。 与此同时,全球另一端,一场规模空前的收购刚刚完成: 贝莱德、微软、英伟达联合组成财团,以 400 亿美元,收购了全球最大数据中心运营商之一 Aligned。 这笔交易背后,是 AI 产业疯狂扩张的底层共识:算力,是核心资源。 但就在资本大举押注 5 吉瓦级别的"云上战场"时,黄仁勋悄悄打开了另一扇门。 DGX Spark不是更大的GPU,也不是性能最强的主机。 它能本地跑2000亿参数大模型,接入戴尔、联想、惠普的桌面系统,支持Ollama、Roboflow、LM Studio运行私有模型。 它代表着AI正从云端中心走向个人边界,不再只是建在远方的基础设施,而是第一次真正装进了你的桌面。 这台1.2公斤的超算,它的意义远不止于一个产品发布。 因为真正重要的,不是他把一台 AI 超算交给马斯克。 ...
让多模态大模型「想明白再画」!港大等开源GoT-R1:强化学习解锁视觉生成推理新范式
机器之心· 2025-06-25 06:50
Core Viewpoint - The article discusses the significant advancements in multimodal large models for generating high-fidelity images from complex text prompts, while also highlighting the challenges faced in accurately interpreting spatial relationships and multi-object attributes [1][2]. Group 1: Introduction of GoT-R1 - A research team from the University of Hong Kong, Chinese University of Hong Kong, and SenseTime has introduced GoT-R1, an important advancement following the Generation Chain-of-Thought (GoT) framework [2]. - GoT-R1 enhances the semantic-spatial reasoning capabilities of multimodal large models through the innovative application of reinforcement learning, allowing the model to autonomously explore and learn better reasoning strategies [3][5]. Group 2: Limitations of GoT Framework - The GoT framework improves image generation accuracy and controllability by explicitly planning semantic content and spatial layout before image generation, but its reasoning capabilities are limited by supervised fine-tuning data based on predefined templates [4][13]. - GoT-R1 aims to overcome these limitations by introducing reinforcement learning into the semantic-spatial reasoning process, enabling the model to learn and optimize reasoning paths independently [5][13]. Group 3: Reward Mechanism in GoT-R1 - GoT-R1 constructs a comprehensive and effective reward mechanism for visual generation tasks, evaluating multiple dimensions of the generated results, including semantic consistency, spatial accuracy, and overall aesthetic quality [13][14]. - The reward framework includes: 1. Reasoning Process Evaluation Reward (RPR) [14] 2. Reasoning-to-Image Alignment Reward (RRI), which quantifies adherence to the reasoning chain using Intersection over Union (IoU) [15] 3. Semantic Alignment Reward (Rsem) and Spatial Alignment Reward (Rspa), which assess the completeness and accuracy of the reasoning chain against the original text prompt [16] 4. Text-to-Image Alignment Reward (RPI), which evaluates the overall consistency of the generated image with the original text prompt [17]. Group 4: Performance Evaluation of GoT-R1 - GoT-R1 was evaluated on the challenging T2I-CompBench, where it established new state-of-the-art (SOTA) performance, achieving the highest scores in five out of six evaluation categories [21][23]. - The model demonstrated significant advantages in handling complex, multi-layered instructions, particularly in the "Complex" benchmark [23]. - Compared to the baseline model, GoT-R1-7B achieved up to a 15% improvement in evaluation metrics, showcasing the effectiveness of reinforcement learning in enhancing the model's reasoning capabilities [24][25]. Group 5: Comparison of Reasoning Chains - A comparative analysis using GPT-4o revealed that GoT-R1 generated reasoning chains were preferred over those from the baseline model across all evaluation categories, particularly in spatial relationship understanding [25][26].