Workflow
量子位
icon
Search documents
让AI像人类画家一样边画边想,港中文&美团让模型「走一步看一步」
量子位· 2025-12-22 04:41
Core Viewpoint - The article discusses the introduction of a new paradigm called Thinking-while-Generating (TwiG), which interleaves textual reasoning with visual generation to enhance the capabilities of models in generating complex images and videos, addressing limitations of existing models in handling spatial relationships and object interactions [5][19]. Group 1: Existing Challenges - Current diffusion and autoregressive models, such as FLUX.1 and Emu3, struggle with generating accurate representations of complex spatial relationships and interactions, often resulting in errors like misplacing objects or incorrect quantities [1]. - Two main approaches have been previously explored: "Think-before-Generation," which lacks flexibility, and "Think-after-Generation," which incurs high computational costs and delays [4]. Group 2: Introduction of TwiG - TwiG allows models to pause during the generation process to evaluate and plan the next steps, mimicking human artistic processes [5][7]. - The framework breaks down visual generation into a cycle of "generate-think-regenerate," enabling models to incorporate reasoning at multiple points during the creation process [7]. Group 3: Core Dimensions of TwiG - The framework consists of three key dimensions: 1. **When to Think**: The model creates a "thinking schedule" based on user prompts, optimizing the generation process into three stages that align with the semantic structure of images [8]. 2. **What to Say**: At each pause, the model generates a "thought chain" that guides the next steps in a more precise manner than traditional prompts [9]. 3. **How to Refine**: After completing a section, the model performs self-reflection to correct any mistakes immediately, enhancing efficiency [10]. Group 4: Empirical Research and Results - The research team conducted experiments on a unified multimodal model (Janus-Pro) to validate the TwiG framework, demonstrating its potential through various stages of testing [12]. - **Zero-Shot Performance**: The TwiG-ZS model showed remarkable "think-while-generating" capabilities without parameter updates, outperforming baseline models in multiple dimensions [13][14]. - **Supervised Fine-Tuning (SFT)**: A dataset of 50K was used for SFT, which improved the model's coherence and control over generated thought chains [16]. - **Reinforcement Learning (RL)**: The TwiG-RL model, optimized with a specific RL strategy, demonstrated competitive performance against existing models like Emu3 and FLUX.1 in key metrics [17]. Group 5: Conclusions and Future Implications - The introduction of TwiG represents a shift in how visual generation models operate, emphasizing the need for logical reasoning in generation processes [19]. - Key conclusions include the necessity of explicit reasoning for complex logic, the efficiency of local corrections over complete rewrites, and the critical role of reinforcement learning in enhancing model capabilities [20]. - The TwiG framework is designed to be compatible with diffusion models, suggesting potential applications in more complex fields such as video generation and 3D modeling [21].
MiniMax海螺视频团队首次开源:Tokenizer也具备明确的Scaling Law
量子位· 2025-12-22 04:41
Core Viewpoint - The MiniMax Sea Cucumber video team has introduced a new scalable visual tokenizer pre-training framework (VTP) that addresses the limitations of traditional tokenizers in generating high-quality outputs from generative models, emphasizing the importance of understanding over mere pixel reconstruction [5][15][58]. Group 1: Traditional Tokenizer Limitations - Traditional tokenizers focus on pixel-level reconstruction, which does not necessarily translate to improved generation quality, leading to a saturation point where increased computational resources yield diminishing returns [4][15]. - The "pre-training scaling problem" indicates that better reconstruction accuracy can paradoxically lead to poorer generation performance, as traditional methods often overlook high-level semantic understanding [12][15]. Group 2: VTP's Approach and Innovations - VTP shifts the focus from pixel-level reconstruction to a more holistic understanding of visual semantics, integrating various representation learning methods to enhance the tokenizer's capabilities [26][30]. - The framework employs a multi-task loss function that combines understanding, reconstruction, and generation, allowing the tokenizer to produce semantically rich latent representations that improve downstream model performance [34][35]. Group 3: Empirical Findings and Performance Metrics - VTP demonstrates that injecting "understanding" into the tokenizer significantly enhances generation quality, with empirical evidence showing a positive correlation between understanding capabilities and generation performance [40][41]. - The VTP model achieved a zero-shot classification accuracy of 78.2% on ImageNet, surpassing the original CLIP's 75.5%, and exhibited superior reconstruction and generation capabilities compared to existing models [44]. Group 4: Scaling Law and Industry Implications - VTP reveals a scaling law for tokenizers, indicating that performance can improve with increased computational resources, data, and parameters, challenging the traditional view that scaling benefits only apply to main models [50][54]. - The findings suggest that investing in tokenizer development is crucial for enhancing overall generative system performance, positioning tokenizers as a core component worthy of long-term investment in the industry [58].
量子位编辑作者招聘
量子位· 2025-12-22 04:41
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 参与核心采访,对话产业专家、技术大牛、撰写AI云落地案例。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作, ...
天下苦SaaS已久,企业级AI得靠「结果」说话
量子位· 2025-12-22 04:41
Core Viewpoint - The article discusses the shift from traditional SaaS models to RaaS (Result as a Service) in the AI industry, highlighting the challenges and opportunities in deploying AI solutions for enterprises [2][35]. Group 1: Challenges in SaaS and AI Deployment - Service providers are struggling with high inference costs and inconsistent delivery quality, leading to a decline in the attractiveness of SaaS in the AI era [2][8]. - Traditional paths for deploying AI involve high upfront costs and significant trial-and-error expenses, which deter many potential customers from adopting AI solutions [11][15]. - The complexity of integrating new AI systems with existing infrastructure adds to the challenges faced by enterprises [12][17]. Group 2: Emergence of RaaS - RaaS is seen as a promising alternative to SaaS, focusing on paying for results rather than just tools, which aligns better with customer needs [39][40]. - The Results Cloud by BaiRongYunChuang offers a comprehensive solution that includes infrastructure, an operating system, and an application store, addressing the pain points of traditional AI deployment [16][34]. - RaaS encourages a collaborative relationship between service providers and clients, transforming the dynamic from a client-vendor relationship to a partnership [42][44]. Group 3: Results Cloud Architecture - The Results Cloud is structured in three layers: BaiJi (infrastructure), BaiGong (operating system), and BaiHui (application store), each serving a specific purpose in the AI deployment process [19][29]. - BaiJi provides a marketplace for AI infrastructure, offering pre-packaged models and computing power without exposing the underlying complexity to clients [20][21]. - BaiGong acts as a central hub that filters and optimizes the combination of models and computing resources, significantly reducing decision-making costs for clients [25][26]. Group 4: Performance Measurement and Compensation - The Results Cloud aligns the performance metrics of AI employees with human employees, allowing for a more straightforward evaluation of effectiveness [46]. - Compensation models for AI employees can include task-based pricing, value-sharing agreements, or fixed salaries, ensuring that clients only pay for actual results [48][49]. - This approach mitigates concerns about upfront costs, encouraging clients to trial AI solutions without financial risk [52]. Group 5: Ecosystem Development - BaiRongYunChuang emphasizes the importance of building an ecosystem for AI solutions, inviting third-party developers to contribute to the platform [57][59]. - The company aims to create a "Silicon-based Productivity Alliance" to foster collaboration and innovation in the AI space [59][60]. - By leveraging its established technology and client base, BaiRongYunChuang seeks to facilitate market opportunities for developers and enhance the overall AI ecosystem [62][63].
真正面向大模型的AI Infra,必须同时懂模型、系统、产业|商汤大装置宣善明@MEET2026
量子位· 2025-12-22 01:40
Core Viewpoints - The core strategy of the company is "1+X," where "1" represents core businesses including large devices, large models, and AI applications, while "X" encompasses innovative businesses such as smart driving, healthcare, and retail [6][10]. AI Infrastructure Development - The company emphasizes that AI infrastructure must not only address the availability of computing power but also ensure efficient, stable, and scalable support for models and industries [3][4]. - The total computing power of the company has reached 32,000 PetaFLOPS, showcasing its commitment to building a robust AI infrastructure [6][13]. Energy Efficiency and Carbon Reduction - The AI computing center has implemented a power consumption prediction system that can accurately forecast power needs within 15 minutes, achieving a 7% annual reduction in electricity costs and over 3,000 tons of annual carbon reduction [6][21]. - The center's Power Usage Effectiveness (PUE) has reached 1.267, with a 15% improvement in overall computing efficiency [21]. Collaboration and Resource Sharing - The company has launched the "SenseTime Computing Power Mall" in collaboration with over ten domestic manufacturers, allowing clients to freely combine and allocate diverse domestic computing resources and industry model services [6][22]. - The platform supports seamless implementation of algorithms across various chips, enhancing the overall capabilities of the PaaS platform [22]. Industry Applications and Partnerships - The company has established partnerships with top-tier research institutions and various industries, including internet technology, AIGC, and traditional sectors, providing comprehensive end-to-end solutions [25][26]. - Notable collaborations include working with major clients in traditional industries to develop industry-specific AI models, demonstrating the feasibility of AI applications even in complex traditional sectors [29][30].
AI体育教练来了!中国团队打造SportsGPT,完成从数值评估到专业指导的智能转身
量子位· 2025-12-22 01:40
Core Insights - The article discusses the current state of "intelligent" sports systems, highlighting that most remain at the "scoring + visualization" stage, lacking actionable insights for athletes and coaches [1] - It introduces the SportsGPT framework, which aims to provide a complete intelligent loop from "motion assessment" to "professional diagnosis" and "training prescription" [5][37] Group 1: Limitations of Current Models - General large models like GPT-5 struggle with specialized sports biomechanics analysis due to their lack of fine-grained visual perception, leading to generic and sometimes physically infeasible suggestions [3][9] - A comparative evaluation shows that SportsGPT outperforms other models in accuracy (3.80) and feasibility (3.77), indicating its unique advantages in generating precise, actionable training guidance [8][9] Group 2: Motion Analysis Techniques - MotionDTW is a two-stage time series alignment algorithm designed for sports motion analysis, addressing traditional DTW's limitations by constructing a high-dimensional feature space [10][21] - The algorithm employs a weighted multi-modal feature space to eliminate errors caused by athlete body differences and incorporates dynamic features like angular velocity to enhance motion phase representation [12][18] Group 3: Diagnostic Capabilities - KISMAM serves as a bridge between raw biomechanical data and interpretable diagnostics, establishing a quantitative benchmark based on data from 100 youth sprinters [25][26] - The model quantifies deviations from standard thresholds and constructs a high-dimensional mapping matrix to understand complex relationships between motion anomalies and technical issues [28][30] Group 4: Training Guidance - SportsRAG, built on a large external knowledge base, enhances the generation of training guidance by integrating domain knowledge with diagnostic results, ensuring actionable recommendations [33][34] - The absence of the RAG module significantly reduces the feasibility of the model's outputs, demonstrating its critical role in transforming diagnostic insights into professional training prescriptions [34] Group 5: Conclusion - The SportsGPT framework represents a significant advancement in intelligent sports training, moving from mere data presentation to providing executable, expert-level guidance [37] - It establishes a new standard in smart sports by effectively addressing the challenges of motion analysis, diagnosis, and training instruction [37]
火线解析MiniMax招股书!全球领先大模型成本只有OpenAI 1%,果然拳怕少壮
量子位· 2025-12-21 15:10
Core Viewpoint - MiniMax, a leading AI model unicorn, has successfully passed the Hong Kong Stock Exchange hearing, signaling its IPO ambitions amidst discussions about the bubble in large AI models like OpenAI [1][3]. Group 1: Company Overview - MiniMax has raised over $1.5 billion in funding within four years, attracting investments from notable firms such as MiHoYo, Alibaba, Tencent, and others [3][62]. - The company has a global presence, serving over 200 countries, with 70% of its revenue coming from international markets [6][42]. - MiniMax aims to achieve Artificial General Intelligence (AGI) and views scalability as a core driver towards this goal [8][7]. Group 2: Technological Advancements - MiniMax is one of the few companies that invested in multimodal model development from its inception [10]. - The company has released several models, including the M1 and M2 text models, with M2 achieving top rankings in performance and cost efficiency [16][17]. - MiniMax has also developed leading models in voice, music, and video, with its video model Hailuo ranking in the top tier of international tests [20][25][26]. Group 3: Financial Performance - MiniMax's revenue surged from $346,000 in 2023 to $30.52 million in 2024, marking a 782.2% increase [39]. - By the first nine months of 2025, revenue reached $53.44 million, significantly surpassing the previous year's total [40]. - The company has achieved a gross margin improvement from -24.7% in 2023 to 23.3% in the first nine months of 2025 [45][46]. Group 4: Operational Efficiency - MiniMax's R&D expenses have increased significantly, but the efficiency of these investments has improved, with training-related cloud computing costs as a percentage of revenue decreasing from over 1365% in 2023 to 266.5% in 2025 [52][54]. - The company has a cash reserve of $1.102 billion, sufficient to sustain operations for over 53 months without additional fundraising [58][59]. - MiniMax's team is young, with an average age of 29, and a high proportion of R&D personnel, which contributes to its innovative and efficient operational model [70][71].
摩尔线程的野心,不藏了
量子位· 2025-12-21 14:13
Core Viewpoint - The article highlights the significant advancements made by Moore Threads in the GPU sector, particularly through the launch of the MUSA architecture and its associated products, which aim to enhance the developer ecosystem and position domestic GPUs at a competitive level in the global market [1][4][19]. Group 1: MUSA Architecture and Innovations - MUSA stands for Meta-computing Unified System Architecture, representing a comprehensive framework that encompasses chip architecture, instruction sets, programming models, and software libraries [6][7]. - The latest GPU architecture, Huagang, boasts a 50% increase in density and a 10-fold improvement in efficiency, with three new chips focusing on AI training, graphics rendering, and intelligent SoC [8][10]. - The MUSA architecture has been iteratively developed over five years, culminating in the latest iteration that optimizes low-precision computing for AI applications [11][13]. Group 2: New Product Launches - Moore Threads introduced three new chips: Huashan, Lushan, and Yangtze, along with two hardware products, AIBOOK and AICube, and the KUAE 2.0 AI Foundry cluster [20][21]. - The Huashan chip targets AI training and high-performance computing, supporting full precision from FP4 to FP64 and significantly enhancing Transformer throughput [22][25][27]. - The Lushan chip focuses on graphics computing, achieving a 64-fold increase in AI performance and a 15-fold improvement in 3A game rendering performance [28][30][31]. - The Yangtze chip is designed for edge computing, providing 50 TOPS of heterogeneous AI computing power for various applications [32][34]. Group 3: Software Ecosystem and Developer Engagement - The MUSA software stack 5.0 was launched, offering a complete toolchain from compilers to AI frameworks, with plans to open-source key components to foster community engagement [15][16]. - Moore Threads aims to build a robust developer ecosystem through the establishment of the Moore Academy, targeting a community of 1 million developers by 2025 [59][61]. - The company emphasizes the importance of a comprehensive ecosystem that integrates software, hardware, and developer trust to create a sustainable competitive advantage in the GPU market [56][58].
AI生成操作系统新突破!上海交大提出文件系统开发新范式:从此只需写规约
量子位· 2025-12-21 14:13
非羊 整理自 凹非寺 量子位 | 公众号 QbitAI 还记得《流浪地球2》里的那台 550W量子计算机 吗? 电影里,MOSS最让人印象深刻的点,除了其强大算力,还有它可以根据需求,实时生成底层操作系统的能力。 如果现在告诉你,我们已经在从"人类需求"生成"底层系统"这件事上迈出了关键一步呢? 来自上海交大IPADS实验室的研究团队,面对自动生成操作系统核心组件的难题,做出了全新的尝试。这项研究成果也即将亮相文件系统与 存储领域顶级学术会议 USENIX FAST'26 。 操作系统:与时俱进的沉重负担 操作系统 (OS) ,是整个数字世界的基石。 向下,它要管理和调度硬件资源 (CPU、内存、硬盘等) ;向上,它要为应用软件提供稳定可靠的运行环境。无论是你手机上的App,还 是云端强大的AI模型,都构建在这块基石之上。 然而,OS必须与时俱进,来满足硬件和应用的双重需求: 一方面,硬件的发展日新月异,例如存储设备,在短短数年内,就从机械硬盘演进到闪存甚至非易失性内存,OS必须快速迭代,才能榨干 这些新硬件的性能; 另一方面,新应用也层出不穷,例如大数据分析、AI训练等,每一个新型应用的出现,都可能对OS的 ...
量子位编辑作者招聘
量子位· 2025-12-21 14:13
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰写独家原创内 ...