推理加速 - filings, earnings calls, financial reports, news

推理加速

Search documents

Founder Park· 2025-06-10 12:59

Core Viewpoint - The article discusses the journey of Cheng Zeyi, who transitioned from working in a large tech company to founding WaveSpeedAI, a startup focused on AI infrastructure, emphasizing the importance of inference acceleration in the AI industry and the potential for significant market growth in AI video generation [4][39]. Group 1: Background and Motivation - Cheng Zeyi initially did not plan to start a business but felt constrained in a large company environment after rapid promotions [1][6]. - Technical professionals often seek to prove their value and find better opportunities to utilize their skills [2][3]. - After leaving a large company, Cheng Zeyi validated his skills by creating a new model that gained significant attention on GitHub, leading him to realize the market demand for his expertise [8][11]. Group 2: Company Formation and Strategy - WaveSpeedAI was founded to provide inference acceleration for image and video generation, with early revenue growth indicating strong market demand [4][26]. - The company adopted a unique approach of prioritizing revenue generation before expansion, focusing on a lean team structure to maintain agility and responsiveness [27][30]. - Cheng Zeyi emphasized the importance of infrastructure in AI, likening it to a vehicle's transmission system that affects performance and user experience [15][20]. Group 3: Market Insights and Opportunities - The AI video generation market is projected to grow significantly, with a compound annual growth rate that could lead to billions in revenue by 2030 [42]. - Current high costs of AI video generation limit widespread adoption, creating a demand for more cost-effective solutions [43][44]. - WaveSpeedAI aims to reduce costs to one-fifth of existing platforms while maintaining high quality and low latency, addressing a critical need in the market [46]. Group 4: Collaborative Ecosystem and Future Plans - The company collaborates with various partners to enhance its service offerings and expand its market reach, focusing on creating a symbiotic ecosystem rather than competing directly with larger firms [32][48]. - WaveSpeedAI is committed to empowering global creators by providing resources and support for developers, aiming to foster innovation in the AI space [55][56]. - The company aspires to be a model for Chinese AI enterprises in global markets, encouraging confidence and ambition among local entrepreneurs [57][58].

ICLR 2025｜首个动态视觉-文本稀疏化框架来了，计算开销直降50%-75%

机器之心· 2025-04-29 03:22

本文由华东师范大学和小红书联合完成，共同第一作者是华东师范大学在读硕士、小红书 NLP 团队实习生黄文轩和翟子杰，通讯作者是小红书 NLP 团队负责人曹绍升，以及华东师范大学林绍辉研究员。多模态大模型（MLLMs）在视觉理解与推理等领域取得了显著成就。然而，随着解码（decoding）阶段不断生成新的 token，推理过程的计算复杂度和 GPU 显存占用逐渐增加，这导致了多模态大模型推理效率的降低。现有的方法通过减少预填充（prefill）阶段的视觉 token 冗余来实现推理加速。遗憾的是，这种在预填充阶段实现的视觉 token 稀疏化所带来的加速优势，在解码阶段会逐渐减弱。当解码输出的文本 token 数量增多时，这些方法仍然会遇到性能瓶颈。为了解决上述问题，团队创新性地提出了一个全新的动态视觉 - 文本上下文稀疏化推理加速框架 ——Dynamic-LLaVA。该框架针对多模态大模型在不同推理模式下（包括预填充阶段以及有无 KV Cache 的解码阶段），设计了定制化的稀疏化推理方案，以实现多模态大模型的高效推理。实验结果表明，Dynamic-LLaVA 在几乎不损失视觉理解和生成能力的前提 ...