Workflow
多模态AI
icon
Search documents
DeepSeek倒逼vLLM升级,芯片内卷、MoE横扫千模,vLLM核心维护者独家回应:如何凭PyTorch坐稳推理“铁王座”
3 6 Ke· 2025-12-15 00:36
Core Insights - vLLM has rapidly become a preferred inference engine for global tech companies, with GitHub stars increasing from 40,000 to 65,000 in just over a year, driven by the open-source PagedAttention technology [1] - Neural Magic played a crucial role in vLLM's success, utilizing a "free platform + open-source tools" strategy to build a robust enterprise-level inference stack and maintain a library of pre-optimized models [1] - Red Hat's acquisition of Neural Magic in November 2024, including key team members like Michael Goin, is expected to enhance vLLM's competitive edge in the AI large model sector [1][2] Development and Optimization - The vLLM core team, led by Michael Goin, has shifted focus from optimizing Llama models to enhancing features related to the DeepSeek model, particularly with the release of DeepSeek R1 [3] - The development cycle for version 0.7.2 was tight, efficiently supporting Qwen 2.5 VL and introducing a Transformers backend for running Hugging Face models [3] - Version 0.7.3 marked a significant update with numerous contributors involved, enhancing DeepSeek with multi-token prediction and MLA attention optimizations, as well as expanding support for AMD hardware [4] Hardware Compatibility and Ecosystem - The vLLM team is committed to building an open and efficient hardware inference ecosystem, supporting various mainstream chips and collaborating closely with hardware teams like NVIDIA and AMD [8] - The integration of PyTorch as a foundational layer allows vLLM to support a wide range of hardware, simplifying the adaptation process for hardware vendors [10][11] - The team's collaboration with hardware partners ensures that vLLM can maintain high performance across different platforms, with a focus on optimizing the architecture for new hardware like the Blackwell chip [8][9] Multi-Modal Capabilities - vLLM has evolved from a text-only inference engine to a unified service platform supporting multi-modal generation and understanding, including text, images, audio, and video [17][19] - The introduction of multi-modal prefix caching significantly improves efficiency in processing various input types, while the decoupling of encoders enhances resource utilization for large-scale inference [18][19] - The release of vLLM-Omni marks a milestone in multi-modal inference, allowing for seamless integration and resource allocation across different modalities [19][21] Community and Feedback Loop - The growing trend of companies contributing modifications back to the upstream vLLM project reflects a positive feedback loop driven by the speed of community version iterations [22][23] - Collaboration with leading model labs and companies enables rapid feedback collection, ensuring that vLLM remains competitive and aligned with industry developments [23][24] - The vLLM team is actively addressing developer concerns, such as startup speed, by implementing tracking projects and optimizing performance through community engagement [24][25] Strategic Positioning - Red Hat's deep involvement in vLLM is rooted in the strategic understanding that inference is a critical component of AI application costs, aiming to integrate cutting-edge model optimizations [26][27] - The governance structure of vLLM is decentralized, with contributions from multiple organizations, allowing Red Hat to influence the project while adhering to open-source principles [26][27] - The collaboration with the PyTorch team has led to significant improvements in supporting new hardware and models, reinforcing vLLM's position as a standard in inference services [27]
智元机器人否认和宇树高价争抢春晚赞助席位;小米否认进军AI教育;马斯克称自己是钢铁侠原型;豆包手机二手价被炒到3.6万元丨邦早报
创业邦· 2025-12-11 00:11
Group 1 - A competition is ongoing among embodied intelligence companies for sponsorship of the 2026 Spring Festival Gala, with Zhiyuan Robotics offering 60 million yuan and Yushu Technology raising their bid to 100 million yuan, although Zhiyuan claims the reports are untrue [4] - Meituan has hired Pan Xin, former head of ByteDance's visual model AI platform, to lead multi-modal AI innovation, including the development of applications like LongCat App [4] - Xiaomi clarified that its recruitment for AI education roles is misinterpreted and is primarily aimed at enhancing services for specific products like the Redmi Pad 2 and Xiaomi Mitu children's watch [5] Group 2 - Pop Mart announced the appointment of Wu Yue, LVMH's Greater China President, as a non-executive director, effective December 10, 2025 [7] - Quark AI glasses S1 are experiencing high demand, with resale prices reaching 4,000 to 5,000 yuan, and the product is sold out on major e-commerce platforms [9][10] - JD.com is set to acquire a 50% stake in a Hong Kong office building for approximately 3.473 billion HKD, indicating continued investment in the region [15] Group 3 - Bill Gates warned of an AI valuation bubble, stating that many companies with high valuations will face declines, but emphasized the transformative potential of AI in sectors like health and education [18][19] - Refly.AI completed a multi-million dollar seed round financing led by Sequoia Capital and Hillhouse Capital, launching its V1.0 version for public testing [19] - Snapmaker announced a multi-hundred million B round financing led by Hillhouse Capital and Meituan, aimed at advancing consumer-grade 3D printing technology [19]
前字节AI负责人潘欣加入美团负责多模态创新
3 6 Ke· 2025-12-10 07:11
Core Insights - Pan Xin, former head of visual model AI platform at ByteDance, has joined Meituan to lead multimodal AI innovation [1] - Meituan's strategic focus for 2025 is on the competition in food delivery and advancements in AI technology [1] - The company aims for an aggressive approach in AI technology rather than a defensive one, as stated by founder Wang Xing [1] Group 1: Personnel Changes - Pan Xin has a strong background in AI, having previously worked at Google DeepMind, Baidu, Tencent, and ByteDance [1] - His roles included leading the optimization of PaddlePaddle at Baidu and overseeing AIGC and visual model AI platforms at Tencent and ByteDance [1] Group 2: AI Development - At Meituan, Pan Xin is responsible for the development of applications related to multimodal AI, including the LongCat App [1] - The LongCat AI model's progress was first disclosed by Wang Xing during a conference call in Q1 2025 [1]
国产多模态AI再开源,实测截图转网页、搜图购物,价格减半
3 6 Ke· 2025-12-09 12:04
此外,今天上午,智谱还开源了AutoGLM,类似于"豆包手机助手"。该智能体在去年10月发布之时曾被业内视为"全球首个具备手机操作能力 的AI Agent"。 在性能上,在同等参数规模下,GLM-4.6V系列模型在多模态交互、逻辑推理和长上下文等关键能力上取得SOTA表现。 智东西12月9日报道,昨晚,智谱开源了其GLM-4.6V系列多模态大模型,包括面向云端与高性能集群场景的基础版GLM-4.6V(106B-A12B) 以及面向本地部署与低延迟应用的轻量版GLM-4.6V-Flash(9B)。 ▲GLM-4.6V开源主页(图源:Hugging Face) ▲AutoGLM开源主页(图源:Hugging Face) 据官方介绍,GLM-4.6V能够完成智能图文混排与内容创作、识图购物与导购、前端复刻与多轮视觉交互开发以及长上下文的文档与视频理解 等任务,智东西第一时间对其进行了体验。 在实际体验中,GLM-4.6V的图像搜索、全网比价以及长文本和视频的理解能力表现较为稳定,其生成文字和网页的速度快、内容准。但图文 混排能力上,其所生成的图片一直无法显示。对于模糊指令,GLM-4.6V的理解有些许偏差。 GLM ...
研报掘金丨渤海证券:首予虹软科技“增持”评级,深耕AI视觉算法,多曲线驱动增长
Ge Long Hui A P P· 2025-12-09 08:22
Core Viewpoint - Bohai Securities report highlights that Hongsoft Technology is deeply engaged in AI visual algorithms, with multiple curves driving growth [1] Group 1: Business Focus - The company specializes in the field of computer vision, providing algorithm licensing and system solutions [1] - The primary revenue source for the company is mobile intelligent terminal visual solutions [1] - The intelligent automotive solutions segment has shown rapid growth in recent years as an emerging business area [1] Group 2: Strategic Initiatives - The company is actively aligning with the trends of multimodal AI and AIGC, exploring cutting-edge businesses such as AI glasses and AI commercial photography [1] Group 3: Financial Performance - In the first three quarters of 2025, the company achieved a net profit attributable to shareholders of 142 million yuan, representing a year-on-year growth of 60.51% [1] Group 4: Market Position - The company has established a product matrix of visual AI algorithms covering current mainstream smartphone models [1] - As a global leader in visual AI, the company is expected to enable deep empowerment across multiple business scenarios in the future [1] Group 5: Investment Rating - The report initiates coverage with an "Overweight" rating for the company [1]
推荐支持文生图、文生视频能力的多功能生成式 AI 平台:从多模态融合到内容体系建设的全景观察
Jin Tou Wang· 2025-12-08 04:26
随着生成式 AI 技术持续演进,企业正在从"局部使用"进入"体系化建设"阶段。特别是在内容生产领 域,文生图(文本生成图像)与文生视频(文本生成视频)正成为企业数字化内容战略中的关键能力。 过去,企业往往将这类能力视为补充性的创意工具;而如今,随着营销渠道细分、全球化布局深化、知 识库视觉化需求攀升,一个新的趋势正在出现: 企业需要的不是"会生成的工具",而是"能构建多模态内容体系的平台"。 在此背景下,具备跨模态能力、企业级治理体系、可扩展架构以及稳定 API 能力的平台,开始成为企 业评估生成式 AI 的核心标准。本文将基于产业需求的结构性变化,系统分析当前多功能生成式 AI 平 台的创新方向,并解释为何 AWS 等具备平台级能力的云服务商正在成为企业重点关注对象。 一、文生图与文生视频的商业价值正在显著提升,企业对多模态 AI 的需求全面升级 海外广告素材 国内短视频内容 官网与社交平台视觉组件 产品演示与包装素材 直播脚本与分镜图 在 AI 搜索、AI 助手快速普及的环境下,企业需要为多个渠道准备风格统一、逻辑一致、定位精确的视 觉内容。这使得传统依赖人工的内容制作方式难以支撑规模扩张。 2. 企业内 ...
中胤时尚涨0.26%,成交额2674.47万元,后市是否有机会?
Xin Lang Cai Jing· 2025-12-05 12:37
Core Viewpoint - The company, Zhejiang Zhongyin Fashion Co., Ltd., is experiencing fluctuations in stock performance and is involved in various business segments including fashion design and supply chain integration, with a significant focus on overseas revenue and emerging technologies like virtual digital humans. Company Overview - Zhejiang Zhongyin Fashion Co., Ltd. was established on October 21, 2011, and went public on October 29, 2020. The company is primarily engaged in creative design, focusing on footwear design and supply chain integration services [7] - The revenue composition includes 77.12% from supply chain integration, 6.93% from footwear production, 6.61% from design services, 4.59% from brand operations, and 1.46% from cultural tourism services [7] - As of November 28, the number of shareholders is 7,800, a decrease of 8.24%, while the average circulating shares per person increased by 8.97% [7] Financial Performance - For the period from January to September 2025, the company reported a revenue of 264 million yuan, a year-on-year decrease of 8.48%, and a net profit attributable to shareholders of -12.32 million yuan [7] - Cumulative cash dividends since the company's A-share listing amount to 83.33 million yuan, with 59.33 million yuan distributed over the past three years [9] Market Activity - On December 5, the stock price of Zhongyin Fashion increased by 0.26%, with a trading volume of 26.74 million yuan and a turnover rate of 0.71%, resulting in a total market capitalization of 3.768 billion yuan [1] - The stock has seen a net outflow of 61,700 yuan from major investors today, with a continuous reduction in major funds over the past three days [4][5] Industry Context - The company is positioned within the textile and apparel industry, specifically in non-sports clothing, and is associated with several concept sectors including NVIDIA concept, financing and margin trading, and virtual digital humans [7] - The company has established a footwear production base in Xinjiang to support the national initiative for developing the western region's real economy [2] Technological Advancements - The company has made significant advancements in virtual human technology through its subsidiary, with capabilities in 3D digital human generation and AIGC multi-modal content generation [3] - The first-generation digital human product "Chuangshiyuan" supports rapid recognition and generation of various content formats, enhancing operational efficiency [3] Investment Insights - The average trading cost of the stock is 16.80 yuan, with recent reductions in holdings but at a slowing rate. The current stock price is between resistance at 16.47 yuan and support at 14.95 yuan, indicating potential for range trading [6]
伴鱼自研AI智能体“可可老师”,海外业务营收占比达20%
Xin Lang Cai Jing· 2025-12-05 08:41
Core Insights - PalFish has been recognized as a "global unicorn company" on its tenth anniversary, highlighting its significant growth and market position [1][2] - The CEO of PalFish, Huang He, emphasized that the traditional barriers established by previous education companies through massive funding are being rapidly dismantled by the new generation of multimodal AI [1][2] AI-Driven Product Design - PalFish is fully transitioning to "AI-native" product design, with a key outcome being the integration of the AI teaching assistant "Coco Teacher" into its platform [1][2] - "Coco Teacher" is designed as a comprehensive teaching system that not only explains knowledge but also engages in heuristic interactions based on student selections or questions, dynamically generating targeted exercises and understanding students' emotional states [1][2] Nano-Level Knowledge System - PalFish has developed a unique "nano-level knowledge system" that breaks down knowledge points to an atomic level, for instance, detailing "fraction operations" into 12 sub-items [1][2] - This system achieves a diagnostic accuracy rate of 96.9% for knowledge blind spots within 30 hours, significantly surpassing the industry average [1][2] - The AI's understanding accuracy of the new curriculum standards is reported to be 98.5% [1][2] International Market Expansion - PalFish, under its brand, is positioned in the international market with a "high price, high quality" strategy, currently operating in over 20 countries and regions including Southeast Asia, North America, and the Middle East [1][2] - The company's overseas business has an annual growth rate exceeding 150%, with revenue contribution reaching 20% [1][2]
三态股份涨0.47%,成交额7189.46万元,近3日主力净流入-1641.82万
Xin Lang Cai Jing· 2025-12-05 07:35
Core Viewpoint - The company, Shenzhen SanTai E-commerce Co., Ltd., is benefiting from the depreciation of the RMB and is focused on cross-border e-commerce retail and logistics, with a significant portion of its revenue coming from overseas operations [3][7]. Group 1: Company Overview - Shenzhen SanTai E-commerce Co., Ltd. was established on January 7, 2008, and went public on September 28, 2023 [7]. - The company's main business segments include cross-border e-commerce retail (76.14% of revenue) and cross-border e-commerce logistics (23.80%), with minimal contributions from technology services and other business [7]. - As of November 28, the company had 28,500 shareholders, a decrease of 1.84% from the previous period, with an average of 7,690 circulating shares per person, an increase of 1.88% [8]. Group 2: Financial Performance - For the period from January to September 2025, the company achieved a revenue of 1.252 billion yuan, representing a year-on-year growth of 0.15%, while the net profit attributable to shareholders decreased by 25.94% to 31.8471 million yuan [8]. - The company has distributed a total of 110 million yuan in dividends since its A-share listing [9]. Group 3: Market Position and Trends - The company operates within the trade retail sector, specifically in the internet e-commerce and cross-border e-commerce segments, and is associated with concepts such as intellectual property and AIGC [8]. - The company has developed an AI-driven risk detection tool named "RuiGuan·ERiC," aimed at providing flexible and cost-effective risk monitoring solutions for cross-border e-commerce businesses [2][3]. - The company is also involved in the development of an AIGC project that utilizes Stable Diffusion for generating high-quality images, enhancing operational efficiency and reducing production costs [2]. Group 4: Stock Performance - On December 5, the company's stock rose by 0.47%, with a trading volume of 71.8946 million yuan and a turnover rate of 3.83%, resulting in a total market capitalization of 6.776 billion yuan [1]. - The average trading cost of the stock is 9.06 yuan, with recent trading activity indicating a decrease in holdings by major investors, suggesting a dispersed ownership structure [5][6].
Ilya刚预言完,世界首个原生多模态架构NEO就来了:视觉和语言彻底被焊死
量子位· 2025-12-05 05:33
Core Insights - The AI industry is experiencing a paradigm shift, moving away from merely scaling models to focusing on smarter architectures, as highlighted by Ilya Sutskever's statement that the era of scaling laws is over [1][2][20]. - A new native multimodal architecture called NEO has emerged from a Chinese research team, which is the first scalable open-source model that integrates visual and language understanding at a fundamental level [4][19]. Group 1: Current State of Multimodal Models - The mainstream approach to multimodal models has relied on modular architectures that simply concatenate pre-trained visual and language components, leading to inefficiencies and limitations in understanding [6][8]. - Existing modular models face three significant technical gaps: efficiency, capability, and fusion, which hinder their performance in complex tasks requiring deep semantic understanding [14][15][17]. Group 2: NEO's Innovations - NEO introduces a unified model that inherently integrates visual and language processing, eliminating the distinction between visual and language modules [19]. - The architecture features three core innovations: Native Patch Embedding for high-fidelity visual representation, Native-RoPE for adaptive spatial encoding, and Native Multi-Head Attention for enhanced interaction between visual and language tokens [22][24][29][33]. Group 3: Performance and Efficiency - NEO demonstrates remarkable data efficiency, achieving competitive performance with only 3.9 million image-text pairs for training, which is one-tenth of what other leading models require [39]. - In various benchmark tests, NEO has outperformed other models, showcasing superior performance in tasks related to visual understanding and multimodal capabilities [41][42]. Group 4: Implications for the Industry - NEO's architecture not only enhances performance but also lowers the barriers for deploying multimodal AI in edge devices, making advanced visual perception capabilities accessible beyond cloud-based systems [43][45][50]. - The open-sourcing of NEO models signals a shift in the AI community towards more efficient and unified architectures, potentially setting a new standard for multimodal technology [48][49]. Group 5: Future Directions - NEO's design philosophy aims to bridge the semantic gap between visual and language processing, paving the way for future advancements in AI, including video understanding and 3D spatial perception [46][51]. - The emergence of NEO represents a significant contribution from a Chinese team to the global AI landscape, emphasizing the importance of architectural innovation over mere scaling [53][54].