多模态模型

Search documents
GPU租赁价格调研
是说芯语· 2025-04-27 06:54
以下文章来源于傅里叶的猫 ,作者CC 傅里叶的猫 . 芯片EDA大厂资深工程师,曾在中科院造卫星,代码还在天上飞。 半导体高质量发展创新成果征集 文章内容来自国盛证券的研报,里面分析了目前GPU云的行业趋势、各个大厂的竞争格局以及目前的 GPU租赁市场行情。 行业趋势总览 当前AI与云计算产业的协同发展已形成紧密的飞轮效应,其核心逻辑在于技术迭代、应用扩展和算力 需求三者的正反馈循环。AI大模型能力的快速提升(如Qwen3、Llama4的多模态升级与逻辑推理优化) 正推动AI从辅助工具向核心生产力渗透,这一过程高度依赖云服务商在算力、存储和运维等底层能力 的持续升级。 以阿里云为例,其第九代ECS实例算力提升20%而价格下降5%,通过硬件性能优化和规模效应摊薄成 本,为企业降低AI开发门槛,进而刺激更多应用场景的落地,例如谷歌Gemini 2.5 Pro在复杂推理任务 中超越人类的表现,以及阿里Qwen2.5-Omni以轻量化模型实现手机端全模态交互,均显示出AI应用正 向企业级和消费级市场双向渗透。 与此同时,模型效率提升(如GPT-4o响应速度优化)虽降低单次推理的算力消耗,但用户规模与调用 频次的指数级 ...
GPU租赁价格调研
傅里叶的猫· 2025-04-26 11:15
Industry Trends Overview - The synergy between AI and cloud computing has created a tight feedback loop driven by technological iteration, application expansion, and computing power demand [3] - The rapid enhancement of AI large model capabilities is pushing AI from being an auxiliary tool to a core productivity driver, heavily relying on cloud service providers for continuous upgrades in computing power, storage, and operations [3] - For instance, Alibaba Cloud's ninth-generation ECS instance has seen a 20% increase in computing power while prices have decreased by 5%, lowering the AI development threshold for enterprises [3] Cloud Service Providers' Technological Upgrades and Competitive Landscape - Cloud service providers are engaged in intense competition centered around AI computing power demands, with leading firms building competitive advantages through differentiated technological paths [5] - Alibaba Cloud focuses on end-to-end optimization, achieving a 20% improvement in AI preprocessing efficiency and a 92% reduction in response time for its PAI platform [5][6] - Huawei Cloud emphasizes architectural innovation, with its CloudMatrix 384 super node achieving three times the GPU density of traditional servers, addressing enterprise needs for customized AI solutions [6] AI Model Progress and Multimodal Breakthroughs - The current phase of AI model iteration is driven by "multimodal + deep thinking," with significant breakthroughs transitioning from laboratories to commercial applications [7] - Upcoming releases like Qwen3 and Llama4 are expected to enhance logical reasoning and voice interaction capabilities, while Alibaba's Qwen2.5-Omni demonstrates end-to-end processing across four modalities [7][8] - The competition among AI models is intensifying, with Google’s Gemini 2.5 Pro showcasing its potential in complex reasoning tasks, while GPT-4o aims to improve image generation precision for enterprise needs [7] Computing Power Demand Surge and Price Transmission in the Industry Chain - The explosive growth of AI technology is leading to a significant surge in computing power demand, creating a structural shortage on the supply side [9] - For example, the price of H100 calls has jumped 22% within two weeks, reflecting the scarcity of computing resources [11] - In North America, IDC rents have increased by over 60% due to high demand and limited supply, while in China, the upgrade of AI-specific data centers has raised unit cabinet costs [15][16] Rise of Computing Power Leasing Models - The emergence of computing power leasing models is becoming a new variable to balance supply and demand contradictions, with companies like CoreWeave reducing marginal costs [17] - However, the sustainability of this business model depends on the downstream application side's ability to pay, as some startups face losses due to high inference costs [17] - Overall, the price transmission in the computing power industry chain is shifting from short-term spikes to long-term structural inflation, reinforcing the barriers for leading firms while posing risks for smaller players [17]
寒武纪和海光信息的更新
2025-04-16 15:46
寒武纪和海光信息的更新 20250116 摘要 • 英伟达因政策不确定性计提 55 亿美元费用,涉及约 55 万张 H20 芯片卡, 预计 2025 年上半年向中国交付 80 多万张,全年预计 120 万张。这或将 促使客户转向国产 AI 芯片,推动其量价齐升。 • 寒武纪和海光信息等国产 AI 芯片厂商在 2025 年初表现出增长确定性,寒 武纪计划在第三季度小批量供货公共安全领域专用推理芯片,海光信息也 有望推出新款高算力推理产品。 • 国内算力市场分层明显,国产卡种类日益丰富,能更精准匹配客户需求。 国产卡海外及国内供给能力超预期,寒武纪二季度预计出货量环比大幅提 升,既受益于需求,也得益于供给。 • 国产厂商目前主要绑定单一大客户,对需求端不利。寒武纪需拓展新客户, 预计 2025 年在运营商等新客户中实现增量,外部租赁 S 业务采购量下半 年或进一步提升。 • 大客户优先选择文字类模型,因早期推理卡显存带宽较低,更适合文字模 型推理。随着多模态模型发展,对图片和视频理解推理需求上升,国产卡 优势将更明显。 Q&A 英伟达最新的 H20 芯片许可政策对市场有何影响? 英伟达在提交给公开文件中提到,美国 ...
人形机器人专家交流
2025-09-26 02:28
Summary of Conference Call on Humanoid Robots Industry Overview - The discussion revolves around the humanoid robot industry, particularly focusing on advancements in technology and market dynamics in China and globally [1][2][3]. Key Points and Arguments 1. **Technological Advancements**: The introduction of models like DeepSeek has significantly improved the capabilities of humanoid robots, especially in natural language processing and understanding Chinese better than competitors like ChatGPT [3]. 2. **Market Interest**: There is a growing interest in humanoid robots among the general public, driven by advancements in AI and robotics, leading to a peak in domestic investment [1]. 3. **Application Scenarios**: The potential application scenarios for humanoid robots include industrial use, healthcare, and personal assistance, with a focus on identifying which applications can be implemented immediately versus those that will take longer [2][16]. 4. **Challenges in Mass Production**: The industry faces bottlenecks in large-scale production and deployment of humanoid robots, particularly in achieving cost-effectiveness and reliability [2][15][26]. 5. **Data Requirements**: The development of humanoid robots requires extensive datasets, including real-world data, synthetic data, and multimodal data to enhance their learning and operational capabilities [5][6][8]. 6. **Safety Standards**: The safety of humanoid robots is a critical concern, especially in human environments, necessitating the establishment of international safety standards [25][26]. 7. **Investment Return**: The discussion highlights the importance of achieving a favorable return on investment (ROI) for the deployment of humanoid robots, which is essential for widespread adoption [26][30]. Additional Important Insights 1. **Industry Competition**: The humanoid robot market is becoming increasingly competitive, with many companies developing similar technologies, leading to a homogenization of products [15]. 2. **Technical Pathways**: There are different technical pathways being explored by companies in the humanoid robot space, with some focusing on cost reduction while others emphasize advanced functionalities [9][10]. 3. **Future Projections**: The speaker predicts that humanoid robots will see significant breakthroughs in the next five years, particularly in educational and industrial applications, while widespread household adoption may take longer [22]. 4. **Material and Power Challenges**: The limitations in battery life and motor power density are significant hurdles that need to be addressed for practical applications of humanoid robots [24][40]. 5. **Human Skill Transfer**: The ability to transfer human skills to robots is seen as a crucial factor in enhancing their functionality and adaptability in various tasks [19]. This summary encapsulates the key discussions and insights from the conference call regarding the humanoid robot industry, highlighting both the opportunities and challenges faced by companies in this rapidly evolving field.
540亿商汤,甩出一张新牌
2 1 Shi Ji Jing Ji Bao Dao· 2025-04-15 02:35
一上台,商汤科技董事长兼CEO 徐立就感叹,"如果三个月不更新自己的认知,可能就会被淘汰。" 4月10日,商汤举办2025技术交流日,徐立正式发布全新升级的"日日新SenseNova V6"(以下简称"日日 新V6")大模型体系。 在徐立看来,多模态模型和通用人工智能的发展,画上约等号,以计算机视觉起家的商汤,从视觉能力 到原生多模态模型的布局,则是自然延伸。 商汤科技联合创始人兼大模型首席科学家林达华向《21CBR》记者表示,公司去年5、6月份就在做多模 态的探索,到了9、10月,技术路线基本跑通。 林达华称,之所以专注多模态推理,而非纯文本赛道的竞争,在于坚信未来的交互,必然是多模态的。 日日新V6,作为拥有超6000亿参数的MoE原生多模态通用大模型,凭借单一模型就可以完成文本、多 模态等各类任务。 其技术能力上的突破,重在四个方面: 长思维链:超过200B高质量多模态长思维链数据,最长64K思维链;数理能力:数据分析能力大幅领先 GPT-4o;推理能力:多模态深度推理国内第一,对标OpenAI o1;全局记忆:率先在国内突破长视频理 解,支持10分钟的视频理解及深度推理。 值得一提的是,长记忆。林达华 ...
“计算机视觉被GPT-4o终结了”(狗头)
量子位· 2025-03-29 07:46
Core Viewpoint - The article discusses the advancements in computer vision (CV) and image generation capabilities brought by the new GPT-4o model, highlighting its potential to disrupt existing tools and methodologies in the field [1][2]. Group 1: Technological Advancements - GPT-4o introduces native multimodal image generation, expanding the functionalities of AI tools beyond traditional applications [2][12]. - The image generation process in GPT-4o is based on a self-regressive model, differing from the diffusion model used in DALL·E, which allows for better adherence to instructions and enhanced image editing capabilities [15][19]. - Observations suggest that the image generation may involve a multi-scale self-regressive combination, where a rough image is generated first, followed by detail filling while the rough shape evolves [17][19]. Group 2: Industry Impact - The advancements in GPT-4o's capabilities have raised concerns among designers and computer vision researchers, indicating a significant shift in the competitive landscape of AI tools [6][10]. - OpenAI's approach of scaling foundational models to achieve these capabilities has surprised many in the industry, suggesting a new trend in AI development [12][19]. - The potential for GPT-4o to enhance applications in autonomous driving has been noted, with implications for future developments in this sector [10]. Group 3: Community Engagement - The article encourages community members to share their experiences and innovative uses of GPT-4o, fostering a collaborative environment for exploring AI applications [26].
32B本地部署!阿里开源最新多模态模型:主打视觉语言,数学推理也很强
量子位· 2025-03-25 00:59
Core Viewpoint - The article discusses the release of the Qwen2.5-VL-32B-Instruct model by Alibaba's Tongyi Qwen, highlighting its advancements in performance and capabilities compared to previous models and competitors. Group 1: Model Specifications - The Qwen2.5-VL family includes three sizes: 3B, 7B, and 72B, with the new 32B version balancing size and performance for local operation [2][3]. - The 32B version has undergone reinforcement learning optimization, achieving state-of-the-art (SOTA) performance in pure text capabilities, even surpassing the 72B model in several benchmarks [4]. Group 2: Performance Improvements - The Qwen2.5-VL-32B demonstrates enhanced mathematical reasoning abilities, image analysis, content recognition, and visual logic deduction, providing clearer and more human-like responses [5]. - For example, it can analyze a traffic sign image and accurately calculate travel time based on distance and speed, showcasing its reasoning process [5][6]. Group 3: Open Source and Community Engagement - The model has been open-sourced and is available for testing on platforms like Hugging Face, allowing users to experience its capabilities directly [14][15]. - The rapid community engagement is evident, with users already running the model in various forums and discussions, indicating a strong interest in its applications [16][17].
智谱上线GLM-4-Voice端到端情感语音模型:迈向AGI之路的最新一步
IPO早知道· 2024-10-26 02:12
发布即开源,这也是智谱首个开源的端到端多模态模型。 本文为IPO早知道原创 作者|C叔 微信公众号|ipozaozhidao 据IPO早知道消息,智谱于10月25日上线了GLM-4-Voice 端到端情感语音模型。 GLM-4-Voice 能够理解情感,有情绪表达、情感共鸣,可自助调节语速,支持多语言和方言,并 且延时更低、可随时打断。 作为端到端的语音模型,GLM-4-Voice 避免了传统的 "语音转文字再转语音" 级联方案过程中带 来的信息损失和误差积累,也拥有理论上更高的建模上限。 整体而言,GLM-4-Voice 具备以下特点:1、情感表达和情感共鸣:声音有不同的情感和细腻的变 化,如高兴、悲伤、生气、害怕等。2、调节语速:在同一轮对话中,可以要求 TA 快点说 or 慢点 说。3、随时打断,灵活输入指令:根据实时的用户指令,调整语音输出的内容和风格,支持更灵活 的对话互动。4、多语言、多方言支持:目前 GLM-4-Voice 支持中英文语音以及中国各地方言, 尤其擅长粤语、重庆话、北京话等。5、结合视频通话,能看也能说:即将上线视频通话功能,打造 在保持一贯 发布即上线 风格的同时 ,GLM-4- ...