多模态大模型
Search documents
星宸科技(301536) - 301536星宸科技投资者关系管理信息20250430
2025-04-30 00:02
Group 1: Business Performance - In Q1 2025, all business lines achieved over 20% year-on-year growth, with significant contributions from smart IoT and automotive sectors [2] - For the full year 2024, the company reported a net profit of approximately CNY 256 million, a year-on-year increase of about 25.18% [3] - Q1 2025 net profit was approximately CNY 51.18 million, reflecting a year-on-year growth of about 0.48% [3] Group 2: Product Development and Market Strategy - The company has launched the SSC309QL SoC chip for AI glasses, with expected shipments in the second half of 2025 [3] - In the humanoid robot sector, the company achieved over threefold growth in both shipment volume and revenue in 2024 compared to 2023 [4] - The company is focusing on advanced IP technologies, including high-performance chips for various applications such as smart robots and smart glasses [5] Group 3: Research and Development Investment - In 2024, R&D investment was approximately CNY 602 million, a year-on-year increase of about 21.95%, with an R&D investment rate of approximately 25.59% [6] - Q1 2025 R&D investment was about CNY 168 million, reflecting a year-on-year increase of approximately 19.8% [6] Group 4: Market Position and Future Outlook - The company has established a global sales strategy, with over half of sales coming from overseas markets [7] - The company aims to become a leading SoC chip supplier in the smart robot industry within the next two to three years [5] - The future development of AI SoC chips is expected to focus on higher efficiency, lower power consumption, and smaller sizes to meet growing smart device demands [11]
ICLR 2025|首个动态视觉-文本稀疏化框架来了,计算开销直降50%-75%
机器之心· 2025-04-29 03:22
本文由华东师范大学和小红书联合完成,共同第一作者是华东师范大学在读硕士、小红书 NLP 团队实习生黄文轩和翟子杰,通讯作者是小红书 NLP 团队负责人 曹绍升,以及华东师范大学林绍辉研究员。 多模态大模型(MLLMs)在视觉理解与推理等领域取得了显著成就。然而,随着解码(decoding)阶段不断生成新的 token,推理过程的计算复杂度和 GPU 显存 占用逐渐增加,这导致了多模态大模型推理效率的降低。现有的方法通过减少预填充(prefill)阶段的视觉 token 冗余来实现推理加速。遗憾的是,这种在预填充 阶段实现的视觉 token 稀疏化所带来的加速优势,在解码阶段会逐渐减弱。当解码输出的文本 token 数量增多时,这些方法仍然会遇到性能瓶颈。 为了解决上述问题,团队创新性地提出了一个全新的动态视觉 - 文本上下文稀疏化推理加速框架 ——Dynamic-LLaVA。该框架针对多模态大模型在不同推理模式 下(包括预填充阶段以及有无 KV Cache 的解码阶段),设计了定制化的稀疏化推理方案,以实现多模态大模型的高效推理。实验结果表明,Dynamic-LLaVA 在 几乎不损失视觉理解和生成能力的前提 ...
Gemini-2.0夺冠!全球首个几何推理专项评测出炉,淘天集团出品
量子位· 2025-04-28 03:43
人类在解答几何问题时,首先会识别所需的几何原理并通过灵活应用它们来推导出答案。 然而,目前的评测方法多集中于最终答案的正确性或简单地对每个推理步骤进行打分,而 忽视了推理过程中的关键因素:几何原理的识别和 应用 。 尽管有研究发现模型的对几何图的感知能力不足限制了其后续推理,但实验发现,几何原理与图像中几何元素的正确对应及应用,是多模态大 模型推理的另一大瓶颈。 为填补这一空白,GeoSense出现了,为在复杂视觉场景中的推理能力提升提供了新的方向。 GeoSense团队 投稿 量子位 | 公众号 QbitAI 多模态大模型几何解题哪家强? 首个从几何原理视角出发,全面评估多模态大模型几何解题能力的双语综合基准来了! GeoSense ,系统评测多模态大模型在几何原理识别和应用中的表现,评测基准的数据和评测代码均已开源。 其背后团队来自 淘天集团算法技术-未来生活实验室团队。 5层知识架构+1789道几何问题 GeoSense旨在系统评估多模态大模型识别和应用几何原理来解决几何问题的能力。 该基准建立了包含定义、定理和公式等几何原理的5层知识架构,覆盖平面几何和立体几何,支持中英双语;精心构建并人工详细标注了 ...
李彦宏:DeepSeek不是万能,最大问题是慢和贵,大多数大模型速度比DeepSeek满血版更快,价格更低【附多模态大模型行业市场分析】
Sou Hu Cai Jing· 2025-04-27 06:28
Core Insights - Baidu's founder, Li Yanhong, emphasized that DeepSeek is not a panacea and highlighted its current limitations, particularly in processing multimedia content [2][3] - DeepSeek achieved significant success by becoming the fastest application to surpass 30 million daily active users and topping the App Store charts in multiple regions, including the US [2] - The AI industry is witnessing a trend towards multimodal models, which are expected to become standard in future foundational models [6] Industry Overview - The training costs for mainstream large models in China typically range from tens of millions to hundreds of millions of dollars, with major players like Baidu, Alibaba, and Tencent investing over $200 million [4] - Startups like Kimi and DeepSeek have managed to reduce training costs to between $30 million and $60 million through technological optimizations [4] - Revenue from multimodal large models in China is concentrated among leading companies, with Alibaba Cloud generating over 110 billion yuan, accounting for about 15% of its group revenue [5] Application Insights - Li Yanhong stressed the importance of applications over models and chips, asserting that the true value lies in the applications that utilize these technologies [6] - Despite DeepSeek's shortcomings, the focus remains on finding the right scenarios and models to create lasting applications [6]
技术突破引领产业升级 格灵深瞳多模态大模型+AI PC国产化双轮驱动
Cai Jing Wang· 2025-04-25 14:50
Group 1 - The company reported a research and development investment of 188.97 million yuan, a year-on-year increase of 3%, and added 90 new patents, strengthening its technological moat [1] - The company launched a "multimodal large model technology and application R&D project" to prepare for future product development and implementation [1] - The self-developed visual large model Unicom outperformed OpenAI's CLIP and Google's SigLIP in academic evaluations, with results published at the ECCV 2024 conference [1] Group 2 - The company has established six key technology directions in the field of computer vision, including multimodal large model technology and 3D stereo vision technology [2] - The company is integrating AI technology with end products, developing new generation intelligent hardware such as AI PCs and unmanned computing equipment [2] - The company plans to increase investment in AI technology R&D, focusing on vertical fields to develop controllable multimodal large models and complex AIGC systems [2]
王晓刚:物理世界模型用于驾驶辅助训练很重要
Xin Lang Cai Jing· 2025-04-24 09:04
Core Insights - The Shanghai Auto Show, held on April 23, focuses on innovation and the future of the automotive industry, showcasing traditional fuel vehicles, new energy vehicles, smart driving, and supply chain technologies [1] - The event highlights the rapid advancement of technologies such as high-level intelligent driving, AI models, and multi-modal perception, with many new technologies and products set to be unveiled [1] Group 1: Industry Trends - The ongoing price war in the automotive sector has extended to supply chain companies, prompting a need for balance between pricing and cost management [3] - The consensus among industry leaders is shifting towards platformization in sensor design, which reduces the need for repetitive development and adaptation for specific vehicle models [4] Group 2: Technological Innovations - The development of generative intelligent driving is seen as a significant opportunity for the industry, addressing limitations of current end-to-end models that require vast amounts of high-quality data [5] - The concept of a "world model" is introduced, allowing for the reconstruction of physical driving scenarios to enhance model training through simulation and reinforcement learning [5][6] - Multi-modal large models are transforming user interaction within smart cabins, enabling more complex and engaging conversations rather than simple one-on-one interactions [6][10] Group 3: Data Utilization - It is noted that 99% of real user data may not be useful for training models, as most driving scenarios involve minimal information gain [7] - The importance of high-quality data is emphasized, with a focus on capturing complex driving behaviors in challenging scenarios [7][8] Group 4: Future Developments - The emergence of proactive interaction capabilities in smart cabins is anticipated to significantly enhance user experience, allowing for multi-party conversations and engagement [10][12] - The integration of AI with hardware is viewed as a trend that could lower costs and improve the overall ecosystem, with a focus on creating a robust software environment [13]
研判2025!中国音频行业产业链、市场规模及重点企业分析:AI技术引领音频行业变革,多模态大模型与生成式AI重塑内容创作[图]
Chan Ye Xin Xi Wang· 2025-04-23 01:36
Industry Overview - The audio industry is experiencing significant growth driven by technological advancements, particularly in AI, which enhances content creation and consumption [1][10] - The market size of China's audio industry is projected to reach 28.7 billion yuan in 2024, representing a year-on-year growth of 14.80% [10] Industry Development History - The audio industry in China has evolved through four main phases: 1. The nascent phase (1996-2005) began with the first online broadcasting platform in China [4] 2. The exploratory phase (2006-2015) saw the emergence of various audio platforms and regulatory frameworks [5] 3. The expansion phase (2016-2019) marked the introduction of live audio streaming features by major platforms [6] 4. The maturity phase (2020-present) is characterized by the listing of major companies and the integration of AI technologies [6] Industry Value Chain - The audio industry value chain consists of upstream content creation, materials, and components; midstream audio platforms; and downstream listening channels such as smartphones and smart speakers [8] Market Size - The application of AI and multimodal large models is transforming audio content creation, enhancing user experience and personalization [10] Key Companies - Major listed companies in the audio sector include Tencent Music, NetEase Cloud Music, and others, with various enterprises involved in digital publishing and audio technology [2] Industry Development Trends 1. **Intelligent and Integrated Solutions**: The future of audio will focus on smart and integrated solutions, leveraging AI for automatic processing and system integration [21] 2. **Immersive Audio Experiences**: Technologies like 3D and spatial audio will enhance user experiences in gaming, film, and entertainment [22][23] 3. **Environmental Sustainability**: The industry will increasingly prioritize eco-friendly materials and energy-efficient technologies in product design [24]
阶跃星辰多模态大模型为OPPO新机提供技术支持
news flash· 2025-04-22 08:05
Core Viewpoint - The article highlights the collaboration between OPPO and Jieyue Xingchen, which provides technological support for OPPO's flagship smartphone, Find X8 Ultra, featuring the "One-Click Flash Memory" function that intelligently recognizes screen content and organizes information into memory collections [1] Group 1 - OPPO's Find X8 Ultra is the first to launch the "One-Click Flash Memory" feature, which utilizes Jieyue Xingchen's multimodal model for content recognition and summarization [1] - The "One-Click Flash Memory" function categorizes fragmented information into different memory collections stored in the "Xiao Bu Memory" application [1] - Jieyue Xingchen has previously assisted OPPO in developing two other AI features: "One-Click Universal Search" and "One-Click Screen Inquiry" [1]
蚂蚁集团副总裁、前基础大模型负责人徐鹏离职
证券时报· 2025-04-14 11:01
Core Insights - Ant Group's Vice President Xu Peng, previously in charge of foundational large models, has recently left the company [1][2] - Xu Peng was responsible for the AI innovation and application department NextEvo, which handled all core technology research and development for Ant's AI initiatives, including the Ant Beiling large model [2] - NextEvo has published over 30 papers in important international journals and conferences in the AI field in 2023 [2] Company Developments - Ant Group has undergone a significant organizational restructuring, establishing two core business groups: Digital Payment Business Group and Alipay Business Group, with a rotating president system to enhance strategic implementation [3] - The rotating president system will have a six-month term, with the first term starting from the announcement until June 30, 2025 [3] - A major personnel change was announced, with President Han Xinyi set to officially take over as CEO on March 1, 2025, overseeing business operations and daily management [3] AI Initiatives - Ant Group is also developing a multimodal large model, which can process and understand various types of information simultaneously, indicating broader application prospects [2] - The company has open-sourced significant AI tools, including the DLRover distributed deep learning system and the GLake project for GPU memory and transmission optimization, filling a gap in domestic AI vertical technology [2]
2025年大模型研究系列:多模态大模型洞察:大模型向多模态发展,深入产业端垂直场景释放技术价值
Tou Bao Yan Jiu Yuan· 2025-04-09 13:52
Market Overview - The Chinese multimodal large model market reached CNY 9.09 billion in 2023 and is projected to grow to CNY 66.23 billion by 2028, with a compound annual growth rate (CAGR) of 48.76%[24] - The rapid growth is driven by continuous technological innovation and strong industry demand[24] Industry Insights - Major players in the Chinese multimodal large model sector include Baidu, Alibaba, Tencent, and SenseTime, with significant advancements in model capabilities[31] - The application of multimodal models spans various sectors, with digital humans accounting for 24% of applications, followed by gaming and advertising at 13% each[33] Technological Development - The evolution of multimodal models has transitioned from task-specific to more general architectures, enhancing efficiency and flexibility[22] - Key components of multimodal models include modality encoders, input projectors, large model backbones, output projectors, and modality generators, which work together to process and generate diverse data types[9][12][14][15][16] Training and Evaluation - The training process for multimodal models typically involves two phases: pre-training with multimodal data and instruction fine-tuning to enhance user interaction capabilities[34] - Evaluation of generation capabilities focuses on aspects such as semantic understanding, coherence, and the ability to handle complex scenes[40][41] Future Trends - Future advancements in multimodal models will focus on improving generation consistency, contextual learning, and complex reasoning capabilities[46] - Addressing challenges like multimodal hallucination and enhancing model robustness will be critical for practical applications in fields such as healthcare and autonomous driving[46][50]