多模态大模型

Search documents
王晓刚:物理世界模型用于驾驶辅助训练很重要
Xin Lang Cai Jing· 2025-04-24 09:04
4月23日,两年一度的上海车展正式开幕,作为新汽车革命下的高级别车展,本次车展以 "拥抱创新 共赢未来"为主题,参展企业涵盖传统燃油车、新能源 车、智能驾驶、供应链技术等多个领域。智能汽车时代,技术发展日新月异,高阶智驾、AI大模型、多模态感知等前沿技术加速落地,更多新技术、新产 品将在上海车展正式亮相。 商汤绝影CEO/商汤科技联合创始人/首席科学家王晓刚 渐形成一些行业的共识。那么有一些这个车厂他在设计自己的那个方案的时候,对于传感器的型号也注重平台化。这就大量减少了我们对特定车型的重复开 发和适配的工作。 新浪汽车:那您觉得未来3-5年吧,您觉得这个汽车行业最值得关注的技术突破应该包括哪方面啊? 王晓刚:我觉得这个因为大模型的发展,通过人工智能大模型生成AI给整个这个行业还是带来了非常大的一个机会吧。我觉得一个是在智驾领域,今天我 们也提出来这个生成式智驾,就是因为现在大家做的是端到端。端到端有它的数据的局限,它需要大量的高质量的数据,它是模仿人类的这样的一个方式。 而且端到端,它有不确定性。比如说出现了一个问题,这个问题场景不可复现,各种很多类似的场景,但也不能确保这个场景就能解决,但是今天我们要用 ...
研判2025!中国音频行业产业链、市场规模及重点企业分析:AI技术引领音频行业变革,多模态大模型与生成式AI重塑内容创作[图]
Chan Ye Xin Xi Wang· 2025-04-23 01:36
Industry Overview - The audio industry is experiencing significant growth driven by technological advancements, particularly in AI, which enhances content creation and consumption [1][10] - The market size of China's audio industry is projected to reach 28.7 billion yuan in 2024, representing a year-on-year growth of 14.80% [10] Industry Development History - The audio industry in China has evolved through four main phases: 1. The nascent phase (1996-2005) began with the first online broadcasting platform in China [4] 2. The exploratory phase (2006-2015) saw the emergence of various audio platforms and regulatory frameworks [5] 3. The expansion phase (2016-2019) marked the introduction of live audio streaming features by major platforms [6] 4. The maturity phase (2020-present) is characterized by the listing of major companies and the integration of AI technologies [6] Industry Value Chain - The audio industry value chain consists of upstream content creation, materials, and components; midstream audio platforms; and downstream listening channels such as smartphones and smart speakers [8] Market Size - The application of AI and multimodal large models is transforming audio content creation, enhancing user experience and personalization [10] Key Companies - Major listed companies in the audio sector include Tencent Music, NetEase Cloud Music, and others, with various enterprises involved in digital publishing and audio technology [2] Industry Development Trends 1. **Intelligent and Integrated Solutions**: The future of audio will focus on smart and integrated solutions, leveraging AI for automatic processing and system integration [21] 2. **Immersive Audio Experiences**: Technologies like 3D and spatial audio will enhance user experiences in gaming, film, and entertainment [22][23] 3. **Environmental Sustainability**: The industry will increasingly prioritize eco-friendly materials and energy-efficient technologies in product design [24]
阶跃星辰多模态大模型为OPPO新机提供技术支持
news flash· 2025-04-22 08:05
Core Viewpoint - The article highlights the collaboration between OPPO and Jieyue Xingchen, which provides technological support for OPPO's flagship smartphone, Find X8 Ultra, featuring the "One-Click Flash Memory" function that intelligently recognizes screen content and organizes information into memory collections [1] Group 1 - OPPO's Find X8 Ultra is the first to launch the "One-Click Flash Memory" feature, which utilizes Jieyue Xingchen's multimodal model for content recognition and summarization [1] - The "One-Click Flash Memory" function categorizes fragmented information into different memory collections stored in the "Xiao Bu Memory" application [1] - Jieyue Xingchen has previously assisted OPPO in developing two other AI features: "One-Click Universal Search" and "One-Click Screen Inquiry" [1]
蚂蚁集团副总裁、前基础大模型负责人徐鹏离职
证券时报· 2025-04-14 11:01
Core Insights - Ant Group's Vice President Xu Peng, previously in charge of foundational large models, has recently left the company [1][2] - Xu Peng was responsible for the AI innovation and application department NextEvo, which handled all core technology research and development for Ant's AI initiatives, including the Ant Beiling large model [2] - NextEvo has published over 30 papers in important international journals and conferences in the AI field in 2023 [2] Company Developments - Ant Group has undergone a significant organizational restructuring, establishing two core business groups: Digital Payment Business Group and Alipay Business Group, with a rotating president system to enhance strategic implementation [3] - The rotating president system will have a six-month term, with the first term starting from the announcement until June 30, 2025 [3] - A major personnel change was announced, with President Han Xinyi set to officially take over as CEO on March 1, 2025, overseeing business operations and daily management [3] AI Initiatives - Ant Group is also developing a multimodal large model, which can process and understand various types of information simultaneously, indicating broader application prospects [2] - The company has open-sourced significant AI tools, including the DLRover distributed deep learning system and the GLake project for GPU memory and transmission optimization, filling a gap in domestic AI vertical technology [2]
2025年大模型研究系列:多模态大模型洞察:大模型向多模态发展,深入产业端垂直场景释放技术价值
Tou Bao Yan Jiu Yuan· 2025-04-09 13:52
Market Overview - The Chinese multimodal large model market reached CNY 9.09 billion in 2023 and is projected to grow to CNY 66.23 billion by 2028, with a compound annual growth rate (CAGR) of 48.76%[24] - The rapid growth is driven by continuous technological innovation and strong industry demand[24] Industry Insights - Major players in the Chinese multimodal large model sector include Baidu, Alibaba, Tencent, and SenseTime, with significant advancements in model capabilities[31] - The application of multimodal models spans various sectors, with digital humans accounting for 24% of applications, followed by gaming and advertising at 13% each[33] Technological Development - The evolution of multimodal models has transitioned from task-specific to more general architectures, enhancing efficiency and flexibility[22] - Key components of multimodal models include modality encoders, input projectors, large model backbones, output projectors, and modality generators, which work together to process and generate diverse data types[9][12][14][15][16] Training and Evaluation - The training process for multimodal models typically involves two phases: pre-training with multimodal data and instruction fine-tuning to enhance user interaction capabilities[34] - Evaluation of generation capabilities focuses on aspects such as semantic understanding, coherence, and the ability to handle complex scenes[40][41] Future Trends - Future advancements in multimodal models will focus on improving generation consistency, contextual learning, and complex reasoning capabilities[46] - Addressing challenges like multimodal hallucination and enhancing model robustness will be critical for practical applications in fields such as healthcare and autonomous driving[46][50]
击败DeepSeek V3?Meta强势炸场,史上最强Llama 4开源!
Ge Long Hui· 2025-04-06 06:22
Core Viewpoint - The launch of Meta's Llama 4 series marks a significant advancement in open-source AI models, positioning the company to compete with leading tech giants in the AI arms race [1][2]. Group 1: Llama 4 Series Launch - Meta introduced its most powerful open-source AI model, Llama 4, which is a multi-modal model capable of integrating various data types and converting content across different formats [3][4]. - The Llama 4 series features a mixed expert (MoE) architecture, supports 12 languages, and is touted as the strongest open-source multi-modal model available [4]. Group 2: Model Specifications - The Llama 4 series includes two versions: Scout and Maverick [5]. - Scout has 17 billion active parameters, 16 expert models, and a total of 109 billion parameters, supporting up to 10 million context inputs, outperforming OpenAI's models [6][8]. - Maverick also has 17 billion active parameters but features 128 expert models and a total of 400 billion parameters, matching the reasoning capabilities of DeepSeek-v3-0324 with only half the parameters [7][10]. Group 3: Performance Metrics - In extensive benchmark tests, Scout outperformed models such as Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 [9]. - Maverick excelled in programming, reasoning, multi-language, long context, and image benchmark tests, surpassing GPT-4o and Gemini 2.0 [11]. Group 4: Future Developments - Meta is training a new model, Llama4-Behemoth, which will have 2 trillion parameters and is expected to be released in the coming months [14]. - This model will feature 288 billion active parameters and 16 experts, and is anticipated to outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro in various STEM benchmark tests [15][16]. Group 5: Strategic Goals - Meta aims to establish itself as a leader in AI by making its models open-source and widely accessible, allowing global benefits [17]. - The company plans to invest $65 billion in expanding its AI infrastructure, including a nearly $1 billion data center project in Wisconsin [19].
港股周报-2025-04-02
BOCOM International· 2025-04-02 06:52
Market Strategy - The report emphasizes a balanced allocation strategy, suggesting that investors should wait for opportunities for elastic rebounds after recent market pressures due to tariff policies and economic uncertainties [2][4]. - The report highlights that the market is currently lacking a clear narrative, leading to divergent capital flows and a technical adjustment in the Hang Seng Technology Index, which has fallen over 10% from its peak [4][5]. - The anticipated announcement of new tariffs by the U.S. is expected to include global tariffs as high as 20%, impacting all trade partners and increasing global risk aversion [4][5]. Sector Performance - The healthcare sector has shown resilience, with pharmaceutical companies experiencing upward momentum due to strong earnings, particularly in CDMO/CMO companies with significant overseas revenue [7][21]. - The materials sector has benefited from a rotation of funds into high-dividend stocks, with coal stocks seeing gains amid declining risk sentiment in technology and consumer sectors [7][21]. - The consumer sector is exhibiting structural trends, with companies like Pop Mart reporting strong earnings growth, while others like Miniso have seen stock price declines following underwhelming performance [7][21]. AI and Technology Developments - OpenAI and Alibaba have made significant updates to their AI models, enhancing multi-modal capabilities that integrate text, images, audio, and video, which are expected to drive commercial applications [10][16]. - The report notes that the AI infrastructure and cloud computing service providers are entering a valuation reconstruction phase, particularly in the context of domestic chip design companies benefiting from localization trends [7][10]. Consumer Sector Insights - The optional consumer sector has outperformed the necessary consumer sector in terms of profit growth, with a reported net profit increase of 39.4% compared to a decline of 2.76% for necessary consumer goods [21][32]. - Companies in the optional consumer sub-sector, such as Pop Mart, have reported significant revenue growth, with a 106.9% increase in annual revenue, driven by strong performance in overseas markets [35][36]. - The necessary consumer sector is under pressure, but there are expectations for marginal improvements as consumption stimulus policies are implemented in 2025 [32][35]. Market Overview - The Hong Kong stock market has experienced a continued pullback, particularly in the technology sector, with valuations nearing the highs of October 2024 [40][54]. - The report indicates that the risk premium for the Hang Seng Index has rebounded, reflecting a shift in market sentiment and a potential opportunity for investors [54][60]. - The report also highlights that the overall market momentum has weakened, with most sectors entering a lagging phase, except for optional consumer and healthcare sectors which are showing improvement [69][70].
智源研究院院长王仲远:多模态大模型会给具身智能带来新变量
Xin Jing Bao· 2025-03-30 10:00
Core Insights - The topic of embodied intelligence is a major focus at the 2025 Zhongguancun Forum, with the introduction of the RoboOS framework and the open-source RoboBrain model [1][3] - Multi-modal large model technology is expected to enhance the intelligence of robots, allowing them to better understand and interact with the physical world [2][3] Group 1: Multi-modal Large Models - Multi-modal large models enable AI to perceive and understand the world through various data types, such as medical imaging and sensor data, facilitating the transition from digital to physical environments [2] - The performance improvement of large language models has slowed due to the exhaustion of available internet text data, necessitating the integration of multi-modal capabilities [2] Group 2: RoboBrain and RoboOS - RoboBrain and RoboOS are designed to support cross-scenario, multi-task deployment and collaboration among different types of robots, enhancing their general intelligence [3] - RoboBrain can interpret human commands and visual inputs to generate actionable plans based on real-time feedback, supporting various robotic configurations [3] Group 3: Industry Development and Challenges - The open-source approach is seen as a key driver for rapid development in the AI industry, allowing for collaboration among hardware, model, and application vendors [4] - Despite the potential of humanoid robots, there are significant challenges in their industrial application, with many still in the early stages of development [5] - The realization of Artificial General Intelligence (AGI) is projected to take an additional 5-10 years, influenced by advancements in embodiment capabilities and data accumulation [5]
商汤去年营收增一成,徐立:目标大模型训练与推理成本每年至少下降一个数量级
Peng Pai Xin Wen· 2025-03-27 00:41
Core Viewpoint - SenseTime reported a revenue increase of 10.8% year-on-year for 2024, driven primarily by the growth in generative AI, which saw a remarkable 103.1% increase, making it the company's largest business segment [2][5]. Financial Performance - Total revenue for 2024 reached 3.77 billion RMB, compared to 3.41 billion RMB in 2023, marking a 10.8% increase [4]. - The net loss narrowed to 4.31 billion RMB, a 33.7% improvement from the previous year's loss of 6.49 billion RMB [4]. - Gross profit was 1.62 billion RMB, with a gross margin of 42.9%, down from 44.1% in 2023 [4]. Business Segments - Generative AI revenue surged to 2.40 billion RMB, accounting for 63.7% of total revenue, up from 1.18 billion RMB (34.8%) in 2023 [8]. - The smart automotive segment generated 256 million RMB, a decline of 33.2% year-on-year, attributed to a shift in strategic focus [6]. - Visual AI revenue fell to 1.11 billion RMB, down 39.5%, as the company concentrated on high-quality clients and introduced generative AI capabilities [7]. Strategic Focus - The company is committed to a "1+X" organizational restructuring to enhance focus on core businesses, particularly generative AI and visual AI, while fostering vertical ecosystem enterprises [9]. - CEO Xu Li emphasized the goal of reducing the costs of large model training and inference by at least an order of magnitude annually, preparing for a surge in large model applications [9][11]. - SenseTime plans to leverage its accumulated resources in computer vision and multimodal reasoning to develop innovative applications for future AI agents [10]. Future Developments - The company will hold a technology day on April 10 to officially upgrade its "日日新" large model to version 6.0 [12]. - SenseTime aims to activate the potential of its ecosystem enterprises, focusing on specific market demands and sharing infrastructure and model development results [11].
Hi 机器人丨“大脑”“小脑”再进化,人形机器人又迎新突破
Sou Hu Cai Jing· 2025-03-26 14:53
Core Insights - The evolution of humanoid robots has accelerated significantly this year, showcasing advanced motion control capabilities, specialized assembly line functions, and potential for household care and companionship [3] - Recent achievements include humanoid robots setting world records in acrobatic skills, such as completing flips with extreme energy bursts and pressure tolerance [4] - The integration of advanced systems has enhanced robots' ability to perform complex movements, including dance and martial arts, through breakthroughs in power systems, intelligent algorithms, and perception technologies [5] Group 1 - Humanoid robots are now capable of performing complex tasks like riding a bike, sewing, and engaging in human-like interactions, thanks to the integration of motion intelligence, operational intelligence, and interactive intelligence [6] - The development of multimodal large models has enabled robots to not only communicate but also perceive and make judgments, significantly improving their responsiveness [7] - Innovations in model design have reduced processing time, allowing for millisecond-level response speeds by optimizing the transition from image and voice inputs directly to voice outputs [8]