Workflow
多模态AI
icon
Search documents
LLaSO 横空出世:逻辑智能推出全球首个完全开源语音大模型框架,定义 LSLM 研究新基准
机器之心· 2025-09-14 05:16
论文标题:L LaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model 在大型语言模型(LLM)的浪潮下,多模态 AI 取得了飞速发展,尤其是在视觉语言(LVLM)领域,已经形成了成熟的研究范式。然而,与之形成鲜明对比的 是,大型语音语言模型(LSLM)的发展却显得零散且步调缓慢。 该领域长期被碎片化的架构、不透明的训练数据和缺失的评估标准所困扰,导致研究之间难以进行公平比较,严重阻碍了技术的可复现性和社区的系统性进步。 许多研究虽然发布了模型权重,但其赖以成功的关键 —— 训练数据和配置细节 —— 却常常被 "雪藏" 起来。 为了打破这一僵局, 北京深度逻辑智能科技有限公司推出了 LLaSO —— 首个完全开放、端到端的语音语言模型研究框架。 LLaSO 旨在为整个社区提供一个统一、透明且可复现的基础设施,其贡献是 "全家桶" 式的,包含了一整套开源的数据、基准和模型,希望以此加速 LSLM 领域的 社区驱动式创新。 论文地址:https://arxiv.org/abs/2508.1 ...
AI产业跟踪:谷歌发布新图像模型Gemini2.5FlashImage,关注多模态AI应用落地进展
Changjiang Securities· 2025-09-05 08:44
Investment Rating - The report maintains a "Positive" investment rating for the industry [7] Core Insights - On August 26, 2025, Google released the image generation and editing model Gemini 2.5 Flash Image, code-named "Nano-Banana," which supports 32k context with pricing for input/output text at $0.3/$2.5 and input/output images at $0.3/$30. The report anticipates a significant turning point in Q4 for domestic models and applications, strongly favoring the monetization, scaling, and commercialization of domestic AI applications [2][5] Summary by Sections Event Description - Google launched the Gemini 2.5 Flash Image model, which supports high-context image generation and editing, with specific pricing details provided [5] Event Commentary - The model exhibits superior capabilities in character consistency and creativity, with five core functions: text-to-image, image-to-text, multi-image generation, iterative refinement, and high-fidelity text rendering. The report suggests that the model's advancements could transition AI from a productivity tool to a creative partner, enhancing the potential for new application scenarios [10] - Key technological highlights include interleaved generation, which allows for consistent and varied image outputs based on user instructions, and pixel-perfect editing capabilities that enable users to refine outputs easily. The cost of generating a single image is approximately $0.039, significantly lower than previous models, enhancing competitive positioning [10] - The report emphasizes the strengthening of investment logic in domestic AI agents, predicting a pivotal moment for AI application monetization and commercialization in Q4. It recommends focusing on AI agent-related companies, the Chinese computing power industry chain, cloud service providers, and IDC firms collaborating with major players like Alibaba [10]
狮腾控股(2562.HK)大涨近12%,推出Geene M2多模态AI平台
Ge Long Hui A P P· 2025-09-04 03:28
Core Viewpoint - Lion Group Holdings (2562.HK) experienced a nearly 12% increase in stock price, reaching HKD 17.9, following the announcement of its new multi-model large language model (LLM) platform, Geene M2 [1]. Company Summary - The newly launched Geene M2 platform integrates various large language models, including Geene R1, Geene TurboGT, OpenAI's ChatGPT, Alibaba's Qwen, ByteDance's SkyLark, and other LLMs [1].
狮腾控股推出Geene M2多模态AI平台
Core Viewpoint - Lion Group announced the launch of its multi-model large language model platform, Geene M2, integrating various models including Geene R1, Geene TurboGT, OpenAI's ChatGPT, Alibaba's Qwen, and ByteDance's SkyLark [1] Company Summary - Lion Group's new platform, Geene M2, aims to consolidate multiple large language models into a single offering, enhancing its capabilities in the AI space [1]
谷歌nano-banana模型一致性强出圈,看好多模态场景应用提速
Orient Securities· 2025-09-02 01:47
Investment Rating - The industry investment rating is maintained as "Positive" [4] Core Insights - The latest Google model, gemini-2.5-flash-image-preview (nano-banana), demonstrates state-of-the-art (SOTA) image understanding and editing capabilities, significantly enhancing production efficiency and accelerating AI penetration in e-commerce and advertising [1][7] - The high consistency in image generation and editing is expected to alleviate pain points in AI video creation workflows, suggesting potential investment opportunities in downstream AI applications within the multi-modal industry [1][7] Summary by Sections Investment Recommendations and Targets - Emphasis is placed on the opportunities in vertical multi-modal AI applications in the second half of the year, driven by technological breakthroughs and cost optimization, which are expected to enhance user growth and commercialization [2] - Companies with multi-modal AI applications targeting overseas markets are highlighted for their potential rapid growth, including Kuaishou-W (01024, Buy), Meitu Inc. (01357, Not Rated), Wanjun Technology (300624, Not Rated), and MiniMax (Not Listed) [2] - Recommendations to monitor the implementation of Meta's logic, which links model capabilities to revenue growth, with suggested follow-ups on Alibaba-W (09988, Buy), Tencent Holdings (00700, Buy), and Kuaishou-W (01024, Buy) [2] Industry Overview - The report focuses on the media industry, particularly in China, and was published on September 2, 2025 [4] - The report indicates a strong outlook for the industry, maintaining a positive stance on its growth potential [4]
三态股份涨0.85%,成交额1.14亿元,近3日主力净流入-4144.15万
Xin Lang Cai Jing· 2025-09-01 08:00
Core Viewpoint - Shenzhen SanTai E-commerce Co., Ltd. is benefiting from the depreciation of the RMB and is actively developing AI-driven tools for risk detection in cross-border e-commerce [2][3]. Company Overview - Shenzhen SanTai E-commerce Co., Ltd. specializes in export cross-border e-commerce retail and third-party logistics, with a revenue composition including hobbies (28.88%), international dedicated lines (24.71%), home living (23.64%), and others [7]. - The company was established on January 7, 2008, and went public on September 28, 2023 [7]. Financial Performance - For the first half of 2025, the company achieved a revenue of 827 million yuan, representing a year-on-year growth of 3.27%, while the net profit attributable to shareholders decreased by 48.75% to 23.26 million yuan [8]. - The company has distributed a total of 110 million yuan in dividends since its A-share listing [9]. Product and Service Development - The company launched its AI-based intellectual property risk detection tool "RuiGuan·ERiC" on September 28, 2023, aimed at providing flexible and cost-effective risk monitoring solutions [2][3]. - The company is also developing an AIGC project that utilizes Stable Diffusion for generating high-quality images, enhancing operational efficiency and reducing production costs [2]. Market Position and Trends - The company’s overseas revenue accounts for 99.98% of its total revenue, benefiting from the depreciation of the RMB [3]. - The company operates within the internet e-commerce sector, specifically in cross-border e-commerce, and is involved in various concept sectors including small-cap stocks, intellectual property, smart logistics, and AIGC [8]. Shareholder Information - As of August 20, the number of shareholders decreased by 5.71% to 31,200, with an average of 7,023 circulating shares per person, an increase of 6.06% [8]. - Major shareholders include Hong Kong Central Clearing Limited and several ETFs, indicating a diversified ownership structure [9].
三态股份跌0.10%,成交额2.35亿元,今日主力净流入-2986.00万
Xin Lang Cai Jing· 2025-08-28 08:13
Core Viewpoint - The company, Shenzhen SanTai E-commerce Co., Ltd., is focusing on cross-border e-commerce retail and logistics, leveraging AI technology for operational efficiency and cost reduction [2][8]. Group 1: Company Overview - Shenzhen SanTai E-commerce Co., Ltd. was established on January 7, 2008, and listed on September 28, 2023 [8]. - The company's main business includes cross-border e-commerce retail (99.98% of revenue) and logistics services [3][9]. - The revenue composition includes interests and hobbies (28.88%), international dedicated lines (24.71%), home living (23.64%), tool accessories (10.62%), trendy fashion (8.66%), digital technology (2.99%), international postal (0.33%), commercial express (0.16%), and other income (0.02%) [8]. Group 2: Financial Performance - For the period from January to March 2025, the company achieved a revenue of 403 million yuan, representing a year-on-year growth of 3.48%, while the net profit attributable to shareholders decreased by 53.47% to 14.0044 million yuan [9]. - The company has distributed a total of 110 million yuan in dividends since its A-share listing [10]. Group 3: Market Position and Trends - The company is positioned within the small-cap segment and is associated with concepts such as AIGC, intellectual property, smart logistics, and e-commerce [8]. - The company is benefiting from the depreciation of the RMB, which enhances its overseas revenue [3].
InternVL 3.5来了!上海AI Lab最新开源:硬刚 GPT-5 还把效率玩明白
自动驾驶之心· 2025-08-27 23:33
Core Viewpoint - Shanghai AI Lab has launched the open-source multimodal model InternVL 3.5, which significantly advances the performance of the InternVL series in terms of generality, reasoning ability, and inference efficiency compared to its predecessors [2]. Model Architecture - InternVL 3.5 consists of three core components: a dynamic high-resolution text tokenizer, an InternViT visual encoder, and a connector that integrates visual and language modalities [5]. - The model employs a two-stage training paradigm, including a large-scale pre-training phase and a multi-stage post-training phase [5][6]. Training Objectives - The pre-training phase utilizes a large-scale multimodal corpus to learn general visual-language representations, with a total of approximately 1.16 billion samples corresponding to about 250 billion tokens [7]. - The post-training strategy includes three stages: Supervised Fine-Tuning (SFT), Cascade Reinforcement Learning (Cascade RL), and Visual Consistency Learning (ViCO) [9]. Performance Metrics - InternVL 3.5 has shown superior performance across various benchmarks, achieving notable scores in tasks such as MMStar, MMVet, and MMBench V1.1 [14]. - The model's performance is competitive with top commercial models like GPT-5, demonstrating significant improvements in multimodal reasoning and mathematical tasks [14][15]. Testing and Deployment - The model incorporates a test-time scaling method to enhance reasoning capabilities, particularly for complex tasks requiring multi-step reasoning [11]. - The Decoupled Vision-Language Deployment (DvD) framework optimizes hardware costs and facilitates seamless integration of new modules without modifying the language server deployment [12].
今日十大热股:华胜天成算力概念持续火热,合力泰5天4板电子纸概念爆发,歌尔股份领衔消费电子行情
Jin Rong Jie· 2025-08-27 03:15
Market Overview - On August 26, A-shares experienced mixed fluctuations, with the Shanghai Composite Index down 0.39%, the Shenzhen Component Index up 0.26%, and the ChiNext Index down 0.75% [1] - The total trading volume in the Shanghai and Shenzhen markets was 2.71 trillion yuan, a decrease of approximately 460 billion yuan compared to the previous day [1] - Over 2,800 stocks rose in the market, with 92 stocks hitting the daily limit, primarily in the computer and machinery sectors [1] Hot Stocks - The top ten popular stocks included Liou Co., Huaseng Tiancheng, Lingyi Intelligent Manufacturing, and others, with significant interest in sectors like AI and electronic components [2] Company Highlights - Liou Co. reported a strong turnaround with an expected net profit of 350-450 million yuan in the first half of the year, benefiting from the fair value changes and sale of shares in Li Auto [3] - Huaseng Tiancheng's popularity stems from its deep involvement in AI computing and a projected net profit increase of 148%-172% year-on-year, supported by its role in national AI computing centers [3] - Lingyi Intelligent Manufacturing's rise is attributed to its diversification into new technology sectors, including electric vehicle components and humanoid robot parts [3] - Tuowei Information gained attention due to its partnership with Huawei and a significant profit increase of over 2200% year-on-year [3] - Hanwujing-U is recognized for its leadership in AI chips, with a nearly 100% increase in orders due to the global demand for large model training [4] - Goer Technology's focus on AI consumer electronics and AR/VR has led to a 110% increase in smart glasses shipments, with a notable 250% growth in AI glasses [4] - Fenda Technology's market interest is driven by its production optimization and breakthroughs in AI hardware, with a 35.9% increase in R&D investment [5] - Helitai's success is linked to its debt restructuring and the growth of its electronic paper business, with a significant reduction in debt ratio and increased revenue from electronic paper [5]
多模态AI概念股集体走强,科大讯飞涨超5%
Ge Long Hui· 2025-08-27 03:15
Group 1 - The A-share market saw a collective surge in multi-modal AI concept stocks, with notable performances including Kaipu Cloud hitting a 20% limit up, Zhongke Chuangda rising over 16%, and Yanshan Technology and Runjian Shares both reaching a 10% limit up [1][2] - The State Council issued an opinion on the implementation of the "Artificial Intelligence +" action plan, which includes six key actions aimed at accelerating the integration of AI with various sectors by 2027 [1] - By 2030, the plan envisions AI fully empowering high-quality development, with the penetration rate of new intelligent terminals and agents exceeding 90%, positioning the AI economy as a significant growth driver for China's economic development [1] Group 2 - Specific stock performances include Kaipu Cloud with a market cap of 7.677 billion, Zhongke Chuangda at 36.8 billion, and Yanshan Technology at 41.5 billion, reflecting significant year-to-date gains [2] - Other notable stocks include Runjian Shares with a market cap of 16 billion and a year-to-date increase of 68.31%, and Entropy Technology with a market cap of 7.969 billion and a 53.58% increase [2] - The overall trend indicates a strong investor interest in AI-related companies, driven by government policy support and anticipated growth in the AI sector [1][2]