多模态 - filings, earnings calls, financial reports, news - Reportify

多模态

Search documents

对话阶跃星辰CEO姜大昕：两年发布16款多模态模型，DeepSeek证明投流模式不成立｜钛媒体AGI

Tai Mei Ti A P P· 2025-05-08 08:33

Core Insights - The CEO of Leap AI, Jiang Daxin, announced the upcoming release of the full version of the inference model Step R1 and a more advanced Step image editing model within the next two to three months [2] - Leap AI emphasizes the importance of "multi-modal understanding and generation integration" as a key path towards developing a world model and progressing towards Artificial General Intelligence (AGI) [2][3] - Jiang Daxin highlighted that traditional traffic investment logic in AI product growth needs reevaluation, as demonstrated by the performance of DeepSeek and other AI products [2] Company Overview - Leap AI, founded in April 2023, is a leading startup focused on developing general AI models and has released the Step series of foundational models [5] - The company has raised several hundred million dollars in its B-round financing, with key investors including Shanghai State-owned Capital Investment Co., Tencent Investment, and Qiming Venture Partners [5] - Leap AI has launched 22 self-developed foundational models, with over 70% being multi-modal models, establishing itself as a leader in the multi-modal AI space [5] Product Development - The company has made significant advancements in multi-modal models, covering various applications such as image understanding, video generation, and music generation [5][7] - Leap AI has established deep collaborations with industry leaders in automotive, mobile, and IoT sectors, enhancing its product capabilities [7] - Recent product releases include the Step R-mini inference model and open-sourced video models, indicating a commitment to expanding its model capabilities [7] Strategic Focus - Leap AI is concentrating on developing intelligent terminal agents that enhance user experience by understanding environmental contexts [11] - The company believes that the integration of pre-trained foundational models with reinforcement learning can significantly improve reasoning capabilities [12] - Jiang Daxin asserts that achieving AGI requires a multi-modal approach, as human intelligence is diverse and relies on various modalities [8] Competitive Positioning - Leap AI differentiates itself from competitors like OpenAI and Google by focusing on foundational model development and multi-modal capabilities [13] - The company aims to create an ecosystem that integrates models with intelligent agents, bridging cloud and edge computing [13]

Seek .(US:SKLTY)

通用人工智能（AGI）

AI Agent智能体

Artificial Intelligence

Step系列基座大模型矩阵

通用人工智能（AGI）

AI Agent智能体

Artificial Intelligence

Step系列基座大模型矩阵

为什么AI视频工具长得越来越像？

3 6 Ke· 2025-05-07 07:50

Core Insights - The AI video sector has seen a shift in focus from OpenAI's Sora to new players like Keke and Jiemeng, with industry players now prioritizing the reduction of the gap between AI video production and consumption [4][5][6] - The competition among AI video players is intensifying, with frequent updates and new model releases expected in 2025, indicating a rapid evolution in the industry [4][12][26] - There is a growing concern among mid-tier AIGC entrepreneurs regarding the commercial viability of AI video, as production costs remain high while client budgets are decreasing [4][16][18] Group 1: Industry Dynamics - The AI video landscape is becoming increasingly crowded, with numerous players emerging and competing for market share [23][26] - The focus of competition has shifted from model parameters to three key dimensions: consistency, usability, and playability [6][13][14] - Many AI video products are becoming homogenized in terms of functionality, leading to increased competition on quality, cost, and interaction forms [5][16] Group 2: Technological Advancements - AI video players are enhancing video generation consistency by improving frame transitions and scene realism, which are critical for quality [9][11] - Major players are iterating their foundational models regularly, with updates occurring at least every six months to maintain competitive advantage [11][12] - New features such as dynamic editing capabilities and end-to-end production tools are being developed to improve usability for creators [13][14] Group 3: Market Challenges - Despite the proliferation of tools and features, many creators express anxiety over rising production costs and decreasing project budgets [16][18][21] - The pricing strategies in the AI video market are not leading to significant reductions in costs, with many companies maintaining high prices for advanced models [20][21] - The complexity of video creation demands a multi-platform approach, as no single company currently meets all needs in the market [27]

可灵大模型

可灵大模型

多模态和Agent成为大厂AI的新赛点

创业邦· 2025-05-01 02:54

Core Viewpoint - The article discusses the evolution of large models in consumer-facing applications, focusing on enhancing user interaction and enabling complex task execution through multi-modal capabilities and agent product ecosystems [4][6]. Multi-modal Capabilities - Major companies like ByteDance, Baidu, Google, and OpenAI have recently launched advanced multi-modal models, enabling innovative applications [4]. - Alibaba's AI product Quark introduced a new feature called "Photo Ask Quark," which utilizes multi-modal capabilities for enhanced user interaction [4][10]. - The development of multi-modal reasoning abilities is evident in products like Byte's Doubao 1.5 and OpenAI's o3 and o4-mini, which can analyze images and generate content [9][10]. Agent Execution Capabilities - The emergence of general agent products aims to execute complex tasks through natural language commands, with recent launches from companies like ByteDance and Baidu [4][5]. - The article highlights the need for agents to possess three key capabilities: integration with third-party data and tools, coding abilities, and strong task understanding [20][23]. - Manus has set a direction for agent products, showcasing a framework that combines user task understanding with tool integration [17]. Future of Agents - The ultimate goal for agents remains uncertain, with ongoing exploration in their development and application [7]. - The integration of multi-modal capabilities and agent execution abilities is crucial for creating a foundational entry point for future applications [25]. - OpenAI anticipates that AI agents will surpass ChatGPT in sales by the end of 2025, projecting revenues of $3 billion, with further growth expected by 2029 [25].

Artificial Intelligence

拍照问夸克

Artificial Intelligence

拍照问夸克

多模态和Agent成为大厂AI的新赛点

3 6 Ke· 2025-04-29 23:29

Core Insights - The article discusses the evolving landscape of AI applications, focusing on the dual pillars of multimodal capabilities and agent execution as key areas of development in the industry [1][2][3] Multimodal Capabilities - Major companies like ByteDance, Baidu, Google, and OpenAI have recently launched advanced multimodal models, enhancing application innovation [1][5] - Alibaba's AI product Quark introduced a new feature called "Photo Query Quark," which utilizes multimodal capabilities for user interaction [1][6] - OpenAI's latest models, o3 and o4-mini, have achieved significant multimodal understanding, allowing for image analysis and generation [5][16] - The integration of multimodal capabilities is expected to transform user experiences in work, study, and daily life, although current products are still in early exploration stages [2][3] Agent Execution - The article highlights the emergence of general agent products that can execute complex tasks based on natural language commands, with notable examples including ByteDance's Kouzi Space and Baidu's Xinxiang App [1][12] - The effectiveness of these agents relies on three key capabilities: connecting to third-party data and tools, coding ability, and task understanding [12][16] - OpenAI is exploring the acquisition of AI programming startup Windsurf to enhance coding capabilities for agents [16][17] - The anticipated revenue from AI agents is projected to exceed $3 billion by the end of 2025, with a potential contribution of $29 billion by 2029 [17] Future Directions - The article suggests that the future of agents may involve a more human-like ecosystem, with agents being developed according to specific professional roles [17] - The integration of multimodal capabilities with agent execution is seen as crucial for establishing a foundational entry point for future AI applications [17]

Artificial Intelligence

豆包1.5深度思考模型

文心4.5 Turbo

Artificial Intelligence

豆包1.5深度思考模型

文心4.5 Turbo

通义千问 Qwen3 发布，对话阿里周靖人

晚点LatePost· 2025-04-29 08:43

以下文章来源于晚点对话，作者程曼祺晚点对话 . 最一手的商业访谈，最真实的企业家思考。阿里云 CTO、通义实验室负责人周靖人 "大模型已经从早期阶段的初期，进入早期阶段的中期，不可能只在单点能力上改进了。" Qwen3 旗舰模型，MoE（混合专家模型）模型 Qwen3-235B-A22B，以 2350 亿总参数、220 亿激活参数，在多项主要 Benchmark（测评指标）上超越了 6710 亿总参数、370 亿激活参数的 DeepSeek-R1 满血版。更小的 MoE 模型 Qwen3-30B-A3B，使用时的激活参数仅为 30 亿，不到之前 Qwen 系列纯推理稠密模型 QwQ- 32B 的 1/10，但效果更优。更小参数、更好性能，意味着开发者可以用更低部署和使用成本，得到更好效果。图片来自通义千问官方博客。（注：MoE 模型每次使用时只会激活部分参数，使用效率更高，所以有总参数、激活参数两个参数指标。） Qwen3 发布前，我们访谈了阿里大模型研发一号位，阿里云 CTO 和通义实验室负责人，周靖人。他也是阿里开源大模型的主要决策者。迄今为止，Qwen 系列大模型已被累计下载 3 ...

混合推理模型

Artificial Intelligence

混合推理模型

Artificial Intelligence

国产算力景气度持续，关注昇腾产业链

2025-04-28 15:33

Summary of Conference Call Records Industry Overview - The conference call primarily discusses the domestic computing power industry and the optical communication sector, highlighting the performance of various companies within these industries [1][4][8]. Key Points and Arguments Domestic Computing Power Industry - The Ascend 910C chip has shown performance improvements, narrowing the gap with NVIDIA's H100, primarily used in Huawei's cloud infrastructure. Strong demand from downstream internet companies is expected to lead to large-scale shipments by May 2025, utilizing a dual 910B chip packaging solution [1][2]. - The overall performance of domestic graphics cards has improved, with increased customer acceptance and a positive outlook for the upstream supply chain, including connectors, liquid cooling, and servers [2]. Optical Communication Sector - The optical communication segment has exceeded expectations, with companies like NewEase and Shijia Photon showing strong performance. Source Technology's CW light source shipments have significantly improved revenue and profitability, with new product gross margins exceeding 80% [1][4]. - Domestic optical module companies, such as Guangxun Technology, experienced a slight decline in Q1 but showed significant improvement in profitability. Demand for domestic optical modules remains high, with production capacity expected to ramp up to 700,000 to 800,000 units per month this year [1][4]. Company Performance Highlights - NewEase and Shijia Photon have reported strong revenue and profit growth, driven by overseas demand for passive devices and corresponding chip products. Their revenue and gross margins for AWG, MPO connectors, and indoor optical cable products have significantly improved [5]. - In contrast, Invec's performance in the liquid cooling segment fell short of expectations, leading to a stock price decline. However, revenue met expectations, and the company faces increased margin pressure due to intensified competition in domestic temperature control orders [8]. Market Trends and Future Outlook - The communication sector's overall performance has been mixed, with some companies meeting expectations while others, like Invec, have struggled. The industry remains optimistic due to high investment from major players like ByteDance, Alibaba, and Tencent, which is expected to drive growth [8][9]. - The AI large model continues to evolve, with significant increases in computing power demand. For instance, Baidu's new model has reduced costs to about one-fourth per million tokens, indicating a growing need for computing resources [12]. - Investment recommendations focus on three areas: self-controlled supply chains (including high-speed connectors and liquid cooling), domestic computing power and AI data center industry trends, and advancements in AI applications, particularly in IoT smart modules and controllers [13]. Additional Important Insights - The optical communication sector's performance is expected to see rapid growth in domestic and international capacity releases over the next few years, particularly in overseas DCI business, which will contribute to significant revenue growth [5]. - The overall sentiment in the communication sector is optimistic, with expectations of continued improvement in profitability and growth trajectories for companies involved in new product releases and increased shipments [6][7].

升腾 910C 芯片

升腾 910C 芯片

图像编辑开源新SOTA，来自多模态卷王阶跃！大模型行业正步入「多模态时间」

量子位· 2025-04-28 03:43

衡宇发自凹非寺量子位 | 公众号 QbitAI 全球AI大模型智能涌现，现在正在进入"多模态时间"。一方面，全球业内各式各样的技术进展，都围绕多模态如火如荼展开。另一方面，AI应用和落地的需求中，多模态也是最重要的能力。没有多模态技术，何谈应用和落地？实际上，多模态的先锋共识和趋势，把代表性玩家的进展连点成线，也能看出来…… 看看行业公认的多模态卷王，阶跃星辰—— 刚刚过去的一个月，陆续上新的3款模型，全是多模态，有图生视频开源模型，有多模态推理模型，还有图像编辑开源模型。模态丰富，上新频繁，性能出色。之所以把阶跃的这些发布连点成线解读，也是因为阶跃从一开始的强落地和强应用属性。目前，阶跃已发布的模型里，七成都是多模态。鉴于多模态是Agent的必备要素，今年阶跃化身「落地型玩家」的态势愈发明显：发力智能终端Agent 。过去一个月，卷王卷出了些啥？据量子位整理回顾，过去一个月，阶跃星辰接连上新了3款模型：它们覆盖了当前多模态模型的几大刚需方向，并且其中Step1X-Edit和Step-Video-TI2V已面向开发者开源。怎么说呢，这很阶跃，也很符合技术流和行业玩家们对"多模态 ...

智能终端Agent

Artificial Intelligence

Step-Video-TI2V

智能终端Agent

Artificial Intelligence

Step-Video-TI2V

重磅发布 | 复旦《大规模语言模型：从理论到实践（第2版）》全新升级，聚焦AI前沿

机器之心· 2025-04-28 01:26

机器之心发布机器之心编辑部《大规模语言模型：从理论到实践（第 2版）》是一本理论与实践并重的专业技术书，更是 AI时代不可或缺的知识工具书。任何人都能在本书中找到属于自己的成长路径。在人工智能浪潮席卷全球的今天，大语言模型正以前所未有的速度推动着科技进步和产业变革。从 ChatGPT 到各类行业应用，LLM 不仅重塑了人机交互的方式，更成为推动学术研究与产业创新的关键技术。面对这一飞速演进的技术体系，如何系统理解其理论基础、掌握核心算法与工程实践，已成为每一位 AI 从业者、研究者、高校学子的必修课。 2023 年 9 月，复旦大学张奇、桂韬、郑锐、黄萱菁研究团队面向全球学术界与产业界正式发布了《大规模语言模型：从理论到实践》。短短两年，大语言模型在理论研究、预训练方法、后训练技术及解释性等方面取得了重要进展。业界对大语言模型的研究更加深入，逐渐揭示出许多与传统深度学习和自然语言处理范式不同的特点。例如，大语言模型仅需 60 条数据就能学习并展现出强大的问题回答能力，显示了其惊人的泛化性。然而，本书作者们也发现大语言模型存在一定的脆弱性。例如，在一个拥有 130 亿个参数的模 ...

大语言模型

混合专家模型

大语言模型

混合专家模型

李彦宏点评 DeepSeek 又贵又慢，网友：这就有点“既要又要”了

程序员的那些事· 2025-04-26 15:13

以下文章来源于MaxAIBox ，作者Max 2 月 14 日，百度宣布了文心大模型不止要免费，而且还要开源。 2 月 16 日晚，百度搜索和文心智能体平台分别宣布，将全面接入 DeepSeek 和文心大模型最新的深度搜索功能。2 月 18 日，DeepSeek-R1 满血版已经在百度 APP 搜索上线。此外，2 月 18 日晚间，李彦宏在 2024 年第四季度及全年财报表示： MaxAIBox . MaxAIBox.com 汇集优秀 AI 工具，探索 AI 无限可能 1 众所周知，百度曾经坚持闭源路线，但 DeepSeek 爆火出圈后，随着各行各业众多企业接入满血版 DeepSeek-R1，百度也跟上了。从 DeepSeek 我们学到一点，那就是将最为优秀的模型开源供所有人使用，将可以极大地推动其应用，因为大家出于好奇自然会想去尝试开源模型，进而推动其更广泛的应用。 2 4 月 25 日，百度在武汉举办了一场 AI 开发者大会，李彦宏上台发表了题为《模型的世界，应用的天下》的演讲。他指出，"只要找对场景，选对基础模型，学一点调模型的方法，做出来的应用不会过时。" + "没有应用，芯片、模型都没 ...

Software and Internet

文心大模型

Software and Internet

文心大模型

酷开一口气甩出 6 个超级智能体！CEO：一定要做 AI 原生，性价比是我们追求的主要方向

AI前线· 2025-04-25 13:48

当下，市面上各类智能体如雨后春笋涌现，但由于缺乏应用广度及深度，以及设备交互无法承载场景需求，智能体的应用价值未得到充分发挥。市面上不缺乏智能体，但缺少能够提供满意服务的智能体。据王志国介绍，此次推出超级智能体后，酷开接下来的规划是分步走的。第一，做用户数据的闭环，要观察三个月左右的时间，尤其是用户留存、活跃数据和功能满足率大方面；第二，主动服务能力是下一个重心，准备把超级智能体的意图识别模型从 7B 模型换到 32B 模型，把它做成跟用户情感对话的工具；第三，时刻保持着跟行业内最领先的大模型做，一定要做 AI 原生，只要中间隔着人，大模型的能力就会被大幅度衰减。同时，酷开超级智能体和六大专业智能体支持软件售卖、设备授权、PaaS 服务、生态共赢的等合作模式，致力构建开放智能生态。据王志国透露，今年 Q1 季度，酷开签约智能体销售（软件销售）已经达到了软件和硬件各占一半。作者 | 华卫 4 月 22 日，酷开在以"大爱 AI"为主题的 2025 春季发布会上发布超级智能体，包括影音、健康、生活、设备、创作、教育六大智能体，以及智能体硬件酷开学习机 Y41 Air、酷开闺蜜机 C20 系列等产品 ...

Artificial Intelligence

酷开超级智能体

酷开闺蜜机 C20 系列

Artificial Intelligence

酷开超级智能体

酷开闺蜜机 C20 系列