多模态推理模型

Search documents
“多模态卷王”,连发三箭!
Zhong Guo Ji Jin Bao· 2025-07-26 08:44
Core Insights - Jumpshare Star announced three significant developments: the launch of the new generation foundational model Step3, a strategic partnership with Shanghai State-owned Capital Investment Co., and the establishment of the Model Ecological Innovation Alliance with nearly ten chip manufacturers and computing power platforms [2][3][6]. Group 1: New Model Launch - The new foundational model Step3 is designed to balance intelligence and efficiency, aiming to create the most suitable model for the inference era and will be open-sourced to global enterprises and developers on July 31 [3]. - Step3 boasts a decoding efficiency that can reach up to 300% compared to DeepSeek-R1 on domestic chips, and it is compatible with all chip types [3]. Group 2: Strategic Partnership - The collaboration with Shanghai State-owned Capital Investment Co. marks a significant step in the commercialization of Jumpshare Star, focusing on capital linkage, ecosystem construction, business synergy, and application empowerment [6][9]. - Shanghai State-owned Capital Investment Co. has a registered capital of 10 billion yuan and is involved in strategic equity management and market-oriented investment projects [9]. Group 3: Ecosystem Alliance - The Model Ecological Innovation Alliance aims to enhance model adaptability and computing efficiency through collaborative innovation among foundational technology vendors [11]. - Initial members of the alliance include major companies such as Huawei Ascend, MuXi, and others, with the goal of providing efficient and user-friendly large model solutions [11][13].
“多模态卷王”,连发三箭!
中国基金报· 2025-07-26 08:31
Core Viewpoint - Jumpshare Star announced three significant developments: the launch of the new generation foundational model Step 3, a strategic partnership with Shanghai State-owned Capital Investment Co., and the establishment of the Model Ecological Innovation Alliance with nearly ten chip manufacturers and computing power platforms [1][7][14]. Group 1: New Generation Foundational Model - The new foundational model Step 3 is designed to balance intelligence and efficiency, aiming to create the most suitable model for the inference era and contribute a powerful multimodal inference model to the open-source community [1][2]. - Step 3 achieves a decoding efficiency that is up to 300% higher than DeepSeek-R1 on domestic chips, demonstrating significant advancements in system and architecture innovation [2]. - In distributed inference using NVIDIA Hopper architecture chips, Step 3 shows a throughput improvement of over 70% compared to DeepSeek-R1 [4]. Group 2: Strategic Partnership - The partnership with Shanghai State-owned Capital Investment Co. marks a significant step in Jumpshare Star's commercialization efforts, focusing on capital linkage, ecological construction, business collaboration, and application empowerment [7]. - Shanghai State-owned Capital Investment Co. is a large state-owned capital investment platform with a registered capital of 10 billion yuan, involved in strategic equity management and market-oriented investment [8]. Group 3: Commercialization Progress - Jumpshare Star has achieved commercial progress, with over half of domestic smartphone manufacturers collaborating with the company, and partnerships with Geely Auto for smart cockpit solutions [10]. - The company aims to achieve an annual revenue target of 1 billion yuan by 2025, driven by rapid growth in the first half of 2025 [10]. Group 4: Model Ecological Innovation Alliance - The Model Ecological Innovation Alliance, initiated by Jumpshare Star and nearly ten chip and infrastructure manufacturers, aims to enhance model adaptability and computing efficiency through collaborative innovation [14][15]. - Initial members of the alliance include Huawei Ascend, Muqi, and several other technology firms, with the goal of providing efficient and user-friendly large model solutions for enterprises and developers [14][15].
阶跃星辰发布新一代基模 Step 3,原生多模态推理模型,性能达到开源 SOTA
Founder Park· 2025-07-26 04:53
Core Viewpoint - The article discusses the launch of Step 3, a new generation foundational model by the company, aimed at enhancing intelligent applications and efficiency in the reasoning era, emphasizing the importance of meeting customer needs and real-world application scenarios [3][6]. Group 1: Step 3 Model Overview - Step 3 is positioned as the primary foundational model, designed for global enterprises and developers, and will be open-sourced on July 31 [3][20]. - The model features a total parameter count of 321 billion, with 38 billion active parameters, showcasing strong visual perception and complex reasoning capabilities [9]. - Step 3 aims to balance performance and cost, achieving state-of-the-art (SOTA) results in open-source multi-modal reasoning tasks [9][18]. Group 2: Technological Innovations - The model employs a Mixture of Experts (MoE) architecture, which allows for significant performance improvements while maintaining low operational costs [9][18]. - Step 3 has demonstrated a decoding efficiency that can reach up to 300% on domestic chips compared to previous models, and over 70% improvement in throughput on NVIDIA Hopper architecture [18][20]. Group 3: Industry Collaboration - The company has initiated the "MoXin Ecological Innovation Alliance" with leading chip and platform manufacturers to foster joint innovation across the model and chip industry [5][22]. - A strategic partnership with Shanghai State-owned Capital Investment Co., Ltd. has been established to enhance capital linkage and ecological business cooperation [5][22]. Group 4: Application and Market Focus - The company is focusing on key application scenarios such as automotive, mobile phones, and IoT devices, with significant collaborations with major domestic smartphone manufacturers and the automotive industry [23]. - The company aims to create scenario-based applications in vertical industries, collaborating with leading firms in finance, content creation, and retail [23].
斯坦福最新!大模型的幻觉分析:沉迷思考=真相消失?
自动驾驶之心· 2025-06-19 10:47
Core Viewpoint - The paper explores the relationship between reasoning capabilities and hallucinations in multimodal reasoning models, questioning whether increased reasoning leads to decreased visual perception accuracy [2][3][37]. Group 1: Reasoning Models and Hallucinations - Multimodal reasoning models exhibit a tendency to amplify hallucinations as their reasoning capabilities improve, leading to potential misinterpretations of visual data [2][3][5]. - The study introduces a new metric, RH-AUC, to assess the balance between reasoning length and perception accuracy, indicating that longer reasoning chains may lead to increased hallucinations [4][30]. Group 2: Attention Mechanism and Performance - The attention mechanism in reasoning models shows a significant drop in focus on visual elements, leading to a reliance on language-based assumptions rather than visual evidence [5][18]. - Experiments reveal that reasoning models perform poorly on perception tasks compared to non-reasoning models, indicating that hallucination rates are higher in reasoning models regardless of their size [8][37]. Group 3: Training Paradigms and Data Quality - The paper identifies two main training paradigms: pure reinforcement learning (RL-only) and supervised fine-tuning combined with reinforcement learning (SFT+RL), with RL-only models generally performing better in balancing reasoning and perception [10][35]. - Data quality is emphasized over quantity, suggesting that models trained on high-quality, domain-specific data perform better in maintaining the reasoning-hallucination balance [39][42]. Group 4: Evaluation Metrics and Future Directions - The RH-Bench benchmark is introduced, consisting of 1000 multimodal tasks to evaluate models' reasoning and perception capabilities comprehensively [30][32]. - Future research directions include exploring broader model architectures and developing mechanisms for dynamically adjusting reasoning lengths to enhance model reliability [44].