Workflow
全模态大模型
icon
Search documents
哈工大深圳团队推出Uni-MoE-2.0-Omni:全模态理解、推理及生成新SOTA
机器之心· 2025-11-25 09:37
全模态大模型(Omnimodal Large Models, OLMs)能够 理解、生成、处理并关联 真实世界多种数据类型,从而实现更丰富的理解以及与复杂世界的深度交互。人 工智能向全模态大模型的演进,标志着其从「专才」走向「通才」,从「工具」走向「伙伴」的关键点。 然而,如何在一个模型中同时兼顾强大的多模态理解与高质量生成,如何构建高效而统一的模型架构,如何设计合理的训练方法和数据配比方案,仍是当前学术 界与工业界共同的挑战。 近日, 哈工大深圳计算与智能研究院 Lychee 大模型团队, 在 2023 年研发的「立知」大语言模型基础上(工信部和网信办双认证),基于 2024 年 5 月提出的原 创 Uni-MoE 全模态大模型架构,正式发布第二代「立知」全模态大模型 Uni-MoE-2.0-Omni。 该模型 以大语言模型为核心,通过渐进式模型架构演进与训练策略优化,将稠密大语言模型拓展为混合专家架构驱动的高效全模态大模型,实现了从「语言理 解」到「多模态理解」,再到「理解与生成兼备」的跨越式升级! 团队围绕以语言为核心的通用人工智能,通过引入全模态 3D RoPE 位置编码、设计动态容量 MoE 架构以 ...
国泰海通:MiniMax发布全模态AI“全家桶” M2登顶全球开源模型
智通财经网· 2025-11-11 11:58
Core Viewpoint - MiniMax, a Shanghai-based AI unicorn, has launched a comprehensive multimodal model suite called "全家桶," marking a significant breakthrough for Chinese AI companies in the multimodal technology field and opening new avenues for commercialization [1][2]. Group 1: Investment Insights - MiniMax's multimodal "全家桶" encompasses a technology system covering text, vision, speech, and music, with its text model M2 ranking among the top globally in authoritative evaluations [2]. - The M2 model has achieved a breakthrough in balancing performance, speed, and cost, establishing a new benchmark in model efficiency and cost control [3]. Group 2: Model Performance - M2's inference cost is as low as $0.53 per million tokens, which is only 8% of Claude 4.5 Sonnet's cost, while its inference speed is nearly double that of the latter [3]. - Following its release, M2's API call volume surged, ranking fourth globally and first among domestic models within five days, demonstrating its excellent balance between high performance and low cost [3]. Group 3: Product Matrix and Technical Layout - The "全家桶" model suite includes Hailuo 2.3 for video generation, which supports generating native 1080p videos for up to 10 seconds, and Speech 2.6, optimized for voice agent scenarios with a response time reduced to 250 milliseconds [4]. - Music 2.0 can generate complete songs lasting up to 5 minutes, showcasing the company's commitment to high-quality generation and stability through the use of a complete attention mechanism [4].
英伟达新架构引爆全模态大模型革命,9B模型开源下载即破万
3 6 Ke· 2025-11-07 10:48
Core Insights - OmniVinci, NVIDIA's latest multimodal model, boasts 9 billion parameters and significantly outperforms competitors in video and audio understanding, showcasing a training data efficiency six times greater than rivals [1][5][7]. Group 1: Model Performance - OmniVinci demonstrates superior performance across multiple benchmarks in multimodal understanding, audio comprehension, and video analysis, establishing itself as a leading model in the field [3][5][9]. - The model's architecture includes innovations such as OmniAlignNet, which enhances the precision of temporal alignment between visual and auditory signals [9][11]. Group 2: Competitive Landscape - The release of OmniVinci marks NVIDIA's strategic entry into the open-source model arena, positioning itself alongside Chinese models like DeepSeek and Qwen, which have rapidly gained traction in the AI community [1][18][22]. - The competitive dynamics are shifting, with NVIDIA leveraging its hardware dominance to influence model development and ecosystem growth, rather than merely supporting it [7][18]. Group 3: Applications and Use Cases - OmniVinci's capabilities extend to various applications, including video content understanding, speech transcription, and robotic navigation, indicating a broad potential for real-world implementation [1][11][14]. - The model's ability to integrate audio and visual data enhances its performance in understanding complex scenarios, leading to significant advancements in multimodal learning [8][9]. Group 4: Community Impact - The open-source release of OmniVinci has generated substantial interest, with over 10,000 downloads on platforms like Hugging Face, indicating a strong community response and engagement [19][22]. - NVIDIA's commitment to open-source models is seen as a strategic move to foster a collaborative ecosystem, ultimately benefiting its hardware sales as more developers utilize its GPUs [18][22].
阿里巴巴旗下通义千问发布Qwen3-Omni原生全模态大模型
Zhi Tong Cai Jing· 2025-09-26 06:18
Core Insights - Alibaba's subsidiary Tongyi Qianwen has officially launched Qwen3-Omni, a fully multimodal large model capable of seamlessly processing various input forms including text, images, audio, and video while generating text and natural speech output in real-time [1] Model Architecture - Qwen3-Omni utilizes the Thinker-Talker architecture, where the Thinker is responsible for text generation and the Talker focuses on streaming speech token generation, directly receiving high-level semantic representations from the Thinker [1] - To achieve ultra-low latency streaming generation, the Talker predicts multi-codebook sequences in an autoregressive manner, with the MTP module outputting the residual codebook for the current frame during each decoding step [1] - The Code2Wav component synthesizes the corresponding waveform, enabling frame-by-frame streaming generation [1]
阿里巴巴(09988)旗下通义千问发布Qwen3-Omni原生全模态大模型
智通财经网· 2025-09-26 06:12
Core Insights - Alibaba's subsidiary Tongyi Qianwen has officially launched Qwen3-Omni, a native multimodal large model capable of seamlessly processing various input forms including text, images, audio, and video while generating text and natural speech output in real-time [1] Group 1: Model Features - Qwen3-Omni is designed as a fully multimodal model that maintains intelligence across different modalities without degradation [1] - The model architecture utilizes the Thinker-Talker framework, where Thinker is responsible for text generation and Talker focuses on streaming voice token generation [1] - To achieve ultra-low latency in streaming generation, Talker predicts multiple codebook sequences in an autoregressive manner, outputting the residual codebook for the current frame [1] Group 2: Technical Implementation - The Code2Wav module synthesizes the corresponding waveform for each frame, enabling frame-by-frame streaming generation [1]
36氪晚报|速卖通墨西哥“海外托管”正式上线;字节跳动Seed开源VeOmni框架;腾讯元宝接入京东
3 6 Ke· 2025-08-14 11:03
Group 1: Company Financial Performance - Zeekr Group reported a total revenue of 27.431 billion yuan in Q2, a year-on-year decrease of 0.9% but a quarter-on-quarter increase of 24.6%. The net loss for Q2 was 287 million yuan, narrowing by 88.8% year-on-year and 62.4% quarter-on-quarter [1] - Weibo's Q2 revenue reached 4.448 billion USD (approximately 32 billion yuan), with an adjusted operating profit of 1.618 billion USD (approximately 11.58 billion yuan), exceeding Wall Street expectations. Advertising revenue for Q2 was 3.834 billion USD (approximately 27.56 billion yuan) [3] - Vipshop reported Q2 net revenue of 25.8 billion yuan, with a GMV of 51.4 billion yuan, reflecting a year-on-year growth of 1.7% [7][8] - JD Group's Q2 revenue was 356.7 billion yuan, showing a year-on-year growth of 22.4%. The net profit attributable to ordinary shareholders was 16.2 billion yuan, with a retail revenue growth of 20.6% [9] Group 2: Strategic Partnerships and New Initiatives - Thunder Innovation and Ant Group announced a strategic partnership to create digital payment solutions for the global market, launching the Thunder X3 Pro AI glasses that enable payment through visual recognition [1] - AliExpress launched an "overseas custody" service in Mexico, allowing local merchants to stock products and gain promotional benefits [2] - Tencent's Yuanbao introduced a new feature allowing users to directly purchase physical books from JD, enhancing its e-commerce capabilities [5] - NetEase Cloud Music reported a net revenue of 3.82 billion yuan in the first half of the year, with a significant increase in long audio content consumption [6] Group 3: Market Developments - China Evergrande announced a hearing on September 16 regarding its liquidation process, with its stock continuing to be suspended [4] - Jingwei Hengrun plans to mass-produce its urban NOA solution based on NVIDIA's Orin-X chip by the end of this year, indicating advancements in autonomous driving technology [10] - Yuxin Technology successfully won a bid for a digital currency project overseas, emphasizing its focus on cross-border payment as part of its international strategy [11]
转移支付连续三年超10万亿,阿里开源全模态大模型 | 财经日日评
吴晓波频道· 2025-03-27 16:49
Group 1: Industrial Enterprises Profit - In January-February, the total profit of industrial enterprises above designated size decreased by 0.3% year-on-year, totaling 910.99 billion yuan, with a profit margin of 4.53%, down by 0.14% [1][2] - Manufacturing sector showed significant improvement with a profit growth of 4.8%, contributing to an overall profit increase of 3.2% for all industrial enterprises [1] - Revenue for industrial enterprises reached 20.09 trillion yuan, a year-on-year increase of 2.8%, while operating costs grew by 2.9% to 17.1 trillion yuan [1] Group 2: Central Government Transfers - The central government’s transfer payments to local governments exceeded 1 trillion yuan for the third consecutive year, totaling 1,034.15 billion yuan in 2025, an increase of 3% from 2024 [3][4] - The highest transfer payment was allocated to Sichuan province at approximately 59.83 billion yuan, followed by Henan at about 52.72 billion yuan [3] - Increasing transfer payments are seen as a short-term solution to alleviate local government financial pressures, but they may also strain central fiscal sustainability in the long run [4] Group 3: Microsoft Data Center Projects - Microsoft has abandoned new data center projects in the US and Europe, originally planned to consume 2 gigawatts of power, due to an oversupply of computing clusters supporting AI operations [5][6] - This decision reflects a strategic adjustment in infrastructure investment, allowing Microsoft to focus on future growth areas [5] - The AI industry is facing challenges with high expectations but low willingness to pay from users, leading to a reduction in capital expenditure growth [6] Group 4: Alibaba's Multimodal Model - Alibaba has launched and open-sourced its first end-to-end multimodal large model, Qwen2.5-Omni-7B, capable of processing various inputs like text, images, audio, and video [7][8] - The model has achieved record performance in multimodal tasks, significantly outperforming similar models from competitors [7] - With only 7 billion parameters, the model has low hardware requirements, making it deployable on smartphones, potentially boosting market confidence in AI functionalities [8] Group 5: Haidilao's Financial Performance - Haidilao reported a revenue of 42.755 billion yuan for the year ending December 31, 2024, a 3.1% increase, with net profit rising by 4.6% to 4.708 billion yuan [13][14] - The company has seen consistent revenue and profit growth over the past two years, although the growth rate has slowed [13] - Haidilao is focusing on closing underperforming restaurants and adapting to market changes, as consumer preferences shift towards value [14] Group 6: Stock Market Performance - On March 27, the stock market experienced a slight rebound, with the Shanghai Composite Index rising by 0.15% and total trading volume reaching 1.19 trillion yuan [15][16] - Chemical stocks saw significant gains due to rising prices, while other sectors like deep-sea technology and nuclear fusion faced declines [15][17] - The market remains in a state of fluctuation, with investors showing interest in buying on dips, indicating underlying support for the market [16][17]