Workflow
多模态推理
icon
Search documents
看图猜位置不输o3!字节发布Seed1.5-VL多模态推理模型,在60个主流基准测试中拿下38项第一
量子位· 2025-05-14 06:07
一水 发自 凹非寺 量子位 | 公众号 QbitAI 在60个主流基准测试中拿下38项第一! 字节发布 轻量级多模态推理模型Seed1.5-VL ,仅用 532M视觉编码器+200亿活跃参数 就能与一众规模更大的顶尖模型掰手腕,还是能带 图深度思考的那种。 相关技术报告也第一时间公开了。 整体而言,虽然是"以小博大",但新模型在复杂谜题推理、OCR、图表理解、3D空间理解等方面表现出色。 比如猜下图中有几只猫,人眼很容易误将地上的黑猫当成影子: 同时也能用来解答复杂推理谜题,考公党有福了(bushi~ 还能用来玩"看图找茬",速度和准确率双双胜于人类: 当然,以上也基于其强大的OCR识别能力。即便是长度惊人、中英混杂的消费小票,也能分分钟转换成表格。 那么它是如何做到的呢? 532M视觉编码器 + 20B混合专家语言模型 通过深扒技术报告,背后关键主要在于 模型架构 和 训练细节 。 据介绍,Seed1.5-VL由以下三个核心组件组成: SeedViT:用于对图像和视频进行编码; MLP适配器:将视觉特征投射为多模态token; 大语言模型:用于处理多模态输入并执行推理。 模型支持多种分辨率的图像输入,并通过 ...
昆仑万维:一季度营收大幅增长46% AI算力芯片取得突破性进展
Core Viewpoint - Kunlun Wanwei (300418.SZ) reported a significant revenue growth of 46% year-on-year in Q1 2025, driven by advancements in AI computing chips and applications [1] Group 1: Financial Performance - The company achieved an operating revenue of 1.76 billion yuan in Q1 2025, marking a 46% increase compared to the previous year [1] - R&D expenses reached 430 million yuan, reflecting a 23% year-on-year growth [1] - The annual recurring revenue (ARR) for AI music reached approximately 12 million USD, with a monthly revenue of about 1 million USD [1] - The ARR for the short drama platform Dramawave was approximately 120 million USD, with a monthly revenue of around 10 million USD [1] - Overseas business revenue amounted to 1.67 billion yuan, showing a 56% increase year-on-year, and accounted for 94% of total revenue [1] Group 2: Technological Advancements - The company launched several disruptive technologies in multi-modal reasoning, video generation, and audio generation, achieving state-of-the-art (SOTA) status in various models [2] - The Skywork R1V multi-modal reasoning model reached open-source SOTA, while the SkyReels-V1 model and SkyReels-A1 algorithm led the global video generation field [2] - In the AI music sector, the Mureka V6 and Mureka O1 models demonstrated a competitive edge, with Mureka O1 surpassing competitors in performance [2] Group 3: AI Chip Development - The company made significant progress in the R&D of AI computing chips, moving towards the goal of "Chinese chips, Kunlun manufacturing" [3] - Kunlun Wanwei acquired a controlling stake in Beijing Aijietek Technology Co., Ltd., completing a full industry chain layout from computing infrastructure to AI applications [3] - The R&D team for AI chips has expanded to nearly 200 employees, covering various fields such as chip design and algorithm development [3] Group 4: Future Prospects - The company plans to launch the Skywork.ai platform in mid-May 2025, which will feature a system of five expert-level AI agents for optimizing various professional tasks [3] - The Opera business segment, including overseas information distribution and metaverse operations, saw a revenue increase of 41% driven by Opera Ads [4] - The company aims to continue advancing AI computing chip development and innovate its AI application matrix to provide leading AI product experiences globally [4]
AI动态跟踪系列(六):OpenAIo3、豆包新品首发,关注原生Agent与多模态推理
Ping An Securities· 2025-04-17 13:10
Investment Rating - The industry investment rating is "Outperform the Market" [1][38]. Core Insights - OpenAI's latest models, o3 and o4-mini, introduce significant advancements in image reasoning and agent capabilities, enhancing the AI programming ecosystem [3][4]. - The competition in the global large model field remains intense, with a strong emphasis on native agent capabilities and multimodal reasoning [34]. - The domestic AI computing power market is expected to see increased acceptance and market share for Chinese AI computing solutions due to ongoing global trade tensions [34]. Summary by Sections OpenAI's New Models - OpenAI released o3 and o4-mini, which are touted as the most intelligent models to date, featuring breakthroughs in image reasoning and agent capabilities [3][4]. - The o3 model has set new state-of-the-art benchmarks in coding, mathematics, and visual perception tasks, outperforming its predecessor o1 by 20% in error rates on complex tasks [5][7]. - The o4-mini model is optimized for fast and cost-effective reasoning, excelling in non-STEM tasks and data science [5]. Doubao 1.5 Model - Doubao 1.5 has reached or is close to the top tier globally in reasoning tasks across mathematics, coding, and science, with enhanced visual understanding capabilities [17][21]. - The Doubao APP, based on the Doubao 1.5 model, can perform "thinking while searching," providing detailed recommendations based on user needs [24][27]. - Doubao's daily token usage has surged to over 12.7 trillion, indicating significant growth and market penetration [18]. Investment Recommendations - The report suggests focusing on AI applications in enterprise services, programming, and office automation, as well as on domestic AI computing power companies [34]. - Recommended stocks in AI applications include companies like Fanwei Network and Kingdee International, while AI computing power recommendations include companies like Haiguang Information and Inspur Information [34].
阶跃星辰发布多模态推理模型Step-R1-V-Mini
news flash· 2025-04-08 12:30
阶跃星辰发布多模态推理模型Step-R1-V-Mini 《科创板日报》8日讯,阶跃星辰正式发布多模态推理模型Step-R1-V-Mini,支持图文输入、文字输 出、有良好的指令遵循和通用能力,能够高精度感知图像并完成复杂推理任务。 ...