多模态

Search documents
细粒度视觉推理链引入数学领域,准确率暴涨32%,港中文MMLab打破多模态数学推理瓶颈
量子位· 2025-06-16 10:30
MINT-CoT团队 投稿 量子位 | 公众号 QbitAI 思维链(Chain of Thought, CoT)推理方法已被证明能够显著提升大语言模型(LLMs)在复杂任务中的表现。而在多模态大语言模型 (MLLMs)中,CoT 同样展现出了巨大潜力。 3. 过度依赖外部功能 像 MVoT 或 Visual SKETCHPAD 等方法,需要借助外部工具或能力来生成或修改图像,训练和推理过程成本高、不通用。 然而,当视觉信息与数学推理结合时,传统的 CoT 方法就显得力不从心了——视觉输入中的数学细节往往被忽略,导致推理结果不准确。 最近,香港中文大学 MMLab 团队正式发布了全新的视觉推理方案——MINT-CoT,专为解决"多模态数学推理"中的难题而设计。 为什么数学视觉推理这么难? 尽管已有一些研究尝试把视觉信息引入 CoT 推理,例如 Visual-CoT、Visual SKETCHPAD、VPT、ICoT 等方法,但在数学场景下依然存 在 三大瓶颈: 1. 粗粒度图像区域选择 大部分方法依赖边界框(Bounding Box)来截取图像区域。但数学图像里的元素(比如坐标轴、几何图形、标注文字等)高度关 ...
工业异常检测新突破,复旦等多模态融合监测入选CVPR 2025
量子位· 2025-06-16 06:59
多模态融合:Real-IAD D³ 的创新之处 Real-IAD D³团队 投稿 量子位 | 公众号 QbitAI 多模态融合检测,工业异常检测领域新突破! 复旦大学、荣旗工业科技、腾讯优图实验室 上海交通大学、上海海洋大学等机构联合发布高精度多模态数据集Real-IAD D³,并基于此数据 集提出了一种创新的多模态融合检测方法。 相关成果已被计算机视觉顶会CVPR 2025收录。 在工业生产中,异常检测是确保产品质量和安全的关键环节。然而,现有的异常检测方法在面对复杂工业环境时,常常因为数据集的局限性而 难以达到理想的检测效果。 为了突破这一瓶颈,研究人员们精心打造了 Real-IAD D³ 数据集,它不仅涵盖了高分辨率的 RGB 图像,还加入了伪 3D 光度立体图像和微 米级精度的 3D 点云数据,为异常检测提供了更丰富的信息。 Real-IAD D³数据集的灵感来源于实际的工业质检场景。在真实的工业生产中,质检人员需要快速、准确地识别出产品表面的各种缺陷,如划 痕、凹陷、裂缝等。这些缺陷不仅种类繁多,而且在不同的光照和材质背景下,其表现形式也各不相同。传统的2D图像检测方法在面对这些 复杂的缺陷时,往往 ...
高考数学斩获139分!小米7B模型比肩Qwen3-235B、OpenAI o3
机器之心· 2025-06-16 05:16
机器之心报道 机器之心编辑部 上上周的 2025 高考已经落下了帷幕!在人工智能领域,各家大模型向数学卷发起了挑战。 在 机器之心的测试 中,七个大模型在「2025 年数学新课标 I 卷」中的成绩是这样的:Gemini 2.5 Pro 考了 145 分,位列第一;Doubao 和 DeepSeek R1 以 144 分紧 随其后,并列第二;o3 和 Qwen3 也仅有一分之差,分别排在第三和第四。受解答题的「拖累」,hunyuan-t1-latest 和文心 X1 Turbo 的总成绩排到了最后两名。 其实,向今年数学卷发起挑战的大模型还有其他家,比如 Xiaomi MiMo-VL,一个只有 7B 参数的小模型 。 该模型同样挑战了 2025 年数学新课标 I 卷,结果显示, 总分 139 分,与 Qwen3-235B 分数相同,并只比 OpenAI o3 低一分 。 并且,相较于同样 7B 参数的多模态大模型 Qwen2.5-VL-7B, MiMo-VL 整整高出了 56 分 。 MiMo-VL-7B 和 Qwen2.5-VL-7B 是通过上传题目截图的形式针对多模态大模型进行评测,其余均是输入文本 lat ...
证券研究报告行业周报:2025年暑期档在即,字节发布豆包大模型1.6-20250615
GOLDEN SUN SECURITIES· 2025-06-15 07:53
Investment Rating - The report maintains an "Increase" rating for the media industry, indicating a positive outlook for the sector [6]. Core Insights - The media sector has shown a 1.38% increase during the week of June 9-13, driven by themes such as new consumption [10][18]. - Key areas of growth for 2025 include AI applications, IP monetization, and mergers and acquisitions, with a focus on multi-modal industry directions and companies with IP advantages [1][18]. - The report highlights the upcoming summer film season in 2025, with over 60 films scheduled for release, including a diverse range of genres [2][20]. - ByteDance's release of the Doubao model 1.6, a leading multi-modal model, marks a significant advancement in AI capabilities within the industry [3][20]. Summary by Sections Market Overview - The media sector's performance is buoyed by new consumption trends, with a notable increase in stock prices for companies like Yuanlong Yatu and Chuanwang Media [10][13]. - The report identifies the top-performing stocks in the media sector, with Yuanlong Yatu leading at a 42.9% increase [13][16]. Sub-sector Insights - **Resource Integration**: Companies such as China Vision Media and Guangxi Broadcasting are highlighted for their potential in resource consolidation [18]. - **AI Focus**: Companies like Rongxin Culture and Aofei Entertainment are noted for their advancements in AI applications [18]. - **Gaming Sector**: Strong recommendations are made for companies with solid performance, including Shenzhou Taiyue and Giant Network [18]. - **State-owned Enterprises**: Companies like Ciweng Media and Anhui New Media are emphasized for their growth potential [18]. - **Education Sector**: Xueda Education is mentioned as a key player in the education sub-sector [18]. Key Events Recap - The report discusses the launch of the "China Film Consumption Year" initiative, aimed at boosting audience engagement during the summer film season [20]. - The performance of the domestic film market is highlighted, with significant box office figures reported for recent releases [22][24]. Data Tracking - The report provides insights into the gaming sector, noting popular upcoming games and their expected impact on the market [21]. - It also tracks viewership data for television series and variety shows, indicating audience preferences and trends [25][26].
【重磅来袭】特斯拉人形机器人秀!杭州大会展中心邀您共赴人形机器人产业巅峰盛会!
机器人大讲堂· 2025-06-15 04:41
Core Viewpoint - The article highlights the debut of Tesla Bot at the 2025 Hangzhou International Humanoid Robot and Robotics Technology Expo, showcasing advancements in humanoid robotics and the participation of over 200 leading companies in the industry [1][3][5]. Group 1: Event Overview - The expo will take place from June 20 to June 22, 2025, at the Hangzhou Grand Convention and Exhibition Center, featuring a combination of forums, exhibitions, and interactive experiences [1]. - The event is organized by the Zhejiang Robot Industry Development Association and aims to present cutting-edge humanoid robot technologies and future living scenarios [1]. Group 2: Key Exhibitors and Technologies - Notable exhibitors include Alibaba Cloud, Hangzhou Six Little Dragons, and various other leading companies, showcasing technologies such as embodied intelligence, multimodal interaction, and brain-computer interfaces [5]. - The expo will cover the entire industry chain, including complete robots, key components, and application scenarios [5]. Group 3: Forums and Networking Opportunities - The event will host several forums, including the Hangzhou Humanoid Robot Conference focusing on industry trends and policy analysis, and a connection conference aimed at fostering business cooperation and technology commercialization [9][10]. - A dedicated forum for investment and technology innovation in the humanoid robotics sector will also take place, providing opportunities to explore new investment avenues [10]. Group 4: Interactive Experiences - The expo will feature interactive activities, including a talent show and educational events aimed at engaging families and promoting technology awareness [11][13]. - Attendees will have the chance to win limited gifts through participation in interactive sessions [11].
华人学者本周发表了4篇Cell论文:强制线粒体自噬;多模态遗传筛选平台;抗衰老间充质祖细胞疗法;补体蛋白攻击的开关
生物世界· 2025-06-15 01:12
Core Insights - This week, four research papers authored by Chinese scholars were published in the prestigious journal Cell, covering topics such as mitochondrial influence on pluripotency, a multimodal genetic screening platform, anti-aging mesenchymal progenitor cell therapy, and a key switch in complement protein attack [1][2][3][4]. Group 1: Mitochondrial Influence on Pluripotency - A study led by Professor Wu Jun from the University of Texas Southwestern Medical Center developed a new technique for enforced mitophagy, revealing the impact of mitochondria on cell pluripotency and demonstrating that reduced mitochondrial numbers delay pre-implantation mouse embryo development [3]. Group 2: Perturb-Multi Genetic Screening Platform - Professor Zhuang Xiaowei from Harvard University introduced Perturb-Multi, a novel platform that combines imaging and sequencing technologies to enable parallel perturbation of hundreds of genes in intact mammalian tissues, facilitating the discovery of genetic bases for complex cellular and tissue physiology [7]. Group 3: Anti-Aging Mesenchymal Progenitor Cell Therapy - Researchers Liu Guanghui, Wang Si, and Qu Jing from the Chinese Academy of Sciences and Capital Medical University developed engineered human anti-aging mesenchymal progenitor cells (SRC) that exhibit resistance to aging, stress, and malignant transformation, significantly delaying multi-organ aging in primate models [11]. Group 4: Key Switch in Complement Protein Attack - A study by Zhicheng Wang from the University of Pennsylvania focused on the complement system, identifying a critical parameter—the surface density of potential complement attachment sites—that triggers a significant increase in complement activation, providing insights for the design of long-lasting drug carriers and biocompatible implants [15][17].
中国AIGC企业投融资风向:早期项目受资本热捧
Sou Hu Cai Jing· 2025-06-14 09:35
Core Insights - The AIGC industry in China is experiencing a significant early-stage investment trend, with total financing reaching billions of RMB in the first months of 2025, marking a 60% year-on-year increase [1] - Angel round financing events account for the highest proportion at 60%, indicating a preference for early-stage investments [3] Group 1: Current Situation - Early-stage projects have become the core area for capital allocation, with 60% of financing events occurring in the angel round, significantly higher than A rounds and strategic investments [3] - Startups established in 2025 account for 60% of the AIGC companies, with notable examples like "月之暗面" and "生数科技" completing significant financing within a year of establishment [4] Group 2: Driving Factors Behind Capital Preferences - Accelerated technological iteration is driving capital to focus on application-layer tools, allowing for quick validation of business models [6] - Policy support and market demand are also pushing the AIGC market, which is expected to exceed trillions by 2025, despite being only billions in 2025 [7] Group 3: Industry Participation - Major industry players like Tencent and Baidu are deeply involved in the ecosystem through strategic investments, with Tencent investing billions in 2025 [9] Group 4: Challenges and Pressures - Investors are increasingly demanding early-stage projects to demonstrate monetization pathways, with examples like "妙鸭相机" showcasing rapid customer acquisition through low-cost services [11] - There are signs of industry bubbles, with global AIGC financing exceeding hundreds of billions, but domestic projects facing challenges due to high levels of homogeneity [12] Group 5: Future Trends - Investment focus is shifting towards the middle layer of the industry, such as AI training tools and data annotation platforms, which are expected to enable scalable applications [15] - Global expansion is accelerating, with leading companies like "月之暗面" initiating overseas user growth plans, attracting capital interest in cross-language models and localization capabilities [15]
“多模态方法无法实现AGI”
AI前线· 2025-06-14 04:06
作者 | Benjamin 译者 | 王强 策划 | 褚杏娟 "将语言投射回思想模型时,我们忽视了支撑我们智能的不言而喻的具身认知。" 首先,虽然奥赛罗的移动可被证明用于推断奥赛罗棋盘的完整状态, 但我们没有理由相信有办法通 过语言描述推断出物理世界的完整画面 。将奥赛罗游戏与物理世界的许多任务区分开来的是, 奥赛 罗本质上位于符号领域,只是使用物理标记来实现,以便于人类玩耍 。一个完整的奥赛罗游戏可以 用纸和笔进行,但人们不能用纸和笔扫地、洗碗或开车。要解决这些任务,你需要超越人类用语言描 述的物理世界概念。这种描述世界的概念是否编码进了正式的世界模型中,或者例如编码进了一个价 值函数,还有待讨论, 但很明显,物理世界中有许多问题不能完全由符号系统表示并用纯粹的符号 操作解决 。 最近生成式人工智能模型的成功让一些人相信人工通用智能(AGI)即将到来。虽然这些模型似乎捕 捉到了人类智能的本质,但它们甚至违背了我们对智能最基本的直觉。它们之所以出现,并非因为它 们是解决智能问题的深思熟虑的解决方案,而是因为它们在我们已有的硬件上有效地扩展了规模。一 些人沉浸在规模扩展的成果中,开始相信这提供了通往 AGI 的 ...
Cell:庄小威团队首次实现在哺乳动物完整组织中进行成像+测序的多模态遗传筛查
生物世界· 2025-06-14 01:47
多细胞生物的生命活动需要成千上万的基因在空间上有序的各种细胞类型中协调运作。要理解组织功能的基础,就需要剖析体内各种细胞和组织表型的遗传控制 机制。然而,一直是个重大挑战,传统方法要么只能测量基因表达情况 (单细胞策略) ,要么只能观察细胞形态 (显微成像) ,始终无法同时捕捉多个维度的 信息。 2025 年 6 月 12 日, 哈佛大学 庄小威 教授团队在国际顶尖学术期刊 Cell 上发表了题为: Perturb-Multimodal: A platform for pooled genetic screens with imaging and sequencing in intact mammalian tissue 的研究论文。 撰文丨王聪 编辑丨王多鱼 排版丨水成文 该研究开发了一种名 为 Perturb-Multi 的新技术,将成像技术与测序技术结合,首次实现在哺乳动物整个组织中对数百个基因并行扰动,同步完成基因表达谱、 亚细胞形态和空间位置的三维解析。 通过成像技术,能够识别单个细胞中的扰动情况,同时测量其基因表达谱和亚细胞形态。利用单细胞测序技术,测量了对相同干扰的完整转录组反应。 研究团队应用 ...
模型上新、降价,火山引擎急推AI应用落地
2 1 Shi Ji Jing Ji Bao Dao· 2025-06-14 00:55
谁来做Agent大规模落地的核心推手? 火山引擎说,我想。 区间定价 "如果说2024年是中国大模型应用的元年,那2025年将很可能是中国Agent落地的元年。" 在火山引擎总裁谭待看来,以PC、移动、AI三个时代来划分,技术主体在发生变化,从PC时代的web,移动时代的APP,到AI时代,则是Agent。 Agent正逐步进入企业的各个业务流程。怎么让Agent星星之火可燎原,打通其规模化落地的堵点,火山引擎的一把火,烧向"区间定价"。 "Agent的Token消耗量是很大的。"会后接受《21CBR》等媒体采访时,谭待谈到,让Agent执行一项任务,可能需要20万tokens。因此,怎么把模型使用成本 降下来,非常关键。 新发布的豆包大模型1.6,首创按"输入长度"区间定价的模式,深度思考、多模态能力与基础语言模型,统一价格。 谭待表示,对同结构、同参数的模型而言,真正影响成本的,是上下文长度,而不是是否开启了思考和多模态功能。目前模型大部分的调用,输入范围都在 32K以内。 基于这一观察,团队意识到,如果在推理调度上,通过分桶调度做好优化,就能够让占大头比例的模型请求,享受到更低成本、更快速度。 在企业使 ...