多模态

Search documents
直击CVPR现场:中国玩家展商面前人从众,腾讯40+篇接收论文亮眼
量子位· 2025-06-17 07:41
白交 发自 凹非寺 量子位 | 公众号 QbitAI CVPR 2025落下帷幕,这次关注度和社交参与感,非常深度了。 比如随手抓住一只何恺明,直接变成追星现场。 在以谷歌/Meta等国际巨头为主导的展区里,中国企业规模创纪录,像腾讯、字节等大展区里面人从众。 总结下来,有这样几个有意思的发现。 展台面前排队体验的技术Demo,妥妥都是技术风向标~ 首先, 多模态、3D生成 是此次论文接收和现场研讨的热门方向,尤其像3D生成是亮点,背后高斯泼溅技术成为此次论文标题出现次数最多 的前五关键词之一。 其次, 对于基础模型的讨论远比以往更加深入,并且延伸到了产业落地 。具身智能、机器人AI在Workshop议程设置中独立出来一个大的板 块。 最后,中国企业今年参与得很深度,不过目前还是聚焦在已经成熟商业化的大公司。 多模态成为接收论文标题中的高频词,3D发展速度快、成果亮眼。 有热心网友整理了2878篇论文标题,得出了以下高频词。 除此之外还有哪些亮点,现在就带大家一网打尽。 探展CVPR 2025 CVPR含金量提升 CVPR,视觉领域妥妥的顶会,甚至与其他两位并称的顶会ICCV和ECCV相比,名气还要高那么一点 ...
MiniMax发布推理模型对标DeepSeek,算力成本仅约53万美元
Di Yi Cai Jing· 2025-06-17 07:26
Core Insights - MiniMax, one of the "Six Little Dragons," has announced significant updates, starting with the release of its first open-source inference model, MiniMax-M1 [1] - MiniMax-M1 has shown competitive performance in benchmark tests, comparable to leading overseas models like DeepSeek-R1 and Qwen3 [3] - The model's training was completed in just three weeks using 512 H800 GPUs, with a total computing cost of only $534,700, which is an order of magnitude lower than initially expected [3][8] Performance Metrics - MiniMax-M1's context window length is 1 million tokens, which is eight times that of DeepSeek R1 and matches Google's Gemini 2.5 Pro, allowing superior performance in long-context understanding tasks [5] - In the TAU-bench evaluation, MiniMax-M1 outperformed DeepSeek-R1-0528 and Google's Gemini 2.5 Pro, ranking just below OpenAI o3 and Claude 4 Opus globally [7] - The model excels in coding capabilities, significantly surpassing most open-source models, with only a slight gap behind the latest DeepSeek R1 [7] Innovations and Cost Efficiency - MiniMax-M1 utilizes a hybrid architecture based on a lightning attention mechanism, enhancing efficiency in long-text input and deep reasoning tasks [7] - The introduction of the CISPO reinforcement learning algorithm has resulted in faster convergence performance compared to Byte's recent DAPO algorithm, contributing to the low training cost [8] - MiniMax's pricing strategy is tiered based on input length, with costs ranging from $0.8 to $2.4 per million tokens for input and $8 to $24 for output, offering competitive pricing against DeepSeek [8] Competitive Landscape - Concurrently, another competitor, Moonlight, has released its programming model Kimi-Dev-72B, which reportedly achieved the highest open-source model level in SWE-bench tests, surpassing the new DeepSeek-R1 [8] - However, Kimi-Dev-72B faced scrutiny for potential overfitting, as it generated less code than required for certain tasks, raising questions about its performance reliability [9] - The AI industry is witnessing renewed competition among the "Six Little Dragons," with MiniMax expected to release further updates in the coming days, potentially impacting the multi-modal AI landscape [9]
细粒度视觉推理链引入数学领域,准确率暴涨32%,港中文MMLab打破多模态数学推理瓶颈
量子位· 2025-06-16 10:30
MINT-CoT团队 投稿 量子位 | 公众号 QbitAI 思维链(Chain of Thought, CoT)推理方法已被证明能够显著提升大语言模型(LLMs)在复杂任务中的表现。而在多模态大语言模型 (MLLMs)中,CoT 同样展现出了巨大潜力。 3. 过度依赖外部功能 像 MVoT 或 Visual SKETCHPAD 等方法,需要借助外部工具或能力来生成或修改图像,训练和推理过程成本高、不通用。 然而,当视觉信息与数学推理结合时,传统的 CoT 方法就显得力不从心了——视觉输入中的数学细节往往被忽略,导致推理结果不准确。 最近,香港中文大学 MMLab 团队正式发布了全新的视觉推理方案——MINT-CoT,专为解决"多模态数学推理"中的难题而设计。 为什么数学视觉推理这么难? 尽管已有一些研究尝试把视觉信息引入 CoT 推理,例如 Visual-CoT、Visual SKETCHPAD、VPT、ICoT 等方法,但在数学场景下依然存 在 三大瓶颈: 1. 粗粒度图像区域选择 大部分方法依赖边界框(Bounding Box)来截取图像区域。但数学图像里的元素(比如坐标轴、几何图形、标注文字等)高度关 ...
工业异常检测新突破,复旦等多模态融合监测入选CVPR 2025
量子位· 2025-06-16 06:59
多模态融合:Real-IAD D³ 的创新之处 Real-IAD D³团队 投稿 量子位 | 公众号 QbitAI 多模态融合检测,工业异常检测领域新突破! 复旦大学、荣旗工业科技、腾讯优图实验室 上海交通大学、上海海洋大学等机构联合发布高精度多模态数据集Real-IAD D³,并基于此数据 集提出了一种创新的多模态融合检测方法。 相关成果已被计算机视觉顶会CVPR 2025收录。 在工业生产中,异常检测是确保产品质量和安全的关键环节。然而,现有的异常检测方法在面对复杂工业环境时,常常因为数据集的局限性而 难以达到理想的检测效果。 为了突破这一瓶颈,研究人员们精心打造了 Real-IAD D³ 数据集,它不仅涵盖了高分辨率的 RGB 图像,还加入了伪 3D 光度立体图像和微 米级精度的 3D 点云数据,为异常检测提供了更丰富的信息。 Real-IAD D³数据集的灵感来源于实际的工业质检场景。在真实的工业生产中,质检人员需要快速、准确地识别出产品表面的各种缺陷,如划 痕、凹陷、裂缝等。这些缺陷不仅种类繁多,而且在不同的光照和材质背景下,其表现形式也各不相同。传统的2D图像检测方法在面对这些 复杂的缺陷时,往往 ...
高考数学斩获139分!小米7B模型比肩Qwen3-235B、OpenAI o3
机器之心· 2025-06-16 05:16
机器之心报道 机器之心编辑部 上上周的 2025 高考已经落下了帷幕!在人工智能领域,各家大模型向数学卷发起了挑战。 在 机器之心的测试 中,七个大模型在「2025 年数学新课标 I 卷」中的成绩是这样的:Gemini 2.5 Pro 考了 145 分,位列第一;Doubao 和 DeepSeek R1 以 144 分紧 随其后,并列第二;o3 和 Qwen3 也仅有一分之差,分别排在第三和第四。受解答题的「拖累」,hunyuan-t1-latest 和文心 X1 Turbo 的总成绩排到了最后两名。 其实,向今年数学卷发起挑战的大模型还有其他家,比如 Xiaomi MiMo-VL,一个只有 7B 参数的小模型 。 该模型同样挑战了 2025 年数学新课标 I 卷,结果显示, 总分 139 分,与 Qwen3-235B 分数相同,并只比 OpenAI o3 低一分 。 并且,相较于同样 7B 参数的多模态大模型 Qwen2.5-VL-7B, MiMo-VL 整整高出了 56 分 。 MiMo-VL-7B 和 Qwen2.5-VL-7B 是通过上传题目截图的形式针对多模态大模型进行评测,其余均是输入文本 lat ...
证券研究报告行业周报:2025年暑期档在即,字节发布豆包大模型1.6-20250615
GOLDEN SUN SECURITIES· 2025-06-15 07:53
Investment Rating - The report maintains an "Increase" rating for the media industry, indicating a positive outlook for the sector [6]. Core Insights - The media sector has shown a 1.38% increase during the week of June 9-13, driven by themes such as new consumption [10][18]. - Key areas of growth for 2025 include AI applications, IP monetization, and mergers and acquisitions, with a focus on multi-modal industry directions and companies with IP advantages [1][18]. - The report highlights the upcoming summer film season in 2025, with over 60 films scheduled for release, including a diverse range of genres [2][20]. - ByteDance's release of the Doubao model 1.6, a leading multi-modal model, marks a significant advancement in AI capabilities within the industry [3][20]. Summary by Sections Market Overview - The media sector's performance is buoyed by new consumption trends, with a notable increase in stock prices for companies like Yuanlong Yatu and Chuanwang Media [10][13]. - The report identifies the top-performing stocks in the media sector, with Yuanlong Yatu leading at a 42.9% increase [13][16]. Sub-sector Insights - **Resource Integration**: Companies such as China Vision Media and Guangxi Broadcasting are highlighted for their potential in resource consolidation [18]. - **AI Focus**: Companies like Rongxin Culture and Aofei Entertainment are noted for their advancements in AI applications [18]. - **Gaming Sector**: Strong recommendations are made for companies with solid performance, including Shenzhou Taiyue and Giant Network [18]. - **State-owned Enterprises**: Companies like Ciweng Media and Anhui New Media are emphasized for their growth potential [18]. - **Education Sector**: Xueda Education is mentioned as a key player in the education sub-sector [18]. Key Events Recap - The report discusses the launch of the "China Film Consumption Year" initiative, aimed at boosting audience engagement during the summer film season [20]. - The performance of the domestic film market is highlighted, with significant box office figures reported for recent releases [22][24]. Data Tracking - The report provides insights into the gaming sector, noting popular upcoming games and their expected impact on the market [21]. - It also tracks viewership data for television series and variety shows, indicating audience preferences and trends [25][26].
【重磅来袭】特斯拉人形机器人秀!杭州大会展中心邀您共赴人形机器人产业巅峰盛会!
机器人大讲堂· 2025-06-15 04:41
Core Viewpoint - The article highlights the debut of Tesla Bot at the 2025 Hangzhou International Humanoid Robot and Robotics Technology Expo, showcasing advancements in humanoid robotics and the participation of over 200 leading companies in the industry [1][3][5]. Group 1: Event Overview - The expo will take place from June 20 to June 22, 2025, at the Hangzhou Grand Convention and Exhibition Center, featuring a combination of forums, exhibitions, and interactive experiences [1]. - The event is organized by the Zhejiang Robot Industry Development Association and aims to present cutting-edge humanoid robot technologies and future living scenarios [1]. Group 2: Key Exhibitors and Technologies - Notable exhibitors include Alibaba Cloud, Hangzhou Six Little Dragons, and various other leading companies, showcasing technologies such as embodied intelligence, multimodal interaction, and brain-computer interfaces [5]. - The expo will cover the entire industry chain, including complete robots, key components, and application scenarios [5]. Group 3: Forums and Networking Opportunities - The event will host several forums, including the Hangzhou Humanoid Robot Conference focusing on industry trends and policy analysis, and a connection conference aimed at fostering business cooperation and technology commercialization [9][10]. - A dedicated forum for investment and technology innovation in the humanoid robotics sector will also take place, providing opportunities to explore new investment avenues [10]. Group 4: Interactive Experiences - The expo will feature interactive activities, including a talent show and educational events aimed at engaging families and promoting technology awareness [11][13]. - Attendees will have the chance to win limited gifts through participation in interactive sessions [11].
华人学者本周发表了4篇Cell论文:强制线粒体自噬;多模态遗传筛选平台;抗衰老间充质祖细胞疗法;补体蛋白攻击的开关
生物世界· 2025-06-15 01:12
Core Insights - This week, four research papers authored by Chinese scholars were published in the prestigious journal Cell, covering topics such as mitochondrial influence on pluripotency, a multimodal genetic screening platform, anti-aging mesenchymal progenitor cell therapy, and a key switch in complement protein attack [1][2][3][4]. Group 1: Mitochondrial Influence on Pluripotency - A study led by Professor Wu Jun from the University of Texas Southwestern Medical Center developed a new technique for enforced mitophagy, revealing the impact of mitochondria on cell pluripotency and demonstrating that reduced mitochondrial numbers delay pre-implantation mouse embryo development [3]. Group 2: Perturb-Multi Genetic Screening Platform - Professor Zhuang Xiaowei from Harvard University introduced Perturb-Multi, a novel platform that combines imaging and sequencing technologies to enable parallel perturbation of hundreds of genes in intact mammalian tissues, facilitating the discovery of genetic bases for complex cellular and tissue physiology [7]. Group 3: Anti-Aging Mesenchymal Progenitor Cell Therapy - Researchers Liu Guanghui, Wang Si, and Qu Jing from the Chinese Academy of Sciences and Capital Medical University developed engineered human anti-aging mesenchymal progenitor cells (SRC) that exhibit resistance to aging, stress, and malignant transformation, significantly delaying multi-organ aging in primate models [11]. Group 4: Key Switch in Complement Protein Attack - A study by Zhicheng Wang from the University of Pennsylvania focused on the complement system, identifying a critical parameter—the surface density of potential complement attachment sites—that triggers a significant increase in complement activation, providing insights for the design of long-lasting drug carriers and biocompatible implants [15][17].
中国AIGC企业投融资风向:早期项目受资本热捧
Sou Hu Cai Jing· 2025-06-14 09:35
Core Insights - The AIGC industry in China is experiencing a significant early-stage investment trend, with total financing reaching billions of RMB in the first months of 2025, marking a 60% year-on-year increase [1] - Angel round financing events account for the highest proportion at 60%, indicating a preference for early-stage investments [3] Group 1: Current Situation - Early-stage projects have become the core area for capital allocation, with 60% of financing events occurring in the angel round, significantly higher than A rounds and strategic investments [3] - Startups established in 2025 account for 60% of the AIGC companies, with notable examples like "月之暗面" and "生数科技" completing significant financing within a year of establishment [4] Group 2: Driving Factors Behind Capital Preferences - Accelerated technological iteration is driving capital to focus on application-layer tools, allowing for quick validation of business models [6] - Policy support and market demand are also pushing the AIGC market, which is expected to exceed trillions by 2025, despite being only billions in 2025 [7] Group 3: Industry Participation - Major industry players like Tencent and Baidu are deeply involved in the ecosystem through strategic investments, with Tencent investing billions in 2025 [9] Group 4: Challenges and Pressures - Investors are increasingly demanding early-stage projects to demonstrate monetization pathways, with examples like "妙鸭相机" showcasing rapid customer acquisition through low-cost services [11] - There are signs of industry bubbles, with global AIGC financing exceeding hundreds of billions, but domestic projects facing challenges due to high levels of homogeneity [12] Group 5: Future Trends - Investment focus is shifting towards the middle layer of the industry, such as AI training tools and data annotation platforms, which are expected to enable scalable applications [15] - Global expansion is accelerating, with leading companies like "月之暗面" initiating overseas user growth plans, attracting capital interest in cross-language models and localization capabilities [15]
“多模态方法无法实现AGI”
AI前线· 2025-06-14 04:06
作者 | Benjamin 译者 | 王强 策划 | 褚杏娟 "将语言投射回思想模型时,我们忽视了支撑我们智能的不言而喻的具身认知。" 首先,虽然奥赛罗的移动可被证明用于推断奥赛罗棋盘的完整状态, 但我们没有理由相信有办法通 过语言描述推断出物理世界的完整画面 。将奥赛罗游戏与物理世界的许多任务区分开来的是, 奥赛 罗本质上位于符号领域,只是使用物理标记来实现,以便于人类玩耍 。一个完整的奥赛罗游戏可以 用纸和笔进行,但人们不能用纸和笔扫地、洗碗或开车。要解决这些任务,你需要超越人类用语言描 述的物理世界概念。这种描述世界的概念是否编码进了正式的世界模型中,或者例如编码进了一个价 值函数,还有待讨论, 但很明显,物理世界中有许多问题不能完全由符号系统表示并用纯粹的符号 操作解决 。 最近生成式人工智能模型的成功让一些人相信人工通用智能(AGI)即将到来。虽然这些模型似乎捕 捉到了人类智能的本质,但它们甚至违背了我们对智能最基本的直觉。它们之所以出现,并非因为它 们是解决智能问题的深思熟虑的解决方案,而是因为它们在我们已有的硬件上有效地扩展了规模。一 些人沉浸在规模扩展的成果中,开始相信这提供了通往 AGI 的 ...