Workflow
视频推理
icon
Search documents
计算机行业周报:小红书Video-Thinker打破工具依赖,DeepSeek推出mHC-20260106
Huaxin Securities· 2026-01-06 12:34
Investment Rating - The report maintains a "Buy" rating for several companies in the AI and computing sectors, including Weike Technology (301196.SZ), Nengke Technology (603859.SH), Hehe Information (688615.SH), and Maixinlin (688685.SH) [9]. Core Insights - The report highlights the introduction of the Video-Thinker model by Xiaohongshu, which breaks the dependency on external tools for video reasoning, achieving state-of-the-art (SOTA) performance with a 7B parameter version [3][22]. - DeepSeek's new architecture, mHC, shows significant performance improvements with only a 6.7% increase in training time, marking a breakthrough in model efficiency [31][32]. - Kimi, a Chinese AI startup, completed a $500 million Series C funding round, with a post-money valuation of $4.3 billion, focusing on the development of its K3 model and talent incentives for 2026 [4][44]. Summary by Sections 1. Computing Dynamics - The report notes stable pricing in computing power leasing, with specific rates for various configurations [21]. - Xiaohongshu's Video-Thinker model integrates key capabilities such as temporal grounding and visual description, achieving new benchmarks in video reasoning [22][23]. - The model's training paradigm includes a two-stage process that enhances its reasoning capabilities while reducing reliance on external tools [26][27]. 2. AI Application Dynamics - Character.AI experienced an 8.32% increase in weekly traffic, indicating growing interest in AI applications [30]. - DeepSeek's mHC architecture addresses traditional bottlenecks in model efficiency, providing a robust framework for enhancing model capabilities [31][32]. 3. AI Financing Trends - Kimi's recent funding round will support the development of its K3 model and expansion of its talent pool, following significant technological advancements in 2025 [4][44]. - Meta's acquisition of Manus for $4-5 billion underscores the strategic importance of AI applications and the integration of advanced AI capabilities into its ecosystem [5][6]. 4. Market Performance - The report provides comparative performance metrics for various AI models, showcasing the advancements made by Video-Thinker over existing solutions [28][29]. - The overall market sentiment remains positive, with a focus on the long-term growth potential of AI applications and computing technologies [7].
让模型自己找关键帧、视觉线索,小红书Video-Thinker破解视频推理困局
机器之心· 2026-01-02 03:12
随着多模态大语言模型(MLLM)的飞速发展,"Thinking with Images" 范式已在图像理解和推理任务上取得了革命性突破 —— 模型不再是被动接收视觉信息,而 是学会了主动定位与思考。 然而,当面对包含复杂时序依赖与动态叙事的视频推理任务时,这一能力尚未得到有效延伸。现有的视频推理方法往往受限于对外部工具的依赖或预设的提示词 策略,难以让模型内生出对时间序列的自主导航与深度理解能力,导致模型在处理长视频或复杂逻辑时显得捉襟见肘。 为攻克这一难题,来自小红书的研究团队提出了 Video-Thinker:一种全新的 "Thinking with Videos" 范式,旨在通过强化学习激发 MLLM 在视频推理中的内生智 能。 与传统方法不同, Video-Thinker 不依赖构建和调用外部工具,而是将 "时序定位(Grounding)" 与 "视觉描述(Captioning)" 这两种核心能力内化在模型的思 维链(CoT)中,使其能在推理过程中自主寻找关键帧并提取视觉线索。 团队精心构建了包含 10K 高质量样本的 Video-Thinker-10K 数据集,并采用 "监督微调 + 强化学习" 的 ...
知名科技基金经理最新操作!
券商中国· 2025-10-28 23:33
Core Viewpoint - The article discusses the significant performance of overseas computing power sectors, represented by optical modules and PCBs, which have provided substantial returns for heavily invested funds, but have also led to increased divergence after substantial price increases [1] Summary by Sections Fund Performance - On October 28, the third-quarter report of well-known fund manager Jin Zicai from Caitong Fund was released, showing that the net value growth rate of the Caitong Growth Preferred A class share reached 90.4% in Q3, outperforming the benchmark by over 80 percentage points [2][3] Portfolio Adjustments - Jin Zicai made significant adjustments to his holdings, drastically reducing positions in leading optical module companies like NewEase and Tianfu Communication, while increasing investments in core PCB industry players such as Shenzhen South Circuit, Shengyi Technology, and Huitian Technology [2][3] - After the adjustments, the top five holdings of the fund included Industrial Fulian, Shenzhen South Circuit, Shengyi Technology, Huitian Technology, and Zhongji Xuchuang [3] Market Insights - Jin Zicai noted that the market's understanding of the optical communication sector has improved, leading to a reduction in the fund's holdings in this area. He believes that the PCB industry may experience unexpected price increases due to structural supply-demand imbalances by 2026 [3] - Despite reducing exposure to optical modules, Jin Zicai continues to heavily overweight the overseas computing power sector, indicating that the growth certainty of overseas AI has increased, and demand for computing power is expected to grow rapidly in 2026 and 2027 [4][5] Investment Strategy - The fund's management scale increased from 4.618 billion to 6.525 billion yuan, with a focus on maintaining research and tracking of other sectors, aiming for proactive and replicable investments in high-quality companies aligned with industry trends [5]
6大基准全面碾压!TW-GRPO刷新视频推理天花板,CLEVRER准确率突破50.4%!
机器人大讲堂· 2025-07-06 05:23
Core Viewpoint - The rapid development of multi-modal large language models (MLLMs) is significantly enhancing video reasoning capabilities, driven by reinforcement learning (RL) as a key engine for this technological revolution [1] Group 1: TW-GRPO Framework Introduction - The TW-GRPO framework is proposed to address challenges in reasoning quality and reward granularity in video reasoning tasks, inspired by the traditional GRPO framework [2] - TW-GRPO integrates focused thinking and multi-level soft reward mechanisms for multi-choice QA tasks [3] Group 2: Key Improvements in TW-GRPO - The framework enhances information weighting and reward mechanism design, applying a soft reward mechanism from video localization to video reasoning tasks [4] - A dynamic weighting mechanism prioritizes high information density tokens, improving reasoning accuracy and efficiency by focusing on key content [4] - The multi-level reward mechanism redefines rewards, allowing for partial correctness in answers, thus improving training stability and efficiency [5] Group 3: Data Augmentation and Training Efficiency - TW-GRPO introduces a question-answer inversion (QAI) data augmentation technique to convert single-choice tasks into multi-choice formats, effectively expanding the training data pool [6] - This approach disrupts traditional equal treatment of tokens, enhancing training efficiency and reasoning performance through differentiated information processing [6] Group 4: Experimental Validation - Extensive experiments demonstrate TW-GRPO's effectiveness in video reasoning and general understanding tasks, outperforming Video-R1 by 18.8%, 1.8%, and 1.6% in various benchmarks [12][15] - The framework shows faster convergence and more stable learning processes compared to traditional GRPO, with shorter output sequences indicating more efficient reasoning [11][17] Group 5: Qualitative Analysis of Reasoning Paths - A qualitative comparison of reasoning paths between T-GRPO and TW-GRPO illustrates significant improvements in accuracy and efficiency in dynamic visual cue reasoning tasks [22]
视频推理界的“福尔摩斯测试”:所有大模型,统统不及格 | 论文代码开源
量子位· 2025-05-29 07:19
金磊 整理自 凹非寺 量子位 | 公众号 QbitAI 一个新的Benchmark,竟让大模型在 复杂 视频推理 这事儿上 统统不及格! 这就是腾讯ARC Lab和香港城市大学最新推出的 Video-Holmes —— 如其名,它可以说是视频推理界的 "福尔摩斯测试" , 通过让多模态大模型参与 " 推理杀人凶手 " , " 解析作案意图" 等高难度的推理任 务,以展现他们复杂视频推理能力的边界 。 而且Video-Holmes可以说是规避了现在业内已有的Benchmark痛点,即视频源和问题都偏简单,没法反映推理模型和非推理模型之间的差 距。 值得一提的是,这个Benchmark的 "一键测评懒人包" ,目前已经上线到了GitHub和HuggingFace,有做视频推理相关的小伙伴,可以去挑 战一下了(地址见文末)。 让大模型全军覆没的新Benchmark 正如刚才提到的,现有视频推理基准(如 VCR-Bench、MVBench 等)主要评估模型的视觉感知和接地能力。 举个例子 。 在这个例子中,为了寻找男人真正的死因,模型需要 主动思考 需要关注的视觉信息,并通过 逻辑关联 分散在不同视频片段中的多个相关 ...