视频推理 - filings, earnings calls, financial reports, news

视频推理

Search documents

券商中国· 2025-10-28 23:33

Core Viewpoint - The article discusses the significant performance of overseas computing power sectors, represented by optical modules and PCBs, which have provided substantial returns for heavily invested funds, but have also led to increased divergence after substantial price increases [1] Summary by Sections Fund Performance - On October 28, the third-quarter report of well-known fund manager Jin Zicai from Caitong Fund was released, showing that the net value growth rate of the Caitong Growth Preferred A class share reached 90.4% in Q3, outperforming the benchmark by over 80 percentage points [2][3] Portfolio Adjustments - Jin Zicai made significant adjustments to his holdings, drastically reducing positions in leading optical module companies like NewEase and Tianfu Communication, while increasing investments in core PCB industry players such as Shenzhen South Circuit, Shengyi Technology, and Huitian Technology [2][3] - After the adjustments, the top five holdings of the fund included Industrial Fulian, Shenzhen South Circuit, Shengyi Technology, Huitian Technology, and Zhongji Xuchuang [3] Market Insights - Jin Zicai noted that the market's understanding of the optical communication sector has improved, leading to a reduction in the fund's holdings in this area. He believes that the PCB industry may experience unexpected price increases due to structural supply-demand imbalances by 2026 [3] - Despite reducing exposure to optical modules, Jin Zicai continues to heavily overweight the overseas computing power sector, indicating that the growth certainty of overseas AI has increased, and demand for computing power is expected to grow rapidly in 2026 and 2027 [4][5] Investment Strategy - The fund's management scale increased from 4.618 billion to 6.525 billion yuan, with a focus on maintaining research and tracking of other sectors, aiming for proactive and replicable investments in high-quality companies aligned with industry trends [5]

6大基准全面碾压！TW-GRPO刷新视频推理天花板，CLEVRER准确率突破50.4%！

机器人大讲堂· 2025-07-06 05:23

Core Viewpoint - The rapid development of multi-modal large language models (MLLMs) is significantly enhancing video reasoning capabilities, driven by reinforcement learning (RL) as a key engine for this technological revolution [1] Group 1: TW-GRPO Framework Introduction - The TW-GRPO framework is proposed to address challenges in reasoning quality and reward granularity in video reasoning tasks, inspired by the traditional GRPO framework [2] - TW-GRPO integrates focused thinking and multi-level soft reward mechanisms for multi-choice QA tasks [3] Group 2: Key Improvements in TW-GRPO - The framework enhances information weighting and reward mechanism design, applying a soft reward mechanism from video localization to video reasoning tasks [4] - A dynamic weighting mechanism prioritizes high information density tokens, improving reasoning accuracy and efficiency by focusing on key content [4] - The multi-level reward mechanism redefines rewards, allowing for partial correctness in answers, thus improving training stability and efficiency [5] Group 3: Data Augmentation and Training Efficiency - TW-GRPO introduces a question-answer inversion (QAI) data augmentation technique to convert single-choice tasks into multi-choice formats, effectively expanding the training data pool [6] - This approach disrupts traditional equal treatment of tokens, enhancing training efficiency and reasoning performance through differentiated information processing [6] Group 4: Experimental Validation - Extensive experiments demonstrate TW-GRPO's effectiveness in video reasoning and general understanding tasks, outperforming Video-R1 by 18.8%, 1.8%, and 1.6% in various benchmarks [12][15] - The framework shows faster convergence and more stable learning processes compared to traditional GRPO, with shorter output sequences indicating more efficient reasoning [11][17] Group 5: Qualitative Analysis of Reasoning Paths - A qualitative comparison of reasoning paths between T-GRPO and TW-GRPO illustrates significant improvements in accuracy and efficiency in dynamic visual cue reasoning tasks [22]

视频推理界的“福尔摩斯测试”：所有大模型，统统不及格 | 论文代码开源

量子位· 2025-05-29 07:19

金磊整理自凹非寺量子位 | 公众号 QbitAI 一个新的Benchmark，竟让大模型在复杂视频推理这事儿上统统不及格！这就是腾讯ARC Lab和香港城市大学最新推出的 Video-Holmes —— 如其名，它可以说是视频推理界的 "福尔摩斯测试" ，通过让多模态大模型参与 " 推理杀人凶手 " , " 解析作案意图" 等高难度的推理任务，以展现他们复杂视频推理能力的边界。而且Video-Holmes可以说是规避了现在业内已有的Benchmark痛点，即视频源和问题都偏简单，没法反映推理模型和非推理模型之间的差距。值得一提的是，这个Benchmark的 "一键测评懒人包" ，目前已经上线到了GitHub和HuggingFace，有做视频推理相关的小伙伴，可以去挑战一下了（地址见文末）。让大模型全军覆没的新Benchmark 正如刚才提到的，现有视频推理基准（如 VCR-Bench、MVBench 等）主要评估模型的视觉感知和接地能力。举个例子。在这个例子中，为了寻找男人真正的死因，模型需要主动思考需要关注的视觉信息，并通过逻辑关联分散在不同视频片段中的多个相关 ...