思维链推理

Search documents
ICCV 2025|UV-CoT:无监督视觉推理新突破,偏好优化重塑图像级思维链
机器之心· 2025-07-28 04:24
Core Viewpoint - The article discusses the introduction of a novel unsupervised visual reasoning framework called UV-CoT, which enhances model reasoning capabilities and interpretability in visual understanding tasks by leveraging a chain-of-thought (CoT) approach [2][3][25]. Group 1: Background and Challenges - Existing models rely on supervised fine-tuning (SFT) strategies that require extensive labeled data, leading to high annotation costs and limited scalability [6][7]. - SFT methods face challenges such as high labor costs for annotating key image regions and reasoning paths, and limited generalization capabilities due to reliance on a single type of training signal [7]. Group 2: UV-CoT Framework - UV-CoT is designed to mimic human visual understanding by focusing on "key regions → reasoning process," employing an unsupervised data generation and preference optimization mechanism [4][3]. - The framework utilizes an automated preference data generation and evaluation process, guided by an improved preference optimization algorithm called Score-DPO (sDPO), to achieve unsupervised image-level chain-of-thought learning [8][11]. Group 3: Methodology - UV-CoT generates diverse intermediate reasoning responses for image-question pairs using a target model and an evaluation model, which assesses the selected regions' scores and their impact on subsequent answers [13]. - The preference data set is constructed by randomly selecting preference pairs from the generated responses, retaining the highest-scoring response chains for further reasoning [14]. Group 4: Performance and Results - UV-CoT demonstrates significant performance improvements over existing supervised chain-of-thought models, outperforming models like Visual-CoT-7B and LLaVA-1.5-7B across six benchmarks [20][22]. - The self-evaluation capability of UV-CoT leads to high-quality bounding box generation, surpassing LLaVA-1.5-7B by 4.8% and closely approaching the performance of the 12B model OmniLMM-12B [23]. Group 5: Conclusion - UV-CoT presents an innovative approach to unsupervised visual reasoning, eliminating the dependency on manual annotations and enabling automatic identification and reasoning optimization of key image regions, thus laying a solid foundation for future research in unsupervised visual understanding [25].
3D芯片堆叠,新方法
半导体行业观察· 2025-07-01 01:03
Core Viewpoint - The next significant leap in semiconductor packaging will require a series of new technologies, processes, and materials that will collectively achieve an order-of-magnitude performance improvement, which is crucial for the AI era [1]. Group 1: Advances in Cooling Technologies - Liquid cooling technology at the chip level is emerging as forced air cooling reaches its limits, with up to 40% of power used for current delivery and heat dissipation [4]. - TSMC's silicon integrated micro-cooler (IMEC-Si) is being tested for reliability, designed to handle over 3,000 watts of uniform power dissipation under specific conditions [6]. - The demand for direct liquid cooling is increasing, with innovative concepts like using chips as coolants being proposed [7]. Group 2: Hybrid Bonding and Interconnects - Hybrid bonding with fine-pitch multilayer redistribution layers (RDL) is gaining attention as a cost-effective solution for high-speed interconnects [14]. - Intel's hybrid bonding can achieve spacing as small as 1µm, which is critical for advanced applications [5][17]. - The transition from traditional dielectric materials to polymer/copper hybrid bonding is being explored to enhance performance [16]. Group 3: Backside Power Delivery - Backside power delivery significantly reduces voltage drop related to transistor power supply, but it also exacerbates heat issues [19]. - IBM has developed an anisotropic model for precise heat transfer calculations in backend stacks, emphasizing the importance of thermal considerations in design [21]. - The implementation of backside power delivery is expected to lead to a 10% to 30% reduction in thermal losses [23]. Group 4: Co-Packaged Optical Devices - The demand for faster data networks is driving the integration of optical engines with GPUs and HBM in a single package, significantly increasing data transmission speeds [26]. - Co-packaged optical devices (CPO) are expected to achieve a 32-fold increase in bandwidth by bringing optical engines closer to processors [26]. - However, challenges remain regarding thermal management and warpage sensitivity in CPO implementations [28].
8个数据集全面胜出!思维链推理刷新图学习表现上限
量子位· 2025-06-08 03:40
GCoT团队 投稿 量子位 | 公众号 QbitAI 图神经网络还能更聪明?思维链提示学习来了! 由于图数据拥有复杂的非线性结构和缺少文本信息,语言模型中的思维链(Chain-of-Thought,CoT)提示 学习方法难以简单直接地应用于图数据。 基于此,来自新加坡管理大学和中国科学技术大学的研究者们提出了 GCo T ——首个应用于无文本图数据 的类思维链提示学习框架。 实验结果表明,GCoT在八个图数据集上的少样本节点分类与图分类任务全面超越现有SOTA方法,尤其在 1-5样本的极少样本设置下表现最为显著。 GCoT方法解析 GCoT的核心思想是将下游的推断过程拆分为多个推断步骤。具体包含: 研究 者们在八个公开 数据集上进行了全面实验以评估和分析GCoT。 整体框架 研究者们将思维链提示学习分为三个部分: 2. 思维构建 为有效利用多层结构信息,研究人员将每一层的嵌入表示做加权求和得到融合后的"思维" 。 3. 基于思维的提示学习Thought conditioned prompt learning 研究人员设计的"思维" 捕获了图中节点的结构知识并用于指导下一步推断。由于每个节点可能具有不同 的特质 ...
海天瑞声20250605
2025-06-06 02:37
Summary of Haitai Ruisheng Conference Call Company Overview - **Company**: Haitai Ruisheng - **Industry**: AI and Data Processing Key Financial Performance - In 2024, Haitai Ruisheng achieved a net profit of 11.34 million yuan, turning around from a loss, with operating cash flow of 28.73 million yuan, driven by increased multimodal data orders and improved gross margins on high-margin products and customized services [2][3][4] - Total revenue for 2024 reached 237 million yuan, a year-on-year increase of 39.45%, with a gross margin of 66.46%, up by 10.45 percentage points [3][4] - The company reported a significant improvement in net profit, up by 41.72 million yuan compared to the previous year [3] Strategic Initiatives - The company is actively expanding its overseas market presence, particularly in the smart driving sector, aligning with automotive companies' international expansion trends [2][5] - Haitai Ruisheng is focusing on R&D investments in smart driving data processing platforms and intelligent data operation platforms, achieving significant advancements in algorithm reserves and inference frameworks [2][6] Technological Innovations - The company has established a technology-led strategy, emphasizing R&D to overcome technical bottlenecks and enhance the production of training data [2][7] - Innovations in smart driving annotation include multi-frame point cloud overlay and object tracking algorithms, which improve annotation efficiency and transition towards 4D annotation [2][8] - The company has developed a self-research SLAM algorithm to optimize parking scene 4D point cloud annotation, addressing complex 3D environments [8][9] Voice Recognition and Natural Language Processing - In collaboration with Tsinghua University, Haitai Ruisheng launched the Dolphin training project to improve ASR accuracy for Eastern languages, processing 212,000 hours of high-quality data covering 40 Eastern languages and 22 Chinese dialects [3][10] - The company has introduced over 150 new training data products, with a total of 1,716 proprietary products, and expanded its offerings to include 11 new languages in the smart voice sector [10] Future Plans - For 2025, the company aims to continue driving growth through technology and product innovation, focusing on building an intelligent data management platform and developing automated data processing algorithms [12] - The company plans to expand its multimodal data product matrix and explore new areas such as embodied intelligence and vertical industry applications [12] Market Positioning - Haitai Ruisheng is positioning itself to support national digital economy strategies by collaborating with local governments and educational institutions to enhance data governance and talent development [13] - The company is also expanding its resource network in finance, healthcare, and manufacturing sectors to improve data service capabilities [12][13] Q1 2025 Financial Performance - In Q1 2025, the company reported revenue of 69.81 million yuan, a 72% year-on-year increase, with a gross margin of 47.41% and a net profit of 370,000 yuan, marking a 101 million yuan increase compared to the previous year [14]
Sebastian Raschka 新书《从头开始推理》抢先看,揭秘推理模型基础
机器之心· 2025-05-02 04:39
Core Viewpoint - The article discusses the advancements in reasoning capabilities of large language models (LLMs) and introduces the book "Reasoning From Scratch" by Sebastian Raschka, which aims to provide practical insights into building reasoning models from the ground up [2][5][59]. Group 1: Definition and Importance of Reasoning in LLMs - Reasoning in the context of LLMs refers to the model's ability to generate intermediate steps before arriving at a final answer, often described as chain-of-thought (CoT) reasoning [8][10]. - The distinction between reasoning and pattern matching is crucial, as traditional LLMs primarily rely on statistical correlations rather than logical reasoning [23][25]. - Understanding reasoning methods is essential for enhancing LLMs' capabilities to tackle complex tasks, such as solving logical puzzles or multi-step arithmetic problems [5][39]. Group 2: Training Process of LLMs - The typical training process for LLMs consists of two main phases: pre-training and fine-tuning [16][19]. - During pre-training, LLMs are trained on vast amounts of unlabelled text (up to several terabytes) to learn language patterns, which can cost millions of dollars and take months [17][21]. - Fine-tuning involves supervised fine-tuning (SFT) and preference fine-tuning to improve the model's ability to respond to user queries [20][21]. Group 3: Pattern Matching vs. Logical Reasoning - LLMs learn to predict the next token based on statistical patterns in the training data, which allows them to generate coherent text but lacks true understanding [23][24]. - In contrast, logical reasoning requires the ability to derive conclusions step-by-step, identifying contradictions and causal relationships [25][26]. - The article highlights that most LLMs do not actively identify contradictions but instead rely on learned patterns from training data [30][34]. Group 4: Enhancing Reasoning Capabilities - The reasoning capabilities of LLMs gained significant attention with the release of OpenAI's o1 model, which emphasizes a more human-like thought process [41][43]. - Enhancements to LLM reasoning can be achieved through inference-time compute scaling, reinforcement learning, and knowledge distillation [44][46][48]. - These methods aim to improve the model's reasoning ability without retraining the underlying model weights [46][48]. Group 5: Importance of Building Reasoning Models from Scratch - Building reasoning models from scratch provides valuable insights into the capabilities, limitations, and computational trade-offs of LLMs [50][57]. - The shift towards reasoning models reflects a broader trend in the AI industry, emphasizing the need for models that can handle complex tasks effectively [52][55]. - Understanding the underlying mechanisms of LLMs and reasoning models is crucial for optimizing their performance in various applications [57].