Workflow
思维链(CoT)
icon
Search documents
超越O4-mini,多模态大模型终于学会回头「看」:中科院自动化所提出GThinker模型
机器之心· 2025-07-19 03:13
Core Viewpoint - The article discusses the limitations of existing multimodal large models in flexible visual interpretation and introduces GThinker, a new model designed to enhance multimodal reasoning capabilities through a novel "Cue-Guided Rethinking" approach [1][3][10]. Group 1: Limitations of Existing Models - Current multimodal models, despite advancements, struggle with general scenarios requiring flexible visual interpretation, often relying on knowledge-based reasoning without deep verification of visual cues [1][8]. - Existing methods, including structured CoT and reinforcement learning, exhibit significant limitations, particularly in correcting misinterpretations of visual cues during reasoning [8][10]. Group 2: Introduction of GThinker - GThinker is developed by researchers from the Institute of Automation, Chinese Academy of Sciences, aiming to achieve universal multimodal reasoning [2]. - The core innovation of GThinker is its "Cue-Guided Rethinking" mode, which allows the model to actively verify and correct its visual understanding during reasoning [3][10]. Group 3: Training Methodology - GThinker employs a two-stage training process to instill the ability for rethinking, starting with a supervised fine-tuning phase that builds a dataset of 7,000 high-quality samples for cold-starting the model's rethinking capabilities [20][21]. - The model utilizes a mixed reward mechanism in reinforcement learning to encourage active exploration across diverse tasks, enhancing its reasoning capabilities [23][24]. Group 4: Performance Results - GThinker has demonstrated superior performance on the challenging M³CoT comprehensive reasoning benchmark, surpassing the latest O4-mini model and achieving state-of-the-art results in various mathematical and knowledge reasoning tasks [4][26]. - In tests across multiple scenarios, GThinker outperformed or matched existing advanced models, indicating its effective learning of rethinking capabilities without causing specialization [28][30].
突发|思维链开山作者Jason Wei被曝加入Meta,机器之心独家证实:Slack没了
机器之心· 2025-07-16 02:22
Core Viewpoint - Meta continues to recruit top talent from OpenAI, with notable researchers Jason Wei and Hyung Won Chung reportedly leaving OpenAI to join Meta [1][2][4]. Group 1: Talent Acquisition - Jason Wei and Hyung Won Chung, both prominent researchers at OpenAI, are confirmed to be leaving for Meta, with their Slack accounts already deactivated [2][4]. - Jason Wei is recognized as a key author of the Chain of Thought (CoT) concept, which has significantly influenced the AI large model field [4][6]. - Hyung Won Chung has been a core contributor to OpenAI's projects, including the o1 model, and has a strong background in large language models [4][29]. Group 2: Contributions and Impact - Jason Wei's work includes leading early efforts in instruction tuning and contributing to research on the emergent capabilities of large models, with over 77,000 citations on Google Scholar [21][16]. - Hyung Won Chung has played a critical role in the development of major projects like PaLM and BLOOM during his time at Google, and later at OpenAI, where he contributed to the o1 series models [26][40]. - Both researchers have been influential in advancing the capabilities of AI systems, particularly in reasoning and information retrieval [38][40]. Group 3: Community Reaction - Following the news of their potential move to Meta, the online community has expressed excitement and congratulations towards Jason Wei, indicating a strong interest in their career transition [10][9].
ACL 2025|为什么你设计的 Prompt 会成功?新理论揭示大模型 Prompt 设计的奥秘与效能
机器之心· 2025-06-16 04:04
Core Insights - The article discusses the importance of prompt design in enhancing the performance of large language models (LLMs) during complex reasoning tasks, emphasizing that effective prompts can significantly improve model accuracy and efficiency [2][7][36] - A theoretical framework is proposed to quantify the complexity of the prompt search space, transitioning prompt engineering from an empirical practice to a more scientific approach [5][35] Group 1: Prompt Design and Its Impact - The effectiveness of prompt engineering has historically been viewed as somewhat mystical, with certain combinations yielding significant performance boosts while others fall short [7] - Prompts serve as critical "selectors" in the chain of thought (CoT) reasoning process, guiding the model in extracting relevant information from its internal hidden states [12][36] - The study reveals that the choice of prompt templates directly influences the reasoning performance of LLMs, with optimal prompt designs leading to performance improvements exceeding 50% [29][36] Group 2: Theoretical Framework and Experimental Evidence - The research introduces a systematic approach to finding optimal prompts by breaking down the CoT reasoning process into two interconnected search spaces: the prompt space and the answer space [22][35] - Experimental results demonstrate that the introduction of CoT mechanisms allows LLMs to perform recursive calculations, which are essential for tackling multi-step reasoning tasks [26][30] - The study highlights that well-designed prompts can effectively dictate the output of each reasoning step, ensuring that only the most relevant information is utilized for subsequent calculations [28][36] Group 3: Limitations and Future Directions - The article notes that relying solely on generic prompts can severely limit the model's performance on complex tasks, indicating the need for tailored prompt designs [36] - Variants of CoT, such as Tree-of-Thought (ToT) and Graph-of-Thought (GoT), can enhance performance but are still constrained by the underlying prompt templates used [32][33] - The findings underscore the necessity for a deeper understanding of task requirements to design prompts that effectively guide LLMs in extracting and utilizing core information [23][35]
北大校友、OpenAI前安全副总裁Lilian Weng关于模型的新思考:Why We Think
Founder Park· 2025-05-18 07:06
Core Insights - The article discusses recent advancements in utilizing "thinking time" during testing and its mechanisms, aiming to enhance model performance in complex cognitive tasks such as logical reasoning, long text comprehension, mathematical problem-solving, and code generation and debugging [4][5]. Group 1: Motivating Models to Think - The core idea is closely related to human thinking processes, where complex problems require time for reflection and analysis [9]. - Daniel Kahneman's dual process theory categorizes human thinking into two systems: fast thinking, which is quick and intuitive, and slow thinking, which is deliberate and logical [9][13]. - In deep learning, neural networks can be characterized by the computational and storage resources they utilize during each forward pass, suggesting that optimizing these resources can improve model performance [10]. Group 2: Thinking in Tokens - The strategy of generating intermediate reasoning steps before producing final answers has evolved into a standard method, particularly in mathematical problem-solving [12]. - The introduction of the "scratchpad" concept allows models to treat generated intermediate tokens as temporary content for reasoning processes, leading to the term "chain of thought" (CoT) [12]. Group 3: Enhancing Reasoning Capabilities - CoT prompting significantly improves success rates in solving mathematical problems, with larger models benefiting more from increased "thinking time" [16]. - Two main strategies to enhance generation quality are parallel sampling and sequential revision, each with its own advantages and challenges [18][19]. Group 4: Self-Correction and Reinforcement Learning - Recent research has successfully utilized reinforcement learning (RL) to enhance language models' reasoning capabilities, particularly in STEM-related tasks [31]. - The DeepSeek-R1 model, designed for high-complexity tasks, employs a two-stage training process combining supervised fine-tuning and reinforcement learning [32]. Group 5: External Tools and Enhanced Reasoning - The use of external tools, such as code interpreters, can efficiently solve intermediate steps in reasoning processes, expanding the capabilities of language models [45]. - The ReAct method integrates external operations with reasoning trajectories, allowing models to incorporate external knowledge into their reasoning paths [48][50]. Group 6: Monitoring and Trustworthiness of Reasoning - Monitoring CoT can effectively detect inappropriate behaviors in reasoning models, such as reward hacking, and enhance robustness against adversarial inputs [51][53]. - The article highlights the importance of ensuring that models faithfully express their reasoning processes, as biases can arise from training data or human-written examples [55][64].
刚刚!北大校友Lilian Weng最新博客来了:Why We Think
机器之心· 2025-05-18 04:25
Core Insights - The article discusses advancements in utilizing "thinking time" during model inference, aiming to enhance the reasoning capabilities of AI models like GPT, Claude, and Gemini [2][3][16]. Group 1: Thinking Mechanisms - The concept of "thinking time" is analogous to human cognitive processes, where complex problems require reflection and analysis before arriving at a solution [6]. - Daniel Kahneman's dual process theory categorizes human thinking into fast (System 1) and slow (System 2) modes, emphasizing the importance of slower, more deliberate thought for accurate decision-making [12]. Group 2: Computational Resources - In deep learning, neural networks can be characterized by the computational and storage resources they utilize during each forward pass, impacting their performance [8]. - The efficiency of models can be improved by allowing them to perform more computations during inference, particularly through strategies like Chain of Thought (CoT) prompting [8][18]. Group 3: Chain of Thought (CoT) and Learning Strategies - CoT prompting significantly enhances the success rate of solving mathematical problems, with larger models benefiting more from extended "thinking time" [16]. - Early research focused on supervised learning from human-written reasoning paths, evolving into reinforcement learning strategies that improve CoT reasoning capabilities [14][41]. Group 4: Test-Time Computation Strategies - Two main strategies for improving generation quality are parallel sampling and sequential revision, each with distinct advantages and challenges [19][20]. - Parallel sampling is straightforward but relies on the model's ability to generate correct answers in one go, while sequential revision allows for targeted corrections but is slower [20][21]. Group 5: Reinforcement Learning Applications - Recent studies have successfully employed reinforcement learning to enhance reasoning capabilities in language models, particularly in STEM-related tasks [41][46]. - The training process often involves a cold-start phase followed by reasoning-oriented reinforcement learning, optimizing performance through structured feedback [42][43]. Group 6: External Tools and Integration - Utilizing external tools, such as code interpreters or APIs, can enhance the reasoning process by offloading certain computational tasks [52][56]. - The ReAct method combines external operations with reasoning trajectories, allowing models to incorporate external knowledge into their inference paths [56][57]. Group 7: Model Interpretability and Trustworthiness - The article highlights the importance of model interpretability, particularly through CoT, which allows for monitoring and understanding model behavior [59]. - However, there are concerns regarding the fidelity of CoT outputs, as biases and errors can affect the reliability of the reasoning process [62][64]. Group 8: Adaptive Computation and Token Utilization - Adaptive computation time allows models to dynamically adjust the number of computation steps during inference, enhancing their reasoning capabilities [81]. - Introducing special tokens, such as thinking tokens, can provide additional processing time and improve model performance on complex tasks [85][89].