Workflow
逻辑推理
icon
Search documents
治食不谨非但无罪,反得赏赐?
Xin Lang Cai Jing· 2026-01-01 23:53
相传春秋时期卫国一位名为说的膳食官为国君进献的烤肉上出现了一根三寸长的头发;一位名为媚的婢 女为国君夫人进献食物,饭中有一根半寸长的"蔡"(简文作"蔡",指杂草),国君及夫人见之大怒,要 将二人治罪。负责审理此案的史猷曰:"说毋罪,媚当赐衣。"国君很是不解,责问史猷是不是审错了。 史猷曰:臣已仔细查验过说切肉的刀,是新打造的,非常锋利,砧板也很坚固。用如此锋利的刀在砧板 上切肥牛肉,筋肉尽断,肉块不到一寸,而三寸长的头发却未被斩断,这不合理。臣又查验烤肉的工 具,桑炭质量极好,铁炉很坚固。桑炭的炽热火焰将肉烤得颇焦,三寸头发却不焦,也不合理。臣又查 验夫人的食室,涂饰甚谨,张设的帷幕很齐全,无草,而且没有通道能让风吹入草。臣还查看了媚睡卧 的莞席,莞席已破而编织的绳子也断了,还有一些莞碎,媚的衣袖破旧而棉絮露出,莞草粘到了她的棉 絮上,半寸长的有六根。试想,卫夫人有一婢女,着破衣,睡于席,席上的莞草粘在衣服上,让她为夫 人做饭,而草却不落入饭中,这是不可能的。臣带来了粘在媚衣服棉絮上的莞草,请求与饭中的草比 对。国君遂拿出饭中的草比之,同也。史猷曰:臣推测,烤肉上的头发是国君您今早命人扇风,而头发 飞到烤肉上 ...
多模态推理新基准!最强Gemini 2.5 Pro仅得60分,复旦港中文上海AILab等出品
量子位· 2025-06-06 13:45
MME团队 投稿 量子位 | 公众号 QbitAI 逻辑推理是人类智能的核心能力,也是多模态大语言模型 (MLLMs) 的关键能力。随着DeepSeek-R1等具备强大推理能力的LLM的出现,研 究人员开始探索如何将推理能力引入多模态大模型(MLLMs)。 然而,现有的benchmark大多缺乏对逻辑推理类型的明确分类,以及对逻辑推理的理解不够清晰,常将感知能力或知识广度与推理能力混 淆。 在此背景下,复旦大学及香港中文大学MMLab联合上海人工智能实验室等多家单位,提出了MME-Reasoning,旨在全面的评估多模态大模 型的推理能力。 结果显示,最优模型得分仅60%左右。 MME-Reasoning:全面评估多模态推理能力 根据Charles Sanders Peirce的分类标准,推理分为三类:演绎推理 (Deductive)、归纳推理 (Inductive) 以及溯因推理 (Abductive)。 MME-Reasoning以此分类作为标准来全面的测评多模态大模型的推理能力。 演绎推理 (Deductive reasoning) 使用规则和前提来推导出结论。 归纳推理 (Inductive reas ...
Sebastian Raschka 新书《从头开始推理》抢先看,揭秘推理模型基础
机器之心· 2025-05-02 04:39
Core Viewpoint - The article discusses the advancements in reasoning capabilities of large language models (LLMs) and introduces the book "Reasoning From Scratch" by Sebastian Raschka, which aims to provide practical insights into building reasoning models from the ground up [2][5][59]. Group 1: Definition and Importance of Reasoning in LLMs - Reasoning in the context of LLMs refers to the model's ability to generate intermediate steps before arriving at a final answer, often described as chain-of-thought (CoT) reasoning [8][10]. - The distinction between reasoning and pattern matching is crucial, as traditional LLMs primarily rely on statistical correlations rather than logical reasoning [23][25]. - Understanding reasoning methods is essential for enhancing LLMs' capabilities to tackle complex tasks, such as solving logical puzzles or multi-step arithmetic problems [5][39]. Group 2: Training Process of LLMs - The typical training process for LLMs consists of two main phases: pre-training and fine-tuning [16][19]. - During pre-training, LLMs are trained on vast amounts of unlabelled text (up to several terabytes) to learn language patterns, which can cost millions of dollars and take months [17][21]. - Fine-tuning involves supervised fine-tuning (SFT) and preference fine-tuning to improve the model's ability to respond to user queries [20][21]. Group 3: Pattern Matching vs. Logical Reasoning - LLMs learn to predict the next token based on statistical patterns in the training data, which allows them to generate coherent text but lacks true understanding [23][24]. - In contrast, logical reasoning requires the ability to derive conclusions step-by-step, identifying contradictions and causal relationships [25][26]. - The article highlights that most LLMs do not actively identify contradictions but instead rely on learned patterns from training data [30][34]. Group 4: Enhancing Reasoning Capabilities - The reasoning capabilities of LLMs gained significant attention with the release of OpenAI's o1 model, which emphasizes a more human-like thought process [41][43]. - Enhancements to LLM reasoning can be achieved through inference-time compute scaling, reinforcement learning, and knowledge distillation [44][46][48]. - These methods aim to improve the model's reasoning ability without retraining the underlying model weights [46][48]. Group 5: Importance of Building Reasoning Models from Scratch - Building reasoning models from scratch provides valuable insights into the capabilities, limitations, and computational trade-offs of LLMs [50][57]. - The shift towards reasoning models reflects a broader trend in the AI industry, emphasizing the need for models that can handle complex tasks effectively [52][55]. - Understanding the underlying mechanisms of LLMs and reasoning models is crucial for optimizing their performance in various applications [57].