Reasoning - filings, earnings calls, financial reports, news - Reportify

Reasoning

Search documents

智谱创始人唐杰谈DeepSeek：很震撼，开启了“AI做事”新范式

Xin Lang Cai Jing· 2026-01-10 13:54

唐杰表示，当时自己也开始思考，也许在DeepSeek这种范式下，即使我们做得再好，也许在Chat的问题上可能最后做到跟DeepSeek差不多，或许再个性化一点，变成有情感的Chat，但总的来讲这个范式可能基本上"快到头"了，剩下更多的反而是工程和技术上的问题。因此，当时自己也面临着"AI下一步朝向哪个方向发展"的思考。 "我们当时的想法是，也许新的范式是让每个人能够用AI做一件事情，这可能是下一个范式，原来是 Chat，现在是真的做事了，所以新的范式开启了。"唐杰表示，经过思考后，智谱最终选择了将AI Coding、Agentic、Reasoning能力整合到一起，让三个能力都相对比较平衡，而不是将这些能力都拆分出去单独研究。据唐杰介绍，2025年7月28日智谱发布GLM-4.5后，整合了代码、推理、Agent三能力，并取得了12项 Benchmark国内领先，近期公司发布的GLM-4.7，相比原来的GLM-4.6和GLM-4.5在Agent和Coding方面再次实现了大幅度提升。（文猛）据唐杰介绍，2025年7月28日智谱发布GLM-4.5后，整合了代码、推理、Agent三能力，并取得了12项 ...

KNOWLEDGE ATLAS(HK:02513)

Hard Won Lessons from Building Effective AI Coding Agents – Nik Pash, Cline

AI Engineer· 2025-12-05 22:02

Most of what’s written about AI agents sounds great in theory — until you try to make them work in production. The seductive ideas (multi-agent orchestration, RAG, prompt stacking) often collapse under real-world constraints. Why? Because they optimize for the wrong thing. In this talk, Nik Pash shares hard-won lessons from building large-scale coding agents at Cline — what failed, what survived, and why the next leap forward won’t come from clever scaffolds, but from evals and environments that truly measu ...

Multi-agent orchestration

Prompt stacking

Multi-agent orchestration

Prompt stacking

X @Tesla Owners Silicon Valley

Tesla Owners Silicon Valley· 2025-12-03 21:07

Grok Rankings Update — December 3xAI continues to lead across multiple benchmarks and usage categories. Here is the full breakdown:## Grok 4.1 Fast — The Agentic ModelSpecialized for tool-calling, long-context, and high-speed performance.#1 on τ²-Bench Telecom (Agentic Tool Use Benchmark)#1 on Berkeley Function Calling Benchmark#1 in Programming Category (Overall Token Share)#1 in Multilingual Usage (Overall Token Share)#2 on OpenRouter Overall Leaderboard (By Token Usage)## Grok Code Fast 1 — The Market Do ...

Artificial Intelligence

Artificial Intelligence

Gemini 3: Reasoning with voxel art

Google DeepMind· 2025-11-18 16:01

Gemini 3 is great at creating beautiful voxal art like this eagle that it built from a single prompt. It's also great at reasoning. So if we break this model down to individual voxels, we can then ask Gemini 3 to build something completely new with it.So that might be a cat or a rabbit or even two eagles. ...

Anthropic· 2025-11-04 00:32

Reasoning Performance of LLMs - LLMs' reasoning performance significantly reduces when reasoning is obfuscated using simple ciphers [1] - LLMs struggle to reason accurately in ciphered text, except for simple ciphers [1] Ciphered Language Processing - LLMs can easily decode ciphered text to English [1] - Research involved testing 10 LLMs across 28 ciphers [1]

Ciphered Language

Language Models

Ciphered Language

Language Models

Avi Chawla· 2025-10-24 06:32

General Information - The content encourages sharing insights on DS, ML, LLMs, and RAGs [1] - The content introduces building a reasoning LLM using GRPO from scratch (100% local) [1] Author Information - Avi Chawla (@_avichawla) shares tutorials and insights daily [1]

Large Language Models

Large Language Models

Avi Chawla· 2025-10-24 06:31

Let's build a reasoning LLM using GRPO, from scratch (100% local): ...

Avi Chawla· 2025-10-21 19:56

Core Problem & Solution - Current LLM techniques struggle to maintain focus on crucial rules and context in long conversations, leading to hallucinations and inconsistent behavior [1][2][5] - Attentive Reasoning Queries (ARQs) solve this by guiding LLMs with explicit, domain-specific questions encoded as targeted queries inside a JSON schema [3][4] - ARQs reinstate critical instructions and facilitate auditable, verifiable intermediate reasoning steps [4][6] - ARQs outperform Chain-of-Thought (CoT) reasoning and direct response generation, achieving a 90.2% success rate across 87 test scenarios [6][8] Implementation & Application - ARQs are implemented in Parlant, an open-source framework [6] - ARQs are integrated into modules like guideline proposer, tool caller, and message generator [8] - Making reasoning explicit, measurable, and domain-aware helps LLMs reason with intention, especially in high-stakes or multi-turn scenarios [7]

Large Language Models (LLMs)

Attentive Reasoning Queries (ARQs)

Hallucinations in LLMs

Chain-of-Thought (CoT)

Large Language Models (LLMs)

Attentive Reasoning Queries (ARQs)

Hallucinations in LLMs

Chain-of-Thought (CoT)

Avi Chawla· 2025-10-20 06:31

If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves: https://t.co/eJnOXv0abS ...

Large Language Models

Chain-of-Thought

Large Language Models

Chain-of-Thought

Avi Chawla· 2025-10-20 06:31

Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves:We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long, multi-turn conversation.For instance, when you give Agents a 2,000-word system prompt filled with policies, tone r ...

Attentive Reasoning Queries (ARQs)

Large Language Models

Hallucinations in LLMs

Chain-of-Thought

Attentive Reasoning Queries (ARQs)

Large Language Models

Hallucinations in LLMs

Chain-of-Thought