Reasoning
Search documents
智谱创始人唐杰谈DeepSeek:很震撼,开启了“AI做事”新范式
Xin Lang Cai Jing· 2026-01-10 13:54
唐杰表示,当时自己也开始思考,也许在DeepSeek这种范式下,即使我们做得再好,也许在Chat的问题 上可能最后做到跟DeepSeek差不多,或许再个性化一点,变成有情感的Chat,但总的来讲这个范式可能 基本上"快到头"了,剩下更多的反而是工程和技术上的问题。因此,当时自己也面临着"AI下一步朝向 哪个方向发展"的思考。 "我们当时的想法是,也许新的范式是让每个人能够用AI做一件事情,这可能是下一个范式,原来是 Chat,现在是真的做事了,所以新的范式开启了。"唐杰表示,经过思考后,智谱最终选择了将AI Coding、Agentic、Reasoning能力整合到一起,让三个能力都相对比较平衡,而不是将这些能力都拆分 出去单独研究。 据唐杰介绍,2025年7月28日智谱发布GLM-4.5后,整合了代码、推理、Agent三能力,并取得了12项 Benchmark国内领先,近期公司发布的GLM-4.7,相比原来的GLM-4.6和GLM-4.5在Agent和Coding方面 再次实现了大幅度提升。(文猛) 据唐杰介绍,2025年7月28日智谱发布GLM-4.5后,整合了代码、推理、Agent三能力,并取得了12项 ...
Hard Won Lessons from Building Effective AI Coding Agents – Nik Pash, Cline
AI Engineer· 2025-12-05 22:02
Most of what’s written about AI agents sounds great in theory — until you try to make them work in production. The seductive ideas (multi-agent orchestration, RAG, prompt stacking) often collapse under real-world constraints. Why? Because they optimize for the wrong thing. In this talk, Nik Pash shares hard-won lessons from building large-scale coding agents at Cline — what failed, what survived, and why the next leap forward won’t come from clever scaffolds, but from evals and environments that truly measu ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-12-03 21:07
Grok Rankings Update — December 3xAI continues to lead across multiple benchmarks and usage categories. Here is the full breakdown:## Grok 4.1 Fast — The Agentic ModelSpecialized for tool-calling, long-context, and high-speed performance.#1 on τ²-Bench Telecom (Agentic Tool Use Benchmark)#1 on Berkeley Function Calling Benchmark#1 in Programming Category (Overall Token Share)#1 in Multilingual Usage (Overall Token Share)#2 on OpenRouter Overall Leaderboard (By Token Usage)## Grok Code Fast 1 — The Market Do ...
Gemini 3: Reasoning with voxel art
Google DeepMind· 2025-11-18 16:01
Gemini 3 is great at creating beautiful voxal art like this eagle that it built from a single prompt. It's also great at reasoning. So if we break this model down to individual voxels, we can then ask Gemini 3 to build something completely new with it.So that might be a cat or a rabbit or even two eagles. ...
X @Anthropic
Anthropic· 2025-11-04 00:32
Reasoning Performance of LLMs - LLMs' reasoning performance significantly reduces when reasoning is obfuscated using simple ciphers [1] - LLMs struggle to reason accurately in ciphered text, except for simple ciphers [1] Ciphered Language Processing - LLMs can easily decode ciphered text to English [1] - Research involved testing 10 LLMs across 28 ciphers [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
General Information - The content encourages sharing insights on DS, ML, LLMs, and RAGs [1] - The content introduces building a reasoning LLM using GRPO from scratch (100% local) [1] Author Information - Avi Chawla (@_avichawla) shares tutorials and insights daily [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:31
Let's build a reasoning LLM using GRPO, from scratch (100% local): ...
X @Avi Chawla
Avi Chawla· 2025-10-21 19:56
Core Problem & Solution - Current LLM techniques struggle to maintain focus on crucial rules and context in long conversations, leading to hallucinations and inconsistent behavior [1][2][5] - Attentive Reasoning Queries (ARQs) solve this by guiding LLMs with explicit, domain-specific questions encoded as targeted queries inside a JSON schema [3][4] - ARQs reinstate critical instructions and facilitate auditable, verifiable intermediate reasoning steps [4][6] - ARQs outperform Chain-of-Thought (CoT) reasoning and direct response generation, achieving a 90.2% success rate across 87 test scenarios [6][8] Implementation & Application - ARQs are implemented in Parlant, an open-source framework [6] - ARQs are integrated into modules like guideline proposer, tool caller, and message generator [8] - Making reasoning explicit, measurable, and domain-aware helps LLMs reason with intention, especially in high-stakes or multi-turn scenarios [7]
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves: https://t.co/eJnOXv0abS ...
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves:We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long, multi-turn conversation.For instance, when you give Agents a 2,000-word system prompt filled with policies, tone r ...