Reasoning
Search documents
智谱创始人唐杰谈DeepSeek:很震撼,开启了“AI做事”新范式
Xin Lang Cai Jing· 2026-01-10 13:54
Core Viewpoint - The emergence of DeepSeek in early 2025 is expected to be a significant and surprising development in the AI field, prompting a reevaluation of the direction of AI advancements [2][5]. Group 1: AI Development Paradigms - The current paradigm of AI, focused on chat capabilities, may be nearing its limits, with future advancements likely to be more about engineering and technical challenges [2][5]. - A new paradigm is proposed where AI enables individuals to accomplish specific tasks, moving beyond mere conversational capabilities to practical applications [2][5]. Group 2: Company Innovations - The company, under the leadership of founder Tang Jie, has chosen to integrate AI capabilities in Coding, Agentic, and Reasoning, aiming for a balanced development rather than isolating these abilities [2][5]. - Following the release of GLM-4.5 on July 28, 2025, the company achieved leadership in 12 domestic benchmarks, with the recent GLM-4.7 showing significant improvements in Agent and Coding capabilities compared to its predecessors GLM-4.6 and GLM-4.5 [3][6].
Hard Won Lessons from Building Effective AI Coding Agents – Nik Pash, Cline
AI Engineer· 2025-12-05 22:02
Most of what’s written about AI agents sounds great in theory — until you try to make them work in production. The seductive ideas (multi-agent orchestration, RAG, prompt stacking) often collapse under real-world constraints. Why? Because they optimize for the wrong thing. In this talk, Nik Pash shares hard-won lessons from building large-scale coding agents at Cline — what failed, what survived, and why the next leap forward won’t come from clever scaffolds, but from evals and environments that truly measu ...
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-12-03 21:07
Grok Rankings Update — December 3xAI continues to lead across multiple benchmarks and usage categories. Here is the full breakdown:## Grok 4.1 Fast — The Agentic ModelSpecialized for tool-calling, long-context, and high-speed performance.#1 on τ²-Bench Telecom (Agentic Tool Use Benchmark)#1 on Berkeley Function Calling Benchmark#1 in Programming Category (Overall Token Share)#1 in Multilingual Usage (Overall Token Share)#2 on OpenRouter Overall Leaderboard (By Token Usage)## Grok Code Fast 1 — The Market Do ...
Gemini 3: Reasoning with voxel art
Google DeepMind· 2025-11-18 16:01
Gemini 3 is great at creating beautiful voxal art like this eagle that it built from a single prompt. It's also great at reasoning. So if we break this model down to individual voxels, we can then ask Gemini 3 to build something completely new with it.So that might be a cat or a rabbit or even two eagles. ...
X @Anthropic
Anthropic· 2025-11-04 00:32
Reasoning Performance of LLMs - LLMs' reasoning performance significantly reduces when reasoning is obfuscated using simple ciphers [1] - LLMs struggle to reason accurately in ciphered text, except for simple ciphers [1] Ciphered Language Processing - LLMs can easily decode ciphered text to English [1] - Research involved testing 10 LLMs across 28 ciphers [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
General Information - The content encourages sharing insights on DS, ML, LLMs, and RAGs [1] - The content introduces building a reasoning LLM using GRPO from scratch (100% local) [1] Author Information - Avi Chawla (@_avichawla) shares tutorials and insights daily [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:31
Let's build a reasoning LLM using GRPO, from scratch (100% local): ...
X @Avi Chawla
Avi Chawla· 2025-10-21 19:56
Core Problem & Solution - Current LLM techniques struggle to maintain focus on crucial rules and context in long conversations, leading to hallucinations and inconsistent behavior [1][2][5] - Attentive Reasoning Queries (ARQs) solve this by guiding LLMs with explicit, domain-specific questions encoded as targeted queries inside a JSON schema [3][4] - ARQs reinstate critical instructions and facilitate auditable, verifiable intermediate reasoning steps [4][6] - ARQs outperform Chain-of-Thought (CoT) reasoning and direct response generation, achieving a 90.2% success rate across 87 test scenarios [6][8] Implementation & Application - ARQs are implemented in Parlant, an open-source framework [6] - ARQs are integrated into modules like guideline proposer, tool caller, and message generator [8] - Making reasoning explicit, measurable, and domain-aware helps LLMs reason with intention, especially in high-stakes or multi-turn scenarios [7]
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves: https://t.co/eJnOXv0abS ...
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves:We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long, multi-turn conversation.For instance, when you give Agents a 2,000-word system prompt filled with policies, tone r ...