Reasoning
Search documents
Gemini 3: Reasoning with voxel art
Google DeepMind· 2025-11-18 16:01
Gemini 3 is great at creating beautiful voxal art like this eagle that it built from a single prompt. It's also great at reasoning. So if we break this model down to individual voxels, we can then ask Gemini 3 to build something completely new with it.So that might be a cat or a rabbit or even two eagles. ...
X @Anthropic
Anthropic· 2025-11-04 00:32
Reasoning Performance of LLMs - LLMs' reasoning performance significantly reduces when reasoning is obfuscated using simple ciphers [1] - LLMs struggle to reason accurately in ciphered text, except for simple ciphers [1] Ciphered Language Processing - LLMs can easily decode ciphered text to English [1] - Research involved testing 10 LLMs across 28 ciphers [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
General Information - The content encourages sharing insights on DS, ML, LLMs, and RAGs [1] - The content introduces building a reasoning LLM using GRPO from scratch (100% local) [1] Author Information - Avi Chawla (@_avichawla) shares tutorials and insights daily [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:31
Let's build a reasoning LLM using GRPO, from scratch (100% local): ...
X @Avi Chawla
Avi Chawla· 2025-10-21 19:56
Core Problem & Solution - Current LLM techniques struggle to maintain focus on crucial rules and context in long conversations, leading to hallucinations and inconsistent behavior [1][2][5] - Attentive Reasoning Queries (ARQs) solve this by guiding LLMs with explicit, domain-specific questions encoded as targeted queries inside a JSON schema [3][4] - ARQs reinstate critical instructions and facilitate auditable, verifiable intermediate reasoning steps [4][6] - ARQs outperform Chain-of-Thought (CoT) reasoning and direct response generation, achieving a 90.2% success rate across 87 test scenarios [6][8] Implementation & Application - ARQs are implemented in Parlant, an open-source framework [6] - ARQs are integrated into modules like guideline proposer, tool caller, and message generator [8] - Making reasoning explicit, measurable, and domain-aware helps LLMs reason with intention, especially in high-stakes or multi-turn scenarios [7]
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves: https://t.co/eJnOXv0abS ...
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves:We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long, multi-turn conversation.For instance, when you give Agents a 2,000-word system prompt filled with policies, tone r ...
From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
a16z· 2025-09-25 13:00
The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas. The next set of evals and milestone that we're looking at will involve actual movement on things that are economically relevant. And I was talking to some some high schoolers and they're saying, "Oh, you know, actually the default way to code is vibe coding. I I do think you know the future hopefully will be vibe researching. " Thanks for coming Jacob and Mark. Jacob, you're the chief scientis ...
X @Avi Chawla
Avi Chawla· 2025-08-29 06:30
AI Agent Evolution - The industry has progressed from simple LLMs to sophisticated Agentic systems with reasoning, memory, and tool use [1] - Early transformer-based chatbots were limited by small context windows, exemplified by ChatGPT's initial 4k token limit [1] - The industry has seen upgrades to handle thousands of tokens, enabling parsing of larger documents and longer conversations [1] - Retrieval-Augmented Generation (RAG) provided access to fresh and external data, enhancing LLM outputs [1] - Multimodal LLMs can process multiple data types (text, images, audio), with memory introducing persistence across interactions [1] Key Components of Advanced AI Agents - Advanced AI Agents are equipped with short-term, long-term, and episodic memory [1] - Tool calling (search, APIs, actions) is a crucial feature of modern AI Agents [1] - Reasoning and ReAct-based decision-making are integral to the current AI Agent era [1]
AGI progress, surprising breakthroughs, and the road ahead — the OpenAI Podcast Ep. 5
OpenAI· 2025-08-15 16:01
AI Progress & AGI Definition - OpenAI is setting the research roadmap for the company, deciding on technical paths and long-term research directions [1] - The industry is progressing to a point where AI can converse naturally, solve math problems, and the focus is shifting towards its real-world impact [1] - The potential for automating the discovery and production of new technology is a key consideration for AI's impact [1][2] - OpenAI seeks to create general intelligence, prioritizing the automated researcher concept for significant technological advancements [2] - The industry is seeing incredible results in medicine, combining reasoning with domain knowledge and intuition [2] Benchmarks & Evaluation - Current benchmarks are facing saturation as models reach human-level performance on standardized intelligence measures [3] - The field has developed data-efficient ways to train for specific abilities, making benchmarks less representative of overall intelligence [3] - The industry needs to consider the reward utility of models and their ability to discover new insights, rather than just test-taking abilities [3] - Reasoning models and longer chain of thought are significant advancements, but continuous hard work is needed to make them work [4][5] Future Directions - Scaling remains important, and new directions include extending the horizon for models to plan and reason [5] - The industry should expect progress on interfaces, with AI becoming more persistent and capable of expressing itself in different forms [6] - Learning to code remains a valuable skill, fostering structured intellect and the ability to break down complicated problems [6]