Reinforcement Learning
Search documents
X @Avi Chawla
Avi Chawla· 2026-03-23 09:03
TinyLoRA: LoRA scaled down to 1 parameter.Researchers from Meta, Cornell, and CMU just dropped a banger.They turned an 8B parameter model into a math and reasoning powerhouse by tweaking just 13 of those parameters.That's 26 bytes and takes up less storage than this sentence.The model hit 91% accuracy on GSM8K, up from 76% before the tweak.The method is called TinyLoRA, and it pushes low-rank adaptation to its absolute extreme.Some quick background on LoRA first:When you finetune a large model, you're updat ...
X @Avi Chawla
Avi Chawla· 2026-03-17 20:45
RT Avi Chawla (@_avichawla)There's a new learning paradigm for AI agents.It learns the way humans do.Think about how you learned to drive. Nobody memorizes every route turn by turn. You develop instincts like maintaining a safe distance, anticipating what other drivers will do, and braking early in the rain. Those instincts become skills you carry to every road you ever drive on.AI agents today do the opposite.Most memory systems store raw trajectories, which are full logs of every action the agent took dur ...
X @Avi Chawla
Avi Chawla· 2026-03-17 06:49
There's a new learning paradigm for AI agents.It learns the way humans do.Think about how you learned to drive. Nobody memorizes every route turn by turn. You develop instincts like maintaining a safe distance, anticipating what other drivers will do, and braking early in the rain. Those instincts become skills you carry to every road you ever drive on.AI agents today do the opposite.Most memory systems store raw trajectories, which are full logs of every action the agent took during a task.These logs are l ...
NVIDIA Launches Vera CPU, Purpose-Built for Agentic AI
Globenewswire· 2026-03-16 19:30
Core Insights - NVIDIA has launched the Vera CPU, the first processor specifically designed for agentic AI and reinforcement learning, achieving twice the efficiency and 50% faster performance compared to traditional rack-scale CPUs [1][18][19] Performance and Efficiency - The Vera CPU offers the highest single-thread performance and bandwidth per core, enhancing AI throughput, responsiveness, and efficiency for large-scale AI applications [3][10] - It features 88 custom NVIDIA-designed Olympus cores, capable of running two tasks simultaneously, which is ideal for multi-tenant AI factories [11] Infrastructure and Ecosystem - The Vera CPU is integrated into a new rack system that can support over 22,500 concurrent CPU environments, allowing rapid deployment and scaling of AI tools [5][6] - Leading hyperscalers and global system makers, including Alibaba, Meta, and Oracle Cloud Infrastructure, are collaborating with NVIDIA to deploy the Vera CPU, establishing it as a new standard for AI workloads [4][19] Technological Advancements - The Vera CPU utilizes NVIDIA NVLink-C2C interconnect technology, providing 1.8 TB/s of coherent bandwidth, which is seven times the bandwidth of PCIe Gen 6, facilitating high-speed data sharing between CPUs and GPUs [7] - The second-generation low-power memory subsystem built on LPDDR5X memory delivers up to 1.2 TB/s of bandwidth, achieving twice the bandwidth at half the power compared to general-purpose CPUs [12] Adoption and Use Cases - Companies like Cursor and Redpanda are adopting the Vera CPU to enhance performance for AI coding agents and streaming data applications, respectively, with Redpanda reporting up to 5.5 times lower latency in benchmarks [13][14] - National laboratories and leading cloud service providers are planning to deploy Vera CPUs, indicating strong interest across various sectors [15]
X @Avi Chawla
Avi Chawla· 2026-03-10 19:32
RT Avi Chawla (@_avichawla)OpenClaw meets RL!OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change.OpenClaw-RL solves this!It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL.The architecture is fully async. This means serving, reward scoring, and training all run in parallel.Once done, weights get hot-swapped after every batch while the agent keeps responding ...
10 years of AlphaGo: The turning point for AI | Thore Graepel & Pushmeet Kohli
Google DeepMind· 2026-03-10 17:28
Welcome back to Google Deep Mind the podcast. I'm Professor Hannah Fry. Picture this scene. It's March 2016. Inside a hotel suite in Soul, South Korea, two players are playing the ancient game of Go. A game of unimaginable complexity, long thought impossible for a machine to master. On one side is Lisa Doll, a legendary 18time Go world champion. on the other, Alph Go, a neural networkbased AI system built on a powerful technique called reinforcement learning. >> Welcome to the Deep Mind challenged live in S ...
腾讯研究院AI速递 20260311
腾讯研究院· 2026-03-10 16:01
Group 1 - Anthropic has introduced a multi-agent code review system for Claude Code, increasing the proportion of PRs receiving substantial review feedback from 16% to 54% after deployment [1] - In large PRs exceeding a thousand lines, 84% receive review comments, averaging 7.5 issues found, with incorrect review results marked at less than 1% [1] - The review system operates on a token-based billing model, costing between $15 to $25 per review, and allows customization of review rules for team and enterprise users [1] Group 2 - AMI Labs, founded by Turing Award winner Yann LeCun, has completed a $1.03 billion seed round with a valuation of $3.5 billion, led by former FAIR engineering director Alex LeBrun as CEO [2] - The company aims to build a world model based on the JEPA architecture, focusing on high-reliability scenarios in industrial control, robotics, wearables, and healthcare [2] - Alexey Sutskever, the proposer of the DiT architecture, has joined as Chief Scientist, with the first practical application expected to take at least a year of research [2] Group 3 - Microsoft has launched Copilot Cowork, which fully integrates with Excel, Word, PPT, and Outlook, utilizing the Anthropic Claude model for reasoning [3] - Key functionalities include automatic weekly schedule organization, preparing entire client meeting agendas with a single command, and executing comprehensive plans from competitive analysis to product launch [3] - The pricing is set at an additional $30/month on top of the M365 enterprise version, with a new E7 package available for $99/month, currently in limited customer research preview [3] Group 4 - Tencent's Mix Yuan 3D team has open-sourced the first reinforcement learning post-training framework for world models, named WorldCompass, addressing pre-trained world model instruction failures [4] - The framework features three core innovations: slice-level sampling to reduce computational complexity, interaction-following scoring based on a 3D base model, and efficient RL optimization algorithms [4] - Interaction accuracy in composite action scenarios has improved from 20% to 55%, achieving better scores on the Stanford WorldScore benchmark [4] Group 5 - Zhipu has launched AutoClaw, a one-click installation tool for local versions on macOS and Windows, providing full OpenClaw capabilities and automatic integration with instant messaging tools [6] - The tool includes the Pony-Alpha-2 model optimized for OpenClaw scenarios, enhancing task execution and integrating AutoGLM Browser-Use capabilities [6] - It features over 50 mainstream skills and APIs covering content creation, office tasks, coding, marketing, and finance, with support for various model APIs [6] Group 6 - Reports indicate that the U.S. military utilized Palantir's Maven system embedded with the Claude model during the U.S.-Iran conflict, analyzing over 150 information streams on the first day [7] - The Maven system integrates data from satellite images, drone footage, and intercepted communications, allowing Claude to generate target suggestions and precise coordinates in real-time [7] - The military has reportedly struck over 3,000 targets, with a Georgetown University study showing that the workload previously requiring 2,000 personnel can now be handled by just 20 [7] Group 7 - Figure has released an update on its robot, which autonomously organizes a living room using the Helix 02 system, performing tasks such as disinfecting surfaces and organizing items [8] - The Helix 02 system features a three-layer architecture for semantic reasoning, perception conversion, and control based on extensive human motion data [8] - The team has not developed new algorithms or customized scenarios, instead allowing the system to learn new tasks simply by supplementing data [8] Group 8 - The AI system OALL has launched O-DataMap, mapping experimental data from global papers into a navigable two-dimensional coordinate system [9] - The map allows users to assess research field heat and maturity, trace knowledge lineages of individual studies, and evaluate research gaps based on input ideas [9] - The map grows in real-time as the AI pipeline continuously analyzes new papers, providing insights into the influence of researchers across fields [9] Group 9 - The latest a16z global AI product Top 100 report shows ChatGPT leading with 900 million weekly users, while Claude's paid subscriptions have increased by over 200% [10] - ChatGPT is expanding into over 85 categories, including travel and shopping, while Claude focuses on professional users with integrated financial terminals and developer infrastructure [10] - OpenClaw has become the highest-starred project on GitHub, surpassing React and Linux, indicating a shift in the competitive landscape of AI products [10] Group 10 - A discussion between Fields Medalist Terence Tao and OpenAI's Mark Chen highlighted that AI is transforming mathematics into a more industrialized field, with significant reductions in error rates [11] - Tao noted that AI has become a daily research tool, outsourcing complex calculations, and has already solved several long-standing mathematical problems with minimal human oversight [11] - Chen emphasized that formal verification systems in mathematics serve as natural judges for reinforcement learning, enabling a mechanism for "infinite cheap trial and error" [11]
X @Avi Chawla
Avi Chawla· 2026-03-10 11:57
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/BFN1giRtPMAvi Chawla (@_avichawla):OpenClaw meets RL!OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change.OpenClaw-RL solves this!It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in https://t.co/ddj08qfDAX ...
X @Avi Chawla
Avi Chawla· 2026-03-10 07:02
OpenClaw meets RL!OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change.OpenClaw-RL solves this!It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL.The architecture is fully async. This means serving, reward scoring, and training all run in parallel.Once done, weights get hot-swapped after every batch while the agent keeps responding.Currently, it has two trai ...
AI大神10亿美元创业,不走寻常路
Sou Hu Cai Jing· 2026-02-21 07:38
Core Insights - David Silver's startup, Ineffable Intelligence, has raised $1 billion in funding, potentially marking the largest seed round financing for a startup in Europe [1][3] - The company is currently valued at approximately $4 billion, with ongoing negotiations that may alter the terms of the investment [3] - Silver's departure from Google DeepMind has sparked significant interest from venture capital firms, including Sequoia Capital, which is leading the funding round [3] Company Overview - Ineffable Intelligence aims to develop AI through reinforcement learning, bypassing traditional large language models to create "superintelligence" [3] - David Silver is renowned for his role in developing AlphaGo and AlphaStar, which have significantly impacted perceptions of AI capabilities [3] - Following Google's acquisition of DeepMind in 2014, Silver has been instrumental in the development of models like Gemini [3] Investment Landscape - The funding round led by Sequoia Capital reflects investor enthusiasm for top industry talent venturing into entrepreneurship [3] - Major tech companies such as Nvidia, Google, and Microsoft are also interested in participating in the investment [3]