Workflow
Reinforcement Learning
icon
Search documents
AI, the Brain, and Our Future | Dr.Beren Millidge | TEDxMiami
TEDx Talks· 2025-08-19 16:03
Ladies and gentlemen, please welcome Dr. . Baron Millage. Within the next few decades, it seems likely that humanity will succeed at building artificial minds and that these minds will rapidly come to outstrip our own intelligence across almost every task.This prospect is both inspiring and terrifying. Artificial intelligence will be one of the greatest inventions that humanity has ever made and also potentially its last. If we succeed, we will have solved deep questions that go right to the heart of the hu ...
喝点VC|红杉对谈OpenAI Agent团队:将Deep Research与Operator整合成主动为你做事的最强Agent
Z Potentials· 2025-08-14 03:33
Core Insights - The article discusses the integration of OpenAI's Deep Research and Operator projects to create a powerful AI Agent capable of executing complex tasks for up to one hour [2][5][6] - The AI Agent utilizes a virtual computer with various tools, including a text browser, GUI browser, terminal access, and API calling capabilities, allowing it to perform tasks that typically require human effort [6][7][24] - The model is designed to facilitate user interaction, enabling users to interrupt, correct, and clarify tasks during execution, which enhances its flexibility and effectiveness [7][22] Integration of Deep Research and Operator - The combination of Deep Research and Operator leverages the strengths of both projects, with Operator excelling in visual interactions and Deep Research in text-based information processing [9][10] - The integration allows the AI Agent to access paid content and perform tasks that require both browsing and interaction with web elements [10][11] - The collaboration has resulted in a more versatile toolset, enabling the AI Agent to perform a wider range of tasks, including generating reports, making purchases, and creating presentations [11][14] Real-World Applications - The AI Agent is designed for both consumer and professional use, targeting "prosumer" users who are willing to wait for detailed reports [15] - Examples of its application include data extraction from spreadsheets, online shopping, and generating financial models based on web-sourced information [16][18] - The model's ability to handle complex tasks autonomously is highlighted, with a recent task taking 28 minutes to complete, showcasing its potential for longer, more intricate assignments [19][20] Training and Development - The AI Agent is trained using reinforcement learning, where it learns to use various tools effectively by completing tasks that require their use [24][25] - The training process involves a significant increase in computational resources and data, allowing for more sophisticated model capabilities [45] - The development team emphasizes the importance of collaboration between research and application teams to ensure the model meets user needs from the outset [30][35] Future Directions - OpenAI aims to enhance the AI Agent's capabilities further, focusing on improving accuracy and performance across diverse tasks [37][49] - The potential for new interaction paradigms between users and the AI Agent is anticipated, with the goal of making the Agent more proactive in assisting users [49][42] - The team is excited about the ongoing exploration of the Agent's capabilities and the discovery of new use cases as it evolves [40][49]
OpenAI Dropped a FRONTIER Open-Weights Model
Matthew Berman· 2025-08-05 17:17
Open AAI has delivered on their promise to release a state-of-the-art open-source model. This is GPTOSS. Now, I think the mystery model Horizon Alpha that was on Open Router is actually this open source model from OpenAI, although they have not confirmed that to me, but we do have an incredible model.Let me tell you about all the details. So, first it comes in two sizes, a 120 billion parameter version and a 20 billion parameter version. These are state-of-the-art openweight language models.Open weight. So ...
OpenAI’s GPT-5 Shines in Coding Tasks — The Information
2025-08-05 03:19
Summary of Key Points from the Conference Call Industry: Artificial Intelligence (AI) Core Insights and Arguments - **Introduction of GPT-5**: OpenAI's upcoming model, GPT-5, is generating positive early feedback, particularly in coding tasks, which is a critical area for the company [3][4][5] - **Performance Improvements**: GPT-5 shows enhanced capabilities in various domains, especially in software engineering, outperforming previous models and rival Anthropic's Claude Sonnet 4 in specific tests [7][10] - **Integration of Models**: The model aims to combine traditional large language models (LLMs) with reasoning models, allowing users to control the reasoning capabilities based on task complexity [5][6] - **Practical Applications**: GPT-5 is better equipped to handle real-world programming challenges, such as modifying complex legacy code, which has been a historical weakness for OpenAI's models [8][9] - **Market Implications**: The success of GPT-5 could significantly impact OpenAI's business and its competitors, as coding assistants powered by Anthropic's models are projected to generate substantial revenue for Anthropic [10][12] Additional Important Content - **Caveats on Model Understanding**: There is uncertainty regarding the exact nature of GPT-5, with speculation that it may function as a router directing queries rather than a single, unified model [13] - **Future Improvements**: Experts suggest that future advancements may stem more from post-training reinforcement learning rather than scaling up pretraining processes [15][17] - **Investor Sentiment**: OpenAI executives are optimistic about the potential for future models, claiming they can reach "GPT-8" using current model structures [17] Implications for Stakeholders - **Impact on Suppliers and Investors**: Strong performance of GPT-5 is seen as beneficial for OpenAI's chip supplier Nvidia and data center firms, as well as for equity and debt investors concerned about AI development trajectories [12]
Inside OpenAI’s Rocky Path to GPT-5 — The Information
2025-08-05 03:19
Ask OpenAI CEO Sam Altman. Art by Mike Sullivan Exclusive Inside OpenAI's Rocky Path to GPT-5 The troubles OpenAI has faced in developing GPT-5 point to slowing AI progress across the industry. Researchers believe advances in reinforcement learning will help to overcome that. By Stephanie Palazzolo, Erin Woo and Amir Efrati OpenAI made waves across the industry in December when it published the results from its tests of artificial intelligence that performs better on tasks when it gets more time and computi ...
Supercharging Startups with AI Agents | Mohit Ambani | TEDxSGGSCC Studio
TEDx Talks· 2025-08-01 15:16
AI Fundamentals - Generative AI works by probabilistically filling in the blanks based on pre-trained data, essentially acting as an advanced autocomplete [5][6] - Pre-training involves feeding massive amounts of unstructured data into large language models (LLMs), requiring significant energy and resources for processing and refinement [7][8][9] - Reinforcement learning and reasoning enhance AI accuracy by implementing strategic action and assigning scores to generated results, reducing hallucinations [11][12] AI Applications in Business - AI agents can automate tasks across various tools and interfaces, acting as digital employees capable of understanding unstructured data and executing actions [13][14] - AI tools can significantly scale business operations, as demonstrated by a cosmetics brand using an AI agent to streamline influencer marketing, reducing the required team size and time [21][22] - AI agents are being used in sales to personalize outreach and automate follow-ups, leading to increased order rates and reduced campaign costs [24] - AI is being applied in operations to automate pricing and quotation processes, monitor safety incidents, and improve response times [25][26] - AI is aiding in financial analysis by enabling rapid screening of stocks based on specific criteria, leveraging open-source tools to retrieve data from millions of PDF files [28] AI's Impact and Future - AI is evolving beyond replacing existing processes to facilitating new inventions, such as a novel use case for magnetic ink in supply chain management [30][31][32][33] - The industry is rapidly advancing towards artificial generalized intelligence (AGI) and artificial super intelligence (ASI), with continuous improvements in AI models and capabilities [34] - The fundamental question is raised about the role of humans in a world where many jobs can be automated, emphasizing the importance of curiosity and relentless questioning [34][35]
China Went HARD...
Matthew Berman· 2025-07-24 00:30
Model Performance & Capabilities - Quen 3 coder rivals Anthropic's Claude family in coding performance, achieving 69.6% on SWEBench verified compared to Claude Sonnet 4's 70.4% [1] - The most powerful variant, Quen 3 coder 480B, features 480 billion parameters with 35 billion active parameters as a mixture of experts model [2][3] - The model supports a native context length of 256k tokens and up to 1 million tokens with extrapolation methods, enhancing its capabilities for tool calling and agentic uses [4] Training Data & Methodology - The model was pre-trained on 7.5 trillion tokens with a 70% code ratio, improving coding abilities while maintaining general and math skills [5] - Quen 2.5 coder was leveraged to clean and rewrite noisy data, significantly improving overall data quality [6] - Code RL training was scaled on a broader set of real-world coding tasks, focusing on diverse coding tasks to unlock the full potential of reinforcement learning [7][8] Tooling & Infrastructure - Quen launched Quen code, a command line tool adapted from Gemini code, enabling agentic and multi-turn execution with planning [2][5][9] - A scalable system was built to run 20,000 independent environments in parallel, leveraging Alibaba cloud's infrastructure for self-play [10] Open Source & Accessibility - The model is hosted on HuggingFace, making it free to use and try out [11]
ChatGPT Agent 团队专访:基模公司做通用 Agent,和 Manus 有什么不一样?
Founder Park· 2025-07-23 13:23
Core Insights - The article discusses the introduction of ChatGPT Agent by OpenAI, which combines deep research and operator capabilities to create a versatile agent capable of performing complex tasks without losing control over extended periods [1][6][13]. Group 1: ChatGPT Agent Overview - ChatGPT Agent is described as the first fully "embodied" agent on a computer, allowing seamless transitions between visual browsing, text analysis, and code execution [1][7]. - The agent can perform complex tasks for up to one hour without losing control, showcasing its advanced capabilities [13][19]. Group 2: Training Methodology - The training of ChatGPT Agent involved reinforcement learning (RL) where the model was given a variety of tools and allowed to discover optimal strategies independently [2][10]. - The agent utilizes a combination of a text browser and a graphical interface, enhancing its efficiency and flexibility in task execution [6][8]. Group 3: Functionality and Use Cases - ChatGPT Agent can handle various tasks, including deep research, online shopping, and creating presentations, making it suitable for both consumer and business applications [13][15]. - Users have reported practical applications such as data extraction from Google Docs and generating financial models, indicating its versatility [16][17]. Group 4: Future Developments - The team envisions continuous improvements in the agent's accuracy and capabilities, aiming to expand its functionality across a wide range of tasks [23][33]. - There is an emphasis on enhancing user interaction and exploring new paradigms for collaboration between users and the agent [34][36]. Group 5: Safety and Risk Management - The article highlights the increased risks associated with the agent's ability to interact with the real world, necessitating robust safety measures and ongoing monitoring [35][36]. - The development team is focused on creating a comprehensive safety framework to mitigate potential harmful actions by the agent [37][39].
OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet
Sequoia Capital· 2025-07-22 09:00
Agent Capabilities & Architecture - OpenAI has created a new agent in ChatGPT that can perform tasks that would take humans a long time, by giving the agent access to a virtual computer [6] - The agent has access to a text browser (similar to deep research tool), a virtual browser (similar to operator tool with full GUI access), and a terminal for running code and calling APIs [6][7][8] - All tools have shared state, allowing for flexible and complex tasks [9] - The agent is trained using reinforcement learning across thousands of virtual machines, allowing it to discover optimal strategies for tool usage [3] Development & Training - The agent is a collaboration between the Deep Research and Operator teams, combining the strengths of both [6] - The agent is trained with reinforcement learning, rewarding efficient and correct task completion [36] - The model figures out when to use which tool through experimentation, without explicit instructions [38] - Reinforcement learning is data-efficient, allowing new capabilities to be taught with smaller, high-quality datasets [75][76] Safety & Limitations - Safety training and mitigations were a core part of the development process due to the agent's ability to take actions with external side effects [44] - The team has implemented a monitor that watches for suspicious activity, similar to antivirus software [48] - Date picking remains a difficult task for the AI system [4][83][84] Future Directions - Future development will focus on improving the accuracy and performance across a wide distribution of tasks [62][85] - The team is exploring different ways of interacting with the agent, beyond the current chat-based interface [68][86] - Personalization and memory for agents will be important for future development, allowing agents to do things without being explicitly asked [67][68]
自动驾驶论文速递 | 世界模型、端到端、VLM/VLA、强化学习等~
自动驾驶之心· 2025-07-21 04:14
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Orbis model developed by Freiburg University, which significantly improves long-horizon prediction in driving world models [1][2]. Group 1: Orbis Model Contributions - The Orbis model addresses shortcomings in contemporary driving world models regarding long-horizon generation, particularly in complex maneuvers like turns, and introduces a trajectory distribution-based evaluation metric to quantify these issues [2]. - It employs a hybrid discrete-continuous tokenizer that allows for fair comparisons between discrete and continuous prediction methods, demonstrating that continuous modeling (based on flow matching) outperforms discrete modeling (based on masked generation) in long-horizon predictions [2]. - The model achieves state-of-the-art (SOTA) performance with only 469 million parameters and 280 hours of monocular video data, excelling in complex driving scenarios such as turns and urban traffic [2]. Group 2: Experimental Results - The Orbis model achieved a Fréchet Video Distance (FVD) of 132.25 on the nuPlan dataset for 6-second rollouts, significantly lower than other models like Cosmos (291.80) and Vista (323.37), indicating superior performance in trajectory prediction [6][7]. - In turn scenarios, Orbis also outperformed other models, achieving a FVD of 231.88 compared to 316.99 for Cosmos and 413.61 for Vista, showcasing its effectiveness in challenging driving conditions [6][7]. Group 3: LaViPlan Framework - The LaViPlan framework, developed by ETRI, utilizes reinforcement learning with verifiable rewards to address the misalignment between visual, language, and action components in autonomous driving, achieving a 19.91% reduction in Average Displacement Error (ADE) for easy scenarios and 14.67% for hard scenarios on the ROADWork dataset [12][14]. - It emphasizes the transition from linguistic fidelity to functional accuracy in trajectory outputs, revealing a trade-off between semantic similarity and task-specific reasoning [14]. Group 4: World Model-Based Scene Generation - The University of Macau introduced a world model-driven scene generation framework that enhances dynamic graph convolution networks, achieving an 83.2% Average Precision (AP) and a 3.99 seconds mean Time to Anticipate (mTTA) on the DAD dataset, marking significant improvements [23][24]. - This framework combines scene generation with adaptive temporal reasoning to create high-resolution driving scenarios, addressing data scarcity and modeling limitations [24]. Group 5: ReAL-AD Framework - The ReAL-AD framework proposed by Shanghai University of Science and Technology and the Chinese University of Hong Kong integrates a three-layer human cognitive decision-making model into end-to-end autonomous driving, improving planning accuracy by 33% and reducing collision rates by 32% [33][34]. - It features three core modules that enhance situational awareness and structured reasoning, leading to significant improvements in trajectory planning accuracy and safety [34].