Workflow
Matthew Berman
icon
Search documents
OpenAI just changed web browsing forever... (ChatGPT Atlas)
Matthew Berman· 2025-10-21 20:23
OpenAI just dropped their web browser and it is incredibly impressive. They have completely reimagined what the web browser should be from the search and URL bar all the way to what an actual assistant controlling your browser will be like. So in this video I'm going to break it all down for you.Let's get started. So this is Chat GPT Atlas, an AI native browser from Open AI. They just finished the live stream.It was Sam Alman and a bunch of the folks who were responsible for creating Atlas and they showed o ...
Andrej Karpathy devastates AI optimists...
Matthew Berman· 2025-10-20 21:22
AGI Timelines and Agent Development - Andre Karpathy 认为 AGI (Artificial General Intelligence,通用人工智能) 还需要 10 年以上的时间才能实现 [1] - 行业普遍认为 2025 年至 2035 年将是 Agent (代理) 的十年,但要使 Agent 真正可用并普及到整个经济领域,还需要大量的开发工作 [1] - 行业观察到 LLM (Large Language Model,大型语言模型) 在近年取得了巨大进展,但仍然存在大量的基础工作、集成工作、物理世界的传感器和执行器、社会工作、安全工作以及研究工作需要完成 [1] Learning Approaches and Model Capabilities - Karpathy 认为 LLM 的学习方式更像是“幽灵”,而不是动物,动物天生就具备大量通过进化预先设定的智能 [1][2] - 行业对强化学习 (RL) 的有效性表示怀疑,认为其每次计算所获得的学习信号较差,并倾向于 agentic 交互,即为 Agent 创建一个可以进行实验和学习的“游乐场” [2] - 行业正在探索系统提示学习 (System Prompt Learning),这是一种通过改变系统提示来影响模型行为的新学习范式,类似于人类做笔记 [2][3] Model Size and Memorization - 行业趋势是模型尺寸先增大后减小,认知核心 (Cognitive Core) 的概念是剥离 LLM 的百科全书式知识,使其更擅长泛化 [3] - 行业对当前 Agent 行业提出了批评,认为其在工具方面过度投入,而忽略了当前的能力水平,并强调与 LLM 协作,结合人类的优势和 LLM 的长处 [3]
AI News: NVIDIA DGX-1, GPT-6 2025, Claude Skills, Waymo DDOS, Datacenters in Space, and more!
Matthew Berman· 2025-10-18 15:34
This video is brought to you by Stack AI. More on them later. GPT6 might be coming by the end of the year. This guy on CNBC said he just got done talking to Brad Gersonner, a prominent figure in Silicon Valley, and he just said GPT6 is coming by the end of this year.That's 2 and 1/2 months from now. Now, that comes right on the heels of GPT5. And honestly, I don't think it's going to be happening.It would be very weird to have this massive launch GPT5 really a fundamental shift in the way users interact wit ...
Anthropic Founder says we should be afraid....
Matthew Berman· 2025-10-17 14:30
Make no mistake, what we are dealing with is a real and mysterious creature, not a simple and predictable machine. This is from anthropic co-founder Jack Clark. He recently published some of his comments from a talk he did in Berkeley in which he conveys his fear of this steady march towards artificial general intelligence.So, we're going to go over what he's so afraid of. Then we're going to give the flip side to show who's thinking that this is just fear-mongering and regulatory capture. Now, before I get ...
I think I just broke ChatGPT's Brain... ❌🧠❌
Matthew Berman· 2025-10-16 23:51
AI Model Behavior - The AI model exhibits confusion and difficulty in identifying the correct seahorse emoji [1][2] - The AI model demonstrates iterative thinking and revision processes [1] - The AI model's processing can be prolonged and may even halt during complex tasks [1] Task Complexity - Emoji recognition can be a challenging task for AI models [2] - The AI model struggles with nuanced distinctions between similar visual representations (seahorse vs fish) [2]
Which AI Model Makes the Best Images?
Matthew Berman· 2025-10-16 18:49
Image Generation Model Comparison - The report compares four image generation models: Quen ImageEdit Plus, Nano Banana, GPT Image 1, and Seedream across various image editing tasks [1][2] - The models are tested on their ability to composite images, transport objects, match lighting, and perform other complex manipulations [2][4] - The open-source script developed by the team allows users to automatically run prompts and upload images to all four models for comparison [11] Model Performance Highlights - Quen ImageEdit Plus excels in tasks requiring realistic lighting and object integration, often outperforming Nano Banana [4][5] - GPT Image 1 demonstrates strength in maintaining style and consistency across images, particularly in portrait and complex scene generation [3][4] - Nano Banana shows proficiency in image consistency and material transformation tasks, such as recoloring and blueprint rendering [31][33] - Seedream shows good performance in specific tasks like motion dynamics and adding graffiti [10][48][67] Task-Specific Performance - In "bleeding edge" tasks pushing model limits, GPT Image 1 often emerges as the winner, particularly in tasks requiring precise anatomical detail and measurement [20][22] - For object removal and reconstruction, Nano Banana consistently delivers the most realistic and seamless results [54][55] - In style transfer tasks, Quen ImageEdit Plus and GPT Image 1 often produce the most visually appealing and accurate results [60][61] - For adding text to images, Nano Banana and GPT Image 1 demonstrate strengths in perspective and transparency [66][68] - In weather effects, Quen ImageEdit Plus and GPT Image 1 excel in creating realistic snowfall and rain effects [69][71] Product Placement - Dell Technologies sponsors the video, highlighting its Dell Pro Max laptops featuring Nvidia RTX Pro Blackwell chips with up to 32 GB of GPU memory, suitable for AI workloads [8][9]
Forward Future Live at Dreamforce! 10/15/2025
Matthew Berman· 2025-10-15 14:59
AI Resources & Community - Offers a free "Humanities Last Prompt Engineering Guide" [1] - Provides a free "The Matthew Berman Vibe Coding Playbook" [1] - Features a newsletter for regular AI updates [1] - Showcases a directory of AI tools [1] - Maintains a Discord community [1] Social Media Presence - Active on X (formerly Twitter) under two accounts [1] - Presence on Instagram [1] - Content available on TikTok [1] Media & Sponsorship - Provides a link for media/sponsorship inquiries [1]
Forward Future Live | 10/10/25
Matthew Berman· 2025-10-10 16:24
Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Forward Future X: https://x.com/forward_future_ 👉🏻 Instagram: https://www.instagram.com/matthewberman_ai 👉🏻 Discord: https://discord.gg/xxysSXBxFW 👉🏻 TikTok: https://www ...
This Tiny Model is Insane... (7m Parameters)
Matthew Berman· 2025-10-10 16:05
Model Performance & Innovation - A 7 million parameter model (TRM - Tiny Recursive Model) is outperforming larger frontier models on reasoning benchmarks [1][2] - TRM achieves 45% test accuracy on ARC AGI 1 and 8% on ARC AGI 2, surpassing models with significantly more parameters (less than 0.01% of the parameters) [2] - The core innovation lies in recursive reasoning with a tiny network, moving away from simply predicting the next token [6][23] - Deep supervision doubles accuracy compared to single-step supervision (from 19% to 39%), while recursive hierarchical reasoning provides incremental improvements [16] - TRM significantly improves performance on tasks like Sudoku (55% to 87%) and Maze (75% to 85%) [18] Technical Approach & Implications - TRM uses a single tiny network with two layers, leveraging recursion as a "virtual depth" to improve reasoning [23][27][28] - The model keeps two memories: its current guess and the reasoning trace, updating both with each recursion [25] - The approach simplifies hierarchical reasoning, moving away from complex mathematical theorems and biological arguments [22][23] - Recursion may represent a new scaling law, potentially enabling powerful models to run on devices like computers and phones [34] Comparison with Existing Models - Traditional LLMs struggle with hard reasoning problems due to auto-regressive generation and reliance on techniques like chain of thought and pass at K [3][5][6] - HRM (Hierarchical Reasoning Model), a previous approach, uses two networks operating at different hierarchies, but its benefits are not well-understood [9][20][21] - TRM outperforms HRM by simplifying the approach and focusing on recursion, achieving greater improvements with less depth [30] - While models like Grok for Thinking perform better on some benchmarks, they require significantly more parameters (over a trillion) compared to TRM's 7 million [32]
Greg Brockman: AGI, Sora 2, Bottlenecks, White Collar, Proactive AI, and more!
Matthew Berman· 2025-10-08 18:48
AI Trends & Future Predictions - Discussion on scaling Sora, indicating the industry's focus on improving AI model capabilities [1] - Exploration of transformer models' future relevance in AI development [1] - Consideration of proactive AI and compressing intelligence as key areas of advancement [1] - Speculation on the potential of fully generated software and its implications [1] - Examination of Agentic Commerce Protocol, suggesting a move towards AI-driven commercial interactions [1] - Predictions for 2026, including the possibility of Artificial General Intelligence (AGI) [1] Technology & Infrastructure - Analysis of building with AMD and other kinds of compute, highlighting the importance of hardware infrastructure [1] - Identification of bottlenecks in AI development, suggesting areas needing improvement [1] - Discussion on decoupling of the internet, potentially related to data sovereignty or decentralized technologies [1] Job Market & Industry Impact - Addressing concerns about job security in the face of AI advancements [1] - Exploration of building on top of OpenAI, indicating the platform's significance in the AI ecosystem [1] - Consideration of the role of humans in the loop, emphasizing the importance of human-AI collaboration [1]