Matthew Berman
Search documents
I think I just broke ChatGPT's Brain... ❌🧠❌
Matthew Berman· 2025-10-16 23:51
AI Model Behavior - The AI model exhibits confusion and difficulty in identifying the correct seahorse emoji [1][2] - The AI model demonstrates iterative thinking and revision processes [1] - The AI model's processing can be prolonged and may even halt during complex tasks [1] Task Complexity - Emoji recognition can be a challenging task for AI models [2] - The AI model struggles with nuanced distinctions between similar visual representations (seahorse vs fish) [2]
Which AI Model Makes the Best Images?
Matthew Berman· 2025-10-16 18:49
Image Generation Model Comparison - The report compares four image generation models: Quen ImageEdit Plus, Nano Banana, GPT Image 1, and Seedream across various image editing tasks [1][2] - The models are tested on their ability to composite images, transport objects, match lighting, and perform other complex manipulations [2][4] - The open-source script developed by the team allows users to automatically run prompts and upload images to all four models for comparison [11] Model Performance Highlights - Quen ImageEdit Plus excels in tasks requiring realistic lighting and object integration, often outperforming Nano Banana [4][5] - GPT Image 1 demonstrates strength in maintaining style and consistency across images, particularly in portrait and complex scene generation [3][4] - Nano Banana shows proficiency in image consistency and material transformation tasks, such as recoloring and blueprint rendering [31][33] - Seedream shows good performance in specific tasks like motion dynamics and adding graffiti [10][48][67] Task-Specific Performance - In "bleeding edge" tasks pushing model limits, GPT Image 1 often emerges as the winner, particularly in tasks requiring precise anatomical detail and measurement [20][22] - For object removal and reconstruction, Nano Banana consistently delivers the most realistic and seamless results [54][55] - In style transfer tasks, Quen ImageEdit Plus and GPT Image 1 often produce the most visually appealing and accurate results [60][61] - For adding text to images, Nano Banana and GPT Image 1 demonstrate strengths in perspective and transparency [66][68] - In weather effects, Quen ImageEdit Plus and GPT Image 1 excel in creating realistic snowfall and rain effects [69][71] Product Placement - Dell Technologies sponsors the video, highlighting its Dell Pro Max laptops featuring Nvidia RTX Pro Blackwell chips with up to 32 GB of GPU memory, suitable for AI workloads [8][9]
Forward Future Live at Dreamforce! 10/15/2025
Matthew Berman· 2025-10-15 14:59
AI Resources & Community - Offers a free "Humanities Last Prompt Engineering Guide" [1] - Provides a free "The Matthew Berman Vibe Coding Playbook" [1] - Features a newsletter for regular AI updates [1] - Showcases a directory of AI tools [1] - Maintains a Discord community [1] Social Media Presence - Active on X (formerly Twitter) under two accounts [1] - Presence on Instagram [1] - Content available on TikTok [1] Media & Sponsorship - Provides a link for media/sponsorship inquiries [1]
Forward Future Live | 10/10/25
Matthew Berman· 2025-10-10 16:24
AI Resources & Guides - Offers a free "Humanities Last Prompt Engineering Guide" [1] - Provides a free "The Matthew Berman Vibe Coding Playbook" [1] - Curates a list of "The Best AI Tools" [1] Community & Updates - Invites users to join a newsletter for regular AI updates [1] - Promotes engagement through various social media platforms [1] Media & Sponsorship - Provides a contact link for media/sponsorship inquiries [1]
This Tiny Model is Insane... (7m Parameters)
Matthew Berman· 2025-10-10 16:05
Model Performance & Innovation - A 7 million parameter model (TRM - Tiny Recursive Model) is outperforming larger frontier models on reasoning benchmarks [1][2] - TRM achieves 45% test accuracy on ARC AGI 1 and 8% on ARC AGI 2, surpassing models with significantly more parameters (less than 0.01% of the parameters) [2] - The core innovation lies in recursive reasoning with a tiny network, moving away from simply predicting the next token [6][23] - Deep supervision doubles accuracy compared to single-step supervision (from 19% to 39%), while recursive hierarchical reasoning provides incremental improvements [16] - TRM significantly improves performance on tasks like Sudoku (55% to 87%) and Maze (75% to 85%) [18] Technical Approach & Implications - TRM uses a single tiny network with two layers, leveraging recursion as a "virtual depth" to improve reasoning [23][27][28] - The model keeps two memories: its current guess and the reasoning trace, updating both with each recursion [25] - The approach simplifies hierarchical reasoning, moving away from complex mathematical theorems and biological arguments [22][23] - Recursion may represent a new scaling law, potentially enabling powerful models to run on devices like computers and phones [34] Comparison with Existing Models - Traditional LLMs struggle with hard reasoning problems due to auto-regressive generation and reliance on techniques like chain of thought and pass at K [3][5][6] - HRM (Hierarchical Reasoning Model), a previous approach, uses two networks operating at different hierarchies, but its benefits are not well-understood [9][20][21] - TRM outperforms HRM by simplifying the approach and focusing on recursion, achieving greater improvements with less depth [30] - While models like Grok for Thinking perform better on some benchmarks, they require significantly more parameters (over a trillion) compared to TRM's 7 million [32]
Greg Brockman: AGI, Sora 2, Bottlenecks, White Collar, Proactive AI, and more!
Matthew Berman· 2025-10-08 18:48
AI Trends & Future Predictions - Discussion on scaling Sora, indicating the industry's focus on improving AI model capabilities [1] - Exploration of transformer models' future relevance in AI development [1] - Consideration of proactive AI and compressing intelligence as key areas of advancement [1] - Speculation on the potential of fully generated software and its implications [1] - Examination of Agentic Commerce Protocol, suggesting a move towards AI-driven commercial interactions [1] - Predictions for 2026, including the possibility of Artificial General Intelligence (AGI) [1] Technology & Infrastructure - Analysis of building with AMD and other kinds of compute, highlighting the importance of hardware infrastructure [1] - Identification of bottlenecks in AI development, suggesting areas needing improvement [1] - Discussion on decoupling of the internet, potentially related to data sovereignty or decentralized technologies [1] Job Market & Industry Impact - Addressing concerns about job security in the face of AI advancements [1] - Exploration of building on top of OpenAI, indicating the platform's significance in the AI ecosystem [1] - Consideration of the role of humans in the loop, emphasizing the importance of human-AI collaboration [1]
Forward Future Live | 8/3/25
Matthew Berman· 2025-10-03 16:20
Enter to win a sora 2 code! https://gleam.io/FC9uI/win-instant-access-to-openais-sora-app Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Forward Future X: https://x.com/forward_future_ 👉🏻 Instagram: https://www.insta ...
Sora 2 is unbelievable...
Matthew Berman· 2025-10-02 19:16
Sora 2 功能与特点 - Sora 2 能够生成各种风格的视频,包括名人乱斗、游戏场景、电影片段等 [1][4][5][8][9] - Sora 2 在人物面部扫描和还原方面表现出色,能准确捕捉人物的 likeness [11][12] - Sora 2 能够进行风格迁移,生成水彩、Pixar 风格、黏土动画等多种艺术风格的视频 [60][61] - Sora 2 具备一定的物理模拟能力,在液体、烟雾、火焰等效果的生成上表现优秀 [38][39][40][42][64] - Sora 2 在镜头控制方面表现良好,能够实现平移、变焦、焦点转移等复杂的镜头效果 [34][36][37] Sora 2 局限性 - Sora 2 在处理精细动作和物体操作时仍存在困难,例如手指与键盘的交互、纸牌的洗牌等 [27][28][29] - Sora 2 在生成包含多人场景时,容易出现人物变形、穿模等问题 [18][22][58][59] - Sora 2 在文本生成方面存在不足,容易出现文字错误、日期不准确等问题 [50][51] - Sora 2 生成的视频分辨率可能较低,影响观看体验 [37] - Sora 2 在版权方面存在争议,可能存在侵权风险 [1][3][65] Lindy 赞助与应用 - Lindy 是一款低代码平台,可以快速构建在线教育平台等应用,并在 5 分钟内部署 [13][14][15] - Lindy 具有内置的 QA 流程,可以确保代码的质量和可靠性 [14][15] - Lindy 为用户提供 20 美元的免费额度 [15]
Claude is BACK! (30 Hours of Thinking!)
Matthew Berman· 2025-10-01 18:08
Model Performance & Benchmarks - Claude Sonnet 4.5% is considered the best coding model, demonstrating a significant advancement in coding ability [1] - On SWE-bench verified evaluation, Claude Sonnet 4.5% outperforms Opus 4.1% by a substantial margin, exceeding almost 20 percentage points compared to GPT-4 Code Interpreter and Gemini 1.5 Pro [1] - The model achieves top scores on Terminal Bench (50%), agentic tool use, and computer use benchmarks, excelling in high school math (Amy 2025 with Python) with a 100% score [1] Long Horizon Tasks & Efficiency - AI's ability to complete long horizon tasks is exponentially increasing, with the task duration AI can handle doubling every 7 months [1] - Claude Sonnet 4.5% can think independently for over 30 hours, indicating its suitability for agentic applications [1] - The industry is shifting towards measuring AI intelligence per watt, emphasizing the importance of task and token efficiency [2] Future Applications & Industry Impact - Anthropic is showcasing a vision of the future of software with "Claude Imagine," demonstrating the ability to generate applications on the fly within a desktop environment [1][2] - Claude is increasingly used to write its own code, with Anthropic's CEO stating that it writes the majority of the code for Claude [9][10] - Box tested Claude Sonnet 4.5% for data extraction accuracy with Box AI on 40,000 fields across 1500+ documents, and the model performed four percentage points better than Sonnet 4 [3][4] Pricing & Availability - Claude Sonnet 4.5% is priced at $3 per million input tokens and $15 per million output tokens, the same as Sonnet 4 [11] - Anthropic recommends immediate upgrading to Claude Sonnet 4.5% for all use cases [11]
OpenAI just dropped Sora 2... And it's SCARY GOOD
Matthew Berman· 2025-09-30 22:44
Sora 2 Key Features and Capabilities - Sora 2 is introduced as a powerful imagination engine with new features, marking a significant advancement in realism through improved motion physics, IQ, and body mechanics [2] - The system includes synchronized dialogue and sound effects, enhancing the realism and immersion of generated videos [4] - Cameo feature allows users to insert themselves and their friends into generated scenes, creating new possibilities for creativity and joy [2] - The model demonstrates advanced world simulation capabilities, critical for training AI models to deeply understand the physical world [5][6] - Sora 2 addresses the overoptimistic tendencies of prior video models by ensuring more realistic physics, such as a basketball rebounding off the backboard when a shot is missed [11] Application and Accessibility - Sora 2 is initially available through an invite-only program, with plans to expand access via the Sora app [4] - The primary interface for Sora 2 is a mobile app, although the presenter expresses a preference for desktop use [5] - The Sora app interface resembles TikTok, featuring AI-generated videos [16] - Users can create videos by selecting participants and providing a text prompt, with videos taking 5-10 minutes to generate [19][20] - The app captures a user's likeness through a brief facial scan, enabling realistic representation in generated videos [21] Industry Impact and Competition - Sora 2 is positioned as a competitor to social media platforms like TikTok, Instagram, and Facebook [18] - The technology is expected to have a significant impact on the future of movies and video creation [17] - The development of world models like Sora 2 is seen as crucial for training embodied AI, such as robots, by providing a safe and cost-effective simulated environment for experimentation [6][7] - Dell Technologies promotes its Promax workstation with Nvidia RTX Pro Blackwells as a powerful solution for AI workloads, highlighting the increasing demand for high-performance computing in AI development [10]