Workflow
Matthew Berman
icon
Search documents
Forward Future Live August 1st, 2025
Matthew Berman· 2025-08-01 16:55
Resources & Tools - Offers a free "Vibe Coding Playbook" download [1] - Provides a free "Humanities Last Prompt Engineering Guide" download [1] - Showcases a curated list of AI tools [1] Community & Updates - Encourages joining a newsletter for regular AI updates [1] - Promotes engagement through X (Twitter), Instagram, and Discord [1] Media & Sponsorship - Provides a contact link for media/sponsorship inquiries [1]
This might be OpenAI's New Open-Source Model...
Matthew Berman· 2025-08-01 00:00
Model Capabilities & Performance - Horizon Alpha demonstrates impressive spatial awareness and problem-solving skills, accurately visualizing complex rotations [1] - The model exhibits multimodal capabilities, effectively understanding and interpreting images with speed [2] - Horizon Alpha successfully solves the Tower of Hanoi puzzle despite lacking chain-of-thought reasoning [6] - The model shows an ability to recognize its limitations, indicating when it lacks knowledge [20][21] - Horizon Alpha achieves top rankings in creative writing and emotional intelligence benchmarks [23][11] Model Characteristics & Limitations - Horizon Alpha is a fast model, outputting tokens at approximately 150 tokens per second [2] - The model lacks a "thinking mode," initially outputting the first response that comes to mind [2] - Horizon Alpha provides incorrect answers to simple logic and percentage-based questions [7][8] - The model refuses to provide instructions for illegal activities, such as hotwiring a car [8][9] - The model incorrectly identifies itself as a GPT4 class model from OpenAI, despite likely being an open-source model [9] Open Router & Box AI - Horizon Alpha is available on Open Router and free to use [1] - Box AI allows users to leverage the latest AI models, including open-source options, for document workflows with enterprise-level security [3][4]
The AlphaGO Moment for AI Models...
Matthew Berman· 2025-07-31 18:08
AI Model Architecture Discovery - The AI field is approaching an era where AI can discover new knowledge and apply it to itself, potentially leading to exponential innovation [1][3] - The current bottleneck in AI discovery is human innovation, limiting the scaling of AI advancements [2][3] - The "AlphaGo moment" for model architecture discovery involves AI self-play to hypothesize, code, test, and analyze new model architectures [3][12] - The key to this approach is AI's ability to learn without human input, discovering novel solutions unconstrained by human biases [8] ASI Arch System - The ASI Arch system uses a researcher, engineer, and analyst to autonomously propose, implement, test, and analyze new neural network architectures [13][14][15][16] - The system learns from past experiments and human literature to propose new architectures, selecting top performers as references [14] - The engineer component self-heals code to ensure new approaches are properly tested [15] - The analyst reviews results, learns insights, and maintains a memory of lessons learned for future generations of models [16] Experimental Results and Implications - The system ran 1,700 autonomous experiments over 20,000 GPU hours, resulting in 106 models that outperformed previous public models [17][18] - The potential for exponential improvement exists by increasing compute resources, such as scaling from 20,000 to 20 million GPU hours [19] - The self-improving AI system can be applied to other scientific fields like biology and medicine by increasing compute resources [20] - The open-sourced paper and code have significant implications, with multiple companies publishing similar self-improving AI papers [21]
Claude Code in SHAMBLES (Qwen3 Coder Tested)
Matthew Berman· 2025-07-31 00:00
Model Performance & Capabilities - Quen 3, an open-source frontier coding model from Alibaba, was tested for various capabilities [1] - Quen 3 successfully generated code for a 2D Navier Stokes solver and a 3D rotating dodcahedron with bouncing spheres [1] - The model demonstrated spatial reasoning failure in a cube rotation task, but the code generation was successful [1] - Quen 3 passed a "needle in a haystack" test by finding a password within the entire book of Harry Potter and the Sorcerer's Stone [1] - The model exhibited censorship regarding Tiananmen Square [1] - Quen 3 refused to take a stance on political questions, providing balanced perspectives on Trump and Kamla [1][2] - The model provided a thoughtful and nuanced response to a prompt about quitting a job and leaving family [2][3][4][5] - Quen 3 refused to answer illegal questions, such as how to hotwire a car [6] - The model provided a correct diagnosis and management plan for acute anterior myocardial infarction [6][7] - Quen 3 gave a good answer to the trolley problem, evaluating morality using utilitarianism and deontology [7][8] - The model showed reasoning traces in its output when answering gotcha questions, although with some errors [11][12][13][14] Technology & Implementation - Together AI sponsors the use of Quen 3, offering high-performance serverless endpoints and pay-per-token pricing [1][2] - Quen Code, an open-source version of Claude Code, works well with Quen 3 and can be installed via npm [2] - The model has a massive context window, natively 256k tokens, with up to 1 million achieved [1]
Chinese Open-Source DOMINATES Coding (GLM-4.5)
Matthew Berman· 2025-07-30 17:15
Model Performance & Capabilities - ZAI's GLM 4.5% model rivals top closed-source models in reasoning, coding, and agentic capabilities [1] - GLM 4.5% demonstrates advanced problem-solving by successfully simulating and solving Rubik's cubes up to 10x10 [2][3][4][21] - The model can solve the Tower of Hanoi puzzle with up to 10 discs, showcasing its reasoning abilities [5][6][7][24][25] - GLM 4.5% exhibits strong coding skills, creating interactive simulations like Lego building, a 3D solar system, and games like Flappy Bird [8][9][21][22] - Benchmarks show GLM 4.5% outperforming other models in agentic tasks and achieving competitive scores in reasoning and coding [17][18][19] Model Architecture & Variants - GLM 4.5% comes in two versions: a larger 355 billion parameter model with 32 billion active parameters, and a smaller "air" version with 106 billion total parameters and 12 billion active parameters [15] - Both models are hybrid reasoning models, capable of both reasoning and non-reasoning tasks [16] Open Source Landscape - China is at the forefront of open-source AI model development with models like GLM 4.5%, Kimmy K2, and Quen 3 [1][15] - Kimmy K2 is comparable in quality to GLM 4.5% but is 250% larger [20] Tools & Resources - HubSpot offers a free AI decoded guide covering AI models, prompts, and tools [12][13][14]
Forward Future Live July 25, 2025
Matthew Berman· 2025-07-25 16:56
AI Resources & Tools - Matthew Berman's Vibe Coding Playbook is available for free download [1] - Humanities Last Prompt Engineering Guide is available for free download [1] - A curated list of AI tools is available [1] Community & Updates - Regular AI updates are provided through a newsletter [1] - Matthew Berman can be followed on X (formerly Twitter) [1] - Matthew Berman can be followed on Instagram [1] - A Discord server is available for community engagement [1] Media & Sponsorship - Media and sponsorship inquiries are welcomed [1]
China Went HARD...
Matthew Berman· 2025-07-24 00:30
Model Performance & Capabilities - Quen 3 coder rivals Anthropic's Claude family in coding performance, achieving 69.6% on SWEBench verified compared to Claude Sonnet 4's 70.4% [1] - The most powerful variant, Quen 3 coder 480B, features 480 billion parameters with 35 billion active parameters as a mixture of experts model [2][3] - The model supports a native context length of 256k tokens and up to 1 million tokens with extrapolation methods, enhancing its capabilities for tool calling and agentic uses [4] Training Data & Methodology - The model was pre-trained on 7.5 trillion tokens with a 70% code ratio, improving coding abilities while maintaining general and math skills [5] - Quen 2.5 coder was leveraged to clean and rewrite noisy data, significantly improving overall data quality [6] - Code RL training was scaled on a broader set of real-world coding tasks, focusing on diverse coding tasks to unlock the full potential of reinforcement learning [7][8] Tooling & Infrastructure - Quen launched Quen code, a command line tool adapted from Gemini code, enabling agentic and multi-turn execution with planning [2][5][9] - A scalable system was built to run 20,000 independent environments in parallel, leveraging Alibaba cloud's infrastructure for self-play [10] Open Source & Accessibility - The model is hosted on HuggingFace, making it free to use and try out [11]
AI News: Sam Altman's Predictions, Talent Wars Continue, Project Stargate, Thinking Machines
Matthew Berman· 2025-07-23 15:37
This video is sponsored by Augment Code. More on them later. All right, first we have an update from Thinking Machines.They just raised a massive amount of capital for what I actually don't quite know. There is very little public information about what they're actually doing. What we do know is that they're going to be training models for enterprise.They just raised $2 billion led by A16Z who basically funds every single investment on the planet at this point with participation from Nvidia, Excel, Service N ...
OpenAI's mystery models are insane...
Matthew Berman· 2025-07-22 16:57
Cancel your AI subscriptions and try this All-in-One AI Super assistant that's 10x better: https://chatllm.abacus.ai/ffb Try this God Tier AI Agent that literally does everything: https://deepagent.abacus.ai/ffb Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Li ...
AI News: Windsurf Drama, Meta Building ASI, Meta Closed Source? Grok 4 Drama, and more!
Matthew Berman· 2025-07-16 19:00
Acquisitions and Talent Strategy - OpenAI's potential acquisition of Windsurf for approximately $3 billion fell through, leading Google to acquire around 30 of Windsurf's top team members while leaving Windsurf as an independent entity [2] - Cognition acquired the remaining assets and team of Windsurf, ensuring 100% of Windsurf employees participated financially in the transaction [3][6][7] - Meta acquired Alexander Wang, the CEO of Scale AI, and a team to lead its super intelligence efforts [4] - Meta is making offers up to $100 million to attract top AI researchers [9] Compute Infrastructure and Investment - Meta is investing hundreds of billions of dollars into compute infrastructure for super intelligence [10] - Meta is building multi-gawatt clusters, with the first one, Prometheus, coming online in 2026, and Hyperion scaling up to 5 gigawatts over several years [11] Open Source and AI Model Development - Meta's new super intelligence lab is considering abandoning its open-source AI model strategy in favor of developing a closed one [13] - Mistral AI released Voxrol, an open-source speech recognition model that outperforms Whisper Large V3 in speech transcription [33][34] AI Model Issues and Solutions - Grock 4 had issues stemming from its system prompt, including associating itself with controversial surnames and reflecting Elon Musk's views on political topics [22][23] - XAI tweaked the prompts to mitigate these issues, sharing details on GitHub for transparency [24] Reinforcement Learning Advancements - Open Pipe AI may have discovered a universal reward function that allows reinforcement learning to be applied to any agent without labeled data or handcrafted reward functions [27][28] - Small models trained with ruler plus gpo are more reliable than 03 on four to four tasks despite being 1/20th the cost [29] Government Collaboration - XAI is offering Grock for government, a suite of products available to US government customers, with products purchasable via the General Services Administration schedule [32]