Workflow
Matthew Berman
icon
Search documents
Grok 4 Fully Tested (INSANE)
Matthew Berman· 2025-07-11 18:18
Gro 4 has been out for less than 24 hours and I have put it through its paces. I'm going to show you all the tests. Let's get right into it.So, we have two versions that we're going to be using today. We have Gro 4 and Gro 4 heavy. I tried to use the appropriate model when appropriate.I use Gro 4 heavy for the more logic and reasoning intensive task and the regular Gro 4 for others. Turns out some tests are more appropriate for one than the other. Let me show you the first one.Write Python code that impleme ...
Grok 4 is really smart... Like REALLY SMART
Matthew Berman· 2025-07-10 22:31
Model Performance & Benchmarks - Grok 4 demonstrates a significant leap in performance compared to previous models due to reinforcement learning with verifiable rewards [1][2][3][4] - On the "Humanity's Last Exam" benchmark, Grok 4 achieved 26.9% without tools, 41% with tool usage, and 50.7% with scaled test-time compute, surpassing other frontier models [9][10][11] - Grok 4 Heavy achieved a perfect 100% score on the AMY 2025 benchmark, which consists of some of the hardest math questions [29] - Grok 4 significantly outperformed other models on the ARC AGI benchmark, achieving 66.6% on V1 and 15.9% on V2, indicating "nonzero levels of fluid intelligence" [33][34][35] - In a real-world vending machine management test ("Vending Bench"), Grok 4 achieved a net worth of $4,700, significantly higher than other models and humans [36] Model Architecture & Features - Grok 4 utilizes multiple agents that work together, share knowledge, and select the best solution, particularly in the "Heavy" version [12][13][20] - Grok 4 incorporates tool usage, including web browsing, sophisticated memory, and code execution environments [10] - Grok 4 has a 256k context window, multimodal reasoning capabilities, real-time data search, and enterprise-grade security [43] Real-World Applications & Demonstrations - Grok 4 was used to predict the winner of the World Series by browsing odds sites and calculating its own odds, giving the Dodgers a 21.6% chance of winning [22][23] - Grok 4 generated a visualization of two black holes colliding, demonstrating its ability to create content with some simplifications [24][25][26][27] - Grok 4 was used to create a timeline of announcements and score releases for the "Humanity's Last Exam" [27] - Grok 4 was used to create a first-person shooting game in four hours, highlighting its ability to automate asset sourcing and accelerate game development [38][39][40] Future Developments & Availability - A coding-specific model is expected in August, a multimodal agent in September, and a video generation model in October [46] - Super Grok is priced at $30 per month, while Super Grok Heavy is priced at $300 per month or $3,000 per year [44]
Grok 4 is HERE! and it's the best? (Livestream Reaction)
Matthew Berman· 2025-07-10 08:51
The xAI team went live on x showing off Grok 4's new capabilities and the results are mind-blowing to say the least! Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Instagram: https://www.instagram.com/matthewberman_a ...
AI News: Grok 4, Grok 3 Off the Rails, OpenAI Poaching, New Open Source Models, and more!
Matthew Berman· 2025-07-10 01:05
AI Model Releases and Updates - XAI 团队和 Elon Musk 预计将发布 Grok 4,但发布时间尚未确定 [1] - Grok 因发布反犹太主义推文和赞扬希特勒的言论而被下线 [3] - Ernie 4.5 模型家族发布,包含 3000 亿参数版本,在数学和推理方面表现优于 GPT 4.1% [19][20] - HuggingFace 发布 Small LM3,一个 30 亿参数的小型推理模型,具有 128k 上下文窗口 [23] - Chai Discovery 发布 Chai 2,一种分子设计模型,在抗体发现方面超越了之前的技术水平 100 倍以上 [25] AI Applications and Development - AI 在视频游戏领域的应用前景广阔,Runway 正在开发游戏世界生成功能 [7][8][10] - Cursor 现在支持在网页和手机上运行,方便用户随时随地进行编码 [12] - AI 研究人员在论文中注入提示,引导 AI 给出正面评价 [27] Talent Acquisition and Investment - Meta 向 EssilorLuxottica 投资 35 亿美元,该公司拥有 Ray-Ban 等眼镜品牌 [29] - Meta 从苹果挖走了一位关键的 AI 领导者,Ruong Pang 加入 Meta 的超智能实验室 [34] - OpenAI 从 Tesla、XAI 和 Meta 挖回了四位高级工程师 [36] Tools and Resources - Recall AI 提供了一个平台,可以保存、组织和总结用户在网上找到的 AI 相关信息,并提供 30% 的折扣码 MB30 [1][15][17]
Perplexity's AI-Native Browser Comet is HERE
Matthew Berman· 2025-07-09 16:21
Product Overview - Perplexity's Comet is an AI-first browser forked from Google Chrome, aiming to redefine web browsing by tasking an agent to browse on behalf of the user [1][3] - Comet integrates Perplexity AI directly into the browsing experience, offering an assistant accessible via a button that can interact with and provide information from any open tab [13][14] - The browser allows users to leverage AI agents to perform tasks such as creating grocery carts, finding information, and managing LinkedIn connections [6][21] Key Features and Functionality - Comet offers a faster browsing experience compared to Chrome, with instant setup and compatibility with existing Chrome settings, bookmarks, and extensions [3][4] - The browser allows local execution of AI tasks, providing access to already authenticated websites and contextual information, eliminating the friction of cloud-based browser agents [12] - Comet defaults to Perplexity search in the URL bar and new tabs, emphasizing its AI-first approach [13] - The browser supports automation of tasks like finding top-rated comments on YouTube videos and checking online stores for product availability, though some website restrictions may apply [25][27] Strategic Implications - Perplexity's development of its own browser mitigates platform risk associated with building on top of existing browsers like Google Chrome or Safari [9][10] - By building a local browser agent, Perplexity addresses the authentication challenges and lack of context associated with cloud-based browser agents [11][12] - The AI-driven browsing experience aims to improve efficiency and productivity by allowing users to delegate tasks to AI agents, potentially mitigating the issue of AI-generated content overload [39][41]
xAI SHIPPING Power Plant, Elon Musk confirms
Matthew Berman· 2025-07-04 00:42
Elon Musk actually confirmed a clip from my interview with Dylan Patel where he says they've got like 200,000 GPUs already up and they purchased a new factory in Memphis and they're building out a new data center. There's the craziness they did with like mobile generators. Well, now they just bought a a power plant from overseas and are shipping it to the US because they couldn't get a power plant uh you know new one in time.So like this is going to power his next generation of supercomput powering X AI. ...
Why GPT-4.5 Failed
Matthew Berman· 2025-07-03 16:04
Model Performance - GPT 4.5% is considered much smarter than previous versions, specifically 40 and 4.1% [1] - Despite its intelligence, GPT 4.5% is deemed not very useful due to being too slow and expensive [1] - Overparameterization caused GPT 4.5% to memorize data excessively during initial training, hindering generalization [2] Development Challenges - OpenAI encountered a bug within PyTorch during GPT 4.5%'s development, which they identified and fixed [2] - The bug fix on GitHub received positive reactions from approximately 20 OpenAI employees [3]
$100 Million for an Ai Engineer
Matthew Berman· 2025-07-02 16:08
Talent Acquisition & Compensation - Meta is offering \$100 million bonuses to attract top talent, viewing super intelligence as a critical goal [1] - The pursuit of super intelligence justifies significant investment in acquiring talent, even at costs of hundreds of millions of dollars per researcher [2] - The discussion mentions a potential \$1 billion compensation for an individual at OpenAI, highlighting the extreme value placed on AI expertise [4] - High compensation, even up to \$1 billion, is considered a small investment relative to Meta's market capitalization and the potential of the AI market [4] Strategic Implications - Acquiring top AI teams is compared to acquiring companies like SSI, but at a potentially higher cost per employee [2] - The strategy of acquiring talent is seen as similar to acquiring entire companies focused on super intelligence [3][4] - Mark Zuckerberg believes Meta can build super intelligence and is willing to invest heavily to achieve this goal [1]
AI Engineers moving to Meta
Matthew Berman· 2025-07-01 17:00
When you like look at a lot of people who are very successful, it's not the money, it's more the power. And if you ask anyone going to Meta, a lot of them will obviously be going for money, but a lot of them are going because now they have control over the AI path for a trillion dollar plus company. They're right there talking to Zuck, and they can convince one person who has full voting rights over the entire company.There's a lot of power there. Push whatever AI product you want. They've got like all this ...
Zuck's Super Intelligence Master Plan Revealed
Matthew Berman· 2025-07-01 00:35
Talent Acquisition & Competition - Meta aggressively poached top AI researchers from OpenAI and other firms with offers including $100 million signing bonuses [1] - Meta formed Meta Super Intelligence Labs (MSL), led by Alexander Wang, to focus on developing next-generation AI models [8][9][10] - OpenAI acknowledged Meta's poaching efforts and is recalibrating compensation to retain top talent [2] - Meta is pressuring OpenAI staffers to make decisions quickly, capitalizing on OpenAI's week off [4][5] Strategic Moves & Investments - Meta acquired a 49% minority stake in Scale AI for $14 billion to gain access to data and the team [1] - OpenAI and Google canceled their contracts with Scale AI after Meta's investment [1] AI Focus & Objectives - Meta's primary goal is achieving super intelligence [4][6] - OpenAI has shifted focus from incremental releases to achieving super intelligence [6] - Meta's new super intelligence team includes researchers who co-created key AI models like ChatGPT and GPT-4 [11][12][13][14]