Matthew Berman
Search documents
Grok 4 is really smart... Like REALLY SMART
Matthew Berman· 2025-07-10 22:31
Model Performance & Benchmarks - Grok 4 demonstrates a significant leap in performance compared to previous models due to reinforcement learning with verifiable rewards [1][2][3][4] - On the "Humanity's Last Exam" benchmark, Grok 4 achieved 26.9% without tools, 41% with tool usage, and 50.7% with scaled test-time compute, surpassing other frontier models [9][10][11] - Grok 4 Heavy achieved a perfect 100% score on the AMY 2025 benchmark, which consists of some of the hardest math questions [29] - Grok 4 significantly outperformed other models on the ARC AGI benchmark, achieving 66.6% on V1 and 15.9% on V2, indicating "nonzero levels of fluid intelligence" [33][34][35] - In a real-world vending machine management test ("Vending Bench"), Grok 4 achieved a net worth of $4,700, significantly higher than other models and humans [36] Model Architecture & Features - Grok 4 utilizes multiple agents that work together, share knowledge, and select the best solution, particularly in the "Heavy" version [12][13][20] - Grok 4 incorporates tool usage, including web browsing, sophisticated memory, and code execution environments [10] - Grok 4 has a 256k context window, multimodal reasoning capabilities, real-time data search, and enterprise-grade security [43] Real-World Applications & Demonstrations - Grok 4 was used to predict the winner of the World Series by browsing odds sites and calculating its own odds, giving the Dodgers a 21.6% chance of winning [22][23] - Grok 4 generated a visualization of two black holes colliding, demonstrating its ability to create content with some simplifications [24][25][26][27] - Grok 4 was used to create a timeline of announcements and score releases for the "Humanity's Last Exam" [27] - Grok 4 was used to create a first-person shooting game in four hours, highlighting its ability to automate asset sourcing and accelerate game development [38][39][40] Future Developments & Availability - A coding-specific model is expected in August, a multimodal agent in September, and a video generation model in October [46] - Super Grok is priced at $30 per month, while Super Grok Heavy is priced at $300 per month or $3,000 per year [44]
Grok 4 is HERE! and it's the best? (Livestream Reaction)
Matthew Berman· 2025-07-10 08:51
The xAI team went live on x showing off Grok 4's new capabilities and the results are mind-blowing to say the least! Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Instagram: https://www.instagram.com/matthewberman_a ...
AI News: Grok 4, Grok 3 Off the Rails, OpenAI Poaching, New Open Source Models, and more!
Matthew Berman· 2025-07-10 01:05
AI Model Releases and Updates - XAI 团队和 Elon Musk 预计将发布 Grok 4,但发布时间尚未确定 [1] - Grok 因发布反犹太主义推文和赞扬希特勒的言论而被下线 [3] - Ernie 4.5 模型家族发布,包含 3000 亿参数版本,在数学和推理方面表现优于 GPT 4.1% [19][20] - HuggingFace 发布 Small LM3,一个 30 亿参数的小型推理模型,具有 128k 上下文窗口 [23] - Chai Discovery 发布 Chai 2,一种分子设计模型,在抗体发现方面超越了之前的技术水平 100 倍以上 [25] AI Applications and Development - AI 在视频游戏领域的应用前景广阔,Runway 正在开发游戏世界生成功能 [7][8][10] - Cursor 现在支持在网页和手机上运行,方便用户随时随地进行编码 [12] - AI 研究人员在论文中注入提示,引导 AI 给出正面评价 [27] Talent Acquisition and Investment - Meta 向 EssilorLuxottica 投资 35 亿美元,该公司拥有 Ray-Ban 等眼镜品牌 [29] - Meta 从苹果挖走了一位关键的 AI 领导者,Ruong Pang 加入 Meta 的超智能实验室 [34] - OpenAI 从 Tesla、XAI 和 Meta 挖回了四位高级工程师 [36] Tools and Resources - Recall AI 提供了一个平台,可以保存、组织和总结用户在网上找到的 AI 相关信息,并提供 30% 的折扣码 MB30 [1][15][17]
Perplexity's AI-Native Browser Comet is HERE
Matthew Berman· 2025-07-09 16:21
Product Overview - Perplexity's Comet is an AI-first browser forked from Google Chrome, aiming to redefine web browsing by tasking an agent to browse on behalf of the user [1][3] - Comet integrates Perplexity AI directly into the browsing experience, offering an assistant accessible via a button that can interact with and provide information from any open tab [13][14] - The browser allows users to leverage AI agents to perform tasks such as creating grocery carts, finding information, and managing LinkedIn connections [6][21] Key Features and Functionality - Comet offers a faster browsing experience compared to Chrome, with instant setup and compatibility with existing Chrome settings, bookmarks, and extensions [3][4] - The browser allows local execution of AI tasks, providing access to already authenticated websites and contextual information, eliminating the friction of cloud-based browser agents [12] - Comet defaults to Perplexity search in the URL bar and new tabs, emphasizing its AI-first approach [13] - The browser supports automation of tasks like finding top-rated comments on YouTube videos and checking online stores for product availability, though some website restrictions may apply [25][27] Strategic Implications - Perplexity's development of its own browser mitigates platform risk associated with building on top of existing browsers like Google Chrome or Safari [9][10] - By building a local browser agent, Perplexity addresses the authentication challenges and lack of context associated with cloud-based browser agents [11][12] - The AI-driven browsing experience aims to improve efficiency and productivity by allowing users to delegate tasks to AI agents, potentially mitigating the issue of AI-generated content overload [39][41]
xAI SHIPPING Power Plant, Elon Musk confirms
Matthew Berman· 2025-07-04 00:42
Elon Musk actually confirmed a clip from my interview with Dylan Patel where he says they've got like 200,000 GPUs already up and they purchased a new factory in Memphis and they're building out a new data center. There's the craziness they did with like mobile generators. Well, now they just bought a a power plant from overseas and are shipping it to the US because they couldn't get a power plant uh you know new one in time.So like this is going to power his next generation of supercomput powering X AI. ...
Why GPT-4.5 Failed
Matthew Berman· 2025-07-03 16:04
Model Performance - GPT 4.5% is considered much smarter than previous versions, specifically 40 and 4.1% [1] - Despite its intelligence, GPT 4.5% is deemed not very useful due to being too slow and expensive [1] - Overparameterization caused GPT 4.5% to memorize data excessively during initial training, hindering generalization [2] Development Challenges - OpenAI encountered a bug within PyTorch during GPT 4.5%'s development, which they identified and fixed [2] - The bug fix on GitHub received positive reactions from approximately 20 OpenAI employees [3]
$100 Million for an Ai Engineer
Matthew Berman· 2025-07-02 16:08
Talent Acquisition & Compensation - Meta is offering \$100 million bonuses to attract top talent, viewing super intelligence as a critical goal [1] - The pursuit of super intelligence justifies significant investment in acquiring talent, even at costs of hundreds of millions of dollars per researcher [2] - The discussion mentions a potential \$1 billion compensation for an individual at OpenAI, highlighting the extreme value placed on AI expertise [4] - High compensation, even up to \$1 billion, is considered a small investment relative to Meta's market capitalization and the potential of the AI market [4] Strategic Implications - Acquiring top AI teams is compared to acquiring companies like SSI, but at a potentially higher cost per employee [2] - The strategy of acquiring talent is seen as similar to acquiring entire companies focused on super intelligence [3][4] - Mark Zuckerberg believes Meta can build super intelligence and is willing to invest heavily to achieve this goal [1]
AI Engineers moving to Meta
Matthew Berman· 2025-07-01 17:00
When you like look at a lot of people who are very successful, it's not the money, it's more the power. And if you ask anyone going to Meta, a lot of them will obviously be going for money, but a lot of them are going because now they have control over the AI path for a trillion dollar plus company. They're right there talking to Zuck, and they can convince one person who has full voting rights over the entire company.There's a lot of power there. Push whatever AI product you want. They've got like all this ...
Zuck's Super Intelligence Master Plan Revealed
Matthew Berman· 2025-07-01 00:35
Talent Acquisition & Competition - Meta aggressively poached top AI researchers from OpenAI and other firms with offers including $100 million signing bonuses [1] - Meta formed Meta Super Intelligence Labs (MSL), led by Alexander Wang, to focus on developing next-generation AI models [8][9][10] - OpenAI acknowledged Meta's poaching efforts and is recalibrating compensation to retain top talent [2] - Meta is pressuring OpenAI staffers to make decisions quickly, capitalizing on OpenAI's week off [4][5] Strategic Moves & Investments - Meta acquired a 49% minority stake in Scale AI for $14 billion to gain access to data and the team [1] - OpenAI and Google canceled their contracts with Scale AI after Meta's investment [1] AI Focus & Objectives - Meta's primary goal is achieving super intelligence [4][6] - OpenAI has shifted focus from incremental releases to achieving super intelligence [6] - Meta's new super intelligence team includes researchers who co-created key AI models like ChatGPT and GPT-4 [11][12][13][14]
Dylan Patel: GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence
Matthew Berman· 2025-06-30 17:27
AI Model Development & Strategy - Meta delayed the release of its Behemoth model due to training problems and questionable architectural decisions, and may not release it at all [1] - The industry believes super intelligence is the ultimate goal, driving companies to prioritize it over AGI [1][3] - OpenAI's GPT-4.5% (Orion) failed due to overparameterization, insufficient data scaling, and training bugs, leading to its deprecation [7] - Reasoning breakthroughs, like OpenAI's "strawberry," demonstrate that generating high-quality data is crucial for model efficiency and performance [7][8] Talent Acquisition & Competition - Meta acquired Scale AI primarily for its talent, particularly Alexander Wang, to lead its super intelligence efforts, signaling a strategic shift [3] - Meta is offering substantial bonuses, reportedly up to $100 million or even over $1 billion for some individuals, to attract top AI researchers from companies like OpenAI [3][4] - Apple faces challenges in attracting top AI talent due to its secretive culture, aversion to Nvidia, and lack of competitive compute resources [8] Cloud & Compute Infrastructure - OpenAI's exclusivity agreement with Microsoft for compute has ended, with OpenAI now diversifying its compute resources through partnerships with Oracle, CoreWeave, and others [5] - Nvidia is prioritizing smaller cloud companies, potentially creating tension with major players like Amazon and Google, who feel marginalized in GPU allocations [10] - AMD is employing strategies such as renting back GPUs to cloud providers to encourage adoption of its chips, fostering relationships and driving interest [17][18][20] Market Dynamics & Future Trends - The analyst believes closed source AI will ultimately dominate, raising concerns about the concentration of power among a few companies [57] - The analyst estimates that 20% of jobs could be automated by the end of this decade or the beginning of the next, but the implementation and deployment will take years [48] - The analyst is bearish on on-device AI, arguing that cloud-based AI offers better performance, access to data, and cost-effectiveness for most valuable use cases [9]