Matthew Berman

Search documents
Forward Future Live August 8th, 2025
Matthew Berman· 2025-08-08 16:33
AI Resources & Community - Forward Future AI 提供最佳 AI 工具发现平台 [1] - Matthew Berman 的 X 平台提供 AI 相关更新 [1] - Discord 社群提供 AI 讨论平台 [1] Media & Sponsorship - 媒体/赞助咨询请访问指定链接 [1] Newsletter - Forward Future AI 提供定期 AI 更新的新闻邮件服务 [1]
GPT-5 LIVESTREAM WATCHPARTY!
Matthew Berman· 2025-08-07 18:20
AI Resources & Community - Forward Future AI 提供最佳 AI 工具发现平台 [1] - Matthew Berman 的 X 平台提供 AI 相关更新 [1] - Discord 社群提供 AI 讨论平台 [1] Media & Sponsorship - 媒体/赞助咨询请访问指定链接 [1] Newsletter - Forward Future AI 提供定期 AI 更新的新闻邮件服务 [1]
GPT-5 Fully Tested (INSANE)
Matthew Berman· 2025-08-07 18:00
GPT-5's Capabilities - GPT-5 can generate interactive Rubik's Cube simulations of up to 20x20x20, including solving algorithms [2][3][4][5][6][7][8] - GPT-5 can create functional clones of applications like Excel and Microsoft Word with features such as formula support, formatting, and image insertion [9][10][11] - GPT-5 can implement complex browser-based games like Conway's Game of Life with 3D visualizations and Snake with enhanced visual effects [12][13][14][15][16][17][18][19][20] - GPT-5 can generate physics simulations, including double pendulums, cloth simulations, fluid dynamics, and ray tracers [20][21][25][26][27][28][36][37][38][39][40] - GPT-5 can create 3D environments such as a flight simulator and a Lego builder, though with some limitations [30][31][32][33][34][35] GPT-5's Speed and Multimodal Functionality - GPT-5 has two modes: GPT5 and GPT5 thinking, with GPT5 achieving speeds of approximately 60-80 tokens per second [22][23][24] - GPT-5 is a multimodal model capable of interpreting images and generating new images based on input [7][49][50][51][52][53] GPT-5's Front-End Development Prowess - GPT-5 can rapidly generate front-end clones of websites like Twitter and create financial dashboards with functional elements [42][43][46][47][48] - GPT-5 can create website front-ends with specific aesthetics, such as a '90s-style website [44][45] GPT-5's Ethical Considerations - GPT-5 can provide responsible and ethical responses to potentially harmful or reckless plans, offering alternative solutions and resources [54][55][56][57][58]
GPT-5 LIVESTREAM WATCHPARTY!
Matthew Berman· 2025-08-07 16:41
AI Resources & Community - Forward Future AI 提供最佳 AI 工具发现平台 [1] - Matthew Berman 的 X 平台提供 AI 相关更新 [1] - Discord 社群提供 AI 讨论平台 [1] Media & Sponsorship - 媒体/赞助咨询请访问指定链接 [1] Newsletter - Forward Future AI 提供定期 AI 更新的新闻邮件服务 [1]
The Industry Reacts to gpt-oss!
Matthew Berman· 2025-08-06 19:22
Model Release & Performance - OpenAI released a new open-source model (GPT-OSS) that performs comparably to smaller models like 04 mini and can run on consumer hardware such as laptops and phones [1] - The 20 billion parameter version of GPT-OSS is reported to outperform models two to three times its size in certain tests [7] - Industry experts highlight the model's efficient training, with the 20 billion parameter version costing less than $500,000 to pre-train, requiring 21 million H100 hours [27] Safety & Evaluation - OpenAI conducted safety evaluations on GPT-OSS, including fine-tuning to identify potential malicious uses, and shared the recommendations they adopted or didn't adopt [2][3] - Former OpenAI safety researchers acknowledge the rigor of OpenAI's OSS safety evaluation [2][19] - The model's inclination to "snitch" on corporate wrongdoing was tested, with the 20 billion parameter version showing a 0% snitch rate and the 120 billion parameter version around 20% [31] Industry Reactions & Implications - Industry experts suggest OpenAI's release of GPT-OSS could be a strategic move to commoditize the model market, potentially forcing competitors to lower prices [22][23] - Some believe the value in AI will increasingly accrue to the application layer rather than the model layer, as the price of AI tokens converges with the cost of infrastructure [25][26] - The open-source model has quickly become the number one trending model on Hugging Face, indicating significant community interest and adoption [17][18] Accessibility & Use - Together AI supports the new open-source models from OpenAI, offering fast speeds and low prices, such as 15 cents per million input tokens and 60 cents per million output tokens for the 120 billion parameter model [12] - The 120 billion parameter model requires approximately 65 GB of storage, making it possible to store on a USB stick and run locally on consumer laptops [15] - Projects like GPTOSS Pro mode chain together multiple instances of the new OpenAI GPT-OSS model to produce better answers than a single instance [10]
Claude Just Got a Big Update (Opus 4.1)
Matthew Berman· 2025-08-05 23:02
Model Release & Performance - Anthropic 发布了 Claude Opus 4.1%,是对 Claude Opus 4 的升级,尤其在 Agentic 任务、真实世界编码和推理方面 [1] - SWEBench verified 基准测试中,Claude Opus 4.1% 的得分从 Opus 4 的 72.5% 提升至 74.5%,提升了 2 个百分点 [3] - Terminal Bench 基准测试中,Claude Opus 4.1% 的终端使用能力从 39.2% 提升至 43.3%,提升了 4.1 个百分点 [4] - GPQA Diamond(研究生水平推理)基准测试中,Claude Opus 4.1% 的得分从 79.6% 提升至 80.9%,提升了 1.3 个百分点 [4] - Towbench(Agentic 工具使用)基准测试中,Claude Opus 4.1% 在零售方面的得分从 81.4% 提升至 82.4%,提升了 1 个百分点,但在航空方面从 59.6% 下降至 56%,下降了 3.6 个百分点 [5] - 多语言问答基准测试中,Claude Opus 4.1% 的得分从 88.8% 提升至 89.5%,提升了 0.7 个百分点 [5] - Amy 2025 基准测试中,Claude Opus 4.1% 的得分提升了 2.5 个百分点至 78% [5] Competitive Positioning & Future Outlook - 在 SWEBench 和 Terminal Bench 基准测试中,Claude Opus 4.1% 优于 OpenAI 的 GPT-3 和 Gemini 1.5 Pro [5] - 在 GPQA Diamond 和 Agentic 工具使用基准测试中,Claude Opus 4.1% 不及 OpenAI 的 GPT-3 和 Gemini 1.5 Pro [6] - 在高中数学竞赛基准测试中,Claude Opus 4.1% 的得分低于 OpenAI 的 GPT-3 (88.9%) 和 Gemini 1.5 Pro (88%),仅为 78% [6] - Claude 目前被广泛认为是市场上最佳的编码模型,尤其擅长 Agentic 编码和 Agent-driven 开发 [7]
OpenAI Goes OPEN-SOURCE! gpt-oss is HERE!
Matthew Berman· 2025-08-05 22:09
Model Release - Open AAI 发布了最先进的开源模型 GPTOSS,包含 1200 亿参数和 200 亿参数两个版本 [1] - 这些模型是 open weight 的语言模型,意味着模型权重也被发布 [1] Performance Benchmarks - 1200 亿参数版本的 GPTOSS 在 Code Forces 竞赛中,使用工具的情况下得分为 2622,与 Frontier 模型(得分 2706)非常接近 [2] - 200 亿参数版本的 GPTOSS 在使用工具的情况下得分为 2516,考虑到其规模,表现同样出色 [2] - 这些模型在编程方面的得分超过了地球上大多数人 [2]
OpenAI Dropped a FRONTIER Open-Weights Model
Matthew Berman· 2025-08-05 17:17
Model Release & Capabilities - Open AAI released GPTOSS, state-of-the-art open-weight language models in 120 billion and 20 billion parameter versions [1] - The models outperform similarly sized open-source models on reasoning tasks and demonstrate strong tool use capabilities [3] - The models are optimized for efficient deployment on consumer hardware, with the 120 billion parameter version running efficiently on a single 80 GB GPU and the 20 billion parameter version on edge devices with 16 GB of memory [4][5] - The models excel in tool use, few-shot learning, function calling, chain of thought reasoning, and health issue diagnosis [8] - The models support context lengths of up to 128,000 tokens [12] Training & Architecture - The models were trained using a mix of reinforcement learning and techniques informed by OpenAI's most advanced internal models [3] - The models utilize a transformer architecture with a mixture of experts, reducing the number of active parameters needed to process input [10][11] - The 120 billion parameter version activates only 5 billion parameters per token, while the 20 billion parameter version activates 36 billion parameters [11][12] - The models employ alternating dense and locally banded sparse attention patterns, group multi-query attention, and RoPE for positional encoding [12] Safety & Security - OpenAI did not put any direct supervision on the chain of thought for either OSS model [21] - The models were pre-trained and filtered to remove harmful data related to chemical, biological, radiological, and nuclear data [22] - Even with robust fine-tuning, maliciously fine-tuned models were unable to reach high capability levels according to OpenAI's preparedness framework [23] - OpenAI is hosting a challenge for red teamers with $500,000 in awards to identify safety issues with the models [24]
Google Genie 3 - The Most Advanced World Simulator Ever...
Matthew Berman· 2025-08-05 14:02
Model Overview - Google announced Genie 3, a general-purpose world model for generating diverse interactive environments [1][8] - Genie 3 allows real-time interaction with improved consistency and realism compared to Genie 2 [12] - The model generates 720p high-quality environments [3] Technical Aspects - Genie 3 considers the entire previously generated trajectory, not just the previous frame, for autoregressive generation [15] - Consistency in Genie 3 is an emergent capability resulting from training scale, not pre-programming [19] - Genie 3 generates dynamic and rich worlds frame by frame based on world description and user actions, unlike methods relying on explicit 3D representation [20] Potential Applications - World models like Genie 3 can be used for training robots and agents [9] - The technology has potential applications in creating video games, movies, and television shows [9] - Google positions world models as a key step towards AGI by providing AI agents with unlimited simulation environments for training [9][10] Comparison with Previous Models - Genie 3 demonstrates significant improvements in consistency, detail, and generation length compared to Genie 2 [22][23] - Genie 3 allows for deeper world exploration than Genie 2 [23] Interactive Features - Users can prompt events in real-time, adding elements to the scene [21] - The model demonstrates realistic interactions, such as light moving out of the way of a jet ski and reflections in mirrors [6] - The model can simulate actions like painting, with paint only being applied when the brush touches the wall [29][30]
Forward Future Live August 1st, 2025
Matthew Berman· 2025-08-01 16:55
Resources & Tools - Offers a free "Vibe Coding Playbook" download [1] - Provides a free "Humanities Last Prompt Engineering Guide" download [1] - Showcases a curated list of AI tools [1] Community & Updates - Encourages joining a newsletter for regular AI updates [1] - Promotes engagement through X (Twitter), Instagram, and Discord [1] Media & Sponsorship - Provides a contact link for media/sponsorship inquiries [1]