Matthew Berman
Search documents
The Industry Reacts to Gemini 3...
Matthew Berman· 2025-11-20 02:14
Google dropped Gemini 3 24 hours ago and the industry has been reacting strongly. It is definitely the best model on the planet and I'm going to show you all of the industry reactions right now. First is from Artificial Analysis, the company that runs independent benchmarks against all of the top models. And yes, Gemini 3 is number one. Here's what they have to say. For the first time, Google has a leading language model and it debuts with a threepoint buffer between the second best model GPT 5.1%. And a lo ...
Gemini 3 is INSANELY GOOD
Matthew Berman· 2025-11-18 23:56
Google just dropped Gemini 3 and it is an insanely good model. Here's everything you need to know about it. First, it is number one across almost every major benchmark, including LM Arena leaderboard, and it scored a perfect 100% on the Amy 2025 math benchmark.Those are incredibly difficult math questions. Google also dropped anti-gravity, their cursor competitor, and it is really good. You can use Gemini 3 Pro in anti-gravity for free right now.The new Gemini app got a major update which now includes Gemin ...
Gemini 3 is the best model on earth
Matthew Berman· 2025-11-18 21:54
Model Performance & Benchmarks - Gemini 3 surpasses previous Frontier models in benchmarks, demonstrating significant advancements in AI capabilities [1] - Gemini 3 achieves 458% with code execution and search on Humanity's last exam, compared to Gemini 25% Pro at 21%, Cloud Sonnet 45% at 13%, and GBT 51% at 265% [2] - On the Vending Bench benchmark, Gemini 3's net worth reached $547816%, significantly outperforming Cloud Sonnet 45% at $3800 [4] - Gemini 3 Deep Think scores 41% on Humanity's Last Exam, compared to Gemini 3 Pro at 375%, Claude Sonnet 45% at 13%, GPT5 Pro at 30%, and GPT 51% at 265% [9][10] - Gemini 3 Deepthink achieves 451% on Arc AGI2 visual reasoning puzzles, a 10x improvement over Gemini 25% Pro [12] Enterprise Applications & Features - Boxcom's benchmark shows a 22-point performance increase for Gemini 3 Pro versus Gemini 25% Pro, with scores of 85% and 63% respectively [6] - Industry subsets in Boxcom's benchmark show significant performance jumps: Healthcare and Life Sciences (45% to 94%), Media and Entertainment (47% to 92%), and Financial Services (51% to 60%) [6] - Gemini 3 excels in complex multi-step reasoning and task automation, as highlighted by Box's new benchmark [7] - Gemini 3 supports multiple modalities, including text, images, video, audio, and code, with a unique focus on video understanding [12] - Gemini 3 can analyze YouTube videos frame by frame, understanding the content in detail [13] Google Integration & New Products - Gemini 3 is integrated into Google Search, dynamically generating user interfaces based on user queries [17] - Google launched anti-gravity, a VS Code fork coding platform that supports Gemini models and other models like GPTOSS and Anthropic's Sonnet [20] - The updated Gemini app features Gemini Agent capability, enabling the AI to complete real tasks on the user's behalf and create dynamic UIs [24] Model Architecture & Specifications - Gemini 3 is a brand new foundation model, not a modification of a prior model [27] - The model accepts text, images, audio, and video files as inputs, with a token context window of up to 1 million and output tokens of 64000 [28] - Gemini 3 is a sparse mixture of experts model built on Google's custom TPU architecture for both pre-training and inference [28]
First recorded major hack using AI...
Matthew Berman· 2025-11-16 20:33
Anthropic just dropped a wild paper detailing the first documented fully autonomous hacking attempt. This is something we all need to be very concerned about. I'm going to break it all down.What actually happened, how successful they were, and what's next. And this video is brought to you by Vulture. More on them later.Here's the paper disrupting the first reported AI orchestrated cyber espionage campaign. In midepptember 2025, we detected a highly sophisticated cyber espionage operation conducted by a Chin ...
Forward Future Live | 11/14/25
Matthew Berman· 2025-11-14 17:07
Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Forward Future X: https://x.com/forward_future_ 👉🏻 Instagram: https://www.instagram.com/matthewberman_ai 👉🏻 Discord: https://discord.gg/xxysSXBxFW 👉🏻 TikTok: https://www ...
OpenAI Unveils GPT-5.1 (UPDATE)
Matthew Berman· 2025-11-13 23:01
GPT 5.1% is here. It is faster. It is more accurate. It is more conversational.And apparently, it has a better personality. Let me break it all down for you. So, there are two main versions getting updated.We have 5.1% instant and 5.1% thinking. The instant is the one that's supposed to give you answers instantly and things like questions and answers, things like conversational use cases. And now it has a quote unquote better personality.It is warmer. It is more intelligent and better at following your inst ...
We now have PLAYABLE World Models...
Matthew Berman· 2025-11-13 15:42
We now have the first multimodal Frontier World model, fully controllable. This is a world model that you can actually move around in and it's available right now. You can play with it right now.And this video is brought to you by Vulture. More on them later. This is called Marble from World Labs. World Labs is the lab under Dr.. Fay Lee, renowned AI researcher. Now what makes this really interesting is Fei Lee and team think that world models are the way to artificial general intelligence not large languag ...
Is AI Alive?!?!
Matthew Berman· 2025-11-10 22:37
Large language models might actually be more than just next word predictors. Anthropic has been putting out incredible papers lately that show AI large language models in particular exhibit very human-like behavior at almost every level. Here's the new paper emergent introspective awareness in large language models.So what did anthropic actually test. There were four main experiment types. First injected thoughts.What they did was use two different prompts, one with all caps and one without all caps. And th ...
Kimi K2 Thinking is CRAZY... (HUGE UPDATE)
Matthew Berman· 2025-11-07 21:36
Model Performance & Benchmarks - Kimmy K2 Thinking outperforms GPT-5 on the "Humanity's Last Exam" benchmark with a score of 44.9% compared to GPT-5's 41.7% [1] - In agentic search for Browse Comp, Kimmy K2 Thinking scores 60.2% versus 54.9% for GPT-5 [1][2] - Kimmy K2 Thinking achieves 83.1% on Live Codebench v6, a competitive programming benchmark [1] - The model can execute 200 to 300 sequential tool calls without human interference [1][2] - Kimmy K2 Thinking significantly outperforms the human baseline of 29.2% on browse comp with a score of 60.2% [2] Model Architecture & Training - The base Kimmy K2 model used 2.8 million H800 hours with 14.8 trillion tokens, costing approximately $5.6 to $6 million [3] - Kimmy K2 Thinking has a trillion parameters with 384 experts, while 32 billion parameters are active during inference [5][6] - Kimmy K2 Thinking has a vocabulary size of 160,000 [5] Market & Industry Impact - China is emerging as a key player in open-source, open-weights frontier AI models [9][10] - The cost of training frontier models is decreasing rapidly [3][4] Use Cases & Capabilities - Kimmy K2 Thinking can solve PhD-level mathematics problems using 23 tool calls in its chain of thought [1] - The model can create component-heavy websites and math explainer visualizations from single prompts [1] - Kimmy K2 Thinking can analyze the relationship between population density and healthcare facility accessibility, generating interactive maps and charts [11][12][13][14][15]
Forward Future Live | 11/7/25
Matthew Berman· 2025-11-07 17:34
Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Forward Future X: https://x.com/forward_future_ 👉🏻 Instagram: https://www.instagram.com/matthewberman_ai 👉🏻 Discord: https://discord.gg/xxysSXBxFW 👉🏻 TikTok: https://www ...