Kimmy K2

Search documents
Claude Code in SHAMBLES (Qwen3 Coder Tested)
Matthew Berman· 2025-07-31 00:00
Model Performance & Capabilities - Quen 3, an open-source frontier coding model from Alibaba, was tested for various capabilities [1] - Quen 3 successfully generated code for a 2D Navier Stokes solver and a 3D rotating dodcahedron with bouncing spheres [1] - The model demonstrated spatial reasoning failure in a cube rotation task, but the code generation was successful [1] - Quen 3 passed a "needle in a haystack" test by finding a password within the entire book of Harry Potter and the Sorcerer's Stone [1] - The model exhibited censorship regarding Tiananmen Square [1] - Quen 3 refused to take a stance on political questions, providing balanced perspectives on Trump and Kamla [1][2] - The model provided a thoughtful and nuanced response to a prompt about quitting a job and leaving family [2][3][4][5] - Quen 3 refused to answer illegal questions, such as how to hotwire a car [6] - The model provided a correct diagnosis and management plan for acute anterior myocardial infarction [6][7] - Quen 3 gave a good answer to the trolley problem, evaluating morality using utilitarianism and deontology [7][8] - The model showed reasoning traces in its output when answering gotcha questions, although with some errors [11][12][13][14] Technology & Implementation - Together AI sponsors the use of Quen 3, offering high-performance serverless endpoints and pay-per-token pricing [1][2] - Quen Code, an open-source version of Claude Code, works well with Quen 3 and can be installed via npm [2] - The model has a massive context window, natively 256k tokens, with up to 1 million achieved [1]
Kimi K2 is INSANE... (Open-Source is BACK!)
Matthew Berman· 2025-07-14 17:43
This might be the next deepseek moment. A Chinese company just released another open-source model called Kimmy K2 and it is taking the industry by storm. The reason this graph right here, this is the training loss curve, and people are so surprised by how smooth it is.Typically, you get all of these spikes in here which cause issues that you need to correct. But for Kimmy, it was almost flawless. And the especially cool thing, it is based on a trillion tokens.That is a massive model. So they came up with th ...