Mathematical Reasoning - filings, earnings calls, financial reports, news

Mathematical Reasoning

Search documents

Qwen又立功，全球最快开源模型诞生，超2000 tokens/秒！

量子位· 2025-09-10 10:01

Core Viewpoint - The article discusses the launch of K2 Think, the world's fastest open-source AI model, developed by MBZUAI and G42 AI, achieving a speed of over 2000 tokens per second with only 32 billion parameters [1][3][8]. Group 1: Model Performance - K2 Think has demonstrated a processing speed exceeding 2000 tokens per second, with specific tests showing speeds of 2730.4 tokens/second and 2224.7 tokens/second [10][14][18]. - The model has performed well in various mathematical benchmark tests, achieving scores such as 90.83 in AIME'24 and 81.24 in AIME'25 [25]. Group 2: Technical Innovations - K2 Think incorporates several technical innovations, including: 1. Supervised fine-tuning for long-chain reasoning, allowing the model to think step-by-step rather than providing direct answers [31]. 2. Reinforcement learning with verifiable rewards, enhancing performance in mathematics and logic [31]. 3. Intelligent planning before reasoning, enabling the model to outline solutions before detailed reasoning [31]. 4. Best-of-N sampling during reasoning to generate multiple answers and select the best one [31]. 5. Speculative decoding to parallelly generate and verify answers, reducing redundant calculations [31]. 6. Hardware acceleration using Cerebras WSE, facilitating the high-speed output [31]. Group 3: Model Background - K2 Think is based on the Qwen 2.5-32B model from HuggingFace, indicating a connection to Chinese technology [6][5]. - Despite having only 32 billion parameters, K2 Think claims to match the performance of flagship models from OpenAI and DeepSeek [24].

Artificial Intelligence

Mathematical Reasoning

Artificial Intelligence

K2 Think

Qwen 2.5 - 32B

Artificial Intelligence

Mathematical Reasoning

Artificial Intelligence

K2 Think

Qwen 2.5 - 32B

全球首个IMO金牌AI诞生！谷歌Gemini碾碎奥数神话，拿下35分震惊裁判

猿大侠· 2025-07-22 03:33

Core Viewpoint - Google DeepMind has officially announced that its model, Gemini Deep Think, has won a gold medal at the International Mathematical Olympiad (IMO) by solving five problems in 4.5 hours, achieving a score of 35 out of 42, which is a significant milestone for AI in mathematics [3][4][22]. Group 1: Achievement and Recognition - Gemini Deep Think is the first AI system to receive official gold medal recognition from the IMO committee [6][7]. - The IMO, held annually since 1959, is a prestigious competition that tests the mathematical abilities of students worldwide [11][12]. - The competition requires participants to solve six complex mathematical problems within a limited time, with only the top 8% receiving gold medals [13][16]. Group 2: Technical Aspects of Gemini Deep Think - Unlike previous models, Gemini Deep Think operates entirely in natural language, allowing it to generate rigorous mathematical proofs directly from problem descriptions [29][32]. - The model employs advanced reasoning techniques, including parallel thinking, enabling it to explore multiple solution paths simultaneously [33][38]. - The training of Gemini involved a combination of reinforcement learning and access to a curated database of high-quality mathematical solutions [37][126]. Group 3: Problem-Solving Process - The model's approach to the problems was methodical, breaking down complex proofs into clear, understandable steps [24][41]. - For example, in the first problem, the model simplified the problem to a specific case and established a lemma to prove the core condition [44][50]. - The solutions provided by Gemini were noted for their clarity and precision, earning praise from IMO judges [24][87]. Group 4: Future Implications - Google plans to make the advanced version of Gemini Deep Think available to select mathematicians and Google AI Ultra subscribers in the future [39]. - The success of Gemini Deep Think highlights the potential for AI to contribute significantly to the field of mathematics, combining natural language capabilities with rigorous reasoning [102][105].

Artificial Intelligence

Mathematical Reasoning

Artificial Intelligence

Gemini Deep Think

AlphaProof

AlphaGeometry 2

Artificial Intelligence

Mathematical Reasoning

Artificial Intelligence

Gemini Deep Think

AlphaProof

AlphaGeometry 2

DeepSeek开源新模型，数学推理能力大提升

Hu Xiu· 2025-05-01 00:48

Core Insights - DeepSeek has officially released DeepSeek-Prover-V2 on Hugging Face, continuing its open-source momentum with two versions launched [1][4] - The training core of DeepSeek-Prover-V2 combines "recursion + reinforcement learning," enabling the model to break down complex theorems into sub-goals and reasoning paths [3][8] Model Specifications - DeepSeek-Prover-V2-7B is based on the previous V1.5 model and supports a maximum context input of 32K [4] - DeepSeek-Prover-V2-671B is trained on the DeepSeek-V3-Base, showcasing the strongest reasoning performance [4] Training Process - The training process consists of two phases: the first phase focuses on rapid mode using an "expert iteration" method, where successful answers refine the model [5] - In the second phase, more complex logical reasoning capabilities are trained, incorporating mathematical knowledge from DeepSeek-V3 and formal data [6] Reinforcement Learning - The GRPO reinforcement learning algorithm is introduced to enhance reasoning capabilities, allowing the model to autonomously learn to select optimal solutions from multiple candidates [8] - The system generates 32 different proof schemes for each theorem, retaining only those verified as correct by the Lean verification system [9] Model Distillation - After developing the powerful 671B model, the team distilled its capabilities into a smaller 7B model, allowing users to achieve near-equivalent mathematical reasoning abilities on resource-limited devices [10][11] Reasoning Modes - The rapid mode (non-CoT) focuses on speed, generating concise Lean code answers without showing the thought process, suitable for handling numerous problems [12] - The logical mode (CoT) details each step of the reasoning process, ensuring clarity and transparency [12] Performance Evaluation - In the final performance assessment, DeepSeek-Prover-V2-671B achieved an 88.9% pass rate in the MiniF2F test, successfully solving 49 problems from the PutnamBench dataset [17] New Dataset - DeepSeek introduced a new formal mathematical dataset, ProverBench, containing 325 problems across various mathematical domains, including number theory, algebra, and calculus [18][19] Comparison and Trends - The comparison shows a significant trend: the performance gap between large language models in "informal mathematical reasoning" and "formal mathematical reasoning" is narrowing [21] - The evolution of model structure and training strategies enables models to produce rigorous, verifiable mathematical proofs [22] Future Directions - DeepSeek-Prover-V2 indicates a shift in focus from merely generating content to generating structured logic, which may touch upon the foundational structure of general artificial intelligence [33][34]

Seek .(US:SKLTY)

Artificial Intelligence

Mathematical Reasoning

Artificial Intelligence

DeepSeek-Prover-V2

DeepSeek-ProverBench

Artificial Intelligence

Mathematical Reasoning

Artificial Intelligence

DeepSeek-Prover-V2

DeepSeek-ProverBench