Workflow
Gemini 3.0 Pro
icon
Search documents
AI破解500年《纽伦堡编年史》天书,仅用1小时,隐藏惊天真相被揭开
3 6 Ke· 2026-01-05 08:40
这些注释字迹残损严重,夹杂着大量中世纪拉丁文缩写,几个世纪以来,学者们始终无法解释它的含义。 然而,Gemini 3.0 Pro仅在一个小时内,就清晰地给出了解读! 它成功识别出:这段注释并非随意的标记,或者装饰性的涂画,而是与不同圣 经年代学体系之间的比较和计算有关。 2026开年王炸!Gemini 3.0 Pro仅用1小时,暴力破解533年未解的《纽伦堡编年史》天书。从0.02美元的算力成本到精准复原16世纪学霸的历法对账单, AI正以全知视角降维打击传统考古! 就在刚刚,500年前的《纽伦堡编年史》天书,被AI破解了! 其中的一段手写注释,难倒了人类历史学家整整500年。 也就是说,几百年前作者的逻辑,被AI精准地捕捉到,完成了整套推理! 研究者们激动地在博客中写道—— 令人难以置信的是,LMM的视觉理解能力已经发展到Gemini 3 Pro能阅读 500 年前的手写缩写速记旁注,回过头去阅读整页印刷内容,并利用页面内容来 推演和澄清速记的含义,然后将所有这些信息整合起来,得出一个能契合所有拼图碎片的最终理解,而这一切都不需要任何形式的人类协助! 老祖宗的古籍,被AI破译了! 《纽伦堡纪事报》是一部出版 ...
我的2025年度AI大盘点 - 前路已明。
数字生命卡兹克· 2025-12-31 01:21
2025,终于快要过去了。 那,每一年的保留节目,怎么能不盘点一下,今年的年度AI呢? 就像游戏行业每年有TGA、有各大博主自己的年度游戏,影视圈有各种博主自己的纯主观金菊花、金扫帚奖等等。 那我也想来盘点一下,我自己心中的年度AI。 包含大模型和AI应用两个方面。 这里我想叠个甲,这个盘点,纯我个人主观,完全不客观,只会综合我自己看待市场的视角和方式,来去做盘点,有不同意见或者我遗漏的欢迎讨论,但 是别喷,这里面没有任何植入,大家也可放心食用。 话不多说,我们,开始。 一.年度大模型 >/1. 年度写作大模型 得奖模型:GPT-5.2 Thinking 没想到吧,我居然会把年度写作模型,颁给GPT-5.2 Thinking。 GPT-5.2 Thinking这个版本,在我心中,超越了Gemini 2.5 Pro和GPT-4.5,在年尾,一举成为我心中的年度模型。 写作,总是会被很多的评测和媒体所忽略,很多人讲,怎么让AI没有AI味,怎么写出来的东西文笔更好,而在我自己尝试下来, GPT-5.2 Thinking在指令 遵循、风格迁移、世界知识这块,都达到了极佳的效果,比 Gemini 2.5 Pro幻觉更少 ...
从谷歌AI体系看应用叙事
2025-12-29 01:04
从谷歌 AI 体系看应用叙事 20251228 摘要 Gemini 3.0 Pro 在多模态数据处理上超越 GPT 5.1 和 Claude 4.5,支 持文本、图片、音频、视频及 PDF,上下文窗口达 100 万 token,提升 复杂问题推理和资源动态调整能力,实现更拟人的慢思考效果。 谷歌视频生成模型 VO 系列,特别是 VO 3.0 和 VO 3.1,实现了音画同 步直出和视频精确调控,单价与 Sora R 接近,分辨率达 720P- 1,080P,并通过技术架构在视频生成领域保持领先,满足用户精细化控 制需求。 截至 2025 年 10 月,谷歌 Gemini 单用户单次使用时长超越 ChatGPT,达 7.2 分钟,得益于嵌入谷歌应用生态系统,下载量增长迅 速,从年中 1,500 万次增至 6,600 万次,有效扩大用户群体并提高用户 粘性。 NanoBanana 是基于 Gemini 开发的图像生成模型,通过调用谷歌搜索 接入真实世界知识,并执行思考步骤理解提示词情境,具备高分辨率、 文字渲染准确、图片精细操控及实时接入知识等优势。 Q&A 谷歌最新发布的核心旗舰模型 Gemini 3.0 Pr ...
罗福莉执掌小米大模型首秀!定调下一代模型,全新MiMo-V2开源还横扫Agent第一梯队
AI前线· 2025-12-17 08:00
作者 | 木子 MiMo-V2-Flash,是小米在今天凌晨发的 新一代 MiMo 模型,而且还给开源了 。 今天上午,在 2025 小米人车家全生态合作伙伴大会上, 罗福莉首次公开亮相 ,Title 是 Xiaomi MiMo 大模型负责人 。 罗福莉还在会上发表演讲,解读了小米的全新大模型 MiMo-V2-Flash 以及背后团队的故事。 这里简单回顾下 MiMo 模型是什么:它是小米自研的大语言模型(LLM)系列;而 MiMo-V2-Flash 不仅 在通用基准测试中和 DeepSeek-V3.2 相当 , 同时 还拉爆性价比,对 Agent 场景友好。 "这只是我们在 AGI 路线图上的第二步。" MiMo-V2-Flash 采用了当前很流行但工程难度也很高的 MoE(混合专家)架构 ,其 总参数规模达 3090 亿 ,但在每次推理时, 真正被"点亮"的只有约 150 亿参数。 此外,它还搭载了 多词元预测(MTP)技术 ,专为高速推理和 Agent 工作流程而设计。与很多追求"参数越大越好"的模型不同,MiMo-V2-Flash 的设 计目标可谓是:"要跑得快、跑得久、被高频调用也跑得起"。 不过在 ...
华尔街彻夜难眠,Gemini 3屠榜金融“最难考试”,AI砸了“金饭碗”?
3 6 Ke· 2025-12-15 11:58
被誉为「黄金职业通行证」的人类知识堡垒,CFA考试悄然陷落。最新的推理模型不仅轻松通过了CFA三级考试,还创造了几乎满分的成绩。 AI一分钟,人类十年功! 一觉醒来,AI推理模型已横扫特许金融分析师CFA考试。 | Model Producer | Ranking | Model | Level I | Level II | Level III | | --- | --- | --- | --- | --- | --- | | OpenAI | 9 | ChatGPT [11] | Fail/Fail | Fail/Fail | Fail | | | 8 | GPT-4 [2] | Pass/Pass | Pass/Pass | Fail | | | 7 | GPT-4o [7] | Pass/Pass | Pass/Pass | Pass | | | 3 | GPT-5 [10] | Pass | Pass | Pass | | Google | 2 | Gemini 2.5 Pro [5] | Pass | Pass | Pass | | | 1 | Gemini 3.0 Pro | Pass | Pas ...
实测 GPT-5.2 :价格暴涨能力微涨,凭什么反击 Gemini
3 6 Ke· 2025-12-12 10:03
这次终于没把表给画错了 GPT 5.2 这次其实是更新了 3 个模型,GPT-5.2 Instant、Thinking、以及 Pro 模型。如果你习惯了 Gemini 3.0 Pro 里,每次问答都会经过思考;那么上手 GPT-5.2 Thinking/Pro 时,你会发现 ChatGPT 思考速度的变慢了,比以往所花的时间要更长。 这也是目前大多数获得提前体验的用户,在社交媒体上分享的心得。即 GPT-5.2 对比 5.1 在各个方面都有了提升,且 GPT-5.2 Pro 非常适合去做一些专业 推理工作,需要长时间来完成的任务,但是,就等待结果的过程变得更漫长。 要吊打 Gemini 的 GPT 5.2 在今天凌晨正式发布了,向所有用户推出。 上个月刚刚退订掉 ChatGPT Plus,转到 Gemini,这次需要因为 GPT-5.2 再回去吗? 看完下面这些网友真实的体验分享,还有 APPSO 的上手实测,或许能有个答案。 例如有用户分享,输入提示词「帮我绘制一张 HLE 测试成绩的图表」,GPT-5.2 Pro 硬是花了 24 分钟才得出这张表。 图片来源:https://x.com/emollick/ ...
DeepSeek V3到V3.2的进化之路,一文看全
机器之心· 2025-12-08 04:27
Core Insights - DeepSeek has released two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have generated significant interest and discussion in the AI community [2][5][11] - The evolution from DeepSeek V3 to V3.2 includes various architectural improvements and the introduction of new mechanisms aimed at enhancing performance and efficiency [10][131] Release Timeline - The initial release of DeepSeek V3 in December 2024 did not create immediate buzz, but the subsequent release of the DeepSeek R1 model changed the landscape, making DeepSeek a popular alternative to proprietary models from companies like OpenAI and Google [11][14] - The release of DeepSeek V3.2-Exp in September 2025 was seen as a preparatory step for the V3.2 model, focusing on establishing the necessary infrastructure for deployment [17][49] Model Types - DeepSeek V3 was initially launched as a base model, while DeepSeek R1 was developed as a specialized reasoning model through additional training [19][20] - The trend in the industry has seen a shift from hybrid reasoning models to specialized models, with DeepSeek seemingly reversing this trend by moving from specialized (R1) to hybrid models (V3.1 and V3.2) [25] Evolution from V3 to V3.1 - DeepSeek V3 utilized a mixed expert model and multi-head latent attention (MLA) to optimize memory usage during inference [29][30] - DeepSeek R1 focused on Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities, particularly in tasks requiring symbolic verification [37][38] Sparse Attention Mechanism - DeepSeek V3.2-Exp introduced a non-standard sparse attention mechanism, which significantly improved efficiency in training and inference, especially in long-context scenarios [49][68] - The DeepSeek Sparse Attention (DSA) mechanism allows the model to selectively focus on relevant past tokens, reducing computational complexity from quadratic to linear [68] Self-Verification and Self-Correction - DeepSeekMath V2, released shortly before V3.2, introduced self-verification and self-correction techniques to improve the accuracy of mathematical reasoning tasks [71][72] - The self-verification process involves a verifier model that assesses the quality of generated proofs, while self-correction allows the model to iteratively improve its outputs based on feedback [78][92] DeepSeek V3.2 Architecture - DeepSeek V3.2 maintains the architecture of its predecessor, V3.2-Exp, while incorporating improvements aimed at enhancing overall model performance across various tasks, including mathematics and coding [107][110] - The model's training process has been refined to include updates to the RLVR framework, integrating new reward mechanisms for different task types [115][116] Performance Benchmarks - DeepSeek V3.2 has shown competitive performance in various benchmarks, achieving notable results in mathematical tasks and outperforming several proprietary models [127]
开源和闭源模型的差距在拉大:这是DeepSeek论文揭示的残酷真相
3 6 Ke· 2025-12-06 00:03
Core Insights - DeepSeek's V3.2 technical report indicates that the performance gap between open-source models and closed-source models is not narrowing but rather widening, based on extensive empirical data [1][2]. Performance Comparison - In benchmark tests, DeepSeek V3.2 scored 85.0 in MMLU-Pro, while GPT-5 scored 87.5 and Gemini 3.0 Pro achieved 90.1. In the GPQA Diamond test, the scores were 82.4 for DeepSeek, 85.7 for GPT-5, and 91.9 for Gemini 3.0 Pro [2][3]. - The most significant gap was observed in the HLE test, where DeepSeek V3.2 scored 25.1, compared to GPT-5's 26.3 and Gemini 3.0 Pro's 37.7, indicating a substantial performance disparity [3][4]. Structural Issues Identified - The report identifies three structural issues limiting the capabilities of open-source models in complex tasks: 1. **Architectural Limitations**: Open-source models rely on traditional vanilla attention mechanisms, which are inefficient for long sequences, hindering scalability and effective post-training [6]. 2. **Resource Investment Gap**: The post-training budget for DeepSeek V3.2 exceeds 10% of its pre-training costs, while most open-source models allocate less than 1%, leading to significant performance differences [7]. 3. **AI Agent Capability Lag**: Open-source models show inferior generalization and instruction-following abilities in real-world applications, as evidenced by lower scores in key agent evaluation benchmarks [8]. DeepSeek's Strategic Innovations - DeepSeek has implemented fundamental technical innovations across three core dimensions: 1. **Architectural Changes**: Introduction of the DSA (DeepSeek Sparse Attention) mechanism, which reduces computational complexity from O(L²) to O(L×k), significantly lowering inference costs while maintaining performance [10]. 2. **Increased Resource Allocation**: DeepSeek has made an unprecedented decision to allocate substantial resources for post-training, training expert models in six key areas with a total of 943.7 billion tokens during the pre-training phase [12]. 3. **Enhanced Agent Capabilities**: Development of a systematic task synthesis process, creating over 1,800 diverse environments and 85,000 complex prompts, which has improved performance in agent-related tests [13]. Conclusion - DeepSeek V3.2 demonstrates a viable path for open-source AI to compete with closed-source models through innovative architecture and strategic resource allocation, suggesting that technological innovation may be the key to survival in the competitive AI landscape [14].
“大交易”:一场迟到的美国AI战略自救
Guan Cha Zhe Wang· 2025-12-04 00:28
Core Argument - The article discusses Ben Buchanan's "grand bargain" proposal for AI development in the U.S., suggesting a strategic agreement between the tech industry and the government to integrate AI into national defense while ensuring it aligns with democratic values. However, the feasibility of this proposal is questioned due to the contrasting realities of U.S. chip policies and the rapid advancements in AI technology from China [1][5][20]. Group 1: AI Development and Policy Discrepancies - Buchanan's proposal emphasizes the need for a strategic partnership between the tech industry and the government, where the former gains access to energy infrastructure and talent, while the latter integrates AI into national defense [1][20]. - The success of DeepSeek's V3.2 model, which rivals top closed-source models despite U.S. chip export restrictions, challenges the effectiveness of both the "dependency" and "containment" strategies towards China [5][6][20]. - The article highlights a fundamental divide in U.S. AI strategy regarding chip policies towards China, with one faction advocating for strategic dependency and the other for strict containment [2][4][5]. Group 2: Energy Infrastructure Challenges - Buchanan's vision includes a significant increase in energy demand for the AI industry, projecting an additional 50 billion watts by 2028, equivalent to Argentina's total electricity consumption [7][8]. - The U.S. faces a political deadlock in energy policy, hindering the construction of new power plants, which is critical for supporting the growing AI sector [7][8]. - The contrasting ability of China to rapidly mobilize resources for infrastructure development poses a competitive disadvantage for the U.S. [9][10]. Group 3: Talent Acquisition and Immigration Policies - The article notes that 70% of top AI researchers in the U.S. are foreign-born, yet current immigration policies are tightening, which could lead to a significant decline in international student enrollment [10][11]. - There is an inherent conflict between the desire to attract international talent and the increasing national security measures that restrict access to sensitive AI research [11][13]. - The political climate in the U.S. is increasingly hostile towards immigration, complicating efforts to maintain a robust talent pipeline for the AI industry [10][11]. Group 4: Government-Industry Relations - The proposed "grand bargain" faces deep-seated mistrust between the tech industry and the government, with tech companies wary of regulatory overreach and the government skeptical of the industry's commitment to national security [14][15]. - Historical examples of tech companies resisting military collaborations illustrate the challenges in establishing a cooperative relationship [14][15]. - The article argues that achieving consensus on key issues such as AI control and economic benefits distribution is unlikely, complicating the realization of the "grand bargain" [15][19]. Group 5: Long-term Strategic Challenges - The rapid pace of AI development contrasts sharply with the slow-moving U.S. political system, which struggles to implement necessary reforms in a timely manner [16][17]. - The instability of political cycles in the U.S. raises concerns about the sustainability of long-term strategies, as policies can be easily overturned by subsequent administrations [17][20]. - The article concludes that the "grand bargain" is based on overly optimistic assumptions about achieving consensus and cooperation in a fragmented political landscape [20].
市场整固后有望延续反弹,科技修复仍有空间
Investment Focus - The market's initial correction is largely complete, and a rebound is expected, particularly in the technology sector [1][8] - U.S. equities rebounded, improving global risk appetite, with Hong Kong and A-shares stabilizing and moving higher, led by the tech sector [1][8] External Liquidity - External liquidity continues to improve, with U.S. September retail sales slowing and PPI below expectations, supporting a December rate cut probability rising to 86% [2][9] - The U.S. Dollar Index fell below 100 to 99.4, while the RMB strengthened to 7.07, with other assets like Bitcoin and gold also experiencing mild rebounds [2][9] Technology Sector - The market focused on developments related to Google, with positive feedback on products like Gemini 3.0 Pro and Nano Banana, and Meta considering significant TPU purchases from Google [3][10] - Google shares rose 7%, while NVIDIA experienced a slight decline of about 1% amid improving liquidity [3][10] - The tech sector in Hong Kong and A-shares saw notable rebounds but have not fully recovered from previous losses, with the ChiNext Index recovering most of its declines [3][10] Real Estate Sector - Vanke faced declines in bond prices due to concerns over large-scale maturities, but some bonds rebounded, indicating no extreme liquidation [4][11] - The sensitivity of the equity market to negative news regarding Vanke is diminishing, with the AH-listed property stocks ending the week higher, suggesting stabilization in the real estate sector [4][11] - The CSRC announced a pilot program for commercial property REITs, aimed at enhancing liquidity in the commercial real estate sector [4][11] Market Activity and Fund Flows - The market experienced a low-volume rebound, with A-share turnover falling to RMB 1.6 trillion and Hong Kong turnover dropping to HKD 150 billion [5][12] - The short-selling ratio in Hong Kong decreased to 12%, below historical averages, while A-share equity ETFs recorded net outflows of RMB 12.4 billion [5][12] - Margin financing turned to a net inflow of RMB 10.6 billion, indicating a re-leveraging phase in the market [5][12] Summary - The market stabilized and rebounded, remaining in a low-volume consolidation phase, with expectations for continued rebound trends [6][13] - The technology sector is expected to continue its rebound, with a focus on the Hang Seng Tech Index and STAR-board names linked to domestic compute infrastructure [6][13] - The real estate sector is stabilizing, with recommendations to watch leading developers with solid fundamentals for rebound opportunities [6][13]