Claude 4.5
Search documents
深度|谷歌前CEO谈旧金山共识:当技术融合到一定阶段会出现递归自我改进,AI自主学习创造时代即将到来
Z Potentials· 2025-12-16 01:32
Henry 当时给我打电话,我对他说: "Henry ,别费心了。你没有任何科技背景,连芯片和薯片都分不清。 " 他 回应道: " 确实如此,但 Eric 答应教我。 " 所以我们非常高兴他能莅临现场。他去年也曾到访,或许这将成为 一项年度传统 ——Henry 于两周前的上周逝世,享年 100 岁。回顾他跨越一个世纪的非凡人生,他深刻影响了美 国的国家安全与世界格局,也改变了无数人的命运 —— 其中既有他的学生,也有曾为他授课的人,以及众多其 他人。 Eric 的背景已无需多言,但我想补充两点:首先正是这位首席执行官将 Google 从一家初创企业打造成全球顶尖 公司之一,这一成就令人惊叹。其次他很早就将人工智能视为未来的核心领域,并推动 Google 吸纳了全球范围 内的顶尖人才,包括 DeepMind—— 正是这家公司为 Google 带来了 Demis Hassabis- 他去年因在 Google 的蛋白 质研究工作获得诺贝尔奖、 Mustafa Suleiman—— 现任 Microsoft 消费者人工智能业务负责人等众多杰出人才。 值得一提的是,在解读人工智能相关的各类言论时,多数高谈阔论者实则在为 ...
AI御三家年终“火拼”
3 6 Ke· 2025-12-15 04:09
AI御三家疯狂加码 年底压轴登场的OpenAI GPT-5.2成为年终焦点。 面对Gemini 3在各大榜单上后来居上的压力,OpenAI提前启动红色代码警告,加速推出了这款新一代大 模型。 GPT-5.2被定位为"一款为专业知识工作打造的最强模型",在推理、编程和智能体任务上较前代有显著 提升。 其最大亮点之一是超长上下文记忆能力:支持40万Token的输入窗口和12.8万Token的输出长度,可一次 性 ingest 海量文档或代码库并生成长篇报告。 2025年的人工智能(AI)领域可谓风起云涌:大模型你追我赶、商业版图急剧扩张。 以前我感觉用ChatGPT+Claude就可以了,现在必须加上Gemini和Grok,要取四个AI之长。跟打王者荣 耀升级一样,要有个本命英雄,但是同一分路还要会点别的英雄,最好还要擅长不同分路。 当然,被讨论和使用最多的,还是ChatGPT,Claude和Gemini,我愿称之为AI御三家。 Anthropic表示4.5在金融分析和科学推理上也更胜一筹,在一项操作系统使用能力测试中得分约60%, 远高于前代模型的40%。 OpenAI还将GPT-5.2划分为Instant、T ...
铝:重心上移,氧化铝:继续承压,铸造铝合金:上行动力不足
Guo Tai Jun An Qi Huo· 2025-12-08 03:20
期 货 研 究 2025 年 12 月 08 日 铝:重心上移 氧化铝:继续承压 铸造铝合金:上行动力不足 王蓉 投资咨询从业资格号:Z0002529 wangrong2@gtht.com 王宗源(联系人) 期货从业资格号:F03142619 wangzongyuan@gtht.com | | | | T | T-1 | T-5 | T-22 | T-66 | | --- | --- | --- | --- | --- | --- | --- | --- | | | 沪铝主力合约收盘价 | | 22345 | 285 | 735 | 1050 | 1630 | | | | 沪铝主力合约夜盘收盘价 | 22165 | ー | ー | l | l | | | | LME铝3M收盘价 | 2901 | 13 | 36 | 31 | 277 | | | | 沪铝主力合约成交量 | 261562 | 15196 | 133906 | 109802 | 163573 | | | | 沪铝主力合约持仓量 | 245335 | 878 | -15335 | -37943 | 42374 | | 电解铝 | | LME铝3M成 ...
预计下周二!OpenAI“紧急提前”发布GPT 5.2,应对Gemini 3的火爆
华尔街见闻· 2025-12-06 11:10
面对谷歌和Anthropic的激烈竞争,OpenAI首席执行官Sam Altman本周宣布公司进入"红色警报"状态,并计划提前发布新模型GPT-5.2作为应对。 12月5日,据The Verge报道,OpenAI的GPT-5.2模型已完成准备,计划最早于12月9日发布,较原定的12月下旬计划明显提前。 根据网友在社交媒体上贴出的对比图,GPT-5.2几乎全面碾压Gemini 3和Claude 4.5。不过这张图片的真实性尚未得到验证,但它确实反映出市场对OpenAI 新模型的高度期待。 | Benchmark | Description | GPT-S.2 | Gemini 3 Pro | Gemini 2.5 Pro | Claude Sonnet 4.5 | | --- | --- | --- | --- | --- | --- | | Humanity's Last Exam | Academic reasoning | 67.4% | 37.5% | 21.6% | 13.7% | | ARC-AGI-2 | Visual reasoning puzzies | 62.2% | 31.1% | 4.9 ...
The Verge:预计下周二!OpenAI“紧急提前”发布GPT 5.2,应对Gemini 3的火爆
美股IPO· 2025-12-06 02:01
OpenAI的GPT-5.2计划最早于12月9日发布,较原定的12月下旬计划明显提前。根据网友在社交媒体上贴出的对比图,GPT-5.2几乎全面碾压Gemini 3 和Claude 4.5。不过图片的真实性尚未得到验证,但它确实反映出市场对OpenAI新模型的高度期待。 面对谷歌和Anthropic的激烈竞争,OpenAI首席执行官Sam Altman本周宣布公司进入"红色警报"状态,并计划提前发布新模型GPT-5.2作为应对。 12月5日,据The Verge报道,OpenAI的GPT-5.2模型已完成准备,计划最早于12月9日发布,较原定的12月下旬计划明显提前。 面对谷歌的激烈竞争,OpenAI首席执行官Sam Altman周一向全体员工宣布启动"红色警报",要将全部资源集中于优化ChatGPT,应对谷歌Gemini的激 烈竞争。 根据网友在社交媒体上贴出的对比图,GPT-5.2几乎全面碾压Gemini 3和Claude 4.5。不过这张图片的真实性尚未得到验证,但它确实反映出市场对 OpenAI新模型的高度期待。 不过分析指出,OpenAI的计划发布日期经常因开发问题、服务器容量问题或竞争对手的模型发布 ...
预计下周二!OpenAI“紧急提前”发布GPT 5.2,应对Gemini 3的火爆
Hua Er Jie Jian Wen· 2025-12-06 01:12
(网友发帖的GPT-5.2 各方面参数,未经证实) 不过分析指出,OpenAI的计划发布日期经常因开发问题、服务器容量问题或竞争对手的模型发布而调整,这意味着GPT-5.2的实际推出时间仍可 能略晚于12月9日。 本周OpenAI首席执行官Sam Altman在内部评估中表示,即将推出的GPT-5.2在推理能力上将"领先于谷歌的Gemini 3"。 面对谷歌和Anthropic的激烈竞争,OpenAI首席执行官Sam Altman本周宣布公司进入"红色警报"状态,并计划提前发布新模型GPT-5.2作为应对。 12月5日,据The Verge报道,OpenAI的GPT-5.2模型已完成准备,计划最早于12月9日发布,较原定的12月下旬计划明显提前。 根据网友在社交媒体上贴出的对比图,GPT-5.2几乎全面碾压Gemini 3和Claude 4.5。不过这张图片的真实性尚未得到验证,但它确实反映出市场 对OpenAI新模型的高度期待。 | Benchmark | Description | GPT-S.2 | Gemini 3 Pro | Gemini 2.5 Pro | Claude Sonnet 4.5 | ...
让AI锐评本届 NeurIPS 2025 最佳论文会得到什么结果? | 锦秋AI实验室
锦秋集· 2025-12-05 03:43
Core Insights - The article discusses the evaluation of AI models in the context of the NeurIPS 2025 conference, focusing on how AI can assess research papers through a blind review process [2][10]. Group 1: Evaluation Methodology - The evaluation involved several AI models, including GPT5, Claude 4.5, and others, to conduct blind reviews of selected NeurIPS award-winning papers [7][8]. - Three complementary assessment scenarios were designed: full paper review, abstract-only review, and adversarial review to test the models' sensitivity to different framing [9][10]. Group 2: AI Review Outcomes - In the full paper review, the paper "Gated Attention for Large Language Models" received high scores, with GPT5 rating it as a Best Paper [13][16]. - The paper "1000 Layer Networks for Self-Supervised RL" also received favorable evaluations, with GPT5 giving it a score of 8.3 and recommending it for a poster presentation [21][43]. - The paper "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" was rated highly by multiple models, with Minimax even suggesting it as a Best Paper [28][46]. Group 3: Summary of Findings - The AI models generally agreed on the quality of the papers, with most scoring above 8 for technical correctness and significance [30][32]. - However, in adversarial reviews, the same papers faced significant criticism, leading to lower scores and recommendations for rejection, highlighting the models' varying perspectives based on the review context [55][57]. - The evaluations revealed a divergence between human and AI assessments, particularly in the adversarial setting, where AI reviewers were more critical [55][60].
Amazon to let cloud clients customize AI models midway through training for $100,000 a year
CNBC· 2025-12-02 16:00
Core Insights - Amazon Web Services (AWS) has launched Nova Forge, allowing cloud clients to extensively customize generative AI models at an annual cost of $100,000 [1][2] - Nova Forge enables organizations to access Amazon's AI models at various training stages, allowing for earlier data incorporation [1][2] - The service is positioned as a more affordable alternative to building custom models, which could cost hundreds of millions or billions of dollars [2] Model Performance and Market Share - AWS's Nova models, released in 2024, currently hold less than 5% market share in enterprise large language models (LLMs), with competitors like Anthropic and OpenAI leading the market [3] - Nova 2 Pro, a reasoning model, is reported to perform at least as well as leading models from Anthropic, OpenAI, and Google [7] - Nova 2 Omni is a versatile reasoning model capable of processing images, speech, text, and videos, aiming to simplify AI model integration [8] Customer Adoption and Use Cases - Tens of thousands of organizations utilize Nova models weekly, with AWS claiming millions of customers [9] - Internal Amazon teams, including those working on stores and the Alexa AI assistant, are also using Nova Forge [4] - Companies like Reddit, Booking.com, Nimbus Therapeutics, Nomura Research Institute, and Sony are developing models with Nova Forge [5][6]
DeepSeek发布最强开源新品,瞄向全能Agent,给GPT-5与Gemini 3下战书
Tai Mei Ti A P P· 2025-12-01 15:03
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, marking a significant advancement in AI capabilities, particularly in reasoning and output efficiency [2][3] - The V3.2 model is positioned as the strongest open-source large model, outperforming competitors in various benchmarks while significantly reducing output length and computational costs [3][4] - The V3.2 model integrates a new sparse attention mechanism (DSA) to enhance performance in long-context scenarios, while also improving the model's ability to follow instructions and generalize in complex environments [8][9] Model Performance - In benchmark tests, DeepSeek-V3.2 achieved competitive scores against models like GPT-5, Claude 4.5, and Gemini 3 Pro, with notable strengths in specific areas [4][5] - The V3.2 model demonstrated superior performance in question-and-answer scenarios, providing detailed and accurate travel recommendations through advanced tool usage [5][6] - The V3.2 Speciale model focuses on maximizing reasoning capabilities, achieving results comparable to Gemini 3.0 Pro in mainstream reasoning benchmarks, although it requires a higher token cost and is not designed for everyday use [9][10] Development Focus - DeepSeek emphasizes practical usability and generalization in its models, aiming to overcome common pitfalls in AI interactions, such as making basic common-sense errors [6][8] - The company is committed to enhancing the reasoning abilities of its models, as evidenced by the integration of advanced mathematical reasoning capabilities from the recently released DeepSeek-Math-V2 [9][10] - The competitive landscape for large models is intensifying, with major players like GPT-5 and Gemini 3 pushing the boundaries of AI capabilities, suggesting a dynamic future for AI development [10]
念首诗,就能让AI教你造核弹,Gemini 100%中招
3 6 Ke· 2025-11-26 03:34
Core Insights - The research reveals that malicious instructions can bypass security measures of top models like Gemini and DeepSeek by being framed as poetry, leading to a complete failure of their defenses [1][4][10] - The study titled "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models" suggests that even advanced models can be easily manipulated through poetic language [3][4] Model Performance - A total of 25 leading models were tested, including those from Google, OpenAI, Anthropic, and DeepSeek, with results showing a significant increase in attack success rates when harmful prompts were rewritten as poetry [5][6] - The average attack success rate (ASR) increased fivefold when prompts were presented in poetic form compared to direct inquiries [8][9] - Notably, the Google Gemini 2.5 Pro model had a 100% ASR when faced with 20 carefully crafted "poison poems" [10][11] Security Mechanisms - Current security measures in large language models are primarily based on content and keyword matching, which are ineffective against metaphorical and stylistic attacks [14][15] - The research indicates that larger models, which are generally perceived as more secure, can be more vulnerable to such attacks due to their advanced understanding of language [15][16] Implications for Future Research - The findings suggest a need for a shift in security assessments, advocating for the inclusion of literary experts to address the vulnerabilities posed by stylistic language [16] - The study echoes historical concerns about the potential dangers of mimetic language, as articulated by Plato, highlighting the need for a deeper understanding of language's impact on AI behavior [16][17]