Large Language Model
Search documents
X @Tesla Owners Silicon Valley
Tesla Owners Silicon Valley· 2025-12-07 01:21
RT Tesla Owners Silicon Valley (@teslaownersSV)Grok Rankings Update December 6Grok 4.1 Fast (The Overall Volume Leader)This is xAI's agentic and tool-calling model, currently dominating the leaderboard by total tokens.#1 Overall Position on OpenRouter Leaderboard (Leading with 1.48 trillion tokens)#1 on τ²-Bench Telecom (Agentic Tool Use Benchmark)#1 on Berkeley Function Calling Benchmark#2 in Tool Calls (Rapidly climbing, indicating strong agent adoption)#2 in Multilingual Usage (Behind Grok Code Fast 1)Gr ...
让AI锐评本届 NeurIPS 2025 最佳论文会得到什么结果? | 锦秋AI实验室
锦秋集· 2025-12-05 03:43
Core Insights - The article discusses the evaluation of AI models in the context of the NeurIPS 2025 conference, focusing on how AI can assess research papers through a blind review process [2][10]. Group 1: Evaluation Methodology - The evaluation involved several AI models, including GPT5, Claude 4.5, and others, to conduct blind reviews of selected NeurIPS award-winning papers [7][8]. - Three complementary assessment scenarios were designed: full paper review, abstract-only review, and adversarial review to test the models' sensitivity to different framing [9][10]. Group 2: AI Review Outcomes - In the full paper review, the paper "Gated Attention for Large Language Models" received high scores, with GPT5 rating it as a Best Paper [13][16]. - The paper "1000 Layer Networks for Self-Supervised RL" also received favorable evaluations, with GPT5 giving it a score of 8.3 and recommending it for a poster presentation [21][43]. - The paper "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" was rated highly by multiple models, with Minimax even suggesting it as a Best Paper [28][46]. Group 3: Summary of Findings - The AI models generally agreed on the quality of the papers, with most scoring above 8 for technical correctness and significance [30][32]. - However, in adversarial reviews, the same papers faced significant criticism, leading to lower scores and recommendations for rejection, highlighting the models' varying perspectives based on the review context [55][57]. - The evaluations revealed a divergence between human and AI assessments, particularly in the adversarial setting, where AI reviewers were more critical [55][60].
xbench榜单更新!DeepSeek V3.2追平GPT-5.1|xbench月报
红杉汇· 2025-12-05 00:06
Core Insights - The latest xbench-ScienceQA leaderboard has been released, showcasing new models from six companies, with Gemini 3 Pro achieving state-of-the-art (SOTA) performance and DeepSeek V3.2 matching GPT-5.1 in scores while offering high cost-effectiveness [1][2][6] - xbench will introduce two new benchmarks to evaluate agent instruction-following capabilities and multimodal understanding of models [1] Model Performance Summary - **Gemini 3 Pro**: Scored 71.6, up from 59.4 in Gemini 2.5 Pro, with a BoN of 85. Average response time is 48.62 seconds. Cost for answering 500 questions is approximately $3 [3][6] - **DeepSeek V3.2**: Achieved a score of 62.6, matching GPT-5.1, with a BoN of 81. The cost for 500 questions is only $2 for the Speciale version and $1.3 for the Thinking version [6] - **Claude Opus 4.5**: Scored 55.2 with a fast average response time of 13 seconds, showing improvement over its predecessor [6] - **Kimi K2 Thinking**: Scored 51.8 with a BoN of 76, indicating a slight improvement [6] New Model Developments - **DeepSeek V3.2**: Introduces a Sparse Attention mechanism to enhance long-context performance while reducing computational complexity. It also features a scalable reinforcement learning framework to improve reasoning and instruction-following capabilities [10][12] - **Gemini 3**: A new multimodal model from Google DeepMind, excelling in reasoning depth and multimodal understanding, achieving a top score of 1501 Elo in LMArena [13] - **Nano Banana Pro**: A new image generation model that integrates advanced reasoning capabilities with real-time knowledge, allowing for complex image synthesis [14] - **Claude Opus 4.5**: A flagship model from Anthropic that excels in code generation and human-computer interaction, achieving high performance in real-world software engineering tasks [15][16] - **GPT-5.1**: An important iteration from OpenAI that enhances conversational fluency and complex task reasoning, introducing adaptive reasoning mechanisms [17] - **Tongyi DeepResearch**: Designed for deep research tasks, this model combines mid-training and post-training frameworks to enhance agent capabilities, achieving competitive performance with a smaller model [19]
X @外汇交易员
外汇交易员· 2025-12-03 03:50
The Information:OpenAI正开发全新AI大语言模型以应对谷歌Gemini 3的技术竞争,项目代号为“大蒜”(Garlic),预计明年年初上线。据报道,OpenAI首席研究官Mark Chen上周向团队介绍了这款新模型,据称该模型在编程与逻辑推理任务中的测试表现优于谷歌Gemini 3.0及Anthropic Opus 4.5。“Garlic”与早前披露的“Shallotpeat”属不同技术路线,其核心突破在于预训练阶段的优化:通过改进算法架构,实现在更小参数量模型中注入以往需大型模型才能获取的知识密度,显著降低训练成本与时间。该技术路径直接对标谷歌宣称的“预训练飞跃”,并修复了GPT-4.5存在的关键缺陷。当前“Garlic”仍需完成安全评估、特定领域精调等后续流程,预计2026年初可能以GPT-5.2或GPT-5.5版本发布。 ...
Anthropic-Style Context Editing… Now for Every LLM in LangChainJS!
LangChain· 2025-12-02 14:00
Hi there, this is Christian from Lchain. In my last video, we looked at how summarization middleware keeps your agents memory compact by rewriting the entire conversation history. But what if the problem isn't the conversation, it's the tools.Because here's the truth. Modern agents don't just talk. They call tools over and over again.And those tool results can absolutely explode your conductor window. Unlike user messages, tool outputs can be huge. I mean, we're talking about 20 pages of search results, a m ...
Kyivstar, Ministry of Digital Transformation of Ukraine Select Google's Gemma as Base Model for Training National LLM
Globenewswire· 2025-12-01 10:00
KYIV, Ukraine, Dec. 01, 2025 (GLOBE NEWSWIRE) -- Kyivstar (Nasdaq: KYIV; KYIVW), Ukraine’s leading digital operator, and the WINWIN AI Center of Excellence under the Ministry of Digital Transformation of Ukraine have selected Google’s Gemma as the base model for training the large language model (LLM). Gemma, Google’s next-generation open AI model, has been proven effective in both international and domestic projects. Kyivstar is the Ukrainian Government’s strategic partner and operational lead for developi ...
X @外汇交易员
外汇交易员· 2025-12-01 05:21
彭博:总部位于硅谷的PaleBlueDot AI正在寻求约3亿美元的贷款,以帮助其客户在日本购买英伟达公司的先进芯片。知情人士透露,这些芯片将用于位于东京的数据中心,而最终用户将是小红书。PaleBlueDot AI发言人回应称,“彭博的信息与事实不符”。外汇交易员 (@myfxtrader):FT:阿里巴巴和字节跳动等中国科技巨头正绕过美国的英伟达芯片限制,在东南亚数据中心训练其最新的大型语言模型。拜登政府时期的《AI扩散规则》已于今年早些时候被特朗普废除,因此中国企业通过租赁协议使用非中国实体拥有和运营的海外数据中心,符合美国出口管制规定。 https://t.co/ZzfV8JfEee ...
Alphabet is the best 'mag 7' stock to own for the next year, says Deepwater's Gene Munster
Youtube· 2025-11-24 21:09
Core Viewpoint - Alphabet is expected to be the best stock among the "Magnificent Seven" for the next year due to its strong performance in search and advancements in generative AI technology [1][2]. Group 1: Performance and Growth - Alphabet has accelerated its search revenue growth, beating market expectations by 300 basis points in the September quarter, indicating a positive outlook for the company [2]. - The company is capturing renewed interest in information retrieval, which is translating into increased search revenue [2]. Group 2: Competitive Position - Alphabet's Gemini project demonstrates its capability to compete with OpenAI in the large language model space, which has revitalized investor confidence in the company's competitive culture [3]. - Only about 20% of Google users currently utilize chatbots daily, presenting a significant growth opportunity for Alphabet as it expands chatbot usage [4]. Group 3: Market Valuation - Alphabet is currently trading at 28 times the next 12-month earnings, which is in line with its peers in the "Magnificent Six" [5]. - The company's valuation multiple has returned to historical averages, suggesting that there is potential for further earnings growth [6][8]. Group 4: Brand and User Habit - Google has a substantial user base with 2.5 billion daily search users compared to approximately 500 million daily ChatGPT users, highlighting Google's entrenched position in the market [7]. - Despite being perceived as an older brand, Google's habitual usage among consumers presents an opportunity for growth, especially as it integrates AI features into its search platform [9][10].
Amazon Operates 900 Data Centers as It Tries to Meet AI Demand
Bloomberg Television· 2025-11-24 15:44
GOOGLE AND NVIDIA AS WELL, WE'RE JOINED BY MANDEEP SINGH, GLOBAL HEAD OF TECHNOLOGY RESEARCH. LET'S KICK IT OFF WITH AMAZON. I THINK IT'S FAIR TO SAY WE'RE BOTH INTERESTED IN ALPHABET TODAY.SO THEY HAVE A ABOUT A ZILLION DATA CENTERS. I'M ACTUALLY NOT MOVED BY THAT. I COULD HAVE GUESSED IT, AND I DON'T THINK IT REALLY MATTERS, RIGHT.HOW MANY ACTUAL PLACES THEY HAVE, DOES IT. MANDEEP: RIGHT NOW, WHEN YOU COMPARE THEM TO A PURE PLAY DATA CENTER PLAY ON THE G. P.U. SIDE, THEY DO ABOUT 32 TO 35 DATA CENTERS. IT ...
Microsoft and Nvidia Just Signed a Multibillion-Dollar Deal With Anthropic. Here's What It Really Means for Investors.
Yahoo Finance· 2025-11-24 14:30
Core Insights - Microsoft and Nvidia have partnered with Anthropic, a large language model maker, where Anthropic will purchase $30 billion of compute capacity from Microsoft's Azure and commit to an additional 1 gigawatt of compute capacity, valued at around $50 billion [1] - Microsoft will invest up to $10 billion in Anthropic, which has a current valuation of $350 billion, a significant increase from its previous valuation of $183 billion [2] - Anthropic has existing investments from Amazon and Alphabet, with Amazon providing cloud computing and training support through its $11 billion AI data center [3][4] Microsoft and Nvidia's Investment - Microsoft and Nvidia's investments in Anthropic signify a strategic move to diversify their AI partnerships beyond OpenAI, with Microsoft aiming to enhance its Azure revenue and provide customers with alternative AI models [5][6] - The collaboration with Anthropic allows both Microsoft and Nvidia to strengthen their positions in the AI market, as Anthropic also partners with various chip and cloud computing companies [7]