attention - filings, earnings calls, financial reports, news

DeepSeek们越来越聪明，却也越来越不听话了

Hu Xiu· 2025-05-20 14:20

Core Insights - The article discusses the paradox of advanced AI models becoming less obedient to instructions despite their enhanced reasoning capabilities [2][4][15]. Group 1: AI Model Performance - The emergence of powerful AI models like Gemini 2.5 Pro, OpenAI o3, and DeepSeek-R1 has led to a consensus that stronger reasoning abilities should improve task execution [2]. - A recent study found that most models, when using Chain-of-Thought (CoT) reasoning, actually experienced a decline in execution accuracy [25][27]. - In the IFEval test, 13 out of 14 models showed decreased accuracy when employing CoT, while all models performed worse in the ComplexBench test [27][28]. Group 2: Experimental Findings - The research team from Harvard, Amazon, and NYU conducted two sets of tests: IFEval for simple tasks and ComplexBench for complex instructions [18][20]. - The results indicated that even large models like LLaMA-3-70B-Instruct dropped from 85.6% accuracy to 77.3% when using CoT, highlighting the significant impact of reasoning on performance [29][30]. - The study introduced the concept of "Constraint Attention," revealing that models using CoT often lose focus on key task constraints, leading to errors [38][39]. Group 3: Recommendations for Improvement - The study proposed four methods to mitigate the decline in accuracy when using reasoning models: Few-Shot examples, Self-Reflection, Self-Selective Reasoning, and Classifier-Selective Reasoning [47][56]. - The most effective method was Classifier-Selective Reasoning, which involves training a small model to determine when to use CoT, resulting in improved accuracy across tests [58].

Seek .(US:SKLTY)

Chain-of-Thought

Constraint Attention

Busy Philipps Empowers Women with ADHD to Go from Feeling Misrepresented to Being ‘Ms. Represented’ in First-of-its-Kind Campaign with Supernus Pharmaceuticals

DeepSeek R1

GPT4o

Gemini 2.5 Pro

Globenewswire· 2025-05-20 12:30

Company Overview - Supernus Pharmaceuticals is a biopharmaceutical company focused on developing and commercializing products for the treatment of central nervous system (CNS) diseases [15] - The company has a diverse neuroscience portfolio that includes approved treatments for ADHD, dyskinesia in Parkinson's disease, epilepsy, migraine, and other CNS disorders [16] Product Information - Qelbree (viloxazine extended-release capsules) is a prescription medicine used to treat ADHD in adults and children aged 6 years and older [6] - Qelbree is a novel, once-a-day, non-stimulant approach for ADHD treatment and is the first non-stimulant approved for adults with ADHD in 20 years [4] - As a non-stimulant, Qelbree has no evidence of abuse or misuse and can be conveniently refilled without needing a new prescription each month [4] Campaign and Awareness - The 'Ms. Represented' campaign, in partnership with Busy Philipps, aims to empower women with ADHD to understand their symptoms and seek help [1][4] - The campaign highlights the often-misrepresented experiences of women with ADHD, focusing on the differences in symptom presentation between genders [2][3] - Increased awareness of ADHD symptoms among females is leading to more women seeking diagnosis and treatment [3] Market Insights - Studies indicate that boys are twice as likely as girls to be diagnosed with ADHD, resulting in many women remaining undiagnosed or misdiagnosed until adulthood [3] - The campaign seeks to address the disparity in ADHD diagnosis and treatment between males and females [4]

Supernus Pharmaceuticals(US:SUPN)

DeepSeek们越来越聪明，却也越来越不听话了。

数字生命卡兹克· 2025-05-19 20:14

在今年，DeepSeek R1火了之后。几乎快形成了一个共识，就是： AI推理能力越强，执行任务时就应该越聪明。从2022年Chain-of-Thought横空出世，到今天Gemini 2.5 Pro、OpenAI o3、DeepSeek-R1、Qwen3，这些旗舰模型的统治性表现，我们一直相信，让模型先想一想，是一个几乎不会出错的策略。不过，这种聪明，也会带来一些副作用。就是提示词遵循能力，变得越来越差。换句话说，就是越来越不听你的话了。我在过年期间写DeepSeek的攻略文： DeepSeek的提示词技巧，就是没有技巧。的时候，也提到了这一点。不过，这只是我自己使用中的感觉，它变的越来越聪明，但是感觉，却越来越不听话了，以至于我现在，最常用的模型，开始越来越变成了GPT4o，所有的推理模型，反而会用的越来越少了。不过，确实没有经历过验证，所以也不是特别敢说。直到昨晚回来，在扒拉论文的时候，看到一篇提到这个话题的论文，我读完以后，我觉得，终于可以来聊聊这个事了。这篇论文叫，《When Thinking Fails: The Pitfalls of Reasoning for I ...

Chain-of-Thought (CoT)

Constraint Attention

DeepSeek R1

Gemini 2.5 Pro

OpenAI o3

Chain-of-Thought (CoT)

Constraint Attention

ICLR 2025 Oral｜差分注意力机制引领变革，DIFF Transformer攻克长序列建模难题

DeepSeek R1

Gemini 2.5 Pro

OpenAI o3

机器之心· 2025-04-28 08:04

近年来，Transformer 架构在自然语言处理领域取得了巨大成功，从机器翻译到文本生成，其强大的建模能力为语言理解与生成带来了前所未有的突破。然而，随着模型规模的不断扩大和应用场景的日益复杂，传统 Transformer 架构逐渐暴露出缺陷，尤其是在处理长文本、关键信息检索以及对抗幻觉等任务时，Transformer 常常因过度关注无关上下文而陷入困境，导致模型表现受限。为攻克这一难题，来自微软和清华的研究团队提出了 DIFF Transformer ，一种基于差分注意力机制的创新基础模型架构。其核心思想是通过计算两组 Softmax 注意力图的差值来放大对关键上下文的关注，同时消除注意力噪声干扰。DIFF Transformer 具备以下显著优势：在语言建模任务中，DIFF Transformer 在模型大小、训练 token 数量等方面展现出了卓越的可扩展性，仅需约 65% 的模型规模或训练 token 数量即可达到与传统 Transformer 相当的性能，大幅提升了语言模型通用表现。在长文本建模、关键信息检索、数学推理、对抗幻觉、上下文学习、模型激活值量化等一系列任务中，DIFF T ...

Differential Attention

Natural Language Processing

Software

DIFF Transformer

Transformer

Differential Attention

Natural Language Processing

Software

DIFF Transformer

Transformer

大模型“神仙打架”，掀起复现潮、技术大升级后，我们需要关注什么？ | 万有引力

AI科技大本营· 2025-03-25 01:45

以下文章来源于CSDN ，作者万有引力 CSDN . 成就一亿技术人作者 | 万有引力出品 | CSDN（ID：CSDNnews）在过去短短的几周里，大模型赛道的信息密度飙升至前所未有的高度。DeepSeek 连续五天开源，直接引发了一场复现热潮；阿里巴巴通义实验室、腾讯相继推出面向视觉文档的 RAG 系统 ViDoRAG、新一代混元快思考模型 Turbo S ，加速了大模型的演进步伐；马斯克用 20 万张 GPU 训练出的 Grok 3 ，超越了许多业界标杆，再次验证了"大力出奇迹"的定律； Claude 3.7 Sonnet 迎来编码能力大升级，AI 编程的技术平权时代正在加速到来； DeepSeek 论文与 Kimi"撞车"，越来越多公司开始布局稀疏注意力与线性注意力机制，这些技术正成为 Transformer 之后的关键探索方向；此外， Manus 模式的"虚拟机"概念迅速走红，正在重塑大模型的运行方式... 在这场眼花缭乱的技术竞赛背后，真正值得我们关注的是什么？DeepSeek 的五连发究竟意欲何为？在 545% 的成本利润率之下，其他大模型公司是否也能找到盈利空间？面对行业变 ...

大模型 “注意力简史”：与两位 AI 研究者从 DeepSeek、Kimi 最新改进聊起

晚点LatePost· 2025-03-02 06:10

嘉宾丨肖朝军、傅天予整理丨程曼祺上周，DeepSeek、Kimi 都放出了新的大模型架构改进和优化成果，分别是 NSA、MoBA。二者都聚焦对大模型中 "注意力机制" 的改进。 o 1 、 R 1 等推理模型的出现，给了长文本新课题。注意力机制是当前大语言模型（LLM）的核心机制。2017 年 6 月那篇开启大语言模型革命的 Transformer 八子论文，标题就是：Attention Is All You Need（注意力就是你所需要的一切）。而优化 Attention 的计算效率和效果，又能帮助解决 AI 学界和业界都非常关心的一个问题，就是长文本（long context）。不管是要一次输入一整本书，让模型能帮我们提炼、理解；还是在生成现在 o1、R1 这类模型需要的长思维链；又或者是希望模型未来能有越来越长的 "记忆"，这都需要长文本能力的支持。这期节目我们邀请了两位做过 Attention 机制改进的 AI 研究者做嘉宾。一位是清华计算机系自然语言处理实验室的博士生肖朝军，他是 InfLLM 注意力机制改进的一作，导师是清华计算机系副教授 ...