attention

Search documents
DeepSeek研究员1200行代码复刻vLLM,H800硬件实测性能反超原版
量子位· 2025-06-13 07:05
西风 发自 凹非寺 量子位 | 公众号 QbitAI 仅用不到 1200行代码,实现最小化且完全可读的vLLM ! Dee pSeek研究 员俞星凯 搞了个开源项目引得大伙拍手叫绝。 项目名为 Nano-vLLM ( 纳米 级-vLLM) ,有三大特点: 下面是vLLM与Nano-vLLM在不同硬件和模型配置下的基准测试情况。 在RTX 4070硬件、Qwen3-0.6B模型环境中,设置了256个序列的总请求数,输入和输出长度均在100-1024个 token间随机采样。 测试结果be like: | Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | | --- | --- | --- | --- | | vLLM | 133,966 | 98.95 | 1353.86 | | Nano-vLLM | 133,966 | 101.90 | 1314.65 | vLLM略微领先。 二者输出token量相同,vLLM耗时98.95秒、吞吐量为1353.86 tokens/s,Nano-vLLM耗时101.90秒、吞吐量131 ...
一文了解DeepSeek和OpenAI:企业家为什么需要认知型创新?
混沌学园· 2025-06-10 11:07
Core Viewpoint - The article emphasizes the transformative impact of AI technology on business innovation and the necessity for companies to adapt their strategies to remain competitive in the evolving landscape of AI [1][2]. Group 1: OpenAI's Emergence - OpenAI was founded in 2015 by Elon Musk and Sam Altman with the mission to counteract the monopolistic power of major tech companies in AI, aiming for an open and safe AI for all [9][10][12]. - The introduction of the Transformer architecture by Google in 2017 revolutionized language processing, enabling models to understand context better and significantly improving training speed [13][15]. - OpenAI's belief in the Scaling Law led to unprecedented investments in AI, resulting in the development of groundbreaking language models that exhibit emergent capabilities [17][19]. Group 2: ChatGPT and Human-Machine Interaction - The launch of ChatGPT marked a significant shift in human-machine interaction, allowing users to communicate in natural language rather than through complex commands, thus lowering the barrier to AI usage [22][24]. - ChatGPT's success not only established a user base for future AI applications but also reshaped perceptions of human-AI collaboration, showcasing vast potential for future developments [25]. Group 3: DeepSeek's Strategic Approach - DeepSeek adopted a "Limited Scaling Law" strategy, focusing on maximizing efficiency and performance with limited resources, contrasting with the resource-heavy approaches of larger AI firms [32][34]. - The company achieved high performance at low costs through innovative model architecture and training methods, emphasizing quality data selection and algorithm efficiency [36][38]. - DeepSeek's R1 model, released in January 2025, demonstrated advanced reasoning capabilities without human feedback, marking a significant advancement in AI technology [45][48]. Group 4: Organizational Innovation in AI - DeepSeek's organizational model promotes an AI Lab paradigm that fosters emergent innovation, allowing for open collaboration and resource sharing among researchers [54][56]. - The dynamic team structure and self-organizing management style encourage creativity and rapid iteration, essential for success in the unpredictable field of AI [58][62]. - The company's approach challenges traditional hierarchical models, advocating for a culture that empowers individuals to explore and innovate freely [64][70]. Group 5: Breaking the "Thought Stamp" - DeepSeek's achievements highlight a shift in mindset among Chinese entrepreneurs, demonstrating that original foundational research in AI is possible within China [75][78]. - The article calls for a departure from the belief that Chinese companies should only focus on application and commercialization, urging a commitment to long-term foundational research and innovation [80][82].
Jay Glazer and Wife Rosie Glazer Offer Candid Look at ADHD and Relationships in New Qelbree® Content Series with Supernus Pharmaceuticals
Globenewswire· 2025-05-29 12:30
Core Insights - Supernus Pharmaceuticals is launching a new video series featuring Jay Glazer and his wife Rosie, focusing on the impact of ADHD on relationships and the role of Qelbree in managing ADHD symptoms [1][2][3] - The series aims to foster open discussions about ADHD, sharing practical strategies for symptom management and insights from both Jay and Rosie [2][3] Company Overview - Supernus Pharmaceuticals is a biopharmaceutical company dedicated to developing and commercializing treatments for central nervous system (CNS) diseases, including ADHD [11][12] - The company has a diverse neuroscience portfolio that includes approved treatments for various CNS disorders, and is actively developing new potential treatments for conditions such as epilepsy and depression [12] Product Information - Qelbree (viloxazine extended-release capsules) is a non-stimulant prescription medication used to treat ADHD in individuals aged 6 years and older [5][10] - Qelbree is available in three dosages: 100 mg, 150 mg, and 200 mg capsules [10]
DeepSeek们越来越聪明,却也越来越不听话了
Hu Xiu· 2025-05-20 14:20
Core Insights - The article discusses the paradox of advanced AI models becoming less obedient to instructions despite their enhanced reasoning capabilities [2][4][15]. Group 1: AI Model Performance - The emergence of powerful AI models like Gemini 2.5 Pro, OpenAI o3, and DeepSeek-R1 has led to a consensus that stronger reasoning abilities should improve task execution [2]. - A recent study found that most models, when using Chain-of-Thought (CoT) reasoning, actually experienced a decline in execution accuracy [25][27]. - In the IFEval test, 13 out of 14 models showed decreased accuracy when employing CoT, while all models performed worse in the ComplexBench test [27][28]. Group 2: Experimental Findings - The research team from Harvard, Amazon, and NYU conducted two sets of tests: IFEval for simple tasks and ComplexBench for complex instructions [18][20]. - The results indicated that even large models like LLaMA-3-70B-Instruct dropped from 85.6% accuracy to 77.3% when using CoT, highlighting the significant impact of reasoning on performance [29][30]. - The study introduced the concept of "Constraint Attention," revealing that models using CoT often lose focus on key task constraints, leading to errors [38][39]. Group 3: Recommendations for Improvement - The study proposed four methods to mitigate the decline in accuracy when using reasoning models: Few-Shot examples, Self-Reflection, Self-Selective Reasoning, and Classifier-Selective Reasoning [47][56]. - The most effective method was Classifier-Selective Reasoning, which involves training a small model to determine when to use CoT, resulting in improved accuracy across tests [58].
Busy Philipps Empowers Women with ADHD to Go from Feeling Misrepresented to Being ‘Ms. Represented’ in First-of-its-Kind Campaign with Supernus Pharmaceuticals
Globenewswire· 2025-05-20 12:30
Company Overview - Supernus Pharmaceuticals is a biopharmaceutical company focused on developing and commercializing products for the treatment of central nervous system (CNS) diseases [15] - The company has a diverse neuroscience portfolio that includes approved treatments for ADHD, dyskinesia in Parkinson's disease, epilepsy, migraine, and other CNS disorders [16] Product Information - Qelbree (viloxazine extended-release capsules) is a prescription medicine used to treat ADHD in adults and children aged 6 years and older [6] - Qelbree is a novel, once-a-day, non-stimulant approach for ADHD treatment and is the first non-stimulant approved for adults with ADHD in 20 years [4] - As a non-stimulant, Qelbree has no evidence of abuse or misuse and can be conveniently refilled without needing a new prescription each month [4] Campaign and Awareness - The 'Ms. Represented' campaign, in partnership with Busy Philipps, aims to empower women with ADHD to understand their symptoms and seek help [1][4] - The campaign highlights the often-misrepresented experiences of women with ADHD, focusing on the differences in symptom presentation between genders [2][3] - Increased awareness of ADHD symptoms among females is leading to more women seeking diagnosis and treatment [3] Market Insights - Studies indicate that boys are twice as likely as girls to be diagnosed with ADHD, resulting in many women remaining undiagnosed or misdiagnosed until adulthood [3] - The campaign seeks to address the disparity in ADHD diagnosis and treatment between males and females [4]
DeepSeek们越来越聪明,却也越来越不听话了。
数字生命卡兹克· 2025-05-19 20:14
在今年,DeepSeek R1火了之后。 几乎快形成了一个共识,就是: AI推理能力越强,执行任务时就应该越聪明。 从2022年Chain-of-Thought横空出世,到今天Gemini 2.5 Pro、OpenAI o3、DeepSeek-R1、Qwen3,这些旗 舰模型的统治性表现,我们一直相信,让模型先想一想,是一个几乎不会出错的策略。 不过,这种聪明,也会带来一些副作用。 就是提示词遵循能力,变得越来越差。 换句话说,就是越来越不听你的话了。 我在过年期间写DeepSeek的攻略文: DeepSeek的提示词技巧,就是没有技巧。 的时候,也提到了这一 点。 不过,这只是我自己使用中的感觉,它变的越来越聪明,但是感觉,却越来越不听话了,以至于我现 在,最常用的模型,开始越来越变成了GPT4o,所有的推理模型,反而会用的越来越少了。 不过,确实没有经历过验证,所以也不是特别敢说。 直到昨晚回来,在扒拉论文的时候,看到一篇提到这个话题的论文,我读完以后,我觉得,终于可以来 聊聊这个事了。 这篇论文叫,《When Thinking Fails: The Pitfalls of Reasoning for I ...
ICLR 2025 Oral|差分注意力机制引领变革,DIFF Transformer攻克长序列建模难题
机器之心· 2025-04-28 08:04
近年来,Transformer 架构在自然语言处理领域取得了巨大成功,从机器翻译到文本生成,其强大的建模能力为语言理解与生成带来了前所未有的突破。 然而,随着模型规模的不断扩大和应用场景的日益复杂,传统 Transformer 架构逐渐暴露出缺陷,尤其是在处理长文本、关键信息检索以及对抗幻觉等任 务时,Transformer 常常因过度关注无关上下文而陷入困境,导致模型表现受限。 为攻克这一难题,来自微软和清华的研究团队提出了 DIFF Transformer ,一种基于差分注意力机制的创新基础模型架构。 其核心思想是通过计算两组 Softmax 注意力图的差值来放大对关键上下文的关注,同时消除注意力噪声干扰。DIFF Transformer 具备以下显著优势: 在语言建模任务中,DIFF Transformer 在模型大小、训练 token 数量等方面展现出了卓越的可扩展性,仅需约 65% 的模型规模或训练 token 数量即可 达到与传统 Transformer 相当的性能,大幅提升了语言模型通用表现。 在长文本建模、关键信息检索、数学推理、对抗幻觉、上下文学习、模型激活值量化等一系列任务中,DIFF T ...
大模型“神仙打架”,掀起复现潮、技术大升级后,我们需要关注什么? | 万有引力
AI科技大本营· 2025-03-25 01:45
以下文章来源于CSDN ,作者万有引力 CSDN . 成就一亿技术人 作者 | 万有引力 出品 | CSDN(ID:CSDNnews) 在过去短短的几周里,大模型赛道的信息密度飙升至前所未有的高度。DeepSeek 连续 五天开源 ,直接引发了一场复现热潮;阿里巴巴通义实验室、 腾讯相继推出面向视觉文档的 RAG 系统 ViDoRAG、新一代混元快思考模型 Turbo S ,加速了大模型的演进步伐;马斯克用 20 万张 GPU 训练出的 Grok 3 ,超越了许多业界标杆,再次验证了"大力出奇迹"的定律; Claude 3.7 Sonnet 迎来编码能力大升级,AI 编程的技术平权时代正在加速到来; DeepSeek 论文与 Kimi"撞车",越来越多公司开始布局稀疏注意力与线性注意力机制,这些技术正成为 Transformer 之后的关键探索方向;此外, Manus 模式的"虚拟机"概 念迅速走红,正在重塑大模型的运行方式... 在这场眼花缭乱的技术竞赛背后,真正值得我们关注的是什么?DeepSeek 的五连发 究竟意欲何为?在 545% 的成本利润率之下,其他大模型公司是 否也能找到盈利空间?面对行业变 ...
大模型 “注意力简史”:与两位 AI 研究者从 DeepSeek、Kimi 最新改进聊起
晚点LatePost· 2025-03-02 06:10
嘉宾 丨 肖朝军、傅天予 整理 丨 程曼祺 上周,DeepSeek、Kimi 都放出了新的大模型架构改进和优化成果,分别是 NSA、MoBA。二者都聚焦对大 模型中 "注意力机制" 的改进。 o 1 、 R 1 等 推 理 模 型 的 出 现,给 了 长 文 本 新 课 题 。 注意力机制是当前大语言模型(LLM)的核心机制。2017 年 6 月那篇开启大语言模型革命的 Transformer 八 子论文,标题就是:Attention Is All You Need(注意力就是你所需要的一切)。 而优化 Attention 的计算效率和效果,又能帮助解决 AI 学界和业界都非常关心的一个问题,就是长文本(long context)。 不管是要一次输入一整本书,让模型能帮我们提炼、理解;还是在生成现在 o1、R1 这类模型需要的长思维 链;又或者是希望模型未来能有越来越长的 "记忆",这都需要长文本能力的支持。 这期节目我们邀请了两位做过 Attention 机制改进的 AI 研究者做嘉宾。 一位是清华计算机系自然语言处理实验室的博士生肖朝军,他是 InfLLM 注意力机制改进的一作,导师是清华 计算机系副教授 ...