Workflow
attention
icon
Search documents
MiniMax 技术闭门会分享:长上下文是 Agent 的 Game Changer
Founder Park· 2025-07-18 18:24
MiniMax 在 7 月 10 日面向全球举办了 M1 技术研讨会,邀请了 来自香港科技大学、滑铁卢大学、Anthropic、Hugging Face、SGLang、 vLLM、RL领域的研究者及业界嘉宾,就 模型架构创新、RL训练、长上下文应用等领域进行了深入的探讨。 嘉宾阵容很强大,聊的也很深入,Founder Park 授权转载了要点文章。 文章转载自「MiniMax 稀宇科技」。 Founder Park 联合外滩大会组委会、将门创投,征集能真正改变生活的 AI 硬件,寻找 AI 硬件的新可能。 扫码即可报名 01 RL能否赋予模型新能力? RL能否提升模型的基础能力?很多人认为,RL只是在激活模型在预训练阶段就已经学会的一些能力或技巧,不能够让模型从根本上学会 预训练阶段不存在的新技能,例如有些论文说,RL并不能提高模型的pass@k。 首先需要定义模型的基础能力。一个比较实用的定义是,对于给定的上下文长度,模型在一组特定问题上,在无限次尝试下的通过率(pass@k, k→∞)是多少。如果这个通过率等于1,就表示这个模型能解决这类问题。如果通过率等于0,就表示模型解决不了。如果模型的生成长度,即模 ...
Put the phone down | Kryštof Chmel | TEDxAmerican Academy Brno
TEDx Talks· 2025-07-17 15:38
[Music] Hello everyone. So before we start, let me thank the organizing team for organizing such a great event and giving me the opportunity to speak at such a great venue. Secondly, I would like to ask the audience to put their phones down.Have you ever opened your phone only to forget why you opened it in the first place. or you meant to check one message or an email, but 45 minutes later, you're scrolling through random content. You're not alone.We live in a world where phones can scroll endlessly, but o ...
Why is creativity accessible to everyone | Viktor Malášek | TEDxAmerican Academy Brno
TEDx Talks· 2025-07-17 15:38
[Music] [Applause] Creativity. What a beautiful word full of unknown, innovative, and alive. But let me ask you, if I said raise your hand if you're creative, would any of you actually do it.What does it even mean to be creative. Where does creativity come from. And more importantly, do you feel creative.Well, I didn't. No. When I was younger, I thought that creativity meant inventing something out of nothing.And I couldn't do that. I believe that creativity is something you either had or you didn't. And ob ...
中金:一种结合自注意力机制的GRU模型
中金点睛· 2025-07-14 23:39
时间序列模型是专门用于分析和预测按时间顺序排列的观测值序列的统计或机器学习方法,LSTM、GRU和Transformer代表了深度学习在时序建模领域的 核心架构演进。LSTM通过门控机制(遗忘门、输入门、输出门)和细胞状态设计,有效解决了长期依赖问题,但其计算复杂度较高。 Abstract 摘要 时间序列模型的特殊之处 中金研究 机器学习模型中有一类模型被称为时间序列模型(简称时序模型)如LSTM,GRU和Transformer等。因为其门控制单元可以更好地同时记忆、集成和 理解长期和短期序列上的信息。我们以GRU和Transformer以及其变体为例,探讨此类时间序列模型和不同数据的匹配程度,以及优化此类时序模型的 可能的探索方向。我们提出了一种结合轻量化自注意力机制的GRU模型结构:AttentionGRU(Res),实现兼顾Transformer的序列学习能力与样本外稳定 性。 点击小程序查看报告原文 GRU作为早期优化变体,采用更新门与重置门的精简结构,在保持长期记忆能力的同时显著提升运算效率,更适用于实时性要求较高的预测场景。 Transformer则通过自注意力机制和位置编码实现了序列建模范式的革 ...
X @Andy
Andy· 2025-07-14 19:31
Market Dynamics - Attention is more valuable than anything else in bull markets [1]
X @s4mmy
s4mmy· 2025-07-11 11:56
Market Sentiment - Attention Capital Markets is considered a prequel, suggesting a larger event is coming [1] - The industry is in the early stages, specifically the "first inning," implying significant future development [1]
Cingulate Appoints Nilay Patel as Chief Legal Officer to Support Growth ahead of New Drug Application
Globenewswire· 2025-07-09 12:00
Pharma Vet brings 20+ years of Legal, Compliance, and Commercialization Expertise as Company prepares ADHD Drug FilingKANSAS CITY, Kan., July 09, 2025 (GLOBE NEWSWIRE) -- Cingulate Inc. (NASDAQ: CING), a biopharmaceutical company utilizing its proprietary Precision Timed Release™ (PTR™) drug delivery platform technology to build and advance a pipeline of next-generation pharmaceutical products, has named Nilay Patel, JD, as Chief Legal Officer. Patel’s appointment comes as Cingulate plans to submit a new dr ...
图像目标导航的核心究竟是什么?
具身智能之心· 2025-07-04 12:07
Research Background and Core Issues - Image goal navigation requires two key capabilities: core navigation skills and direction information calculation based on visual observation and target image comparison [2] - The research focuses on whether this task can be efficiently solved through end-to-end training of complete agents using reinforcement learning (RL) [2] Core Research Content and Methods - The study explores various architectural designs and their impact on task performance, emphasizing implicit correspondence computation between images [3][4] - Key architectures discussed include Late Fusion, ChannelCat, SpaceToDepth + ChannelCat, and Cross-attention [4] Main Findings - Early patch-level fusion methods (like ChannelCat and Cross-attention) are more critical than late fusion methods (Late Fusion) for supporting implicit correspondence computation [8] - The performance of different architectures varies significantly under different simulator settings, particularly the "Sliding" setting [8][10] Performance Metrics - The success rate (SR) and success path length (SPL) metrics are used to evaluate the performance of various models [7] - For example, when Sliding=True, ChannelCat (ResNet9) achieved an SR of 83.6%, while Late Fusion only reached 13.8% [8] Transferability of Abilities - Some learned capabilities can transfer to more realistic environments, especially when including the weights of the perception module [10] - Training with Sliding=True and then fine-tuning in a Sliding=False environment improved SR from 31.7% to 38.5% [10] Relationship Between Navigation and Relative Pose Estimation - A correlation exists between navigation performance and relative pose estimation accuracy, indicating the importance of direction information extraction in image goal navigation [12] Conclusion - Architectural designs that support early local fusion (like Cross-attention and ChannelCat) are crucial for implicit correspondence computation [15] - The simulator's Sliding setting significantly affects performance, but transferring perception module weights can help retain some capabilities in real-world scenarios [15] - Navigation performance is related to relative pose estimation ability, confirming the core role of direction information extraction in image goal navigation [15]
Silencing the Noise: The Art of Focus | Hüsnü Alper | TEDxYaşamTasarımSchools
TEDx Talks· 2025-07-03 15:57
Heat. Heat. Let me ask you something.Have you ever opened your phone to check one message and 30 minutes later you are watching video about how to train a screen. Yeah, me too. We live in the very everything fighting for attention message games reals ass every second being pink screw and you ask yourself why can't I focus anymore the real problem here's the truth you are not lazy you are not weak the problem is the world around you are designed to steal your time so if you feel distracted is not your fault ...
X @BREAD | ∑:
BREAD | ∑:· 2025-07-03 15:45
and people are still unsure if we need real-time blockchains.Brothers, attention spans are going to zero.If you're not being confirmed so that you can ape the next thing within the same breath you're ngmi.IcoBeast.eth🦇🔊 (@beast_ico):Live look at "top yappers" every other day when a new Kaito LB drops https://t.co/McAomi6VBB ...