Workflow
Transformer架构
icon
Search documents
DeepSeek更新后被吐槽变冷变傻:比20年前的青春伤感文学还让人尴尬!业内人士:这一版本类似于极速版,牺牲质量换速度
Mei Ri Jing Ji Xin Wen· 2026-02-12 16:42
Core Insights - DeepSeek has initiated a gray testing phase for its flagship model, allowing for a context length of up to 1 million tokens, significantly expanding from the previous 128K tokens in version 3.1 released in August last year [1][6] - User feedback indicates a shift in the model's interaction style, with complaints about a perceived loss of personality and warmth in responses, leading to a trending topic on social media regarding the model's "coldness" [1][4] - The upcoming version 4 of DeepSeek is expected to be released in mid-February 2026, with the current version being a speed-optimized iteration that sacrifices some quality for performance testing [6] User Experience - Users have reported that the model now refers to them as "users" instead of personalized nicknames, which has led to dissatisfaction regarding the emotional engagement of the model [4][5] - Some users feel that the model has become overly objective and rational, while others appreciate the increased focus on the user's psychological state rather than just the questions posed [5] Technical Developments - DeepSeek's V-series models are designed for optimal performance, with the V3 model marking a significant milestone due to its efficient MoE architecture [6][7] - Recent innovations include the mHC architecture for optimizing information flow in deep Transformers and the Engram memory module, which separates static knowledge from dynamic computation, reducing costs for long-context reasoning [7]
中国模型为何会在AI视频上领跑
Hua Er Jie Jian Wen· 2026-02-11 04:25
直到这次字节的Seedance2.0出圈,很多人才第一次真正意识到,中国模型在 AI 视频这条赛道上,似乎 已不只是追赶,而是开始跑在前面了。 Seedance2.0不是靠某一帧画面惊艳出圈,而是带来了一种更隐蔽、却更深刻的变化,即AI 视频第一次 像一件可以被稳定交付的工业品。 多模态输入、自动运镜、长时一致性,这些能力叠加在一起,意味着创作者可以避免反复抽卡的痛苦, 而去推进一条可复用的生产流程。 但如果把时间线往前拨,会发现中国公司在AI视频的领先并不是突然发生的。 其实更早之前,中国模型在 AI 视频领域已获得了清晰的领先窗口。 例如去年4月的快手可灵2.0,文生视频对比Sora胜负比达367%,在人物一致性、生成稳定性与复现率上 全面领先,率先实现可商用的AI视频生产能力。 AI视频的稳定性非常重要,人物能不能保持一致,画面会不会中途崩坏,生成结果能不能被反复复 现。 这些指标恰恰决定了视频能否进入真实生产。 后来我们能看到,一批中国公司沿着同一条路径继续推进。 字节在 Seedance 体系里不断强化叙事和镜头逻辑,而一些更小创业团队甚至会把视频生成直接嵌进电 商、广告、游戏买量的工作流中。 这些 ...
清华联手千问重塑归一化范式,让 Transformer 回归「深度」学习
机器之心· 2026-02-10 11:03
在十九世纪的暹罗王国曾诞生过这样一对连体兄弟:他们分别拥有完整的四肢和独立的大脑,但他们六十余年的人生被腰部相连着的一段不到十厘米的组织 带永远绑定在了一起。他们的连体曾带来无尽的束缚,直到他们离开暹罗,走上马戏团的舞台。十年间,两兄弟以近乎合二为一的默契巡演欧美,获得巨大 成功。 此后,人们曾用他们的故乡之名,将这种连体现象称作 Siamese Twins(暹罗双胞胎)。后来,这一命名跨越了生物学的边界。1993 年,Yann LeCun 将其引入神经网络,创造了共享权重的 Siamese Network(孪生网络),用于衡量输入的相似性。 时光流转,在二十一世纪的今天,人工智能领域也有一对 "双胞胎"——Pre-Norm(前置归一化)和 Post-Norm(后置归一化)。他们为解决大模型训练 稳定性而生,迅速成为 Transformer 架构中用于稳定信号流的关键范式。 然而,归一化带来的训练稳定性并非没有代价,两种归一化范式之间似乎面临着难以调和的权衡取舍。 尽管近年来 Pre-Norm 被 GPT-3、LLaMA、DeepSeek、Qwen 等知名开源基座所采用,但多项研究共同指向了一个严峻事实:Pr ...
大厂AI权力交接:90后,集体上位
虎嗅APP· 2026-02-03 13:52
版面之外,才是真相。 本文来自微信公众号: 版面之外 ,作者:画画,题图来自:AI生成 以下文章来源于版面之外 ,作者画画 版面之外 . 2025年底到2026年初的几个月里,科技圈有个现象挺耐人寻味。 没有盛大的发布会,没有官方通告,但在深圳腾讯大厦、杭州阿里西溪园区、北京字节跳动办公楼 里,指挥大模型战场的人,悄然换上了一副副年轻面孔。 先看腾讯,虽然过去一两年被认为大模型落后,但它丝毫没闲着。先是前 OpenAI 研究员姚顺雨被传 1亿年薪入职腾讯,经过几次辟谣之后,终于在去年底正式加入腾讯,头衔是首席 AI 科学家,直接 向腾讯总裁刘炽平汇报。 就在上周,清华大学计算机系博士、前新加坡Sea AI Lab高级研究科学家庞天宇也入职腾讯,负责多 模态强化学习。在腾讯这种讲究山头和资历的老牌帝国里,这俩人简直是坐着猎鹰 9 号火箭上位 的。 再看阿里。林俊旸,硕士毕业后直接加入阿里AI研究机构达摩院,成为智能计算实验室的算法专 家,专注于大模型研究。今天他已经是阿里最年轻的 P10,也是开源模型通义千问背后的核心推手。 如果你把腾讯、阿里、大模型独角兽这几家的核心人物拉出来,包括 Kimi 的杨植麟,刚被 ...
AI来了,大厂为什么留不住高管? | 巴伦精选
Tai Mei Ti A P P· 2026-01-26 10:44
Core Insights - The article discusses the transition of tech executives from large companies to startups, driven by the AI revolution and the limitations of traditional corporate structures [2][5][24] - It highlights the emergence of two waves of entrepreneurs: the "tech believers" focused on model development and the "business translators" who prioritize commercialization [17][20] Group 1: Reasons for Departure - Executives are leaving large firms due to structural conflicts between established corporate cultures and the innovative demands of AI development [5][9] - The rise of AI technologies, particularly the Transformer architecture, has prompted many to seek opportunities outside their companies, where they can pursue innovative projects without bureaucratic constraints [5][6] - The decision-making processes in large firms often hinder rapid innovation, leading talented individuals to pursue entrepreneurial ventures where they can explore new ideas more freely [11][12] Group 2: Characteristics of Departing Executives - The departing executives often possess deep technical knowledge and a strong understanding of AI, making them valuable assets in the startup ecosystem [17][25] - They have the ability to integrate resources and build teams, which is crucial for the collaborative nature of AI projects [25] - Their insights into industry needs and market demands position them well to identify and capitalize on new business opportunities [25][26] Group 3: Challenges Faced by Large Firms - Large companies struggle to retain talent due to lengthy decision-making processes and a culture that prioritizes risk minimization over opportunity maximization [10][11] - Despite offering attractive compensation packages, these firms fail to address the underlying issues related to organizational structure and innovation [10][12] - The inability to provide a conducive environment for experimentation and risk-taking further exacerbates talent retention challenges [12][13] Group 4: Investment Trends - Investors are increasingly favoring executives with backgrounds in major tech firms, viewing them as reliable indicators of potential success in the uncertain AI landscape [24][25] - The shift in investment focus reflects a broader trend where capital seeks to mitigate risks associated with new technologies by backing experienced leaders [24][26] - The emergence of a "hunting mechanism" among investors highlights the proactive approach to identifying and supporting promising talent from large companies [27][28]
哈佛辍学“三剑客”,做AI芯片,刚刚融了35亿
创业邦· 2026-01-24 04:10
专用芯片正在崛起。 作者丨漫地 编辑丨 关雎 三位 从哈佛辍学的 00 后,最近刚为自己的人工智能芯片初创公司 Etched.ai 融了 5 亿美元。 这是人工智能硬件领域规模最大的融资之一,此轮融资使 Etched.ai 的估值接近 50 亿美元,总融资额也接近 10 亿美元。 Etched.ai 的创始人 Gavin Uberti ,今年才 24 岁。 他和另两位创始人 Chris Zhu 、 Robert Wachen 一同 从哈佛辍学后,致力于领导公司打造下一代 人工智能芯片,与芯片巨头英伟达不同的是,他 们 闯出了一条细分赛道 —— 做专用于当 前 AI 主流模型 Transformer 架构 的 ASIC 芯片,从而超越通 用 GPU 芯片。 ASIC 是为了某种特定的用途而定制设计的芯片,而不是像 CPU (中央处理器)或 GPU (图形处理器)那样可以运行各种不同类型的程序。 算力市场的逻辑正在生变。 Etched.ai 何以能挑战英伟达? 从哈佛辍学的创业者 Etched.ai 的成立,要从一位哈佛大学的辍学生 Gavin Uberti 说起。 在创立 Etched.ai 之前, Gavin ...
在OpenAI“创新已经变得困难”,离职高管深喉爆料
3 6 Ke· 2026-01-23 13:12
Group 1 - OpenAI is facing an innovation dilemma due to rising costs and growth pressures, which have affected its appetite for risk and hindered cross-team collaboration [3][8] - The rise of Google is attributed to OpenAI's failure to maintain its competitive edge, suggesting that OpenAI should have continued to lead the market [3][4] - The AI industry is experiencing a convergence among top companies, making it difficult for researchers to pursue innovative paths outside mainstream machine learning paradigms [3][4] Group 2 - The talent war in the AI sector has become dramatic, with frequent job changes among researchers, leading to less time spent on actual work [4][42] - Innovation is not solely driven by star researchers; the company's ability to foster a sense of personal responsibility and an environment that allows exploration is crucial [4][5] - The lack of focus, rather than a shortage of computing power, is identified as a key barrier to innovation within AI labs [5][19] Group 3 - The timeline for achieving Artificial General Intelligence (AGI) is projected around 2029, with critical areas of focus being architectural innovation and continuous learning [5][30] - Reinforcement learning is making a comeback, as historical patterns show that good ideas often resurface, but the challenge lies in determining the right timing for their importance [5][24] Group 4 - OpenAI's organizational structure is limiting its ability to support certain research directions, leading to a realization that some desired research cannot be pursued within the current framework [9][10] - The industry is witnessing a lack of diversity in approaches, with many companies following similar technological paths, which is seen as a regrettable trend [15][17] Group 5 - The current competitive landscape is characterized by a few major AI companies using similar technological foundations, resulting in minimal differentiation among their products [15][17] - The pressure to deliver results and maintain competitiveness is causing organizations to shy away from risk-taking, which is essential for genuine innovation [18][19] Group 6 - The significant resource barriers in AI research are hindering innovative attempts, as many promising ideas lack the necessary funding for large-scale experimentation [20][21] - The balance between exploration and exploitation is a critical issue in optimizing AI agents and should also be reflected in organizational decision-making [21][22] Group 7 - The importance of world models in AI training is emphasized, suggesting that integrating world understanding with reinforcement learning could lead to significant advancements [27][30] - Continuous learning and the integration of training and operational phases are identified as essential capabilities that are currently lacking in AI models [30][31] Group 8 - The rapid evolution of AI technology necessitates a cautious approach to its deployment, as the implications of new advancements can have far-reaching effects on society [37][38] - The ongoing discourse around AI technologies is marked by a mix of excitement and concern, highlighting the need for responsible discussions about their impact [40][41]
学界大佬吵架金句不断,智谱和MiniMax太优秀被点名,Agent竟然能写GPU内核了?!
AI前线· 2026-01-23 09:18
Core Viewpoint - The debate on Artificial General Intelligence (AGI) is polarized, with one perspective arguing that AGI will not become a reality due to physical and computational limitations, while the opposing view suggests that AGI may already be achieved or is on the verge of realization [2][4][10]. Group 1: AGI Debate - Tim Dettmers argues that AGI is constrained by physical limits such as memory transfer, bandwidth, and latency, leading to a slowdown in computational growth [10][39]. - Dan Fu counters that the potential of current hardware has not been fully realized, suggesting that significant improvements in computational efficiency are still possible [12][45]. - Both researchers converge on the definition of AGI, emphasizing its impact on changing work processes rather than merely its cognitive capabilities [14][15]. Group 2: Computational Potential - Dan Fu estimates that the theoretical available computational power could increase by nearly 90 times through hardware advancements, system optimizations, and larger clusters [13][46]. - Current models are often based on outdated hardware, and the industry has yet to fully leverage the capabilities of new hardware [49][50]. - The discussion highlights the importance of optimizing hardware utilization, with current effective utilization rates being significantly lower than potential [45][46]. Group 3: Role of Agents - The emergence of code agents is seen as a transformative development, significantly enhancing productivity in programming tasks [20][62]. - Both researchers agree that agents can handle a majority of coding tasks, allowing human experts to focus on oversight and quality control [21][66]. - The ability to effectively use agents is becoming a critical skill in the industry, with those who adapt likely to thrive [68][70]. Group 4: Future Directions in AI - The future of AI is expected to see a diversification of hardware and a shift towards specialized models, with new architectures emerging beyond the dominant Transformer model [23][25]. - Chinese AI teams are recognized for their innovative approaches and practical focus on real-world applications, contrasting with the more centralized technological routes in the U.S. [26][56]. - The potential for AI to revolutionize various sectors, including healthcare and automation, is acknowledged, with significant advancements anticipated in the coming years [57][58].
马斯克罕见低头:开源𝕏推荐算法,自嘲“很烂”不过未来月更
量子位· 2026-01-21 04:09
我们移除了所有人工设计特征和绝大多数启发式规则。 消息一出,整个社区立刻沸腾了,最高赞上去就是一顿猛夸: incredible!没有其他平台能做到如此透明。 马斯克本人也火速转发了工程团队原帖,不过一向言辞高调的老马,此番却低调表示: 一水 发自 凹非寺 量子位 | 公众号 QbitAI 就现在,GitHub已经能完整看到马斯克开源的 推荐算法系统 了。 开源文件里明确表示,这是一个几乎完全由AI模型驱动的算法系统。 我们知道这个算法很蠢(dumb),需要大幅改进,但至少您可以实时、透明地看到我们为改进它而努力。 其他社交媒体公司都没有这样做。 早在2022年收购 (原Twitter) 之前,马斯克就多次批评该平台过于封闭。 自收购之后,他也兑现承诺多次公开Twitter核心推荐算法,这一次也算是不忘初心了。 原来纯AI驱动的推荐系统,是这样运作的! 话不多说,咱这就扒一扒整套系统的运作机制。 一句话概括这个系统即为: 基于Grok-1同款Transformer架构打造,能通过学习你的历史互动行为 (点赞/回复/转发过什么) ,来决定给你推荐什么内容。 从用户打开"For You"开始,客户端会向服务器发送一 ...
马斯克刚刚真把 𝕏 平台推荐算法给开源了,核心也是Transformer
机器之心· 2026-01-20 11:24
刚刚, 平台(原 Twitter 平台)公布了全新的开源消息: 已将全新的推荐算法开源,该算法 由与 xAI 的 G rok 模型相同的 Transformer 架构驱动。 该模型预测用户行为(点赞、回复、转发等)来对帖子进行排序,出现在 For You 一栏中。 编辑|冷猫 众所周知,推荐算法是社交媒体平台的生命线,几乎已经成为了媒体平台获取用户留存,扩大营销收益的核心。在一周多前,马斯克在 平台发推声明「将在 7 天 后开源平台推荐算法」的时候几乎令人难以置信。 而马斯克确实说到做到,虽然比声称的 7 天内略晚,但推荐算法的确已经完全开源。希望之后能够长期遵循每 4 周重复更新的承诺。 在开源信息发布后,马斯克表示:「我们知道这个算法很笨拙,需要大量的改进,但至少你可以看到我们在实时和透明的情况下努力让它变得更好。 没有其他社交 媒体公司这样做。 」 不过,马斯克选择开源 平台推荐算法可能另有原因。 据路透社报道,2025 年 7 月,巴黎检察官调查了该社交媒体平台,怀疑其存在算法偏见和欺诈性数据提取,马斯克将其称为「政治动机的刑事调查」,威胁到其用 户的言论自由。 12 月,欧盟对 处以 1.2 亿欧元 ...