Transformer
Search documents
AI势不可挡:2026年模型升级有哪些预期差?
2026-02-10 03:24
宗建树 长江证券分析师: 各位领导,大家晚上好,我是长江证券的首席分析师宗建树,今天由我给大家汇报一下我 们整个的一个 AI 势不可挡,2026 年模型升级有哪些预期差的一个整体的汇报。然后 AI 是围绕是我们最近开的一个新的系列,因为最近整个 AI 的产业其实近有一波比较大的一 个调整。但是我们觉得最近调整的一个主要核心原因,第一个就是确实现在在整个需求侧 落地,目前还没有看到明显的加速。第二个是海外的宏观的一个波动,也进步放大了整个 AI 的波动。 但是,我们认为从整个 AI 大的产业趋势来看,目前的产业趋势的确定性是在不断的提升, 所以我们坚定看好整个后续。产业趋势,整个后续的一个发展。今天主要汇报的是关于模 型方面的升级,因为从这个 AI 来看,模型是这个 AI 最大的核心的一个驱动力。我们觉得 在整个 2026 年,我觉得是一个模型升级原有的范式的曲线继续向上,然后模型又逐渐开 始跟场景融合的一个年份。所以今年第一个模型的演进,一样会持续的一个向上。第二个 我们觉得整个模型的,跟场景结合之后,它的落地会全面的加速。 所以,这是我们非常去看好整个 2025 年整个 AI 去表现的一个非常重要的原因。我 ...
90后大牛,集体上位
投资界· 2026-02-09 07:19
以下文章来源于版面之外 ,作者画画 版面之外 . 版面之外,才是真相。 权力交接。 作者 / 画画 来源 / 版面之外 (ID:Out_take) 2 0 2 5年底到2 0 2 6年初的几个月里,科技圈有个现象挺耐人寻味。 没有盛大的发布会,没有官方通告,但在深圳腾讯大厦、杭州阿里西溪园区、北京字节 跳动办公楼里,指挥大模型战场的人,悄然换上了一副副年轻面孔。 先 看 腾 讯 , 虽 然 过 去 一 两 年 被 认 为 大 模 型 落 后 , 但 它 丝 毫 没 闲 着 。 先 是 前 Op e nAI 研 究员姚顺雨被传1亿年薪入职腾讯,经过几次辟谣之后,终于在去年底正式加入腾讯, 头衔是首席 AI 科学家,直接向腾讯总裁刘炽平汇报。 就 在 上 周 , 清 华 大 学 计 算 机 系 博 士 、 前 新 加 坡 S e a AI La b 高 级 研 究 科 学 家 庞 天 宇 也 入 职腾讯,负责多模态强化学习。在腾讯这种讲究山头和资历的老牌帝国里,这俩人简直 是坐着猎鹰 9 号火箭上位的。 再 看 阿 里 。 林 俊 旸 , 硕 士 毕 业 后 直 接 加 入 阿 里 AI 研 究 机 构 达 摩 ...
大厂AI权力交接:90后,集体上位
3 6 Ke· 2026-02-02 00:22
2025年底到2026年初的几个月里,科技圈有个现象挺耐人寻味。 没有盛大的发布会,没有官方通告,但在深圳腾讯大厦、杭州阿里西溪园区、北京字节跳动办公楼里, 指挥大模型战场的人,悄然换上了一副副年轻面孔。 先看腾讯,虽然过去一两年被认为大模型落后,但它丝毫没闲着。先是前 OpenAI 研究员姚顺雨被传1 亿年薪入职腾讯,经过几次辟谣之后,终于在去年底正式加入腾讯,头衔是首席 AI 科学家,直接向腾 讯总裁刘炽平汇报。 就在上周,清华大学计算机系博士、前新加坡Sea AI Lab高级研究科学家庞天宇也入职腾讯,负责多模 态强化学习。在腾讯这种讲究山头和资历的老牌帝国里,这俩人简直是坐着猎鹰 9 号火箭上位的。 再看阿里。林俊旸,硕士毕业后直接加入阿里AI研究机构达摩院,成为智能计算实验室的算法专家, 专注于大模型研究。今天他已经是阿里最年轻的 P10,也是开源模型通义千问背后的核心推手。 如果你把腾讯、阿里、大模型独角兽这几家的核心人物拉出来,包括 Kimi 的杨植麟,刚被 Meta 砸下 数十亿美金收购的 Manus 创始人肖弘,会发现一个挺震撼的现象,掌舵着 AI 方向的,全是一帮 90 后。 这批人精准地 ...
硅谷“钱太多”毁了AI ?!前OpenAI o1负责人炮轰:别吹谷歌,Q-Star 被炒成肥皂剧,7年高压被“逼疯”!
Xin Lang Cai Jing· 2026-01-25 01:24
Core Insights - Jerry Tworek's departure from OpenAI highlights a growing divide between AI research and commercialization, as he seeks to pursue riskier foundational research that is increasingly difficult within a company focused on user growth and commercial strategies [2][3][4] - Tworek criticizes the AI industry for a lack of innovation, noting that major companies are developing similar technologies, which pressures researchers to prioritize short-term gains over experimental breakthroughs [3][4][24] - He emphasizes that OpenAI's slow response to competition from Google was a significant factor in its current position, suggesting that the company made critical missteps despite its initial advantages [3][4] Company Dynamics - Tworek points out that employee turnover can indicate deeper issues within a company, suggesting that if many key personnel leave due to misalignment in direction or decision-making, it reflects underlying problems [4][24] - He contrasts OpenAI's organizational rigidity with the agility of competitors like Anthropic, which he praises for its focused and effective execution in AI research [4][5] - The current state of the AI industry resembles a dramatic narrative, where personal movements and internal conflicts are sensationalized, creating a high-pressure environment for researchers [6][7][44] Research and Innovation - Tworek believes that the AI field is overly focused on scaling existing models, particularly those based on the Transformer architecture, and argues for the need to explore new methodologies and architectures [19][36] - He identifies two underappreciated research directions: architectural innovation beyond Transformers and the integration of continual learning, which he sees as essential for advancing AI capabilities [36][37] - The industry is at a crossroads where researchers must balance the pursuit of groundbreaking ideas with the pressures of existing corporate structures and funding constraints [28][30] Future Outlook - Tworek expresses cautious optimism about the potential for breakthroughs in AI, suggesting that while significant progress has been made, there are still many unexplored avenues that could lead to substantial advancements [38][40] - He acknowledges the challenges of achieving AGI, emphasizing the importance of integrating continuous learning and multimodal perception into AI systems [39][40] - The conversation around AI's impact on society is evolving, with a recognition that new technologies will have profound effects on various aspects of life, including interpersonal relationships and economic productivity [42][43]
非Transformer架构的新突破,液态神经网络的推理小模型只用900M内存
机器之心· 2026-01-21 09:35
Core Insights - The article discusses the dominance of the Transformer architecture in large models and introduces Liquid AI's new model, LFM2.5-1.2B-Thinking, which operates efficiently on edge devices [1][2]. Group 1: Model Overview - Liquid AI has released LFM2.5-1.2B-Thinking, a reasoning model that can run entirely on edge devices with only 900 MB of memory [2][3]. - This model excels in generating internal reasoning trajectories before arriving at final answers, demonstrating superior performance in tool usage, mathematical reasoning, and instruction following [3][14]. Group 2: Performance Metrics - Compared to its predecessor LFM2.5-1.2B-Instruct, LFM2.5-1.2B-Thinking shows significant improvements in three key areas: mathematical reasoning (from 63 to 88 on MATH-500), instruction following (from 61 to 69 on Multi-IF), and tool usage (from 49 to 57 on BFCLv3) [7][9]. - In various reasoning benchmarks, LFM2.5-1.2B-Thinking's performance matches or exceeds that of Qwen3-1.7B, despite having approximately 40% fewer parameters [7][10]. Group 3: Training and Development - The model's training involved multi-step reasoning to enhance capabilities while maintaining concise answers for low-latency deployment [16]. - Liquid AI implemented strategies to reduce the occurrence of "doom looping" in the model's responses, achieving a reduction from 15.74% to 0.36% in the final training phase [17][18]. Group 4: Ecosystem and Compatibility - Liquid AI is expanding the ecosystem for the LFM series, ensuring compatibility with popular reasoning frameworks and supporting various hardware accelerations [24]. - The model has been tested across different devices, showcasing its efficient performance in long-context reasoning [26]. Group 5: Future Implications - LFM2.5-1.2B-Thinking signifies a shift away from the exclusive reliance on Transformer models, suggesting that smaller, powerful edge reasoning models may offer superior solutions [27]. - The decreasing barriers to running inference models on various devices is seen as a positive development for AI potential [28].
谷歌刚掀了模型记忆的桌子,英伟达又革了注意力的命
3 6 Ke· 2026-01-20 01:12
Core Insights - Google's Nested Learning has sparked a significant shift in the understanding of model memory, allowing models to change parameters during inference rather than being static after training [1][5] - NVIDIA's research introduces a more radical approach with the paper "End-to-End Test-Time Training for Long Context," suggesting that memory is essentially learning, and "remembering" equates to "continuing to train" [1][10] Group 1: Nested Learning and Test-Time Training (TTT) - Nested Learning allows models to incorporate new information into their internal memory during inference, rather than just storing it temporarily [1][5] - TTT, which has roots dating back to 2013, enables models to adapt their parameters during inference, enhancing their performance based on the current context [5][9] - TTT-E2E proposes a method that eliminates the need for traditional attention mechanisms, allowing for constant latency regardless of context length [7][9] Group 2: Memory Redefined - Memory is redefined as a continuous learning process rather than a static storage structure, emphasizing the importance of how past information influences future predictions [10][34] - The TTT-E2E method aligns the model's learning objectives directly with its ultimate goal of next-token prediction, enhancing its ability to learn from context [10][16] Group 3: Engineering Stability and Efficiency - The implementation of TTT-E2E incorporates meta-learning to stabilize the model's learning process during inference, addressing issues of catastrophic forgetting and parameter drift [20][22] - Safety measures, such as mini-batch processing and sliding window attention, are introduced to ensure the model retains short-term memory while updating parameters [24][25] Group 4: Performance Metrics - TTT-E2E demonstrates superior performance in loss reduction across varying context lengths, maintaining efficiency even as context increases [27][29] - The model's ability to learn continuously from context without relying on traditional attention mechanisms results in significant improvements in prediction accuracy [31][34] Group 5: Future Implications - The advancements in TTT-E2E suggest a shift towards a more sustainable approach to continuous learning, potentially becoming a leading solution in the industry for handling long-context scenarios [34][35] - This approach aligns with the growing demand for models that can learn and adapt without the high computational costs associated with traditional attention mechanisms [33][34]
英伟达DLSS 4.5来了:Transformer再进化消除鬼影,“拼好帧”最高提至6倍还能动态调节
量子位· 2026-01-16 07:21
Core Viewpoint - NVIDIA has introduced DLSS 4.5 at CES 2026, enhancing gaming experiences by addressing key player concerns regarding image quality and frame rates through a "dual-core" strategy [1][3]. Group 1: Image Quality Enhancement - The first core focuses on image quality, utilizing an upgraded super-resolution technology based on the second-generation Transformer model [4][11]. - This new model boasts five times the computational power of the first generation and is trained on a significantly expanded high-fidelity dataset [12]. - The upgraded model directly processes in the game's native linear space, improving clarity and reducing artifacts like ghosting and flickering, especially in high-contrast scenes [17][19]. - Users of all GeForce RTX graphics cards can access the super-resolution feature through an NVIDIA App update, ensuring enhanced stability and clarity [21]. Group 2: Performance Improvement - The second core is dedicated to performance, specifically designed for the RTX 50 series, featuring dynamic multi-frame generation [6][23]. - DLSS 4.5 introduces a new six-fold multi-frame generation mode, allowing for the generation of up to five additional frames for each traditional rendered frame, significantly enhancing game smoothness [25]. - For instance, the game "Black Myth: Wukong" can now run at 240 fps, compared to its previous frame rate of under 190 fps [27]. - The dynamic multi-frame generation adapts to GPU performance and monitor refresh rates, optimizing frame rates while maintaining quality and responsiveness [30][33]. Group 3: Display Technology Advancement - NVIDIA has also unveiled G-SYNC Pulsar, a significant evolution of G-SYNC technology aimed at reducing motion blur in high-speed visuals [34]. - Demonstrations show that this technology can enhance the visual clarity of a 360Hz monitor to the equivalent of 1000Hz [35]. - Initial support for G-SYNC Pulsar has been rolled out by manufacturers such as ASUS, AOC, and MSI [36].
China just 'months' behind U.S. AI models, Google DeepMind CEO says
CNBC· 2026-01-15 23:30
Core Insights - China's artificial intelligence (AI) models are reportedly only "a matter of months" behind U.S. and Western capabilities, according to Demis Hassabis, CEO of Google DeepMind, challenging previous assumptions of a significant gap [3][4] - Chinese AI lab DeepSeek has demonstrated strong performance with models built on less advanced chips, indicating that Chinese companies are making notable advancements in AI technology [5] - Despite progress, there are concerns regarding China's ability to innovate beyond existing technologies, with Hassabis emphasizing the difficulty of achieving frontier breakthroughs [6][8] AI Development in China - Chinese tech giants like Alibaba and startups such as Moonshot AI and Zhipu have released competitive AI models, contributing to the perception of China's rapid advancement in the field [5] - Nvidia CEO Jensen Huang acknowledged that while the U.S. leads in chip technology, China is making significant strides in AI models and infrastructure [9] Challenges Facing Chinese AI Firms - Access to critical technology, particularly advanced semiconductors from Nvidia, poses a significant challenge for Chinese technology firms, which could widen the gap between U.S. and Chinese AI capabilities over time [10][11] - Analysts predict that the lack of access to cutting-edge Nvidia chips may lead to a divergence in AI model capabilities, with U.S. infrastructure continuing to iterate and improve [12] Perspectives on Innovation - Alibaba's Qwen team technical lead, Lin Junyang, expressed skepticism about Chinese firms surpassing U.S. tech giants in AI within the next three to five years, citing a substantial difference in computing infrastructure [15] - Hassabis attributes the lack of groundbreaking innovations in China to a "mentality" issue rather than solely technological restrictions, comparing the need for exploratory innovation to the historical achievements of Bell Labs [16][17]
Ambarella (NasdaqGS:AMBA) FY Conference Transcript
2026-01-13 21:47
Summary of Ambarella's Conference Call Company Overview - **Company**: Ambarella - **Industry**: Semiconductor, specifically focusing on edge AI applications - **Core Products**: AI semiconductors used in video security, autonomous driving, telematics, and other robotic applications - **Revenue Source**: Approximately 80% of revenue comes from edge AI applications [2][5] Transformation and Product Development - Ambarella has transformed from a video processor company for consumer applications to an AI SoC provider for intelligent edge applications over the past decade [5][6] - The company has developed three generations of AI accelerators, with the second generation (CV2 family) representing 80% of total revenue [7][10] - The third generation architecture incorporates transformer-based models, which are expected to open larger market opportunities compared to CNN-based models [8][10] Market Opportunities and Growth - The company anticipates significant growth in transformer-based revenue, which is expected to coexist with CNN-based revenue [11][12] - The average selling price (ASP) for the CV2 family ranges from $15 to $75, while the third generation (CV3, CV7, N1) has an ASP of $20 to $400, indicating potential for significant revenue growth [13][14] - New applications for transformer technology include autonomous driving and advanced robotics, with examples provided from CES demonstrations [17][19] Business Segments and Performance - Ambarella's enterprise security camera market remains strong, but telematics and portable video markets have shown unexpected growth [34][35] - The company expects continued growth in enterprise security and telematics, with ASP and unit growth driving performance [36] - The IoT business is diversifying, with security now accounting for less than 50% of the IoT revenue, down from previous years [50][52] Edge Infrastructure and AI Applications - The N1 AI box is designed to aggregate edge endpoints, enhancing existing security cameras with Gen AI capabilities [55][59] - The edge infrastructure business is expected to have higher ASPs but similar gross margins compared to the overall corporate average of 59%-62% [59][60] Automotive Market Insights - The automotive market is currently facing delays in Level 2+ design wins, but Ambarella continues to focus on securing partnerships with OEMs [62][63] - The company is leveraging its investments in autonomous driving technology for broader robotic applications, including drones [63][64] Software and Licensing Opportunities - Ambarella has developed two large models for end-to-end AI applications and is open to licensing these models to OEMs [65][66] - The company is focusing on securing design wins for both hardware and software revenue, with licensing as an additional revenue stream [66] Future Outlook - Ambarella is optimistic about the growth potential in both existing and new markets, with plans to provide official guidance for fiscal 2027 in February [36][37] - The company is exploring custom ASIC projects with large customers, which could enhance revenue and market presence [41][42] Key Takeaways from CES - New product announcements, including the CV7 chip, which offers improved AI performance and lower power consumption [37][38] - Introduction of a new go-to-market strategy to engage partners for addressing segmented markets [38][39] - Engagement in custom chip design with large customers, focusing on leveraging Ambarella's IP [41][42] This summary encapsulates the key points discussed during the conference call, highlighting Ambarella's strategic direction, market opportunities, and future growth potential.
把RoPE扔掉,AI更能看懂长上下文,Transformer作者团队开源大模型预训练新方法
3 6 Ke· 2026-01-13 11:01
Core Insights - A new technology called DroPE has been developed by a research team led by Llion Jones, one of the core authors of the Transformer architecture, to address the challenges of long text processing in large models [1][14] - DroPE allows for seamless zero-shot context expansion without the need for expensive long-context training, requiring less than 1% of the pre-training budget for model recalibration [1][10] Technology Overview - DroPE can be seen as a method that discards positional embeddings to extend context, humorously referred to as "NoRoPE" by netizens [3] - The technology utilizes RoPE (Rotary Positional Encoding) as a temporary training tool during the pre-training phase to ensure stability and efficiency, while discarding positional embeddings during the inference phase [8][5] Performance Metrics - Experiments conducted on various models, including a 5M parameter model and the SmolLM family (360M/1.7B) as well as the 7B parameter Llama2-7B, showed that DroPE improved the average score of the base SmolLM by over 10 times on the LongBench benchmark [10] - In the NIAH task evaluation, the recall rate of the DroPE model reached 74.92%, significantly surpassing traditional RoPE scaling methods [10] Comparative Analysis - Performance comparisons across different methods indicate that DroPE outperforms other techniques in various tasks, achieving an average score of 30.52 in the LongBench benchmark [11] - Even with only 0.5% of the pre-training budget used for recalibration, DroPE demonstrated exceptional performance in long-context question answering and summarization tasks [11] Company Background - The team behind DroPE is from Sakana AI, co-founded by Llion Jones and former Google senior scientist David Ha, and has gained attention for creating the first AI scientist capable of producing complete academic papers [14][16] - Sakana AI has also collaborated with MIT researchers to propose the Digital Red Queen algorithm, showcasing the potential of large language models in adversarial program evolution [18]