Workflow
TurboS
icon
Search documents
「Tokens是胡扯」,Mamba作者抛出颠覆性观点,揭露Transformer深层缺陷
机器之心· 2025-07-09 09:52
机器之心编译 原文作者:Albert Gu 编辑:陈陈、杜伟 「Tokenization(分词)是 Transformer 模型为弥补自身缺陷不得不戴上的枷锁。」 近日,Mamba 作者、CMU 助理教授、Cartesia AI 首席科学家 Albert Gu 撰写了一篇新博客,探讨了状态空间模型(SSM)和 Transformer 之间的权衡,并提出了这 样一种观点。 这篇博客改编自 Albert Gu 过去一年来多次进行的一场演讲。虽然演讲内容通俗易懂,面向比较广泛的受众群体,但其中的一些有趣的见解、观点和原理阐释,相 信对专业研究者也不乏启发价值。 在社交媒体 X 上,Albert Gu 抛出了「tokens are bullshit」的观点,并预告了接下来要发布的重大架构进展。 图源: https://x.com/_albertgu/status/1942615020111876248 评论区的很多网友赞成 Albert Gu 的观点,认为移除 tokenization 会在计算效率方面带来积极影响。 状态空间模型 本文首先定义了什么是状态空间模型(State Space Model,SSM)。 1. ...
腾讯,重磅开源
Zheng Quan Shi Bao· 2025-06-27 15:32
Core Insights - Tencent has launched the Hunyuan-A13B, the industry's first 13B-level MoE (Mixture of Experts) open-source inference model, which features a total of 80 billion parameters but activates only 13 billion, achieving high performance with lower resource requirements [1][2] Model Performance - Hunyuan-A13B is one of Tencent's most utilized large language models, with over 400 business applications and an average daily request volume exceeding 130 million [2] - In various authoritative industry benchmarks, Hunyuan-A13B has demonstrated competitive performance compared to models like OpenAI's o1-1217, DeepSeek's R1-0120, and Qwen3-A22B [2][3] Benchmark Results - In the Mathematics category, Hunyuan-A13B scored 87.3 in AIME2024, outperforming OpenAI's o1-1217 and DeepSeek's R1-0120 [3] - Hunyuan-A13B excelled in reasoning tasks, achieving a score of 89.1 in BBH, indicating strong reasoning capabilities [3] - The model also showed notable performance in agent tool invocation and long-text capabilities, utilizing a multi-agent data synthesis framework [3] Model Features - The Hunyuan-A13B allows users to select between fast and slow reasoning modes, optimizing resource allocation for efficiency and task accuracy [4] - This model is part of Tencent's ongoing efforts to enhance its AI capabilities, following the release of the TurboS model, which focuses on rapid reasoning [4] Strategic Developments - Tencent is restructuring its large model R&D system, focusing on three core areas: computing power, algorithms, and data management [5] - The company has established new departments dedicated to large language models and multimodal models, aiming to explore cutting-edge technologies and improve model capabilities [5] Financial Investments - Tencent's R&D expenditure reached 70.69 billion yuan in 2024, with capital expenditures showing a significant year-on-year increase of 221%, reflecting the company's commitment to AI investment [6] - The increase in capital spending is attributed to the acquisition of more GPUs to meet growing inference demands, with plans for further investment in 2025 [6]
腾讯,重磅开源!
证券时报· 2025-06-27 15:09
业界首个13B级别的MoE(混合专家)开源混合推理模型,以小参数实现大智慧。 6月27日,腾讯混元宣布开源首个混合推理MoE模型Hunyuan-A13B。这一模型总参数为80B,但激活参数仅为13B,以小参数实现了比肩同等架构领先开源模型 的成绩,具有推理速度更快,性价比更高的优势。目前,该模型已经在Github和Huggingface等开源社区上线,同时模型API也在腾讯云官网正式上线,支持快速 接入部署。 开源业界首个13B级别的MoE混合推理模型 据腾讯介绍,Hunyuan-A13B是腾讯内部应用和调用量最大的大语言模型之一,有超过400个业务用于精调或者直接调用,日均请求超1.3亿。同时,这也是业界首 个13B级别的MoE开源混合推理模型,可以帮助开发者以用更低门槛的方式获得更好的模型能力。 在多个业内权威数据测试集上,Hunyuan-A13B与OpenAI的o1-1217、DeepSeek的R1-0120、Qwen3-A22B等模型的对比中表现出了不相上下的成绩。 | | | OpenAl-o1-1217 | Deepseek-R1-0120 | Qwen3-A22B | Hunyuan-A13B ...
Garrett Inaugurates Wuhan Innovation Center to Advance Zero-Emission Mobility
Globenewswire· 2025-06-26 06:00
Core Insights - Garrett Motion Inc. inaugurated its new Wuhan Innovation Center, enhancing its zero-emission R&D capabilities in China and globally [2][3] - The center is part of Garrett's strategy to strengthen its position as a leader in differentiated automotive technologies, particularly in electrification and decarbonization [3][7] Company Developments - The Wuhan Innovation Center is Garrett's second innovation hub in China, complementing the Shanghai R&D Center to create a "dual innovation engine" [2][5] - The center focuses on high-speed E-Powertrain systems for zero-emission applications, supporting both automotive and industrial decarbonization efforts [5][9] Technological Advancements - Garrett has a history of innovation in mobility technologies, including variable geometry turbines and hydrogen fuel cell compressors, with a strong emphasis on zero-emission solutions [4][6] - The E-Powertrain system integrates an electric motor, inverter, and gearbox, reducing system size and weight by up to 40% and cutting the use of critical materials by approximately 30% [8] Strategic Importance - The Wuhan Innovation Center embodies Garrett's "East for East" strategy, integrating R&D and manufacturing capabilities to accelerate the commercialization of zero-emission technologies [10] - The center aims to attract cross-disciplinary talent and foster partnerships with academic institutions and industry players to drive innovation [10]
中信证券:系统级算力有望成为AI发展的下一站 建议关注国内产业链相关公司
智通财经网· 2025-06-26 00:29
底层基础设施的通用性就是为了前瞻性地应对未来的模型发展。当前AI产业发展迅速,Scaling law在后 训练、在线推理等阶段快速发展。训练端,模型架构持续创新迭代,有望进一步强化训练侧scaling law 的延续,如阿里巴巴Qwen团队与浙江大学团队提出的Parallel Scaling、腾讯混元团队采用Transformer、 Mamba混合架构训练的TurboS都取得了优秀的性能表现。推理端,在MoE专家网络架构成为主流后, 如何通过硬件部署实现更高的吞吐量和更低的延时成为焦点。采用类似推理集群的形式未来有望成为主 流,计算节点有望通过提升计算密度满足推理需求。系统级算力料将成为下一代AI算力基础设施。 系统级算力需要系统级能力。 芯片层面,算力集群中涉及AI加速芯片、CPU芯片、Switch互连芯片、DPU数据处理芯片等,受限制于 制程,国产AI加速芯片在峰值算力能力领域上相较于海外旗舰产品仍有差距,软件生态上亦因产业发 展时长而相对落后,单芯片能力的竞争并无直接优势。互连层面,传统PCIe与英伟达NVLink等差距较 大,NVLink5.0提供1.8TB/s双向带宽,超传统PCIe方案的十倍,国产 ...
互联网云厂商集体发力AI Agent 火山引擎再掀“价格革命”
中经记者 李静 北京报道 AI Agent(智能体)无疑是2025年AI领域最火的领域之一,微软、谷歌、OpenAI等国际巨头与百度、 阿里、腾讯等国内厂商均在2025年推出了重磅的AI Agent产品。此外,初创公司Monica发布的首款通用 AI Agent产品Manus在内测阶段就引发了全世界的关注。 顺福资本管理创始人、行行AI董事长李明顺在接受《中国经营报》记者采访时指出,上一轮AI创业潮 中,王小川、李开复等人聚焦大模型底层赛道,项目融资规模动辄数亿元;而当下AI应用层正迎来创 业新风口,其中AI Agent 领域成为最炙手可热的方向。"这波创业潮的显著特征是轻量化——几人到十 几人的小团队,依托大模型能力即可启动 AI 应用项目。"李明顺说。 在AI Agent创业的浪潮中,百度智能云、腾讯云、火山引擎等云计算厂商也敏锐地捕捉到了这个商机, 不约而同地在AI Agent方向布局。 6月11日,火山引擎推出最新的豆包大模型1.6,推出全栈AIAgent开发工具,更是把模型价格进一步拉 低,降低至DeepSeek的1/3。俨然希望以低价争夺更多的市场份额,或也将推动行业竞争更加白热化。 火山引擎再掀 ...
BorgWarner (BWA) 2025 Conference Transcript
2025-06-11 17:40
Summary of BorgWarner (BWA) 2025 Conference Call Company Overview - **Company**: BorgWarner (BWA) - **Date**: June 11, 2025 - **Key Focus**: Strong performance in Q1, managing tariff impacts, and growth in electric and foundational product portfolios Key Points Financial Performance - **Q1 Results**: Strong outgrowth of nearly 4% above industry production, operating margin at approximately 10%, and free cash flow of about $270 million above the prior year [2][3][4] - **Tariff Impact**: Initially projected a 1.6% impact on sales due to tariffs, but this has decreased due to effective mitigation strategies and changes in executive orders [4][5] - **Concerns**: Industry production in the U.S. remains a concern, particularly for Q3 and Q4, but current visibility shows strong performance in Q2 [6][9] Market Dynamics - **Regional Insights**: - **North America**: Initially expected a decline of 1% to 3% in market production, revised to 2% to 4% down, but Q2 remains strong [8][9] - **Europe**: Strong demand for e-products, with clarity on emission regulations potentially increasing demand for foundational products [15][16] - **China**: Significant exposure with 20% of overall revenue, strong performance in both e-products and foundational products [17][18] Product Strategy - **Electrification Trends**: Adoption rates for electrified vehicles vary by region, with China leading in BEV adoption. North America is expected to grow slower in electrification [20][22] - **Hybrid Vehicles**: Increasing opportunities for hybrids in North America, with a focus on both foundational and e-products [25][26] - **Turbos and Efficient Engines**: Continued demand for efficient propulsion systems, with a focus on turbos and all-wheel drive products [33][34] Operational Efficiency - **Cost Management**: Sustained performance with a focus on competitive cost structures, supply chain savings, and operational excellence [42][44] - **Capital Allocation**: A balanced approach to capital allocation, focusing on organic and inorganic investments, stock repurchases, and dividends [50][54] Future Outlook - **M&A Strategy**: Disciplined approach to M&A, focusing on industrial logic and accretive assets, with ongoing evaluation of potential targets [50][53] - **Free Cash Flow Generation**: Expected midpoint of $700 million in free cash flow for the year, with plans to utilize this for shareholder value creation [78][79] Additional Insights - **Operational Model**: The company's unique operating model promotes accountability and resilience, contributing to strong free cash flow and margin performance [75][76] - **Market Share Dynamics**: Anticipation of market consolidation as smaller players may struggle, providing opportunities for BorgWarner to increase market share [36][38] Conclusion BorgWarner is positioned well with strong financial performance, effective management of tariff impacts, and a strategic focus on electrification and hybrid vehicles. The company aims to leverage its operational strengths and free cash flow generation to drive shareholder value while navigating the evolving automotive landscape.
华为,重大突破!
证券时报· 2025-05-30 13:21
中国AI产业注入强心剂。 5月30日,证券时报·券商中国记者从华为获悉,华为在MoE模型训练领域再进一步,重磅推出参数规模高达7180亿的全新模型——盘古Ultra MoE,这是一个全流 程在昇腾AI计算平台上训练的准万亿MoE模型。同时,华为发布盘古Ultra MoE模型架构和训练方法的技术报告,披露众多技术细节,充分体现了昇腾在超大规模 MoE训练性能上的跨越。 业内人士分析,华为盘古Ultra MoE和盘古Pro MoE系列模型的发布,证明华为不仅完成了国产算力+国产模型的全流程自主可控的训练实践,同时在集群训练系统 的性能上也实现了业界领先。这意味着国产AI基础设施的自主创新能力得到了进一步验证,为中国人工智能产业的发展提供了一颗"定心丸"。 国内大模型消息不断 5月28日,深度求索公司传来消息,DeepSeek-R1模型已完成小版本试升级,可前往官方网页、APP、小程序测试(打开深度思考),API接口和使用方式保持不 变。 这家总部位于杭州的初创公司今年1月发布了DeepSeek-R1人工智能模型,震惊了全球科技界。R1模型在多项标准化指标上的表现均优于西方竞争对手,而其成本 据称仅为数百万美元。此 ...
DeepSeek R1幻觉率降低,用户喊话:想要R2
第一财经· 2025-05-29 15:13
2025.05. 29 本文字数:1440,阅读时长大约2分钟 导读 :报告显示此前 R1模型幻觉率在21%左右。 作者 | 第一财经 刘晓洁 在开源平台HuggingFace上发布R1模型的更新后,5月29日晚,DeepSeek终于发布了官方公告介绍这 次版本的具体能力迭代细节,其中包括深度思考能力强化、幻觉改善和创意写作更好等。 leepSeek ? deepseek ai DeepSeek-R1-0528 is here! Try it now: chat.deepseek.com No change to API usage - docs here: api- docs.deepseek.com/guides/reasoni ... o Open-source weights: huggingface.co/ deepseek-ai/De ... 排详帖- | 排名 | 模型名称 | 机构 | 准确率 | 幻觉率 | | --- | --- | --- | --- | --- | | | | | (%) | (%) | | હિર્દ | doubao-1.5-pro-32k | 字节跳动 | 95. ...
腾讯亮相首届国际通用人工智能大会
Huan Qiu Wang Zi Xun· 2025-05-26 12:08
来源:光明网 5月24日-25日,由北京通用人工智能学会主办的首届国际通用人工智能大会(TongAI)在北京召开。作 为我国首个聚焦通用人工智能(AGI)的国际学术盛会,大会汇聚来自哈佛大学、新加坡管理大学、北 京大学、清华大学等国内外一流高校专家学者以及腾讯等领军企业技术领袖共襄盛举,通过深度思想碰 撞与交流构建原创性技术路线,推动国际学术界携手突破认知边界、共塑技术范式。 腾讯首席科学家、Robotics X实验室主任张正友在大会主论坛作主题报告。腾讯杰出科学家胡瀚在"多模 态交互学习"专题会议中对腾讯混元多模态大模型进行了详细介绍。 胡瀚提到,腾讯在大模型上的技术迭代正不断提速。腾讯混元模型矩阵全面升级,旗舰快思考模型混元 TurboS、深度思考模型混元T1双双迭代。基于TurboS基座,腾讯新推出视觉深度推理模型T1-Vision和 端到端语音通话模型混元Voice。混元图像2.0、混元3D v2.5及混元游戏视觉生成等一系列多模态模型也 同步"上新"。 语言模型跻身全球前八,技术能力持续提升 在疯狂卷技术的全球大模型角逐中,腾讯混元正小步快跑、快速迭代,技术能力持续提升。 在全球公认的权威大语言模型 ...