Workflow
大型语言模型
icon
Search documents
TrendForce:预计人形机器人芯片市场规模有望于2028年突破4800万美元
Zhi Tong Cai Jing· 2025-08-26 07:49
智通财经APP获悉,TrendForce集邦咨询最新研究,英伟达(NVDA.US)近日新推出的Jetson Thor能够为 机器人提供更强的算力支持,有望带动芯片市场扩容。在Agility Robotics(敏捷机器人)、Boston Dynamics(波士顿动力)、Amazon(亚马逊)等厂商陆续采用与建置生态圈的趋势下,预估人形机器人芯片 市场规模有望于2028年突破4,800万美元。 随着人形机器人软硬件技术推陈出新,影响放量关键的应用场景仍为产业关注重点。根据国际机器人联 合会(IFR)2025年下半年发布的最新论文,各国人形机器人发展虽依技术、目的有所不同,不过短期多 以试点补位为主,中期进入制造与服务规模化,长期可望普及至家庭日常场景,高阶SoC在此阶段的效 用更加关键。TrendForce集邦咨询此前认为,全球人形机器人需待2032年前后稳定走入家庭,才能真正 放量突破10万台,切合IFR提出之阶段发展。 TrendForce集邦咨询指出,尽管NVIDIA Jetson Thor系列提供强大效能,但其开发套件价格达3,499美 元,相较于前代Jetson Orin的1,499美元大幅提高。但因产 ...
大型语言模型稳定强化学习的新路径:几何平均策略优化GMPO
机器之心· 2025-08-13 00:52
本文主要作者:赵毓钟,中国科学院大学在读博士,微软亚洲研究院 MSRA 实习生,主要研究方向为多模态学习、语言模型后训练。刘悦,中国科学院大学在读 指导老师:万方,中国科学院大学计算机学院副教授,博导。叶齐祥,中国科学院大学电子学院教授,博导。 崔磊,微软亚洲研究院通用人工智能组(GenAI) 首席研究经理。韦福如,微软亚洲研究院通用人工智能组(GenAI)杰出科学家。 近年来,强化学习(RL)在大型语言模型(LLM)的微调过程中,尤其是在推理能力提升方面,取得了显著的成效。传统的强化学习方法,如近端策略优化 (Proximal Policy Optimization,PPO)及其变种,包括组相对策略优化(Group Relative Policy Optimization,GRPO),在处理复杂推理任务时表现出了强大的潜 力。然而,尽管它们在许多场景下都表现良好,仍然 面临着在训练过程中不 稳定 的问题 ,尤其是在处理带有极端重要性加权奖励时。几何平均策略优化 (Geometric-Mean Policy Optimization,GMPO),作为 GRPO 的稳定化版本,解决这一问题。本文将深入探讨 GM ...
GPT-5来了,微软抢先接入:一键生成网页、博士级智能,所有用户免费使用;马斯克不服
Sou Hu Cai Jing· 2025-08-08 04:45
Core Viewpoint - OpenAI has launched its latest large language model, GPT-5, which is claimed to be the best model in the world, available for free to users after a delay of two and a half years since GPT-4's release [1][3][6]. Group 1: Model Features and Performance - GPT-5 utilizes an integrated model architecture that automatically selects reasoning depth based on the task, eliminating the need for users to switch modes [3][5]. - The model shows significant improvements in various fields, including coding, mathematics, writing, health, and visual perception [5][10]. - In coding tasks, GPT-5 achieved a first-attempt accuracy of 74.9% in the SWE-bench Verified benchmark, outperforming previous models [10][13]. - GPT-5's error rate for factual inaccuracies is significantly lower than that of its predecessors, with a 4.8% error rate compared to 20.6% for GPT-4o [18]. Group 2: User Access and Pricing - GPT-5 is accessible to all users, including free users with usage limits, while Plus and Pro users have higher usage quotas [5][10]. - The pricing for developers using the API is set at $1.25 per million tokens for input and $10 for output, making it cheaper than GPT-4o and other competitors [5][10]. Group 3: Safety and Customization - OpenAI has introduced a new safety training method called "safe completions," which helps the model provide useful answers while avoiding unnecessary refusals [20][21]. - GPT-5 allows users to choose from four preset personalities for interaction, enhancing the customization of user experience [21]. Group 4: Market Impact and Investment - The release of GPT-5 is expected to strengthen OpenAI's leading position in large model technology, boosting investor confidence and aiding in the company's valuation growth [31]. - OpenAI recently secured $8.3 billion in new capital, raising its valuation to $300 billion, which may be linked to the timing of GPT-5's launch [30][31].
闪迪联手SK海力士,发力新型HBM
半导体行业观察· 2025-08-08 01:47
Core Viewpoint - Sandisk and SK Hynix are collaborating to standardize High Bandwidth Flash (HBF) technology, which aims to enhance GPU access to large NAND capacities, thereby accelerating AI training and inference workloads [1][3][6]. Group 1: Collaboration and Standardization - The memorandum of understanding (MoU) between Sandisk and SK Hynix focuses on defining technical requirements and creating an HBF technology ecosystem [3][4]. - Sandisk's CTO emphasized that this collaboration addresses the urgent need for scalable memory in the AI industry, aiming to provide innovative solutions to meet exponential data demands [3][4]. - SK Hynix's expertise in HBM technology positions it well to contribute to the development of HBF, which is seen as crucial for unlocking the full potential of AI and next-generation data workloads [3][6]. Group 2: Technical Specifications and Advantages - HBF aims to provide bandwidth comparable to HBM while offering 8-16 times the capacity at similar costs, potentially reaching up to 768 GB [4][6]. - HBF technology combines NAND flash with HBM-like bandwidth capabilities, allowing for significant capacity increases while sacrificing some latency [6][8]. - Unlike DRAM, NAND flash is non-volatile, enabling lower energy consumption for persistent storage, which is critical as AI inference expands into energy-constrained environments [6][8]. Group 3: Market Implications and Future Developments - The collaboration signifies the importance of a multi-supplier HBF market, ensuring customers are not reliant on a single vendor and fostering competition to accelerate HBF development [4][10]. - Sandisk's HBF technology received recognition at the FMS 2025 event, and the first samples are expected to be launched in the second half of 2026, with AI inference devices anticipated in early 2027 [5][9]. - The integration of HBF technology could pave the way for heterogeneous memory stacks, allowing DRAM, flash, and new persistent memory types to coexist in AI accelerators, addressing rising HBM costs [10].
GPT-5来了,免费向所有用户开放
第一财经· 2025-08-08 00:19
2025.08. 08 本文字数:1335,阅读时长大约2分钟 作者 | 第一财经 胡弋杰 此次发布最大的商业信号,是OpenAI将GPT-5免费提供给大部分用户,包括免费版、Plus版、Pro版和团队版,企业与教育用户也将在下周获得接入 权限。这一策略被认为意在迅速扩大使用规模,并推动AI应用的二次创新。 奥尔特曼透露,GPT-5在速度、直觉与推理能力上全面提升,并能凭借"氛围编程"(vibe coding)让用户用自然语言生成可运行的软件应用程 序。"它能做的最酷的事情之一,就是为你按需编写高质量的软件,这种即时开发的能力,可能会成为GPT-5时代的决定性特征。"他说。 "氛围编程" 与上一代相比,GPT-5最大的结构性变化是采用集成模型。这意味着系统将自主判断问题的复杂度,必要时调用更多计算资源进行"深度思考"。奥尔 特曼称,这是普通用户首次接触OpenAI的"测试时间计算"技术,即在面对数学推导或复杂推理时,模型会主动延长计算时间以提高准确性。 在现场演示中,GPT-5根据简单文本提示生成了完整的可运行软件,从界面设计到逻辑功能均由AI独立完成。奥尔特曼还宣布,下一阶段将显著提升 语音模式的自然度与智 ...
ACL首届博士论文奖公布,华人学者李曼玲获荣誉提名
机器之心· 2025-07-29 09:58
Core Insights - The article discusses the announcement of the ACL's new award for outstanding doctoral dissertations in computational linguistics, highlighting the significance of the award and its impact on the field of natural language processing [1][2][4]. Group 1: Award Details - The inaugural recipient of the ACL Doctoral Dissertation Award is Sewon Min from the University of Washington, recognized for her thesis titled "Rethinking Data Use in Large Language Models" [2][4]. - The award committee emphasized that Min's research provides critical insights into the behavior and capabilities of large language models, particularly in the area of in-context learning [4][14]. Group 2: Research Contributions - Min's dissertation discusses the understanding and advancement of large language models, focusing on their use of extensive training datasets [14]. - She demonstrates that the in-context learning ability of these models is largely determined by the content learned from training data [15]. - Min introduces a new class of language models called nonparametric language models, which utilize training data as a storage mechanism to retrieve information, enhancing accuracy and updatability [16][18]. Group 3: Other Nominated Works - The article also mentions three additional nominees for the award: Manling Li from the University of Illinois Urbana-Champaign, Ashish Sharma from the University of Washington, and Thomas Rishi Sherborne from the University of Edinburgh [8][20]. - Manling Li's work focuses on event-centric multimodal knowledge acquisition, proposing methods to transition from entity-centric to event-centric knowledge extraction [26][30]. - Ashish Sharma explores human-AI collaboration to improve mental health support, demonstrating how AI can enhance empathy in conversations and assist users in self-help interventions [45][51]. - Thomas Rishi Sherborne's research addresses cross-lingual transfer for semantic parsing, proposing strategies for effective adaptation of semantic parsers to new languages [62][64].
中银晨会聚焦-20250724
Key Insights - The report highlights a focus on the humanoid robot industry, which has seen a significant increase in market attention, with the National Securities Robot Industry Index rising by 7.6% from July 7 to July 18, 2025 [6][8] - Major factors driving this resurgence include substantial orders from leading companies, capital acquisitions, influential statements from industry leaders, and supportive government policies aimed at fostering innovation in humanoid robotics [7][8] - The report also notes that the active equity fund median position reached 90.63% in Q2 2025, indicating a historical high and a shift towards increased allocations in TMT, Hong Kong stocks, and machinery sectors [9][10] Humanoid Robot Industry - The humanoid robot market is experiencing a revival, with key players like China Mobile placing significant orders, which serve as a validation of product functionality and market readiness [6][7] - The report identifies a trend of increased capital activity, with companies pursuing mergers and acquisitions to enhance their market positions [7] - Government initiatives are also playing a crucial role, with policies aimed at promoting the development of humanoid robots and related technologies [8] Active Equity Fund Analysis - The report indicates that the highest allocation sectors for active equity funds in Q2 2025 were TMT (23.37%), Hong Kong stocks (20.41%), and machinery (19.68%), reflecting a strategic shift in investment focus [9][10] - The report emphasizes that the current allocation levels are above historical averages for several sectors, indicating a bullish sentiment among fund managers [9][10] AI Computing Industry - The AI computing supply chain is entering a phase of maturity, driven by advancements in generative AI and large language models, leading to a closure of the demand-supply loop [11][12] - The report highlights that the infrastructure for AI computing is expected to see continued investment, with significant growth in demand for high-end AI servers [12][13] - The competition in the PCB industry is intensifying due to the rising demand for AI servers, with a projected 150% increase in demand for high-density interconnect (HDI) boards [13]
重塑注意力机制:GTA登场,KV缓存缩减70%、计算量削减62.5%
机器之心· 2025-07-22 08:59
Core Viewpoint - The article discusses the introduction of Grouped-head latent Attention (GTA), a new framework developed by a collaboration between Chinese Academy of Sciences, University College London, and Hong Kong University of Science and Technology (Guangzhou), which significantly enhances model performance and computational efficiency in large language models [1][3]. Grouped-head latent Attention (GTA) Introduction - GTA is designed to address the efficiency challenges faced by large language models, particularly those using the traditional Multi-Head Attention (MHA) mechanism, which suffers from computational redundancy, memory bottlenecks, and inference latency issues [2][4][6]. Efficiency Challenges in Large Language Models - The MHA architecture leads to excessive computation due to independent calculations for each attention head, resulting in a quadratic increase in floating-point operations (FLOPs) when processing long sequences [3][4]. - Memory requirements for storing key-value (KV) pairs grow rapidly with sequence length and the number of attention heads, making deployment on edge devices challenging [3][12]. - High computational and memory demands contribute to significant inference delays, hindering real-time applications [4][6]. Core Innovations of GTA - GTA introduces a grouped sharing mechanism for attention matrices, reducing overall computation by allowing multiple attention heads to share a single attention matrix, thus cutting down FLOPs significantly [8][10]. - The framework employs a "compression + decoding" strategy to minimize memory usage by compressing all attention head value vectors into a low-dimensional latent representation, which is then dynamically decoded as needed [12][14]. Experimental Validation of GTA - Comprehensive experiments demonstrate that GTA not only improves computational efficiency and memory utilization but also maintains or surpasses the performance of existing mainstream attention mechanisms [16][19]. - In tests with a model of 160 million parameters, GTA achieved lower evaluation loss and better performance on downstream tasks compared to traditional MHA and other models, with its KV cache size reduced to 12.5% of MHA's [18][19]. Scalability and Performance of GTA - When scaling to 500 million parameters, GTA continued to outperform other models in evaluation loss and accuracy while maintaining a KV cache size of only 12.5% compared to MHA [19]. - The architecture's efficiency was further validated in a 1 billion parameter model, where GTA demonstrated comparable performance to GQA-1B while using significantly less memory [20][22]. Theoretical Efficiency Analysis - The theoretical analysis indicates that GTA achieves substantial reductions in computational complexity and memory usage, translating to faster inference speeds [24]. - Empirical benchmarks confirm GTA's superior performance in prefill and decode times across various hardware platforms, showcasing its robustness and efficiency [25][29]. Future Directions - Despite its advancements, GTA faces challenges such as potential approximation errors from the nonlinear decoder and the need for broader validation across different tasks beyond natural language processing [33]. - Future research aims to refine the decoder architecture and explore GTA's applicability in larger models and diverse application domains [33].
摩根大通(JPM.N)首席执行官戴蒙:我们没有理由拥有大型语言模型。
news flash· 2025-07-15 12:54
Group 1 - The core viewpoint expressed by JPMorgan CEO Jamie Dimon is that there is no compelling reason for the company to possess large language models [1] Group 2 - The statement reflects a broader skepticism within the financial industry regarding the necessity and utility of large language models in banking operations [1] - This perspective may influence how financial institutions approach investments in AI technologies moving forward [1] - The comments could signal a cautious stance towards the integration of advanced AI tools in traditional banking practices [1]
黄仁勋,卖卖卖!身家超巴菲特
Sou Hu Cai Jing· 2025-07-12 04:13
Core Viewpoint - Nvidia's market capitalization has reached a historic high of $4.02 trillion, making it the first company to surpass this milestone, surpassing Microsoft and Apple [2][3] Company Summary - Nvidia's CEO, Jensen Huang, has a net worth of $144 billion, ranking ninth globally, surpassing Warren Buffett [1][2] - Huang has been systematically selling shares of Nvidia, having sold approximately 600,000 shares worth about $96 million in July alone [2][3] - Despite the sell-off, Huang still holds over 858 million shares of Nvidia through various partnerships and trusts [3] - The share sales are part of a pre-established trading plan under SEC Rule 10b5-1, which allows executives to sell shares under predetermined conditions [3] Industry Summary - Nvidia is a leading manufacturer of GPUs, widely used in AI training, inference, and deployment of large language models, making it a preferred infrastructure provider for major tech companies like OpenAI, Google, and Meta [3] - The company's stock performance has been strong, contributing to its record market valuation and reflecting the growing demand for AI-related technologies [3]