模型训练

Search documents
周鸿祎:360最近都采购华为芯片,国产性价比高
Nan Fang Du Shi Bao· 2025-07-23 14:03
开源模型性能的提升,为今年AI智能体的火爆打下基石。简单来说,与AI聊天机器人只会对话不同, 智能体具备任务推理、规划和执行能力,被业内视为AI落地的关键方向之一。"人的角色会变成智能体 的指挥官,管理很多智能体。"周鸿祎说。 "国产芯片和英伟达肯定是有差距的,但必须要用,都不用的话,差距永远存在。只有咬牙坚持用,产 品才能改进。"7月23日,360集团创始人周鸿祎在2025中国互联网大会期间接受媒体采访时透露,360最 近采购的都是华为的芯片产品。 据央视新闻近期报道,英伟达公司创始人黄仁勋透露,美国已批准H20芯片销往中国。 周鸿祎说,H20更适合用于模型推理。和模型训练相比,模型推理对芯片的技术要求相对较低,反而给 了国产AI芯片更多市场机会。比如,华为的产品虽然没赶上英伟达最新的GB200芯片,但如果用于推理 场景,其性价比高于H20。 DeepSeek在带火推理模型上功不可没。不过,"AI产品榜"的数据显示,DeepSeek在6月第一次出现月活 用户负增长。周鸿祎对此推测认为,DeepSeek创始人梁文锋并没有很花心思去做一款To C的应用,也 并不关注APP的日活或者收费,"流量涨得最厉害的时候, ...
连续套现14亿元,黄仁勋急着“下车”?
3 6 Ke· 2025-07-23 12:01
从AI教父到套现王,说到底,黄仁勋只是一个商人。 "我已经足够有钱,可以了,够了。"7月16日,黄仁勋中国行期间接受媒体采访时凡尔赛了一把。 然而,嘴上说的和实际做的,却大相径庭。 两天之后,7月18日,黄仁勋刚转身便再一次减持英伟达7.5万股股票,套现金额1294万美元(约9267万人民币)。当日,英伟达盘中股价创下历史新高, 盘中最高股价为174.25美元/股,截至收盘报172.41美元/股。 两月内连续减持约20次,套现14.35亿元 据了解,早在今年3月,黄仁勋依据10b5-1规则披露其减持600万股英伟达股票的计划。10b5-1交易规则明确,上市公司内幕人士可提前设定在特定时间出 售一定数量的股票,以避免内幕交易的嫌疑,确保交易公正透明。 黄仁勋的股票减持正是该规则之下的合规行为。 7月9日美股早盘,英伟达股价曾一度涨近2.8%至164.42美元,总市值短暂突破4万亿美元( 约合28.7万亿元人民币),成 为全球第一家市值达到这一里程 碑的公司。 英伟达股价持续冲高过程中,黄仁勋却反其道而行,边涨边卖,边打边撤。 连续的减持,让不少投资者忐忑不安。有投资者称,以前"信仰"黄仁勋是AI教父,现在大家称 ...
7B模型“情商”比肩GPT-4o,腾讯突破开放域RL难题,得分直翻5倍
量子位· 2025-07-18 06:16
Core Insights - The article discusses the challenges and solutions in optimizing large models for emotional intelligence in multi-turn dialogues using Reinforcement Learning (RL) [2][4][5] - The proposed RLVER framework integrates a user simulator that acts as both the interaction environment and the reward source, addressing the three main challenges of RL in this context [2][5][11] Group 1: Challenges in RL for Emotional Intelligence - The three main challenges identified are: 1. Environmental challenge: Creating a realistic and diverse interaction environment for the model [2][4] 2. Reward challenge: Converting subjective user satisfaction into stable, long-term rewards [2][11] 3. Training challenge: Achieving stable and efficient multi-turn online RL training on large language models (LLMs) [2][4] Group 2: RLVER Framework - The RLVER framework utilizes a user simulator that embodies diverse user profiles and interaction scenarios, allowing for a rich and dynamic learning environment [7][8] - This simulator updates its emotional state based on the model's responses, providing personalized feedback that enhances the model's learning experience [9][10] Group 3: Performance Outcomes - The Qwen2.5-7B model, trained using RLVER, achieved a score of 79.2 on the Sentient-Benchmark, a significant increase from 13.3, positioning it alongside top commercial models like GPT-4o and Gemini 2.5 Pro [16][17] - The model maintained its general capabilities in areas like mathematics and coding, avoiding "catastrophic forgetting" [17] Group 4: Insights from Training - The introduction of explicit "think-then-say" prompts improved the model's ability to understand and respond empathetically, leading to two distinct paths towards empathy: "thinking models" and "reactive models" [20][21] - The choice of optimization algorithms (PPO vs. GRPO) revealed that focusing on specific dimensions of emotional intelligence can yield better overall performance [23][27] Group 5: User Simulator Insights - The RLVER team created two types of user simulators, with findings indicating that a more forgiving environment (Vanilla simulator) is beneficial for early-stage model growth compared to a more challenging environment [29][30] - Models with explicit thinking structures demonstrated greater robustness in challenging environments, suggesting that reasoning capabilities can mitigate training instability [33]
大数据ETF(159739)上涨超1%,H20芯片恢复对华销售,大模型训练迎来利好
Xin Lang Cai Jing· 2025-07-16 02:31
大数据ETF紧密跟踪中证云计算与大数据主题指数,中证云计算与大数据主题指数选取50只业务涉及提 供云计算服务、大数据服务以及上述服务相关硬件设备的上市公司证券作为指数样本,以反映云计算与 大数据主题上市公司证券的整体表现。 数据显示,截至2025年6月30日,中证云计算与大数据主题指数(930851)前十大权重股分别为科大讯飞 (002230)、中际旭创(300308)、新易盛(300502)、中科曙光(603019)、金山办公(688111)、浪潮信息 (000977)、恒生电子(600570)、紫光股份(000938)、润和软件(300339)、润泽科技(300442),前十大权重 股合计占比51.84%。 大数据ETF(159739),场外联接A:021090;联接C:021091;联接I:022882。 截至2025年7月16日 10:08,中证云计算与大数据主题指数(930851)强势上涨1.68%,成分股新易盛 (300502)上涨12.90%,云天励飞(688343)上涨5.35%,税友股份(603171)上涨4.34%,中国长城(000066), 中际旭创(300308)等个股跟涨。大数据ET ...
比Adam更有效,POET从谱不变原理出发,让LLM训练又稳又快
机器之心· 2025-07-15 00:59
Core Viewpoint - The article discusses a novel training paradigm for large language models (LLMs) called POET (Reparameterized Training via Orthogonal Equivalence Transformation), which aims to enhance training efficiency and stability based on first principles [2][3]. Group 1: POET Methodology - POET introduces structural reparameterization of each neuron by incorporating two learnable orthogonal matrices and a fixed random weight matrix, maintaining the singular value distribution of weights during training [3][11]. - The method combines singular value invariance with minimal hyperspherical energy, providing a new paradigm that offers both physical interpretability and generalization capability for large model training [3][11]. - POET's training process is designed to stabilize the optimization process and significantly improve model generalization performance [3][11]. Group 2: Advantages of POET - POET maintains the spectral properties of the weight matrix throughout training, ensuring that the singular values remain consistent with the randomly initialized matrix [17]. - The method allows for efficient parameter control and avoids the issue of excessively large singular values that can occur in standard LLM training [17]. - Two new initialization strategies, normalized Gaussian initialization and uniform spectrum initialization, are proposed to ensure bounded singular values in the generated weight matrices [17]. Group 3: Training Dynamics and Performance - The article presents experimental results demonstrating POET's superior performance in training large language models, including improvements in perplexity and training efficiency compared to traditional methods like AdamW [20][24]. - POET's training process is divided into three phases: conical shell searching, stable learning on the conical shell, and final adjusting, which reflects the evolution of the orthogonal matrices during training [40][41]. - The use of a fully stochastic sampling approach in POET allows for a significant reduction in memory costs compared to traditional methods, enhancing scalability [26][27].
科创板年内新增最大IPO融资项目拆解:摩尔线程的商业化初探
Hua Er Jie Jian Wen· 2025-07-03 13:09
Core Viewpoint - The competition for the title of "first domestic GPU stock" has begun, with major players like Moer Technology and Muxi Integrated Circuit both advancing towards IPOs, indicating a significant move towards capitalizing the domestic GPU market [1][8]. Group 1: Company Overview - Moer Technology is highlighted as the most notable player among the "four little dragons" of domestic GPUs, with a core team primarily from Nvidia [2]. - The company's MTT S80 graphics card has a single-precision floating-point performance close to Nvidia's RTX 3060, and its self-built GPU computing cluster outperforms similar foreign counterparts [2][12]. Group 2: Financial Performance - In 2024, Moer Technology's revenue reached 438 million yuan, representing a year-on-year increase of over 200% [3]. - Despite the revenue growth, the company incurred a net loss of 1.492 billion yuan due to R&D expenses of 1.359 billion yuan, although this loss has decreased by about 10% year-on-year [4]. Group 3: Fundraising and Investment Plans - Moer Technology plans to raise 8 billion yuan for the development of AI training and inference chips, graphics chips, and AI SoC chips, marking the largest fundraising scale among new IPO projects on the Sci-Tech Innovation Board this year [5][6]. Group 4: Product Strategy and Market Position - Moer Technology's product lineup includes AI computing, professional graphics acceleration, desktop graphics acceleration, and intelligent SoC, catering to government, enterprise, and individual consumer needs [9]. - The AI computing products generated 336 million yuan in revenue in 2024, accounting for over 70% of total revenue, benefiting from the rapid growth in demand for large model training and inference deployment [11][12]. Group 5: Competitive Landscape - Moer Technology's revenue in 2024 was only about 60% of Muxi Integrated Circuit's revenue, indicating a competitive challenge [18]. - The company is shifting its strategy to focus more on professional graphics acceleration and AI computing products, as its consumer-grade products have struggled in a competitive market [20][21]. Group 6: Future Outlook - The management anticipates that Moer Technology could achieve profitability as early as 2027, with 440 million yuan in sales contracts already in progress [23][24].
江苏发布创新提升数字贸易政策措施
Xin Hua Ri Bao· 2025-07-02 21:40
Group 1 - The core viewpoint of the article is that Jiangsu Province aims to leverage digital trade to promote high-quality development of service trade, with a target of reaching a service trade scale of 600 billion yuan and digital delivery service trade of 300 billion yuan by 2030, accounting for approximately 50% of the service trade [1] - Jiangsu will focus on institutional openness in digital trade, creating a digital trade ecosystem, and aligning with high-standard economic and trade rules, including pilot cooperation in digital trade with Singapore [1] - The province plans to establish national service trade innovation development demonstration zones and national digital trade demonstration zones, enhancing infrastructure and public services in key areas like Nanjing Software Valley to facilitate domestic and international industrial chain collaboration [1] Group 2 - A significant highlight of the policy is industry empowerment, with Jiangsu focusing on developing digital product trade in the cultural industry, strengthening cultural trade bases in cities like Nanjing, Wuxi, and Suzhou, and promoting exports in sectors such as animation and film [2] - The province aims to expand digital technology trade in advantageous fields, advance high-end software development, and implement an "Artificial Intelligence+" action plan to upgrade service outsourcing and promote enterprise transformation [2] - Jiangsu will enhance international transportation service capabilities, optimize international route networks, and accelerate the development of smart ports and waterways, while also improving the international competitiveness of tourism services and supporting international education services [2]
最高法法官:在大模型训练数据输入端构建合理使用制度
Nan Fang Du Shi Bao· 2025-07-01 09:23
Core Viewpoint - The article discusses the legal implications of using copyrighted works as training data for AI models, advocating for a "wide entry, strict exit" approach to balance AI development and copyright protection [1][2][3]. Group 1: Legal Framework for AI Training Data - The author suggests establishing a reasonable use system for AI training data at the "input end" while implementing stricter regulations at the "output end" to protect the interests of copyright holders [1][2]. - The current risks associated with AI model applications are unclear, and imposing strict regulations at the input stage could hinder innovation due to high authorization costs and legal risks for AI developers [2][3]. - The author argues that traditional copyright licensing models may suppress innovation due to high costs and complex negotiations, leading to potential legal gray areas for AI companies [2][3]. Group 2: Legislative Recommendations - The author recommends legislative measures to classify AI training data as a specific case of reasonable use under copyright law, emphasizing its public interest and value in the AI industry [3]. - The use of training data by AI models is compared to "molecular gastronomy," where the data is not merely copied but transformed to extract underlying patterns [3]. - The proposal includes providing copyright holders with remedies for legal data acquisition and infringement risks, ensuring a dynamic balance between reasonable use and copyright protection [3]. Group 3: Judicial Precedents - Recent U.S. court rulings on AI training data have significant implications for China, highlighting the need for careful examination of whether the use of copyrighted works negatively impacts their market value [4]. - The rulings indicate that while some uses may be deemed reasonable, the legality of using copyrighted works for training AI models remains a complex issue that requires case-by-case analysis [4].
极氪智驾团队夺冠CVPR国际比赛,解决端到端AI模型训练世界级难题
news flash· 2025-06-30 08:08
Core Viewpoint - The ZEEKR autonomous driving team won the Argoverse2 2025 scene mining challenge at the CVPR 2025 conference, showcasing their AI technology's ability to address recognized challenges in the global autonomous driving sector [1] Group 1: Competition Results - The ZEEKR autonomous driving team achieved first place in the Argoverse2 2025 scene mining challenge [1] - The competition was held at the prestigious CVPR 2025 conference, highlighting the significance of the achievement [1] Group 2: Technological Advancements - The team utilized AI technology to effectively solve key technical challenges in the autonomous driving field [1] - Their AI model demonstrated superior learning outcomes from large and redundant datasets, enhancing the system's ability to identify and process critical driving scenarios in real-world applications [1]
微软发布Mu模型:支持Windows智能体,小参数跑出10倍性能;研究称美国30%代码已由AI生成,年创百亿美元价值 | 全球科技早参
Mei Ri Jing Ji Xin Wen· 2025-06-23 23:50
Group 1 - Microsoft has released a new small parameter model called Mu, which has 330 million parameters and outperforms its predecessor Phi-3.5-mini, achieving over 100 tokens per second on offline NPU laptops, marking a significant advancement in small parameter models [2] - A recent study indicates that approximately 30.1% of Python code submitted by American developers in 2024 is generated by AI, contributing an estimated annual value of $9.6 billion to $14.4 billion to the U.S. economy, highlighting the potential of AI in enhancing efficiency and economic value [3] - Google is reportedly using a resource pool of 20 billion YouTube videos to train its next-generation AI tools, while ensuring compliance with creator agreements and developing protective measures for creators' rights in the AI era [4] Group 2 - Microsoft’s chief scientist Eric Horvitz warns that the Trump administration's proposal to prohibit state-level AI regulations could hinder technological development and contradict the goals of scientific progress [5] - Perplexity is set to launch a Windows version of its Comet browser, which features an AI assistant capable of checking shopping discounts, reminding users of unanswered emails, and offering a virtual try-on feature, accelerating the application of AI in the browser space [6][7]