雷峰网
Search documents
华为「数字化风洞」小时级预演万卡集群方案,昇腾助力大模型运行「又快又稳」
雷峰网· 2025-06-11 11:00
Core Viewpoint - The article discusses the launch of the Ascend modeling and simulation platform, which aims to optimize the interaction between load, optimization strategies, and system architecture to enhance infrastructure performance [1]. Group 1: Challenges in AI Model Training - Over 60% of computing power is wasted due to hardware resource mismatches and system coupling, highlighting the inefficiencies in traditional optimization methods [2]. - The training process for large models is likened to "slamming the gas pedal," where the MoE model requires precise balancing of computation and memory to avoid efficiency drops [4]. - Dynamic real-time inference systems face challenges in meeting both high throughput and low latency requirements across varying task types [4]. Group 2: Solutions and Innovations - The "digital wind tunnel" allows for pre-simulation of complex AI models in a virtual environment, enabling the identification of bottlenecks and optimization strategies before real-world implementation [6]. - The Sim2Train framework enhances the efficiency of large-scale training clusters through automatic optimization of deployment space and dynamic performance awareness, achieving a 41% improvement in resource utilization [7]. - The Sim2Infer framework focuses on real-time optimization of inference systems, resulting in over 30% performance improvement through adaptive mixed-precision inference and global load balancing [8]. Group 3: High Availability and Reliability - The Sim2Availability framework ensures high availability of the Ascend computing system, achieving a 98% uptime and rapid recovery from failures through advanced optimization techniques [11]. - The system employs a comprehensive monitoring approach to track hardware states and optimize software fault management, enhancing overall system reliability [13]. Group 4: Future Outlook - As new applications evolve, the demand for innovative system architectures will increase, necessitating continuous advancements in modeling and simulation methods to support the development of computing infrastructure [16].
具身智能估值断层加速,机器人新势力靠什么穿越风暴?
雷峰网· 2025-06-11 11:00
Core Viewpoint - The disparity in valuations within the embodied intelligence sector reflects either a significant gap in capabilities or the presence of a valuation bubble [2][3][24] Group 1: Market Dynamics and Valuations - The first tier of Chinese embodied intelligence startups is estimated to be valued between 2.5 billion to 3 billion RMB [2] - Companies like Yushun and Zhiyuan have valuations exceeding 15 billion RMB and 10 billion RMB respectively, while many others are valued between 2 billion to 3.5 billion RMB [2] - The valuation gap between different tiers of companies can exceed 100% [2] - The market is witnessing a trend where hardware and software are increasingly viewed as separate investment tracks, complicating valuation standards [4][17] Group 2: Investment Trends - The investment landscape is shifting towards a headquarter-focused model, with a preference for established companies like Yushun, which has become a benchmark for hardware projects [16][17] - Many investors are cautious, preferring to invest in companies with proven business models rather than speculative startups [18][19] - The influx of capital into the sector has led to inflated valuations, with early-stage companies often starting at valuations in the millions [18][19] Group 3: Challenges and Future Outlook - The embodied intelligence sector is still heavily reliant on financing, with many companies focusing on securing funding rather than achieving profitability [12][20] - There is a consensus that the market is at a critical juncture, with many companies expected to deliver products and generate revenue in the near future, potentially reshaping the competitive landscape [24] - The industry is still in its early stages, with significant technological advancements yet to be realized, indicating a potential for future growth despite current valuation concerns [23][24]
比亚迪长安等车企承诺账期不超60天,蔚小理尚未跟进;YU7外形被质疑抄袭,专家放话不侵权;喜马拉雅12.6亿美元卖身腾讯音乐
雷峰网· 2025-06-11 00:53
Group 1 - BYD and Changan have unified their payment terms to 60 days, while new players like NIO and Li Auto have not yet responded [4][5] - Xiaomi's YU7 model faces plagiarism accusations, but the company claims its design is original and backed by experts stating it does not infringe on patents [7][8] - BYD's salary levels have surpassed Huawei's, with significant investments in AI and a commitment to improving brand perception amid shareholder criticism [10][11] Group 2 - Ren Zhengfei of Huawei stated that the U.S. has exaggerated Huawei's achievements, emphasizing the need for continuous improvement in chip technology [13] - TSMC is accelerating its U.S. factory construction while slowing down projects in Japan and Europe due to market demand fluctuations [14] - BYD and other Chinese manufacturers are gaining ground in the autonomous driving sector, posing a threat to Tesla's market position [15] Group 3 - The Zhiyuan Research Institute showcased a four-legged robot designed to assist visually impaired individuals, successfully guiding them in complex environments [17] - Tencent Music announced a $12.6 billion acquisition of Himalaya, marking a significant move into the online audio sector [19] - Xiaopeng Motors is set to unveil its G7 model featuring the Turing AI chip, which boasts advanced processing capabilities [26] Group 4 - Huawei is preparing to launch its Pura 80 series smartphones, featuring advanced imaging technology and expected to start at around 5000 yuan [32] - Ideal Auto has established two new robotics divisions, focusing on space and wearable robots, indicating a strategic shift towards AI integration [34] - Gree Electric's president mentioned that several business segments are ready for potential spin-offs, reflecting a strategy to enhance market competitiveness [35]
万字总结:如何练就适配人形机器人的可靠「灵巧手」?
雷峰网· 2025-06-10 10:30
2025 年 5 月 25 日,雷峰网、AI 科技评论、GAIR Live 品牌举办了一场主题为"具身智能之灵巧手的探索与应用"线上圆桌沙龙。 圆桌主持人为元禾原点合伙人乐金鑫,同时圆桌还邀请了新加坡国立大学助理教授 & RoboScience创始人邵林、上海交通大学副教授 & 千觉机器人创始人马 道林、浙江大学控制科学与工程学院百人计划研究员 & 博士生导师叶琦,共同开展一场深度交流。 VLA 未来有望升级为含触觉的 VTLA,以突破信息融合的技术瓶颈。 作者丨吴华秀 编辑丨 陈彩娴 在具身智能快速崛起的当下,灵巧手作为连接数字智能与物理世界的关键载体,正从传统的执行终端跃升为人工智能落地的核心突破口。 会上,嘉宾们各自分享了与灵巧手的故事,并围绕灵巧手软硬件挑战、数据与模型、落地与应用等多个方面发表独特见解。其中,三位嘉宾围绕如何灵巧手数 据难题,分别给出了意见与想法。 马道林指出,当前灵巧手、夹爪相关的采集数据及其训练出的模型,仍处于整个具身智能领域的初期阶段,而且数据模态更多是视觉和动作方面,还未涵盖触 觉。接下来一方面要采集更多多模态数据,另一方面是解决采集后不同模态数据的处理以及融合等问题。 邵林 ...
昇腾 AI 算力集群有多稳?万卡可用度 98%,秒级恢复故障不用愁
雷峰网· 2025-06-10 10:30
Core Viewpoint - The article discusses how Huawei enhances the efficiency and stability of AI computing clusters, emphasizing the importance of high availability to support continuous operation and minimize downtime in AI applications [2][16]. Group 1: High Availability Core Infrastructure - AI computing clusters face complex fault diagnosis challenges due to large system scale and intricate technology stacks, with fault localization taking from hours to days [4]. - Huawei has developed a full-stack observability capability to improve fault detection and management, which includes a fault mode library and cross-domain fault diagnosis [4]. - The CloudMatrix super node achieves a mean time between failures (MTBF) of over 24 hours, significantly enhancing hardware reliability [4]. Group 2: Fault Tolerance and Reliability - Huawei's super node architecture leverages optical link software fault tolerance solutions, achieving a fault tolerance rate of over 99% for optical module failures [5][6]. - The recovery time for high-bandwidth memory (HBM) multi-bit ECC faults has been reduced to 1 minute, resulting in a 5% decrease in computing power loss due to faults [6]. Group 3: Training and Inference Efficiency - The linearity metric measures the improvement in training task speed relative to the number of computing cards, with Huawei achieving a linearity of 96% for the Pangu Ultra 135B model using a 4K card setup [10]. - Huawei's training recovery system can restore training tasks in under 10 minutes, with process-level recovery reducing this to as low as 30 seconds [12]. - For large EP inference architectures, Huawei has proposed a three-tier fault tolerance solution to minimize user impact during hardware failures [12][14]. Group 4: Future Directions - Huawei aims to explore new applications driven by diverse and complex scenarios, breakthroughs in heterogeneous integration, and innovative engineering paradigms focused on observability and intelligent autonomy [16].
损失达几十亿?美的回应北美空调事件:不存在缺陷系主动召回;DeepSeek核心高管离职创业;传华为Pura X有新开屏方案
雷峰网· 2025-06-10 00:28
Group 1 - Xiaomi's China region has undergone personnel adjustments, with Vice President Wang Xiaoyan also taking on the role of General Manager of Xiaomi Home, while the former GM Wang Hui will transition to the Sales Management Department [4] - As of March 31, Xiaomi's offline retail store count in China reached 16,000, with a target of 20,000 by the end of the year [4] - Xiaomi is expanding its new retail model globally, planning to open 10,000 stores overseas in the next five years [5] Group 2 - DeepSeek's core executive has left to start a new venture focused on the Agent sector, with plans to launch a product by Christmas 2025 [7] - DJI's imaging system founder and team leader has reportedly left the company, marking a significant personnel change [9] Group 3 - Midea Group responded to a recall of its North American air conditioning units, stating it was a voluntary recall and not due to defects, despite potential losses amounting to billions [10] - The recalled U-shaped air conditioner has sold 1.7 million units in the U.S. and 45,900 in Canada since its launch in 2020 [10] Group 4 - BYD has entered the top ten of imported car brands in Japan for the first time, with 416 units registered in May and plans to open 100 stores by the end of 2025 [21] - BYD's sales in Japan for 2024 are projected at 2,221 units, a 10% year-on-year increase, despite a 6% decline in overall imported car sales [21] Group 5 - JD.com has released a clean cooperation guideline prohibiting suppliers from engaging with dismissed employees, and established a 10 million yuan anti-corruption reward fund [19] - GAC Aion has seen a leadership change, with He Xianqing taking over as chairman from Feng Xingya [19] Group 6 - Xiaohongshu has established its first overseas office in Hong Kong, marking a significant step in its global strategy [20] - The platform aims to enhance creative collaboration between local content creators and brands, promoting cultural exchange [20] Group 7 - The "Guzi economy" is rapidly growing, with Pinduoduo testing a new group buying service specifically for this market, projected to reach a market size of 168.9 billion yuan in 2024 [13] - SiliconCloud, a generative AI development platform, has surpassed 6 million users and thousands of enterprise clients, with significant daily token generation [14] Group 8 - Neuralink and Grok are collaborating to enable ALS patients to communicate again through a brain-machine interface, showcasing advancements in assistive technology [32] - Toyota is partnering with a Finnish company to launch the world's first hydrogen sauna, aligning with its environmental goals [33] Group 9 - Qualcomm has announced the acquisition of UK semiconductor company Alphawave Semi for approximately $2.4 billion, enhancing its semiconductor IP portfolio [34] - SHEIN has denied reports of plans to increase its Indian supplier base from 150 to 1,000, clarifying its partnership with Reliance is limited to brand licensing [34]
独家丨原抖音生服市场负责人王丁虓加入京东健康,向CEO金恩林汇报
雷峰网· 2025-06-09 13:37
Core Viewpoint - Wang Dingxiao has recently joined JD Health as the head of the marketing department, indicating a strategic shift in the company's marketing leadership and a response to the evolving landscape of the digital marketing industry [2][5]. Group 1: Leadership Changes - Wang Dingxiao, previously the marketing head for Douyin's life services, has taken on the role of general manager of the marketing department at JD Health, reporting directly to CEO Jin Enlin [2][4]. - Prior to Wang's appointment, the marketing department was managed by CEO Jin Enlin himself, highlighting the frequent changes in leadership within JD Health [5]. Group 2: Career Background of Wang Dingxiao - Wang Dingxiao graduated from Tianjin Normal University in 2010 and has held various strategic roles in advertising firms such as Dentsu Digital and Ogilvy before transitioning to marketing at ByteDance in 2017 [2][3]. - During his seven years at ByteDance, Wang played a significant role in the development of the short video industry, managing marketing strategies for key clients across multiple regions [3]. Group 3: Industry Context - The marketing landscape is increasingly characterized by the need to build personal brands for entrepreneurs, leading to frequent adjustments in the roles of brand and marketing departments [5]. - JD's management has seen continuous changes, with recent reports indicating ongoing adjustments in the retail and marketing teams, reflecting the dynamic nature of the industry [5].
华为昇腾万卡集群揭秘:如何驯服AI算力「巨兽」?
雷峰网· 2025-06-09 13:37
Core Viewpoint - The article discusses the advancements in AI computing clusters, particularly focusing on Huawei's innovations in ensuring high availability, linear scalability, rapid recovery, and fault tolerance in large-scale AI model training and inference systems [3][25]. Group 1: High Availability of Super Nodes - AI training and inference require continuous operation, similar to an emergency room, where each computer in the cluster has a backup to take over in case of failure, ensuring uninterrupted tasks [5][6]. - Huawei's CloudMatrix 384 super node employs a fault tolerance strategy that includes system-level, business-level, and operational-level fault management to convert faults into manageable issues [5][6]. Group 2: Linear Scalability - The ideal scenario for computing power is linear scalability, where 100 computers should provide 100 times the power of one. Huawei's task distribution algorithms ensure efficient collaboration among computers, enhancing performance as the number of machines increases [8]. - Key technologies such as TACO, NSF, NB, and AICT have been developed to improve the linearity of training large models, achieving linearity rates of 96% and above in various configurations [8]. Group 3: Rapid Recovery of Training - The system can quickly recover from failures during training by automatically saving progress, allowing it to resume from the last checkpoint rather than starting over [10][12]. - Innovations like process-level rescheduling and online recovery techniques have reduced recovery times to under 3 minutes and even as low as 30 seconds in some cases [12]. Group 4: Fault Tolerance in MoE Model Inference - The article outlines a three-tier fault tolerance strategy for large-scale MoE model inference, which minimizes user impact during hardware failures [14][15]. - Techniques such as instance-level rapid restart and token-level retries have significantly reduced recovery times from 20 minutes to as low as 5 minutes [15]. Group 5: Fault Management and Diagnostic Capabilities - A real-time monitoring system continuously checks the health of each computer in the cluster, allowing for quick identification and resolution of issues [16]. - Huawei's comprehensive fault management solution includes capabilities for error detection, isolation, and recovery, enhancing the reliability of the computing cluster [16]. Group 6: Simulation and Modeling - Before training complex AI models, the computing cluster can simulate various scenarios in a virtual environment to identify potential bottlenecks and optimize performance [19][20]. - The introduction of a Markov modeling simulation platform allows for efficient resource allocation and performance tuning, improving throughput and reducing communication delays [20][21]. Group 7: Framework Migration - Huawei's MindSpore framework has rapidly evolved since its open-source launch, providing tools for seamless migration from other frameworks and enhancing execution efficiency [23]. - The framework supports one-click deployment for large models, significantly improving inference performance [23]. Group 8: Future Outlook - The article concludes that the evolution of computing infrastructure will follow a collaborative path between algorithms, computing power, and engineering capabilities, potentially creating a closed loop of innovation driven by application demands [25].
周鸿祎准备干掉360整个市场部,一个人办一场发布会,一年省几千万元;荣耀高管曝友商内部通知要干死荣耀;石头科技冲刺港股二次上市
雷峰网· 2025-06-09 00:33
要闻提示 NEWS REMIND 1.字节跳动内容质量负责人李彤离职,CQC业务变化折射AI冲击人工审核 2."疯狂的石头"再出发!石头科技冲刺港股二次上市 3.比亚迪李云飞回应"车圈恒大""常压油箱"事件:油箱合规、财务没问题 4."一年能省几千万"!周鸿祎:准备干掉360整个市场部 5.吉利李书福:有些企业的竞争方式令人难以启齿 6.小红书估值350亿美元,朱啸虎:没有股东愿意卖 7.特斯拉人形机器人Optimus项目负责人宣布离职:希望更多陪伴家人 8.扎克伯格豪赌AI,Meta拟斥资超百亿美元投资Scale AI 今日头条 HEADLINE NEWS "一年能省几千万"!周鸿祎:准备干掉360整个市场部 6月6日晚间,360集团创始人、董事长周鸿祎在其个人自媒体账号发文称,"我准备干掉360整个市场 部,这样一年可以给公司省下几千万。"他表示,从当天起,要做一个挑战,一个人完成一场完整的新产 品发布会。"听起来像天方夜谭,但这次我准备动真格的了。" 周鸿祎称,过去做一个产品发布会,需要市场部几十号人,忙活大半个月,费时费力费钱,还让他不满 意,这次他自己一个人全包了。周鸿祎还称,这不是一拍脑袋就做的决 ...
高价硬件出海别傻拼技术,要抓大赛道里的小机会丨鲸犀百人谈Vol.38
雷峰网· 2025-06-06 11:34
当前的市场热衷给硬件加上AI以提高价格,但这更多只是手段而不是目的。 作者丨吴优 满足用户精神嗨点的产品,更容易做成高价 雷峰网·鲸犀:当前硬件产品的价格区间要如何分类?什么样的产品可以被定义为高价产品? 编辑丨刘伟 DJI、拓竹等中国硬件公司近些年的亮眼成绩在提醒我们一件事:高端硬件阵地似乎正在从美国硅谷向中国 深圳转移。"中国制造"不再是极致性价比的代名词,中国公司也能做出广受海外市场欢迎的高品质高价格 的硬件产品。 不过,DJI、拓竹这样的公司依旧是中国硬件企业中的少数,高价硬件产品究竟需要如何定义和销售,依旧 是中国企业需要持续探索的课题。 本期雷峰网·鲸犀出海百人谈栏目,邀请到海石出海创始合伙人杨飞,分享中国硬件出海,应该如何正确做 出高价产品。 杨飞先生拥有超15年的创业和投资经验,于2024年发起海石出海,目前海石出海服务的客户已经覆盖了从 成功众筹200万美金的初创公司,到国内收入达100亿的大型产业集团,为其从零打造出海运营、营销和品 牌体系。杨飞曾任百联挚高资本的出海赛道负责人,也是著名美元基金云九资本创始团队成员,从零推动 云九资本发展为管理规模超20亿美金的主流机构。 最近,杨飞在咖啡 ...