Workflow
CPU
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-09-21 19:48
RT Avi Chawla (@_avichawla)PyTorch dataloader has 2 terrible default settings.Fixing them gave me ~5x speedup.When you train a PyTorch model on a GPU:- .to(device) transfers the data to the GPU.- Everything after this executes on the GPU.This means when the GPU is working, the CPU is idle, and when the CPU is working, the GPU is idle.Memory pinning optimizes this as follows:- When the model is trained on the 1st mini-batch, the CPU can transfer the 2nd mini-batch to the GPU.- This ensures that the GPU does ...
X @Avi Chawla
Avi Chawla· 2025-09-21 06:33
PyTorch dataloader has 2 terrible default settings.Fixing them gave me ~5x speedup.When you train a PyTorch model on a GPU:- .to(device) transfers the data to the GPU.- Everything after this executes on the GPU.This means when the GPU is working, the CPU is idle, and when the CPU is working, the GPU is idle.Memory pinning optimizes this as follows:- When the model is trained on the 1st mini-batch, the CPU can transfer the 2nd mini-batch to the GPU.- This ensures that the GPU does not have to wait for the ne ...
Nvidia's getting into AMD's business with $5B stake in Intel, says Constellation's Ray Wang
CNBC Television· 2025-09-19 11:41
Intel uh coming off its uh best day in a long long time. The stock surged 22% following Nvidia's announcement that it will invest $5 billion in Intel. Joining us now is Ray Wong, Constellation Research founder and chairman.What I'm looking over in Becky's chair. It's a lot of room there. Normally you're in here with us.Uh Ray, I put on some extra makeup because you usually take a a picture and tweet it out. What What happened. Why aren't you here.Hey, I'd love to be in New York. Happy Friday. I I ended up i ...
X @郭明錤 (Ming-Chi Kuo)
Key Industry Takeaways from Nvidia’s $5 Billion Investment in Intel1. Partnership Could Define and Accelerate the AI PC LandscapeFor Nvidia, developing its own Windows-on-ARM processors carries high uncertainty; for Intel, establishing a competitive edge in GPUs is difficult. Teaming up (CPU + GPU) could create powerful synergies and advantages across the PC ecosystem.2. Significant Synergistic Potential in x86 / Mid & Low-Range / Inference AI ServersA key trend ahead is enterprises building x86-based / mid ...
X @郭明錤 (Ming-Chi Kuo)
Nvidia投資Intel 50億美元之產業趨勢重點分析1, Nvidia與Intel合作有望定義AI PC並加速其發展對Nvidia而言,自行開發Windows on ARM處理器的不確定性高;對 Intel 而言,要在GPU領域快速提升競爭力難度高。兩者合作(CPU+GPU)可望在PC生態中形成強綜效與優勢。2. 在 x86/中低階/推論用 AI 伺服器上也具高綜效企業自建x86/中低階/推論用的AI伺服器為未來關鍵趨勢。Intel擁有 x86伺服器企業客戶與通路資源;Nvidia具備技術優勢 (AI晶片、NVLink、CUDA等)。若雙方高度整合技術與銷售優勢,將有機會顯著受惠於龐大的潛在需求。3. 台積電:先進製程與AI晶片訂單數年內不受影響,主要觀察PC、GPU、x86伺服器與網通客戶的市佔與訂單變化,但整體而言風險可控(1) 台積電先進製程優勢可望至少維持至2030年,Nvidia與Intel合作不易改變此趨勢。(2) AI晶片需最先進製程,台積電的AI晶片訂單不受影響。(3) 此投資未來可能改變Nvidia與Intel的競爭者之市佔 (如AMD的PC、GPU與x86伺服器晶片、Broadcom的 ...
光博会见闻反馈
2025-09-15 01:49
Summary of Key Points from the Conference Call Industry Overview - The optical module industry is experiencing high growth, with saturation expected in demand from the second half of 2025 to 2026, driven by the introduction of 1.6T solutions, primarily benefiting from the mass import of NVIDIA's C8X network card and potentially the CX9 network card initiating 3.2T demand [2][5][20] - The iteration cycle for optical modules has shortened to approximately two years, favoring leading manufacturers [2][5] Core Insights and Arguments - Domestic second-tier optical module manufacturers such as Solstice, Cambridge, and Lantech are seizing the high demand for AI optical modules to penetrate the North American market, despite the limited opportunities due to established suppliers [2][6] - Domestic optical chip manufacturers are accelerating technological advancements, with significant progress reported by Yuanjie in CW laser technology and Changguang Huaxin in 100G EML, enhancing market competitiveness [2][7][8] - The CPC (Copax) and pluggable optical module solution proposed by Xuchuang is gaining traction, having been adopted by overseas companies like Broadcom and Marvell, marking it as a significant competitor in the short term [2][13] Emerging Technologies - Liquid cooling products were prominently showcased at the 2025 data center exhibition, indicating readiness for NVIDIA's opportunities, with high demand noted [3] - OCS (Optical Circuit Switching) technology is gaining attention, with Google pushing its development and domestic manufacturers like Guangku and Lingyun Light showcasing related products [12] - NPU (Near-Package Unit) technology is emerging as a promising alternative to CPU, with expectations for earlier market adoption and significant demand for switches [11] Market Dynamics - The optical module industry is expected to see less price decline in 2026 due to strong demand and tight supply conditions, with shortages in core materials like EMA and CW light sources contributing to price stability [4][20] - The North American market's demand for 800G and 1.6T is creating opportunities for domestic manufacturers, despite the competitive landscape [6] Notable Developments - Changfei Fiber showcased an AI intelligent hub solution and hollow fiber products, achieving a significant milestone with a 100-kilometer hollow fiber link demonstrating a loss of 0.089dB per kilometer, nearing the limits of quartz fiber [4][18][19] - The rapid development of supernodes in China is being driven by major players like Huawei and ZTE, indicating a robust growth trend in the industry [14] Conclusion - The optical module industry is poised for significant changes driven by technological advancements and market dynamics, with new solutions like CPC and hollow fiber technology potentially reshaping competitive landscapes and driving growth [21]
X @BREAD | ∑:
BREAD | ∑:· 2025-08-18 21:06
SALT Architecture & Performance - SALT 通过异步方式将更改持久化到磁盘,以保持顶层完整树的内存状态 [1] - SALT 的身份验证数据结构性能不受键值对数量或 SSD 数量的影响,因为它的大小固定且完全驻留在内存中 [1] - SALT 可以自由选择最佳的键值“数据库”,因为底层键值存储是正交问题 [1] - SALT 在实验中受 CPU 限制,CPU 成本不随存储的键值对数量增加 [3] - SALT 每次帐户/存储槽更新只会对底层键值引擎进行一次更新,这被认为是最佳的 [4] Comparison with Other Technologies - 文档承认 NOMT 和 QMDB 的比较可能存在问题,因为它们未在相同数量的键值对上进行比较 [2] - SALT 团队认为优化键值引擎不是他们的主要任务 [5] - SALT 团队将在论文发表时进行适当的评估 [5] Key-Value Engine & Scalability - SALT 可以使用任何键值引擎,这被认为是主要优势,因为可以跟随最新的技术发展 [4] - 即使 SALT 存储 10 亿(1 Billion)或更多的键值对,键值引擎也不太可能取代 CPU 成为主要瓶颈 [4] - 在 RocksDB 上重放 EVM 存储/帐户更新(超过 10 亿(1 Billion)),实现了每秒数十万次的写入,远超 CPU 导致的每秒 87,000 次更新的瓶颈 [4]
🚨 All-In Summit Speaker Announcement: Rene Haas
All-In Podcast· 2025-08-05 16:28
Company Valuation & Market Position - ARM's IPO in September valued the company above $54 billion [1] - The IPO was the largest public offering in over 2 years [1] - The company's valuation has tripled [1] - ARM's circuits are present in nearly every smartphone [1] - ARM is considered the winner in the CPU side [2] Industry Trends & Opportunities - Software is advancing faster than hardware [2] - Increased investment in new hardware benefits ARM [2]
芯江湖•浙大派:一支"严门"子弟的CPU风云录
半导体芯闻· 2025-07-24 10:21
Group 1 - The article discusses the rapid evolution of the RISC-V architecture and its significance in the semiconductor industry, highlighting that changes in the next five years will be faster than the past decade [3][34]. - Key figures in the RISC-V ecosystem, such as Dr. Jim Keller and Professor Yan Xiaolang, are mentioned for their contributions and vision in advancing RISC-V technology [5][18]. - The article emphasizes the importance of collaboration between academia and industry, particularly through initiatives like "产教融合" (industry-education integration) [19]. Group 2 - The rise of companies like Zhongtianwei (中天微) and PingTouGe (平头哥) is highlighted, showcasing their innovative approaches and strategic partnerships, particularly with Alibaba [32][34]. - Zhongtianwei's development of the C-Core series of embedded CPUs is noted for its wide application across various sectors, including IoT and automotive electronics [20]. - PingTouGe's acquisition by Alibaba is presented as a pivotal moment for enhancing China's chip development capabilities, with a focus on achieving "自主可控" (independent control) in chip technology [34]. Group 3 - The article introduces new players in the RISC-V landscape, such as Zhihe Computing (知合计算) and Xinkai Technology (芯来科技), emphasizing their unique contributions and innovations in AI and automotive applications [40][49]. - Zhihe Computing aims to develop high-performance AI computing CPUs based on RISC-V architecture, targeting efficient integration of AI into business operations [43]. - Xinkai Technology is recognized for its focus on RISC-V CPU IP development and has achieved significant milestones, including being the first to obtain ASIL-D certification for automotive applications [50][53]. Group 4 - The entrepreneurial journey of Chen Zhijian and his company Jindie Shikong (进迭时空) is highlighted, showcasing the rapid production of RISC-V AI CPU chips and strategic investments to enhance their market position [55][57]. - The article concludes with a reflection on the ongoing evolution of the RISC-V ecosystem and the continuous emergence of new talent and companies, reinforcing the idea that the journey in the semiconductor industry is far from over [58].
对话季宇:大模型非必须在GPU跑,CPU内存带宽已足够
虎嗅APP· 2025-05-18 13:51
Core Viewpoint - The article discusses the innovative approach of a company, 行云集成电路, led by its founder, 季宇, in developing a cost-effective AI computing solution through the integration of CPU and memory technologies, challenging the traditional reliance on GPUs for large model deployments [5][10][19]. Group 1: Company Overview - 行云集成电路 was founded by 季宇, a former Huawei expert, focusing on self-developed GPU technology [5]. - The company aims to create a DeepSeek integrated machine, which is a high-performance computing device designed for local deployment of AI models [8][19]. Group 2: Technology and Innovation - The DeepSeek integrated machine, referred to as "组装机," combines various hardware components, including Intel or domestic CPUs and NVIDIA GPUs, but aims to reduce costs significantly [9][19]. - 季宇 argues that modern large models can run efficiently on CPUs, leveraging their high memory bandwidth, which can exceed that of high-end GPUs like the RTX 4090 [10][13]. - The company plans to design a custom chip that optimizes CPU performance for AI applications, moving away from traditional GPU reliance [13][24]. Group 3: Market Strategy - The goal is to make AI technology accessible at consumer electronics price points, transforming the market from supercomputing to widespread use [18][25]. - By lowering the cost of AI computing solutions to around 100,000 yuan, the company aims to enable more startups to enter the AI space [19][25]. - The strategy includes using common components to promote widespread adoption and avoid creating high barriers to entry for other players in the industry [22][23]. Group 4: Competitive Landscape - 季宇 believes that simply following NVIDIA's path will not lead to success, emphasizing the need for innovative approaches to challenge established players [17]. - The company seeks to demonstrate the feasibility of its approach through proof-of-concept products, aiming to gain acceptance from industry players [14][18].