Workflow
TRT LLM
icon
Search documents
NVIDIA Blackwell Sets New Standard in AI Inference with 15X ROI and $75 Million Revenue
NVIDIA· 2025-10-09 23:43
Performance Benchmarks - Blackwell 在 Deepsee R1、GPTOSS 和 Llama 等领先的开源模型上实现了突破性性能,基于 inference max 基准 [1] - 新的基准设计不仅用于理解性能,还包括成本和效率,从而了解大规模部署推理的需求 [2] - GB200 MBL72 单系统可以产生足够的 tokens 来创造 7500 万美元的收入,投资回报率达 15 倍(基于 GPT OSS)[2] - 借助最新的 TRT LLM 软件改进,每个 GPU 每秒能够生成 6 万个 tokens [3] - 对于像 Llama 这样的密集开放模型,每个 GPU 每秒能够生成 1 万个 tokens,是上一代 Hopper 平台的 4 倍 [3] Efficiency Improvements - Blackwell 在功率受限的数据中心中,每兆瓦的性能是上一代 Hopper 平台的 10 倍 [3] - 更多的 tokens 转化为更多收入 [4] Future Expectations - 预计 Blackwell Ultra 将有新的结果,以及更多的软件改进和增强,从而提高 AI 工厂的性能和效率 [4]
热乎出炉的面经,刚面完NVIDIA TRT LLM~
自动驾驶之心· 2025-06-23 11:34
Core Insights - The article discusses a recent interview experience with Nvidia for a position related to LLM inference acceleration, highlighting the rigorous interview process and technical discussions involved [1]. Group 1: Interview Process - The interview consisted of four rounds, each lasting one hour, with a total duration of four hours, indicating a thorough evaluation process by Nvidia [1]. - The first interviewer focused on the candidate's research work, particularly on speculative decoding, and included a coding challenge that the candidate struggled with due to lack of practice [1]. - The second interviewer demonstrated familiarity with the candidate's research, engaging in a deeper discussion about speculative decoding and presenting a string-related coding problem [1]. Group 2: Technical Discussions - The third interviewer, a female group leader, discussed the development directions of speculative decoding in high-batch scenarios and posed questions about transformer structures, specifically regarding the dimensions of Q and K [1]. - The fourth interviewer, who was the only one to turn on the camera, engaged in discussions from a systems perspective, providing valuable insights and confirming understanding during the presentation [1]. Group 3: Internship Details - The internship location options include Shanghai, Beijing, or remote work, with a focus on inference optimization rather than purely research-oriented tasks [1]. - The expected internship salary ranges from 8,000 to 10,000 yuan, reflecting the competitive nature of positions in the tech industry [1].