LLM推理加速
Search documents
热乎出炉的面经,刚面完NVIDIA TRT LLM~
自动驾驶之心· 2025-06-23 11:34
Core Insights - The article discusses a recent interview experience with Nvidia for a position related to LLM inference acceleration, highlighting the rigorous interview process and technical discussions involved [1]. Group 1: Interview Process - The interview consisted of four rounds, each lasting one hour, with a total duration of four hours, indicating a thorough evaluation process by Nvidia [1]. - The first interviewer focused on the candidate's research work, particularly on speculative decoding, and included a coding challenge that the candidate struggled with due to lack of practice [1]. - The second interviewer demonstrated familiarity with the candidate's research, engaging in a deeper discussion about speculative decoding and presenting a string-related coding problem [1]. Group 2: Technical Discussions - The third interviewer, a female group leader, discussed the development directions of speculative decoding in high-batch scenarios and posed questions about transformer structures, specifically regarding the dimensions of Q and K [1]. - The fourth interviewer, who was the only one to turn on the camera, engaged in discussions from a systems perspective, providing valuable insights and confirming understanding during the presentation [1]. Group 3: Internship Details - The internship location options include Shanghai, Beijing, or remote work, with a focus on inference optimization rather than purely research-oriented tasks [1]. - The expected internship salary ranges from 8,000 to 10,000 yuan, reflecting the competitive nature of positions in the tech industry [1].