DeepEP通信框架

Search documents
AI动态汇总:英伟达Llama-Nemotron模型表现优异,小米Mi-BRAG智能引擎亮相
China Post Securities· 2025-05-14 13:08
Quantitative Models and Construction Methods 1. Model Name: Llama-Nemotron - **Model Construction Idea**: The Llama-Nemotron model aims to enhance inference capabilities while reducing memory usage without sacrificing performance[12][13] - **Model Construction Process**: - **Stage 1: Neural Architecture Search (NAS)**: Optimizes from the Llama 3 model to accelerate inference using block-level local distillation and mixed-integer programming (MIP) solvers to select the most efficient configuration[14] - **Stage 2: Vertical Compression and FFN Fusion**: Introduces FFN fusion technology to reduce sequence depth and improve computational efficiency by identifying and replacing consecutive FFN blocks[14] - **Stage 3: Knowledge Distillation and Continued Pre-training**: Conducts knowledge distillation and continued pre-training to improve model quality and recover any quality loss from block replacement[15] - **Stage 4: Supervised Fine-Tuning (SFT)**: Uses mixed instruction data and reasoning trajectories from strong teacher models for supervised fine-tuning[15] - **Stage 5: Large-Scale Reinforcement Learning**: Trains the model using large-scale reinforcement learning, particularly on complex mathematical and STEM datasets[15] - **Model Evaluation**: The model is designed to enhance inference efficiency and reduce memory usage while maintaining high performance[13][16] Model Backtesting Results - **Llama-Nemotron Model**: - **HumanEval 0-shot**: 92.1%[53] - **LiveCodeBench (v6) 0-shot**: 30.3%[53] - **MultiPL-E average 0-shot**: 81.4%[53] - **ArenaHard 0-shot**: 97.1%[53] - **IfEval 0-shot**: 89.4%[53] - **Math500 Instruct 0-shot**: 91.0%[53] - **GPQA Diamond 5-shot CoT**: 57.1%[53] - **MMLU Pro 5-shot CoT**: 77.2%[53] - **RULER 32K**: 96.0%[53] - **RULER 128K**: 90.2%[53] - **MMMU 0-shot**: 66.1%[53] - **DocVQA 0-shot**: 95.3%[53] - **AI2D 0-shot**: 93.7%[53] - **ChartQA 0-shot**: 82.6%[53] Quantitative Factors and Construction Methods 1. Factor Name: Mi-BRAG - **Factor Construction Idea**: The Mi-BRAG system addresses high knowledge update costs, lack of insight into proprietary knowledge bases, and data leakage risks in traditional large models[25] - **Factor Construction Process**: - **Full-Format Compatibility**: Integrates an intelligent parsing engine to handle various document formats like PDF, Word, and Excel[27] - **Full-Modal Parsing**: Accurately analyzes complex images, tables, and mixed information[27] - **Multilingual Q&A**: Supports document parsing and interactive Q&A in major languages[27] - **Fine-Grained Traceability**: Uses dynamic traceability technology to mark the original document and citation location for each generated result[27] - **Factor Evaluation**: The system enhances the intelligent knowledge center for various application scenarios, improving product intelligence and user experience[28] Factor Backtesting Results - **Mi-BRAG Factor**: - **SuperCLUE-RAG Generation Capability Ranking**: Ranked first in April 2025[31] 2. Factor Name: VPP (Video Prediction Policy) - **Factor Construction Idea**: VPP is designed to generate video actions based on text instructions, leveraging AIGC video diffusion models for predictive visual representation and action learning[36][39] - **Factor Construction Process**: - **Stage 1**: Uses video diffusion models to learn predictive visual representations[36] - **Stage 2**: Employs Video Former and DiT diffusion strategies for action learning[36] - **Factor Evaluation**: VPP significantly enhances the generalization ability of humanoid robots by learning from human actions and reducing dependency on high-quality robot data[36][40] Factor Backtesting Results - **VPP Factor**: - **Calvin ABC-D Task Average Length**: 4.33[42] - **Real-World Dexterous Hand Task Success Rate**: 67%[42]
提升大模型通信性能30% DeepSeek致谢腾讯大模型网络提速技术方案贡献
Shen Zhen Shang Bao· 2025-05-11 22:32
Core Insights - Tencent's technical team has optimized the DeepEP communication framework, achieving significant performance improvements in various network environments, with a 100% enhancement in RoCE and a 30% enhancement in IB networks, facilitating more efficient AI large model training solutions [2][3] - The optimization addresses key bottlenecks in the original DeepEP framework, particularly in bandwidth utilization and CPU control delays, which were limiting its broader application [2][3] Group 1 - The optimization includes intelligent bandwidth allocation through topology-aware multi-QP chaining technology, ensuring full utilization of dual-port network card bandwidth and preventing bandwidth waste [3] - Tencent has resolved CPU control bottlenecks in GPU communication by optimizing the control plane operations to bypass CPU intermediaries, reducing latency and energy consumption [3] - A new "QP internal sequencing lock" mechanism has been introduced to ensure accurate and sequential data transmission among multiple GPUs, even when handling over 1,000 simultaneous data transfer tasks [3] Group 2 - The optimized DeepEP framework has been fully open-sourced and successfully applied in Tencent's mixed Yuan large model training and inference projects, demonstrating excellent versatility in high-performance environments built with Tencent's Xingmai and H20 servers [3]
AI周报 | xAI新一轮融资后估值有望超1200亿美元;OpenAI重组计划生变
Di Yi Cai Jing Zi Xun· 2025-05-11 01:39
Group 1: xAI Financing - xAI, an AI startup founded by Elon Musk, is negotiating a new round of financing with a potential valuation exceeding $120 billion (approximately 86.88 billion RMB) [1] - Investors are considering injecting $20 billion into xAI, although the specific amount may fluctuate as negotiations progress [1] - If successful, this financing would become the second-largest startup funding round in history, following OpenAI's $40 billion funding earlier this year, which valued OpenAI at $300 billion (approximately 217,000 million RMB) [1] Group 2: OpenAI Restructuring - OpenAI announced it will remain under the control of a non-profit organization, retracting a previous restructuring plan that aimed to shift control to a for-profit entity [2] - The for-profit LLC will transition to a Public Benefit Corporation (PBC), allowing it to pursue profit while also focusing on social missions [2] - The new structure will enable investors and employees to hold common stock without limits on appreciation, facilitating future fundraising efforts [2] Group 3: AI Programming Unicorn - Anysphere, the developer of the AI programming tool Cursor, completed a $900 million funding round, bringing its valuation to approximately $9 billion [5][6] - The funding round was led by Thrive Capital, with participation from notable investors such as a16z and Accel [5] - Cursor is recognized as one of the most popular AI tools in the programming sector, reflecting the growing interest in AI programming applications [6] Group 4: Google Market Value Drop - Google's parent company Alphabet experienced a market value loss of nearly $150 billion after Apple announced plans to introduce AI features in its Safari browser [4] - The stock price of Alphabet fell over 7% following the announcement, highlighting the competitive threat posed by AI technologies to traditional search engines [4] - The integration of AI into search functionalities is becoming a significant trend, with major players like Apple and OpenAI actively pursuing this direction [4] Group 5: Tencent's Video Generation Tool - Tencent's Hunyuan team released and open-sourced a new multimodal video generation tool called HunyuanCustom, which significantly improves performance over existing solutions [8] - The tool integrates various input modalities, including text, images, audio, and video, to generate videos [8] - This release is part of a broader trend of open-source video generation models competing with proprietary tools in the market [8] Group 6: Humanoid Robot Developments - Several humanoid robot manufacturers have updated their products, showcasing advancements in mobility and control [9] - The CL-3 humanoid robot by Zhijidongli features 31 degrees of freedom, enabling it to perform human-like movements [9] - The ongoing evolution of humanoid robots is highlighted by upcoming events such as the World Humanoid Robot Sports Competition [9]