AI动态汇总:MetaLIama4开源,openAI启动先锋计划
- The report introduces the Llama 4 model series, which includes Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth, highlighting their advanced multimodal capabilities and efficiency through the MoE (Mixture of Experts) architecture[10][11][12] - Llama 4 Scout features 16 experts with 17 billion activated parameters, supports a 10M context window, and is optimized for single H100 GPU deployment, achieving state-of-the-art (SOTA) performance in various benchmarks[11][12] - Llama 4 Maverick employs 128 routed experts and a shared expert, activating only a subset of total parameters during inference, which reduces service costs and latency. It also incorporates post-training strategies like lightweight SFT, online RL, and DPO to balance model intelligence and conversational ability[12][14] - The CoDA method is introduced to mitigate hallucination in large language models (LLMs) by identifying overshadowed knowledge through mutual information calculations and suppressing dominant knowledge biases. This method significantly improves factual accuracy across datasets like MemoTrap, NQ-Swap, and Overshadow[23][25][29] - The KG-SFT framework enhances knowledge manipulation in LLMs by integrating external knowledge graphs. It includes components like Extractor (NER and BM25 for entity and triple extraction), Generator (HITS algorithm for generating explanatory text), and Detector (NLI models for detecting knowledge conflicts). KG-SFT demonstrates superior performance, especially in low-data scenarios, with a 14% accuracy improvement in English datasets[45][47][52] - DeepCoder-14B-Preview, an open-source code reasoning model, achieves competitive performance with only 14 billion parameters. It utilizes GRPO+ for stable training, iterative context length extension, and the verl-pipeline for efficient reinforcement learning. The model achieves a Pass@1 accuracy of 60.6% on LiveCodeBench and a Codeforces score of 1936, placing it in the 95.3rd percentile[53][61][64]