Meta新模型要来了，但Llama 4的锅谁来接？1300多位作者的联合报告来了

Core Insights - Meta's newly established AI team has delivered its first key models internally this month, as stated by CTO Andrew Bosworth, who described the models as "very good" [1] - The company is developing a text AI model codenamed Avocado, expected to be released in Q1, and an image and video AI model codenamed Mango [1] - A technical report titled "Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes" has been uploaded to arXiv, reviewing the data and technical achievements claimed by the Meta Llama 4 series [1][5] Summary by Sections Technical Report Overview - The report includes contributions from over 1300 authors, indicating a collaborative effort from the Llama 4 team, despite some contributors having left Meta [4] - It emphasizes that the document is an independent investigation of publicly available materials, with benchmark values attributed to model cards [4] Model Performance and Limitations - The report highlights a gap between the architectural capabilities of the models and their actual deployment performance, particularly regarding context length [4][7] - It mentions that while the architecture supports a context length of 10 million tokens, practical deployment often limits this due to hardware constraints [7] Controversies and Criticisms - The report addresses criticisms regarding the Llama 4 series, particularly the discrepancies between leaderboard performance and real-world application [8][11] - It notes that the experimental variant submitted to the LMArena leaderboard differs from the publicly released version, leading to accusations of "gaming AI benchmarks" [11] - Marketing claims made in announcements should be distinguished from rigorous model card benchmark results, as some statements are categorized as "marketing-facing claims" [11] Model Variants and Features - The report summarizes the released model variants, including Llama 4 Scout and Llama 4 Maverick, detailing their architectures, active parameters, modalities, and supported languages [9][10] - It also discusses the training disclosures and deployment limitations observed in major service environments [12]