LLaMA - filings, earnings calls, financial reports, news

LLaMA

Search documents

GF SECURITIES· 2026-01-04 07:25

[Table_Page] 跟踪分析|传媒证券研究报告 [Table_Summary] [Table_Title] 传媒行业•AI 周度跟踪之四十九火山引擎成为总台春晚独家 AI 云合作伙伴，"京东 AI 购"上线 [Table_Gr ade] 行业评级买入前次评级买入报告日期 2026-01-04 核心观点：国内主要 AI 产品数据：根据 SimilarWeb，上周（2025/12/22- 2025/12/28）国内主要 AI 大模型产品网页端访问量分别为：Kimi 799.07 万次，环比下降 7.83%；文心一言 102.64 万次，环比上升 4.15%；豆包 2409.88 万次，环比上升 0.10%；智谱清言 66.42 万次，环比上升 16.83%；讯飞星火 9.99 万次，环比上升 10.24%；DeepSeek 6632.63 万次，环比下降 5.06%；腾讯元宝 502.20 万次，环比上升 7.39%；天工 AI7.11 万次，环比上升 8.28%。海外主要 AI 产品数据：根据 SimilarWeb，上周（2025/12/22- 2025/12/28）海外主要 AI 大模 ...

AAAI 2026 | 首个抗端到端攻击的大模型加密指纹 / 水印方案

机器之心· 2025-12-01 09:30

Core Insights - The article discusses the development of iSeal, an encrypted fingerprinting solution designed to protect the intellectual property of large language models (LLMs) against advanced attacks [2][3][5]. Research Background - The training of large language models often incurs costs in the millions of dollars, making the model weights valuable intellectual property. Researchers typically use model fingerprinting techniques to assert ownership by embedding triggers that produce characteristic responses [6][7]. - Existing fingerprinting methods assume that the verifier faces a black-box API, which is unrealistic as advanced attackers can directly steal model weights and deploy them locally, gaining end-to-end control [7][10]. iSeal Overview - iSeal is the first encrypted fingerprinting scheme designed for end-to-end model theft scenarios. It introduces encryption mechanisms to resist collusion-based unlearning and response manipulation attacks, achieving a 100% verification success rate across 12 mainstream LLMs [3][12]. Methodology and Innovations - iSeal's framework transforms the fingerprint verification process into a secure encrypted interaction protocol, focusing on three main aspects: - **Encrypted Fingerprinting and External Encoder**: iSeal employs an encrypted fingerprint embedding mechanism and an external encoder to decouple fingerprints from model weights, preventing attackers from reverse-engineering the fingerprints [15]. - **Confusion & Diffusion Mechanism**: This mechanism binds fingerprint features to the model's core reasoning capabilities, making them inseparable and resilient against attempts to erase specific fingerprints [15]. - **Similarity-based Dynamic Verification**: iSeal uses a similarity-based verification strategy and error correction mechanisms to identify fingerprint signals even when attackers manipulate outputs through paraphrasing or synonym replacement [15][18]. Experimental Results - In experiments involving models like LLaMA and OPT, iSeal maintained a 100% verification success rate even under advanced attacks, while traditional fingerprinting methods failed after minor fine-tuning [17][18]. - The results demonstrated that iSeal's design effectively prevents attackers from compromising the entire verification structure by attempting to erase parts of the fingerprint [17][21]. Ablation Studies - Ablation studies confirmed the necessity of iSeal's key components, showing that without freezing the encoder or using a learned encoder, the verification success rate dropped to near zero [20][21].

大语言模型版权保护

模型指纹技术

端到端控制攻击

Artificial Intelligence

Artificial Intelligence

iSeal

LLaMA

何小鹏谈开源：向前走是最重要的

Xin Lang Ke Ji· 2025-11-05 10:17

Core Insights - Xiaopeng Motors' CEO He Xiaopeng emphasized the importance of open-source technology, comparing it to initiatives by Meta and Alibaba, and expressed a commitment to collaboration within the industry [1] Group 1: Open Source Strategy - Xiaopeng Motors has decided to open-source its SDK, aiming to enhance collaboration and innovation within the automotive industry [1] - The company believes that successful operations require strong capabilities in core technology, computing power, data management, engineering, and customer satisfaction metrics like NPS and ENPS [1] Group 2: Financial Commitment - Xiaopeng Motors invests nearly 10 billion in R&D annually, reflecting its long-term commitment to technological advancement over its 11 years of operation [1] - The CEO expressed a desire for more partnerships, including with major players like Volkswagen, to drive the industry into a new phase of development [1]

实锤了：GPU越多，论文接收率越高、引用越多

机器之心· 2025-10-17 08:12

Core Insights - The article discusses the significant advancements in the AI field over the past three years, primarily driven by the development of foundational models, which require substantial data, computational power, and human resources [2][4]. Resource Allocation and Research Impact - The relationship between hardware resources and the publication of top-tier AI/ML conference papers has been analyzed, focusing on GPU availability and TFLOPs [4][5]. - A total of 5,889 foundational model-related papers were identified, revealing that stronger GPU acquisition capabilities correlate with higher acceptance rates and citation counts in eight leading conferences [5][9]. Research Methodology - The study collected structured information from 34,828 accepted papers between 2022 and 2024, identifying 5,889 related to foundational models through keyword searches [8][11]. - A survey of 229 authors from 312 papers indicated a lack of transparency in GPU usage reporting, highlighting the need for standardized resource disclosure [9][11]. Growth of Foundational Model Research - From 2022 to 2024, foundational model research has seen explosive growth, with the proportion of related papers in top AI conferences rising significantly [18][19]. - In NLP conferences, foundational model papers have outpaced those in general machine learning conferences [22]. Research Contributions by Academia and Industry - Academic institutions contributed more papers overall, while top industrial labs excelled in single-institution output, with Google and Microsoft leading in paper production [29][32]. - The research efficiency between academia and industry is comparable, with industry researchers publishing an average of 8.72 papers and academia 7.93 papers [31]. Open Source Models and GPU Usage - Open-source models, particularly the LLaMA series, have become the predominant choice in research, favored for their flexibility and accessibility [35][37]. - NVIDIA A100 is the most widely used GPU in foundational model research, with a notable concentration of GPU resources among a few institutions [38][39]. Funding Sources and Research Focus - Government funding is the primary source for foundational model research, with 85.5% of papers receiving government support [41][42]. - The focus of research has shifted towards algorithm development and inference processes, with a significant portion of papers dedicated to these areas [42]. Computational Resources and Research Output - The total computational power measured in TFLOPs is more strongly correlated with research output and citation impact than the sheer number of GPUs used [44][45]. - While more resources can improve acceptance rates, the quality of research and its novelty remain critical factors in the review process [47].

Avi Chawla· 2025-10-12 19:29

Core Problem of Traditional RAG - Most retrieved chunks in traditional RAG setups do not effectively aid the LLM, leading to increased computational costs, latency, and context processing [1][5] - Classic RAG involves fetching similar chunks from a vector database and directly inputting the retrieved context into the LLM [5] REFRAG Solution by Meta AI - Meta AI's REFRAG introduces a novel approach by compressing and filtering context at a vector level, focusing on relevance [1][2] - REFRAG employs chunk compression, relevance policy (RL-trained), and selective expansion to process only essential information [2] - The process involves encoding documents, finding relevant chunks, using a relevance policy to select chunks, and concatenating token-level representations [3][4] Performance Metrics of REFRAG - REFRAG outperforms LLaMA on 16 RAG benchmarks, demonstrating enhanced performance [5][7] - REFRAG achieves 30.85x faster time-to-first-token, significantly improving processing speed [5][7] - REFRAG handles 16x larger context windows, allowing for more extensive information processing [5][7] - REFRAG utilizes 2-4x fewer tokens, reducing computational resource consumption [5][7] - REFRAG leads to no accuracy loss across RAG, summarization, and multi-turn conversation tasks [7]

Meta Platforms(US:META)

RAG (Retrieval-Augmented Generation)

LLM (Large Language Model)

Artificial Intelligence

Vector Database

REFRAG

LLaMA

RAG (Retrieval-Augmented Generation)

LLM (Large Language Model)

Artificial Intelligence

Avi Chawla· 2025-10-12 06:31

Researchers from Meta built a new RAG approach that:- outperforms LLaMA on 16 RAG benchmarks.- has 30.85x faster time-to-first-token.- handles 16x larger context windows.- and it utilizes 2-4x fewer tokens.Here's the core problem with a typical RAG setup that Meta solves:Most of what we retrieve in RAG setups never actually helps the LLM.In classic RAG, when a query arrives:- You encode it into a vector.- Fetch similar chunks from vector DB.- Dump the retrieved context into the LLM.It typically works, but a ...

Meta Platforms(US:META)

从 1600 美元单卡到 450 万美元年费：部署大模型到底需要多少钱？

锦秋集· 2025-10-05 11:54

Core Insights - The article discusses the significant cost disparities between local deployment of AI models and subscription-based commercial APIs, highlighting the need for a clear cost analysis framework for businesses considering generative AI integration [1][2][5]. Cost Analysis Framework - A systematic cost analysis framework has been developed to compare total ownership costs (TCO) of local deployment (hardware, electricity) versus commercial APIs (subscription fees) [2][5]. - The framework includes an online cost estimation tool tailored for different business sizes, allowing companies to analyze their specific workloads [2][3]. Local Deployment Costs - Local deployment costs vary by model size: small models (e.g., EXAONE 4.0 32B) can be deployed with a single RTX 5090 GPU (approximately $2,000) and monthly electricity costs of $13.2; medium models (e.g., Llama-3.3-70B) require one A100 GPU ($15,000) with monthly costs of $7.92; large models (e.g., Qwen3-235B) need four A100 GPUs ($60,000) with monthly costs of $31.68 [2][3][21]. - Hardware costs account for over 90% of the initial investment in local deployment [2]. Commercial API Costs - Commercial APIs charge based on token usage, with significant price differences: high-end services like Claude-4 Opus charge $15 for 1 million input tokens and $75 for output, while cost-effective options like GPT-5 charge $1.25 for input and $10 for output [2][20]. - For a monthly processing of 50 million tokens, the annual cost for high-end services can exceed $4.5 million, while cost-effective options may only cost $375,000 [2]. Break-even Analysis - The break-even period varies significantly: small models can achieve break-even in as little as 0.3 months compared to high-end commercial APIs, while medium models take between 2.3 to 34 months, and large models can take from 3.5 to 108 months [2][3]. - A monthly processing threshold of 50 million tokens is critical for the economic viability of large model local deployments [2]. Market Context - The rapid development of LLMs has led to increased interest in local deployment due to concerns over data privacy, vendor lock-in, and long-term operational costs associated with commercial APIs [5][7]. - The article emphasizes the growing feasibility of local deployment for small and medium enterprises, driven by advancements in open-source models and hardware [12][50]. Strategic Decision Framework - The research categorizes deployment scenarios into three types: quick return on investment (0-6 months), long-term investment (6-24 months), and economically unfeasible (over 24 months), aiding organizations in making informed decisions [49][50]. - The findings suggest that local deployment is not as straightforward as previously thought, with various factors influencing the economic viability of different deployment strategies [48][52].

大模型部署成本分析

总拥有成本（TCO）模型

Artificial Intelligence

Artificial Intelligence

LLaMA

通义千问

GPT - 5

人工智能产业“十四五”复盘与“十五五”展望：“两个变局”下的AI要素化跃

Sou Hu Cai Jing· 2025-09-26 17:47

Core Insights - The report focuses on the development and trends of the AI industry during China's 14th Five-Year Plan (2021-2025) and the outlook for the 15th Five-Year Plan (2026-2030), highlighting significant changes and advancements in technology, industry ecology, policy support, and application expansion [2][8]. Group 1: 14th Five-Year Plan Review - The AI industry has undergone five major qualitative changes, establishing a foundation for "factorization" [9]. - Technological transformation is marked by the dominance of the Transformer architecture, which has unified AIGC (AI-Generated Content) and completed the "engine convergence" [12][19]. - The computing power landscape has shifted, with domestic AI chips closing the efficiency gap with international counterparts, and the evolution from general IDC (Internet Data Center) to AIDC (AI Data Center) [25][26]. - Data has transitioned from governmental sharing to being recognized as a fiscal element, with mechanisms for asset inclusion and revenue sharing being established [33][34]. - Market dynamics have changed, with the end of the visual dividend leading to a downward shift in both supply and payment curves, allowing for a revaluation of AI [10][12]. Group 2: 15th Five-Year Plan Outlook - The AI factorization leap will be characterized by "price discovery, scale trading, and cross-border output," with Agents as the core vehicle [9]. - The product dimension will see a shift from passive execution to autonomous collaboration, with revenue models evolving from token-based to profit-sharing [9][10]. - The supply side will benefit from a complete domestic ecosystem, enabling the definition of "Agent instruction sets" and achieving pricing power [9][10]. - Demand will expand into global southern markets, with significant population potential and a projected compound annual growth rate of 9.2% for the digital economy [9][10]. - Five key application scenarios are expected to see iterative expansion, transitioning from project-based to subscription-based consumption [9][10]. Group 3: Investment Recommendations - Investment opportunities are identified in four main areas: computing power infrastructure, AI Agents and MaaS (Model as a Service) providers, intelligent terminals and embodied intelligent robots, and AI applications in green and low-carbon initiatives [9][10].

自动驾驶之心· 2025-08-19 23:32

Core Viewpoint - The article discusses the competition between two major paradigms in generative AI: Diffusion Models and Autoregressive (AR) Models, highlighting the emergence of Diffusion Language Models (DLMs) as a potential breakthrough in the field of large language models [2][3]. Group 1: DLM Advantages Over AR Models - DLMs offer parallel generation capabilities, significantly improving inference speed by achieving a tenfold increase compared to AR models, which are limited by token-level serial processing [11][12]. - DLMs utilize bidirectional context, enhancing language understanding and generation control, allowing for finer adjustments in output characteristics such as sentiment and structure [12][14]. - The iterative denoising mechanism of DLMs allows for corrections during the generation process, reducing the accumulation of early errors, which is a limitation in AR models [13]. - DLMs are naturally suited for multimodal applications, enabling the integration of text and visual data without the need for separate modules, thus enhancing the quality of joint generation tasks [14]. Group 2: Technical Landscape of DLMs - DLMs are categorized into three paradigms: Continuous Space DLMs, Discrete Space DLMs, and Hybrid AR-DLMs, each with distinct advantages and applications [15][20]. - Continuous Space DLMs leverage established diffusion techniques from image models but may suffer from semantic loss during the embedding process [20]. - Discrete Space DLMs operate directly on token levels, maintaining semantic integrity and simplifying the inference process, making them the mainstream approach in large parameter models [21]. - Hybrid AR-DLMs combine the strengths of AR models and DLMs, balancing efficiency and quality for tasks requiring high coherence [22]. Group 3: Training and Inference Optimization - DLMs utilize transfer learning to reduce training costs, with methods such as initializing from AR models or image diffusion models, significantly lowering data requirements [30][31]. - The article outlines three main directions for inference optimization: parallel decoding, masking strategies, and efficiency technologies, all aimed at enhancing speed and quality [35][38]. - Techniques like confidence-aware decoding and dynamic masking are highlighted as key innovations to improve the quality of generated outputs while maintaining high inference speeds [38][39]. Group 4: Multimodal Applications and Industry Impact - DLMs are increasingly applied in multimodal contexts, allowing for unified processing of text and visual data, which enhances capabilities in tasks like visual reasoning and joint content creation [44]. - The article presents various case studies demonstrating DLMs' effectiveness in high-value vertical applications, such as code generation and computational biology, showcasing their potential in real-world scenarios [46]. - DLMs are positioned as a transformative technology in industries, with applications ranging from real-time code generation to complex molecular design, indicating their broad utility [46][47]. Group 5: Challenges and Future Directions - The article identifies key challenges facing DLMs, including the trade-off between parallelism and performance, infrastructure limitations, and scalability issues compared to AR models [49][53]. - Future research directions are proposed, focusing on improving training objectives, building dedicated toolchains, and enhancing long-sequence processing capabilities [54][56].

偷 2396 部黄片，每部赔 15 万，小扎惹大事了！Meta 盗版海量小视频训练 AI

程序员的那些事· 2025-08-19 03:45

Core Viewpoint - The lawsuit filed by adult film giant Strike 3 Holdings against Meta highlights the issue of copyright infringement in the context of AI training, specifically focusing on the unauthorized use of adult film content for developing AI models [2][3]. Group 1: Lawsuit Details - Strike 3 Holdings and Counterlife Media accuse Meta of systematically pirating 2,396 adult films since 2018 for training its AI models, potentially leading to a compensation claim of $359 million (approximately 2.6 billion RMB) [2][3][16]. - The lawsuit marks a significant case as it is the first to address the use of adult film content in training video generation AI, differing from previous copyright disputes involving text and images [2][3]. Group 2: Impact on the Industry - The plaintiffs express concern that Meta's AI could replicate their unique production style at a fraction of the cost, threatening the viability of traditional adult film studios that invest in high-quality production [5][16]. - The lawsuit reveals that Meta allegedly utilized a "tit-for-tat" mechanism on the BT network to not only download but also distribute pirated content, which could significantly enhance download speeds [6][7][8]. Group 3: Evidence and Allegations - The lawsuit cites data from the plaintiffs' VXN Scan tracking system, which indicates that 47 Facebook-registered IPs were involved in illegal distribution, with over 100,000 instances of infringement verified [10][12]. - Meta is accused of constructing a piracy network using "shadow data centers" and non-human usage patterns, suggesting a deliberate strategy to collect training data for AI [11][12][14][15]. Group 4: Legal Proceedings and Reactions - The plaintiffs are seeking a jury trial, asserting that Meta's actions constitute both direct and indirect copyright infringement [16]. - Meta has publicly denied the allegations, but the evidence presented by the plaintiffs is considered substantial, leading to speculation about a potential out-of-court settlement [18].

Meta Platforms(US:META)