大语言模型(LLM)

Search documents
相信大模型成本会下降,才是业内最大的幻觉
Founder Park· 2025-08-19 08:01
Core Viewpoint - The belief among many AI entrepreneurs that model costs will decrease significantly is challenged by the reality that only older models see such reductions, while the best models maintain stable costs, impacting business models in the AI sector [6][20]. Group 1: Cost Dynamics - The cost of models like GPT-3.5 has decreased to one-tenth of its previous price, yet profit margins have worsened, indicating a disconnect between cost reduction and market demand for the best models [14][20]. - Market demand consistently shifts to the latest state-of-the-art models, leading to a scenario where older, cheaper models are largely ignored [15][16]. - The expectation that costs will drop significantly while maintaining high-quality service is flawed, as the best models' costs remain relatively unchanged [20][21]. Group 2: Token Consumption - The token consumption for tasks has increased dramatically, with AI models now requiring significantly more tokens for operations than before, leading to higher operational costs [24][26]. - Predictions suggest that as AI capabilities improve, the cost of running complex tasks will escalate, potentially reaching $72 per session by 2027, which is unsustainable under current subscription models [26][34]. - The increase in token consumption is likened to a situation where improved efficiency leads to higher overall resource usage, creating a liquidity squeeze for companies relying on fixed-rate subscriptions [27][34]. Group 3: Business Model Challenges - Companies are aware that usage-based pricing could alleviate financial pressures but hesitate to implement it due to competitive dynamics where fixed-rate models dominate [35][36]. - The industry faces a dilemma: adopting usage-based pricing could lead to stagnation in growth, as consumers prefer flat-rate subscriptions despite the potential for unexpected costs [39]. - Successful companies in the AI space are exploring alternative business models, such as vertical integration and using AI as a lead-in for other services, to capture value beyond just model usage [40][42]. Group 4: Future Outlook - The article emphasizes the need for AI startups to rethink their strategies in light of the evolving landscape, suggesting that merely relying on the expectation of future cost reductions is insufficient for sustainable growth [44][45]. - The concept of becoming a "new cloud vendor" is proposed as a potential path forward, focusing on integrating AI capabilities with broader service offerings [45].
Discrete Tokenization:多模态大模型的关键基石,首个系统化综述发布
机器之心· 2025-08-05 18:56
Core Insights - The article discusses the advancements in Discrete Tokenization for Multimodal Large Language Models (LLMs), emphasizing its role in transforming various modalities into discrete representations that LLMs can process effectively [2][39]. - A comprehensive survey has been released, detailing the technical landscape, challenges, and future research directions in the field of Discrete Tokenization for Multimodal LLMs [2][39]. Multimodal LLMs and Discrete Tokenization - Recent breakthroughs in Large Language Models (LLMs) have led to their application in various text tasks, prompting interest in extending their capabilities to non-text modalities such as images, audio, and video [2]. - Discrete Tokenization has emerged as a key solution, utilizing techniques like Vector Quantization (VQ) to compress high-dimensional continuous inputs into compact discrete tokens, enhancing cross-modal understanding and generation [2][39]. Systematic Review and Methodologies - The article presents the first systematic review of Discrete Tokenization for Multimodal LLMs, organizing content based on input data modalities and combinations, from early single-modal to multi-modal tokenization methods [2][39]. - Eight core categories of Vector Quantization methods are identified, including VQ, RVQ, PQ, AQ, FSQ, LFQ, BSQ, and Graph Anchor-Relation Tokenization, each with unique characteristics suitable for different modalities and tasks [8][9][14]. Challenges and Future Directions - Key challenges in Discrete Tokenization include codebook collapse, information loss during quantization, difficulties in gradient propagation, and issues with granularity and semantic alignment [12][36]. - Future research directions may focus on adaptive quantization, unified frameworks, biologically inspired codebooks, cross-modal generalization, and enhancing interpretability [37][36]. Applications in Single and Multimodal Tasks - Discrete Tokenization has been widely applied in single-modal tasks such as image retrieval, audio encoding, and video representation, allowing LLMs to process non-text modalities effectively [20][22]. - In multimodal tasks, it serves as a semantic bridge, enabling models to handle complex inputs across different modalities, facilitating tasks like cross-modal retrieval and generation [27][30].
英伟达最新研究:小模型才是智能体的未来
3 6 Ke· 2025-08-05 09:45
Core Viewpoint - Small Language Models (SLMs) are considered the future of AI agents, as they are more efficient and cost-effective compared to large language models (LLMs) [1][3]. Group 1: Advantages of SLMs - SLMs are powerful enough to handle most repetitive and specialized tasks within AI agents [3]. - They are inherently better suited for the architecture of agent systems, being flexible and easy to integrate [3]. - Economically, SLMs significantly reduce operational costs, making them a more efficient choice for AI applications [3]. Group 2: Market Potential - The AI agent market is projected to grow from $5.2 billion in 2024 to $200 billion by 2034, with over half of enterprises already utilizing AI agents [5]. - Current AI agent tasks are often repetitive, such as "checking emails" and "generating reports," making the use of LLMs inefficient [5]. Group 3: SLM Characteristics - SLMs can be deployed on standard consumer devices, such as smartphones and laptops, and have fast inference speeds [9]. - Models with fewer than 1 billion parameters are classified as SLMs, while larger models typically require cloud support [9]. - SLMs are likened to a "portable brain," balancing efficiency and ease of iteration, unlike LLMs which are compared to "universe-level supercomputers" with high latency and costs [9]. Group 4: Performance Comparison - Cutting-edge small models like Phi-3 and Hymba can perform tasks comparable to 30B to 70B large models while reducing computational load by 10-30 times [11]. - Real-world tests showed that 60% of tasks in MetaGPT, 40% in Open Operator, and 70% in Cradle could be replaced by SLMs [11]. Group 5: Barriers to Adoption - The primary reason for the limited use of SLMs is path dependency, with significant investments (up to $57 billion) in centralized large model infrastructure [12]. - There is a strong industry bias towards the belief that "bigger is better," which has hindered the exploration of small models [12]. - SLMs lack the marketing hype that large models like GPT-4 have received, leading to fewer attempts to explore more cost-effective options [13].
EvaLearn:AI下半场的全新评测范式!
机器之心· 2025-07-28 10:45
Core Viewpoint - The article discusses the shift in AI research from "can it be done" to "is it effective," emphasizing the need for new evaluation methods that assess the long-term adaptability and learning capabilities of models, particularly in the context of achieving general artificial intelligence [1][4]. Group 1: New Evaluation Paradigm - A new evaluation paradigm called EvaLearn has been proposed to assess the learning ability and efficiency of large language models (LLMs), providing a fresh perspective on understanding their human-like learning potential [5][6]. - EvaLearn focuses on "sequential problem-solving," redefining the evaluation logic for large language models, and has gained significant attention since its open-source release [6][8]. Group 2: Limitations of Traditional Benchmarks - Traditional benchmarks treat problems as isolated samples, failing to evaluate models' learning efficiency and adaptability, which are crucial for understanding their performance [8][9]. - EvaLearn constructs 648 challenging problems organized into 182 sequences, requiring models to solve them in order, thus allowing for a systematic assessment of their learning capabilities [9][11]. Group 3: Key Findings from EvaLearn - The research team found that models exhibit diverse learning abilities across different task types, with most models better leveraging prior experience for mathematical and logical reasoning tasks, while tasks like summarization rely more on pre-trained knowledge [14]. - Models based on chain-of-thought reasoning generally outperform those that are not, demonstrating better stability and the ability to solve multiple related problems consecutively [15]. - Feedback learning, which incorporates evaluations from a verifier, significantly enhances models' learning abilities and efficiency compared to example-based learning [16]. - Learning ability and efficiency metrics provide a comprehensive assessment of models' learning potential, revealing that high static performance does not guarantee superior learning capabilities [17]. Group 4: Evaluation Metrics - EvaLearn employs a comprehensive set of evaluation metrics to characterize models' dynamic learning abilities, including summary accuracy, classification skills, information extraction, logical reasoning, mathematical reasoning, and sequence reasoning [20]. - Overall accuracy, learning speed, first correct position, consecutive correct answers, and post-warm-up accuracy are key indicators used to assess models' performance [21]. Group 5: Learning Efficiency and Methods - The study indicates significant differences in learning efficiency among models and task types, with non-thinking models often showing faster progress in experience accumulation, while thinking models demonstrate more stable gains [44]. - Different problem-solving methods, such as example learning and feedback learning, significantly impact model performance, with feedback learning generally yielding higher accuracy and learning efficiency [46][48]. - The average position of the first correct answer varies across models and tasks, highlighting the models' learning potential and the importance of feedback in enhancing learning outcomes [51][53]. Group 6: Conclusion - EvaLearn represents a novel benchmark framework for sequentially evaluating models' learning abilities and efficiencies across various tasks, revealing significant performance differences among leading models [55][56]. - The findings underscore the importance of understanding models' learning capabilities and efficiencies as a new perspective for evaluating their performance and bridging the gap between current models and human capabilities [57].
李艳:美国“AI行动计划”的阳谋与玄机
Huan Qiu Wang Zi Xun· 2025-07-24 23:17
Group 1 - The core viewpoint of the article is that the U.S. government's "AI Action Plan" is a significant policy directive aimed at winning the AI competition of the 21st century through over 90 specific administrative measures focusing on technological innovation, application development, and international rule-making [1][2] - The report reflects a strategic approach by the U.S. government, indicating a shift from internal regulatory preferences to a more open and collaborative stance in AI policy, aimed at fostering domestic innovation while maintaining competitiveness against other nations [2][3] - The U.S. has already begun implementing actions in the AI sector prior to the formal announcement of the plan, including adjustments to chip export controls and international collaborations to expand its AI market presence [3][4] Group 2 - The report includes intriguing elements such as the policy of "value-neutral" models, which aims to ensure that federal procurement of large language models (LLMs) remains objective and free from ideological bias, while also scrutinizing the influence of government oversight on Chinese AI models [4][5] - The encouragement of open-source and open-weight models appears to support domestic startups and academia rather than fostering a global open-source AI ecosystem, indicating a pragmatic approach of "open at home, closed abroad" to maintain technological leadership [5] - The effectiveness of the "AI Action Plan" in achieving its goals will depend on various internal and external factors, including the ability to unify regulatory approaches domestically and the willingness of other nations to accept a U.S.-led AI ecosystem [5]
字节跳动2026校招来了!大模型算法、多模态、CV类有较多坑位
自动驾驶之心· 2025-07-22 01:47
Core Viewpoint - ByteDance has opened its campus recruitment programs, including the Jindouyun Talent Program and the Top Seed Program, targeting different groups of doctoral students with varying focuses and application difficulties [1]. Group 1: Jindouyun Talent Program - The Jindouyun Talent Program is aimed at doctoral students graduating between September 2022 and August 2026 for full-time positions, and those graduating in September 2025 and later for internship positions [2]. - The program has relaxed the recruitment restrictions for past graduates, allowing those who graduated in 2022 to apply [2]. - It covers eight major fields, including large model applications, search/recommendation/advertising, computer architecture, AI safety, hardware, AI coding, video architecture, and AIGC, balancing academic research with industrial application and supporting paper publication [2]. Group 2: Top Seed Program - The Top Seed Program primarily targets doctoral students graduating in 2026 and also opens recruitment for research interns [3]. - It focuses on core technologies of large models, such as large language models (LLM), multimodal generation and understanding, machine learning algorithms, and speech [3]. - The goal of this program is to cultivate more top-tier talent, offering high compensation and computational support [3]. Group 3: Community and Resources - The AutoRobo Knowledge Community is designed for job seekers in autonomous driving, embodied intelligence, and large models, currently with nearly 1,000 members from various companies [6][8]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and internal referrals [8][9]. - It also compiles a hundred interview questions related to autonomous driving and embodied intelligence, covering various technical aspects [12][13][17]. Group 4: Industry Reports and Insights - The community offers in-depth industry reports to help members understand the current state, development trends, and market opportunities in various fields, including robotics and embodied intelligence [18]. - Reports include topics like the world robotics report, investment reports in embodied intelligence, and the development of humanoid robots [18]. Group 5: Interview Experiences and Tips - The community shares successful and unsuccessful interview experiences across various companies and positions, providing insights into the interview process [20]. - It also compiles common interview questions and skills required for algorithm positions in the autonomous driving sector [25].
师兄自己发了篇自动驾大模型,申博去TOP2了。。。
自动驾驶之心· 2025-07-09 12:56
Core Viewpoint - The article discusses the advancements in large models (LLMs) for autonomous driving, highlighting the need for optimization in efficiency, knowledge expansion, and reasoning capabilities as the technology matures [2][3]. Group 1: Development of Large Models - Companies like Li Auto and Huawei are implementing their own VLA and VLM solutions, indicating a trend towards the practical application of large models in autonomous driving [2]. - The focus for the next generation of large models includes lightweight design, hardware adaptation, knowledge distillation, quantization acceleration, and efficient fine-tuning [2][3]. Group 2: Course Introduction - A course is being offered to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [3]. - The course aims to address core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms like Chain-of-Thought (CoT) and reinforcement learning [3][4]. Group 3: Enrollment and Requirements - The course will accept a maximum of 8 students per session, targeting individuals with a background in deep learning or machine learning who are familiar with Python and PyTorch [5][10]. - Participants will gain a systematic understanding of large model optimization, practical coding skills, and insights into academic writing and publication processes [8][10]. Group 4: Course Outcomes - Students will learn to combine theoretical knowledge with practical coding, develop their own research ideas, and produce a draft of a research paper [8][9]. - The course includes a structured timeline with specific topics each week, covering model pruning, quantization, efficient fine-tuning, and advanced reasoning techniques [20].
独家洞察 | RAG如何提升人工智能准确性
慧甚FactSet· 2025-06-10 05:12
Core Viewpoint - The accuracy of data is crucial for financial services companies utilizing Generative AI (GenAI) and Large Language Models (LLM), as inaccurate or low-quality data can adversely affect company strategy, operations, risk management, and compliance [1][3]. Group 1: Causes of Data Inaccuracy - Data inaccuracy in the financial services sector often arises from multiple factors, including the increasing volume and variety of data sourced from multiple vendors, patents, and third-party sources [4]. - "Hallucination" is a significant challenge in the financial sector regarding Generative AI, where models generate coherent but factually incorrect or misleading information due to their reliance on learned patterns from training data without factual verification [4]. Group 2: Importance of Retrieval-Augmented Generation (RAG) - RAG is a critical technology for improving the accuracy of Generative AI and significantly reducing hallucinations by integrating real data with generated responses [6]. - RAG combines the generative capabilities of LLMs with effective data retrieval systems, allowing for more accurate and contextually relevant answers, especially in financial risk assessments [6]. - RAG enhances the utilization of various data formats, enabling the processing of both structured and unstructured data efficiently, and connects existing legacy systems without the need for costly migrations or retraining of LLMs [7]. Group 3: Benefits of RAG - RAG helps address the main causes of data inaccuracy discussed earlier, providing more accurate answers based on proprietary data and reducing hallucinations [8]. - It allows for the integration of the latest knowledge and user permission management, ensuring that responses are based on up-to-date information [8].
微软副总裁X上「开课」,连更关于RL的一切,LLM从业者必读
机器之心· 2025-05-26 01:28
Core Viewpoint - The article discusses the educational series on artificial intelligence initiated by Nando de Freitas, focusing on reinforcement learning (RL) and its applications in large language models (LLMs) [1][2]. Summary by Sections Introduction to AI Education - Nando de Freitas aims to educate readers on AI through a series of posts on X, starting with reinforcement learning and gradually covering diffusion and flow matching technologies [1][2]. Learning Types - The article highlights that there is no ultimate conclusion on unsupervised learning, supervised learning, and reinforcement learning [8][19]. - Supervised learning is described as basic imitation, requiring high-quality expert data for effective learning [9]. - Reinforcement learning focuses on selective imitation, allowing agents to learn from suboptimal experiences and improve their performance [10][11]. Distributed Reinforcement Learning Systems - Modern distributed RL systems consist of two main components: Actors and Learners, where Actors interact with the environment and collect data, while Learners update the policy network based on this data [23][24]. - The importance of measuring operational durations and communication bandwidth in such systems is emphasized [24][27]. Offline Reinforcement Learning - Offline RL has unique value in scenarios like post-training LLMs, where it can leverage historical data for learning [28][29]. Single-step and Multi-step RL - The article differentiates between single-step and multi-step RL problems, with single-step focusing on immediate actions and multi-step involving planning over a series of interactions [35][39]. - The complexity of multi-step RL is noted, particularly in credit assignment issues where multiple decisions affect outcomes [40][41]. Policy Gradient and Techniques - Policy gradient methods are discussed, including the use of baseline subtraction to reduce variance in reward signals [49][56]. - The article also covers the significance of KL divergence in maintaining proximity to supervised fine-tuning strategies during post-training [69]. Importance Sampling and PPO - Importance sampling is introduced as a method to correct off-policy sample bias, with Proximal Policy Optimization (PPO) being a key technique to manage policy updates [73][78]. - The integration of various techniques in training models like DeepSeek-R1 is highlighted, showcasing the complexity of modern RL systems [81]. Future Directions - Freitas plans to expand the discussion from single-step to multi-step RL, indicating ongoing developments in the field [82].
晚点独家丨字节 AI 研发调整继续:吴永辉直接管理范围扩大,AI Lab 3 个方向并入 Seed
晚点LatePost· 2025-04-22 15:58
这些陆续的调整,共同指向字节跳动正进一步整合 AI 研发力量。 文 丨 王与桐 编辑 丨 程曼祺 继吴永辉担任字节 AI 研发部门 Seed 的负责人后,Seed 组织正在陆续调整。 吴永辉直接管理范围扩大。据了解,近期原 Seed 大语言模型(LLM)负责人乔木已多日未现 身办公室,他的工作飞书也处于停用状态。LLM 之下的 3 个团队,Pre-train(预训练)、Post- train(后训练) 和 Horizon 转为直接向吴永辉汇报。 约半月前的另一个变化是,字节 AI Lab 的 3 个方向正式并入 Seed,分别是探索机器人和具身 智能的 Seed Robotics,将 AI 应用于材料、生物等科研探索的 AI for Science 和致力于让 AI 公正、透明、可解释且符合伦理标准的 Reponsible AI。 字节跳动进一步整合 AI 研发力量。 自 2023 年下半年以来,AI Lab 的 NLP(自然语言处理)和 Pixel Dance(视频生成团队)已陆续并 入 Seed。最新调整意味着,AI Lab 正式在组织结构上被 Seed 全部吸收到。李航仍为 AI Lab 负责 人,此次 ...