大语言模型
Search documents
 史上最全robot manipulation综述,多达1200篇!八家机构联合发布
 自动驾驶之心· 2025-10-14 23:33
以下文章来源于具身智能之心 ,作者Shuanghao Bai等 具身智能之心 . 与世界交互,更进一步 点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Shuanghao Bai等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文作者来自:西安交通大学、香港科技大学(广州)、中国科学院自动化所、西湖大学、浙江大学、悉尼大学、北京智源人工智能研究院、北京大学。 当下,随着大语言模型(LLMs)与多模态模型(MLLMs)的突破,人工智能正以前所未有的速度从"会说"迈向"会做"。 具身智能(Embodied Intelligence)成为连接认知与行动的关键前沿:只有让智能体能够在真实环境中感知、推理并执行操作,才能迈向真正的通用智能 (AGI)。而在这一过程中,机器人操作(Robot Manipulation)扮演着核心角色——它让机器人不仅"理解世界",更能"改变世界"。 从早期的规则控制与运动规划,到如今融合强化学习、模仿学习与大 ...
 AI大语言模型如何带来内存超级周期?
 傅里叶的猫· 2025-10-14 15:51
 Core Viewpoint - The article discusses the impact of AI large language models, particularly GPT-5, on the demand for memory components such as HBM, DRAM, and NAND, suggesting a potential memory supercycle driven by AI inference workloads [4][8].   Memory Demand Analysis - The demand for HBM and DRAM is primarily driven by the inference phase of AI models, with GPT-5 estimated to require approximately 26.8 PB of HBM and 9.1 EB of DRAM if a 50% cache hit rate is assumed [8][10]. - NAND demand is significantly influenced by retrieval-augmented generation (RAG) processes, with an estimated requirement of 200 EB by 2025, considering data center capacity adjustments [8][11].   Supply and Demand Dynamics - The global supply forecast for DRAM and NAND indicates that by 2025, the supply will be 36.5 EB and 925 EB respectively, with GPT-5's demand accounting for 25% and 22% of the total supply [9]. - The article highlights a shift from oversupply to a shortage in the NAND market due to increased orders from cloud service providers, leading to price increases expected in late 2025 and early 2026 [11][12].   Beneficiary Companies - Companies such as KIOXIA and SanDisk are identified as key beneficiaries of the NAND price increases, with KIOXIA having the highest price elasticity but facing debt risks, while SanDisk is expanding its enterprise segment [12]. - Major manufacturers like Samsung and SK Hynix are positioned to benefit from both HBM and NAND markets, although their valuations may already reflect some of the positive outlook [12].   Market Outlook - Analysts predict that the current cycle is in its early stages, with profitability expected to begin in Q4 2025 and a potential explosion in demand in 2026, particularly for companies like SanDisk [13]. - The article notes several risk factors that could impact the sustainability of this cycle, including potential overestimation of cloud orders and the possibility of increased NAND production leading to oversupply by 2027 [13].
 蚂蚁发布并开源万亿参数思考模型Ring-1T
 Xin Jing Bao· 2025-10-14 04:20
 Core Viewpoint - Ant Group has officially launched the trillion-parameter thinking model Ring-1T, which is fully open-sourced, including model weights and training recipes, enhancing its natural language reasoning capabilities and overall performance across various tasks [1]   Group 1: Model Development - Ring-1T builds upon the previously released preview version Ring-1T-preview, expanding large-scale verifiable reward reinforcement learning (RLVR) training [1] - The model aims to improve general capabilities through Reinforcement Learning from Human Feedback (RLHF) training, resulting in more balanced performance on various task leaderboards [1]   Group 2: Model Availability - Users can download the Ring-1T model through platforms like HuggingFace and Modao Community, and experience it online via Ant Group's Baibao Box [1] - Ant Group's Beiling team has released a total of 18 models, creating a product matrix of large language models ranging from 16 billion to 1 trillion parameters [1]   Group 3: Product Evolution - The release of Ring-1T and the general-purpose trillion-parameter model Ling-1T marks the transition of Beiling's large model into its 2.0 phase [1]
 史上最全robot manioulation综述,多达1200篇!西交,港科,北大等八家机构联合发布
 具身智能之心· 2025-10-14 03:50
 Core Insights - The article discusses the rapid advancements in artificial intelligence, particularly in embodied intelligence, which connects cognition and action, emphasizing the importance of robot manipulation in achieving general artificial intelligence (AGI) [3][4].   Summary by Sections  Overview of Embodied Intelligence - Embodied intelligence is highlighted as a crucial frontier that enables agents to perceive, reason, and act in real environments, moving from mere language understanding to actionable intelligence [3].   Paradigm Shift in Robot Manipulation - The research in robot manipulation is undergoing a paradigm shift, integrating reinforcement learning, imitation learning, and large models into intelligent control systems [4][6].   Comprehensive Survey of Robot Manipulation - A comprehensive survey titled "Towards a Unified Understanding of Robot Manipulation" systematically organizes over 1000 references, covering hardware, control foundations, task and data systems, and cross-modal generalization research [4][6][7].   Unified Framework for Understanding Robot Manipulation - The article proposes a unified framework that extends traditional high-level planning and low-level control classifications, incorporating language, code, motion, affordance, and 3D representations [9][20].   Key Bottlenecks in Robot Manipulation - Two major bottlenecks in robot manipulation are identified: data collection and utilization, and system generalization capabilities, with a detailed analysis of existing solutions [27][28].   Future Directions - Four key future directions are proposed: building a true "robot brain" for general cognition and control, breaking data bottlenecks for scalable data generation and utilization, enhancing multi-modal perception for complex interactions, and ensuring human-robot coexistence safety [34].
 卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
 3 6 Ke· 2025-10-14 03:40
 Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, a former AI director at Tesla and co-founder of OpenAI, aimed at educational purposes [1][57]. - The project allows users to build a basic conversational AI model with a cost of approximately $100 and a training time of about 4 hours on a cloud GPU server [1][10].   Project Overview - "nanochat" consists of around 8000 lines of code and is implemented in Rust, featuring a tokenizer, a pre-trained Transformer model, and various training datasets [2][3]. - The model can perform basic conversational tasks, generate stories and poems, and answer simple questions [2][4].   Performance Metrics - After approximately 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [4][52]. - The model's performance metrics include CORE scores, ARC-Easy, GSM8K, and HumanEval, with notable improvements observed during different training phases [3][52].   Training Phases - The training process includes pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) stages, each contributing to the model's capabilities [41][46]. - Mid-training focuses on adapting the model for multi-turn conversations and teaching it to handle multiple-choice questions [35][36].   Community Engagement - The project has gained significant attention on GitHub, with over 4.8k stars shortly after its release, indicating strong community interest and potential for further optimization [8][7]. - The codebase is designed to be user-friendly, allowing modifications and enhancements by the community [54][55].   Educational Impact - Karpathy aims to integrate this technology into a broader educational framework, potentially transforming how AI can assist in learning [62]. - The project is part of a larger initiative to create a symbiotic relationship between teachers and AI, enhancing the learning experience [62].
 卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
 量子位· 2025-10-14 02:19
 Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, which can be built with minimal cost and code [1][2][4].   Project Overview - "nanochat" is a full-stack training and inference pipeline that allows users to create a basic ChatGPT-like model with approximately 8000 lines of code [2][4]. - The entire project can be executed on a cloud GPU server for about $100, taking as little as 4 hours to set up and run [3][4][16].   Technical Specifications - The model is built using Rust and includes a tokenizer, a pre-trained Transformer architecture, and various training datasets [5]. - It supports efficient inference with features like KV caching and a lightweight Python interpreter for tool usage [5][43].   Performance Metrics - After about 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [8]. - A specific example shows that a model trained for 24 hours can achieve scores of over 40 on the MMLU dataset and over 70 on the ARC-Easy dataset [10].   Development Goals - Karpathy aims to create a unified, simple, and modifiable codebase that can serve as a strong baseline for future developments [11][13]. - The project is intended to be a capstone for the upcoming LLM101n course, which focuses on building large language models [12].   Community Engagement - The project has gained significant attention, with GitHub stars reaching 4.8k shortly after its release, indicating strong community interest [14]. - Users are encouraged to optimize and modify the codebase, allowing for a collaborative improvement process [59].   Training Process - The training process involves several stages: pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) [45][48][51]. - The total time for the training process, excluding RL, is approximately 3 hours and 51 minutes, with a total cost of about $92.4 [57].   Final Remarks - The article emphasizes the potential of "nanochat" as a research tool and a framework for benchmarking, similar to previous projects like nanoGPT [13]. - The project is still in its early stages, with many opportunities for further optimization and enhancement [13][50].
 拒绝“熵崩塌”和“熵爆炸”!这项研究让大模型学会“精确探索”,推理成绩飙升
 量子位· 2025-10-13 08:47
 Core Insights - The article discusses the advancements in large language models (LLMs) using a method called RLVR (Reinforcement Learning with Verifiable Rewards), which has led to significant breakthroughs in mathematical, coding, and scientific reasoning tasks since 2024 [1][2].   Group 1: Challenges in RLVR Training - RLVR faces a critical bottleneck known as the "exploration imbalance," where exploration can either be too limited, leading to entropy collapse, or too uncontrolled, resulting in entropy explosion [2][9]. - The traditional entropy regularization method encourages exploration but can lead to either rapid convergence to a deterministic strategy or chaotic outputs due to excessive uncertainty [6][10].   Group 2: Proposed Solution - SIREN - The research team introduced a Selective Entropy Regularization method (SIREN) that employs three mechanisms: defining the exploration range, focusing on key decision points, and stabilizing the training process [14][18]. - SIREN limits entropy calculations to a core set of high-probability tokens, ensuring that exploration occurs only within semantically reasonable candidates [14][15]. - It identifies key decision points in the generation sequence where entropy is significantly higher than average, concentrating exploration incentives on these critical areas [16]. - The method adjusts the entropy target to maintain it within a reasonable range, preventing training instability [17].   Group 3: Experimental Validation - Experimental results demonstrate that SIREN significantly improves performance across various models and datasets, achieving an average major accuracy (maj@k) of 54.6% on Qwen2.5-Math-7B, surpassing the strongest baseline by 4.8% [22][24]. - The effective exploration facilitated by SIREN leads to a fundamental change in performance compared to traditional entropy regularization methods [25][32]. - The research indicates that SIREN maintains diversity in answers and avoids confusion collapse, contributing to a smoother and more controllable training process [28][30].   Group 4: Future Implications - The study emphasizes the importance of stable, controllable, and efficient exploration in releasing the potential of large models and overcoming performance bottlenecks [35]. - The proposed selective exploration control mechanism offers a feasible solution for refining exploration strategies in future reasoning model training paradigms [35].
 马斯克AI公司开发“世界模型”,从英伟达挖专家将推游戏
 Feng Huang Wang· 2025-10-13 03:21
 Core Insights - xAI, led by Elon Musk, is intensifying efforts to develop a "world model" to compete with Meta and Google in the next generation of AI systems capable of autonomous navigation and design in physical environments [1][2] - The world model is a generative AI model that understands dynamic features of the real world, including physical and spatial properties, using various input data types [1] - xAI has hired experts from NVIDIA to advance the development of these models, which are expected to enhance AI capabilities beyond current large language models [1][2]   Company Developments - xAI has recruited two AI researchers, Zeeshan Patel and Ethan He, with experience in world model development [2] - The company plans to launch an AI-generated game by the end of next year, reaffirming its commitment to this goal [2] - Recently, xAI released an upgraded image and video generation model, which is now available for free to users [2]   Industry Context - Other leading AI labs, including Google and Meta, are also working on world models, indicating a competitive landscape [3] - The potential market size for world models is suggested to be close to the current global economic total, highlighting significant commercial interest [2] - Challenges remain in finding sufficient data to simulate the real world and train these models, which is both difficult and costly [3]
 专访 AirPods 团队:一只小小的耳机,如何学会追踪 50 种运动?
 3 6 Ke· 2025-10-13 02:31
 Core Insights - The article discusses the advancements in heart rate monitoring technology with the introduction of AirPods Pro 3, which achieves accuracy comparable to traditional chest straps [1][4][10] - It highlights the innovative use of the ear canal as a more effective physiological signal collection point compared to the wrist, leveraging infrared light PPG technology [5][7][8] - The integration of multiple sensors and algorithms allows AirPods Pro 3 to accurately track various physical activities and heart rate in real-time [15][16][17]   Group 1: Technology and Innovation - AirPods Pro 3 can monitor heart rate with precision, matching the accuracy of chest straps, especially during steady-state and interval running [1][3] - The device utilizes infrared light PPG for heart rate monitoring, which is more effective than traditional green LED light sources used in most wearables [7][10] - The combination of heart rate data with motion data from accelerometers and gyroscopes enhances the accuracy of heart rate readings during various physical activities [8][15]   Group 2: Market Position and Competitive Edge - Apple aims to position AirPods Pro 3 as a comprehensive fitness device, comparable to the Apple Watch, by understanding user activities and calorie expenditure [14][16] - The development of a Motion Foundation Model, trained on extensive real-world data, enables the device to recognize over 50 different types of exercises [16][17] - The collaboration between AirPods and Apple Watch is designed to complement each other, providing a more complete digital representation of the user's body [9][10]   Group 3: User Experience and Design - The design of AirPods Pro 3 focuses on achieving a snug fit, which is crucial for accurate physiological monitoring [10][11] - The device's ability to filter out external noise while monitoring internal signals reflects Apple's philosophy of technology enhancing human perception [17] - The advancements in sound quality and physiological monitoring capabilities indicate a shift in how audio devices are perceived, moving from mere sound output to becoming sensors for self-awareness [17]
 全球AI数据视角看机器人市场
 2025-10-13 01:00
 Summary of Conference Call on AI and Robotics Industry   Industry Overview - The AI industry is still in its early stages, with significant investments from major companies amounting to hundreds of billions to trillions of dollars, indicating substantial potential for growth [1][3] - AI-related computing power currently represents a small fraction of the overall economy, suggesting significant room for expansion [1][4]   Key Insights and Arguments - The ratio of training to inference computing power is currently 1:1, indicating that the industry is still in the early investment phase [1][4] - Robotics, as an application of AI, is accelerating in development, with companies like Figure starting mass production of advanced robots [1][5] - The U.S. market shows strong consumer willingness to spend on technology products, benefiting both the robotics and electric vehicle sectors [1][8]   Market Dynamics - Companies like Taotao and Ecovacs in the U.S. are noteworthy for their strong channel transformation capabilities, while Chinese companies like Yushu are making inroads into the North American market [1][6] - The average annual capital expenditure for U.S. tech giants ranges from $27 billion to $68 billion, with a return on investment (ROI) of approximately 40% to 50%, significantly higher than that of Chinese companies [1][6]   Economic Implications - The rapid growth of the AI industry in the U.S. has led to rising wages for AI-related personnel, contributing to inflation and creating a positive ROI cycle [1][7] - The increasing cost of labor makes AI technology more attractive for companies, further driving investment in AI and robotics [1][7]   Future Projections - The market for electric vehicles is expected to grow significantly, with projections of over 10 million units sold by 2025 [1][12] - The robotics sector is also anticipated to expand, with the potential for high demand as technology advances [1][12]   Investment Considerations - When selecting stocks in the North American market, focus on companies with strong channel capabilities and those actively expanding into North America [1][9] - The ongoing investment in AI, projected to reach $60 billion annually by U.S. companies, will likely lead to a wave of white-collar job replacements, eventually extending to blue-collar jobs [1][11]   Conclusion - The AI and robotics sectors are poised for significant growth, driven by technological advancements, strong consumer demand, and substantial investments from major companies [1][12]