大语言模型(LLMs)

Search documents
纯血VLA综述来啦!从VLM到扩散,再到强化学习方案
具身智能之心· 2025-09-30 04:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Dapeng Zhang等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 | | | 1. 介绍 机器人学长期以来一直是科学研究中的重要领域。早期的机器人主要依赖预编程的指令和人工设计的控制策略来完成任务分解与执行。这类方法通常应用于简 单、重复性的任务,例如工厂流水线和物流分拣。近年来,人工智能的快速发展使研究者能够在图像、文本和点云等多模态数据中,利用深度学习的特征提取与 轨迹预测能力。通过结合感知、检测、跟踪和定位等技术,研究者将机器人任务分解为多个阶段,以满足执行需求,从而推动了具身智能与自动驾驶的发展。然 而,大多数机器人仍然作为孤立的智能体存在,它们通常为特定任务而设计,缺乏与人类和外部环境的有效交互。 为克服这些局限性,研究者开始探索将大语言模型(LLMs)与视觉语言模型(VLMs)引入机器人操作中,以实现更精准和灵活的控制。现代的机器人操作方法 通常依赖视觉-语言生成范式(如自回归模型 或扩散模型),并结合大规模数据集 以及先进的微调策略。我们将这些方法称为 VLA基础模型,它们 ...
UCLA最新!大模型时序推理和Agentic系统的全面综述
自动驾驶之心· 2025-09-27 23:33
当城市早高峰的车流数据实时涌入交通管控系统,当医院的心电图仪持续记录患者的心脏电活动,当股票交易所的行情面板每秒刷新数十次股价波动——这些伴 随时间流逝不断产生的"时间序列数据",早已成为现代社会运转的"数字脉搏"。从金融风控、医疗诊断到能源调度、交通管理,几乎所有关键领域的决策,都依赖 于对这些 时序数据 的深度解读。 过去数十年间,时间序列分析领域涌现出了从经典统计模型(如ARIMA、ETS)到深度学习方法(如LSTM、Transformer)的大量技术,它们在"预测未来""识别 异常"等基础任务上取得了显著进展。例如,早期用LSTM预测未来24小时的城市用电量,用CNN检测心电图中的心律失常片段,这些传统技术早已落地于实际场 景。 但随着应用需求的不断升级,传统方法的"能力边界"逐渐显现。在个性化医疗场景中,医生不仅需要模型判断"患者是否存在心律异常",更需要知道"异常与哪些 生理指标、哪个时间段的活动相关";在自适应风险管理中,基金经理不仅需要股价预测结果,更需要理解"若政策调整,股价可能如何变化"的因果逻辑;在 autonomous 交通系统中,控制器不仅要检测拥堵,还需实时调整信号策略并验证效果— ...
西交利物浦&港科最新!轨迹预测基座大模型综述
自动驾驶之心· 2025-09-24 23:33
Core Insights - The article discusses the application of large language models (LLMs) and multimodal large language models (MLLMs) in the paradigm shift for autonomous driving trajectory prediction, enhancing the understanding of complex traffic scenarios to improve safety and efficiency [1][20]. Summary by Sections Introduction and Overview - The integration of LLMs into autonomous driving systems allows for a deeper understanding of traffic scenarios, transitioning from traditional methods to LFM-based approaches [1]. - Trajectory prediction is identified as a core technology in autonomous driving, utilizing historical data and contextual information to infer future movements of traffic participants [5]. Traditional Methods and Challenges - Traditional vehicle trajectory prediction methods include physics-based approaches (e.g., Kalman filters) and machine learning methods (e.g., Gaussian processes), which struggle with complex interactions [8]. - Deep learning methods improve long-term prediction accuracy but face challenges such as high computational demands and poor interpretability [9]. - Reinforcement learning methods excel in interactive scene modeling but are complex and unstable [9]. LLM-Based Vehicle Trajectory Prediction - LFM introduces a paradigm shift by discretizing continuous motion states into symbolic sequences, leveraging LLMs' semantic modeling capabilities [11]. - Key applications of LLMs include trajectory-language mapping, multimodal fusion, and constraint-based reasoning, enhancing interpretability and robustness in long-tail scenarios [11][13]. Evaluation Metrics and Datasets - The article categorizes datasets for pedestrian and vehicle trajectory prediction, highlighting the importance of datasets like Waymo and ETH/UCY for evaluating model performance [16]. - Evaluation metrics for vehicles include L2 distance and collision rates, while pedestrian metrics focus on minADE and minFDE [17]. Performance Comparison - A performance comparison of various models on the NuScenes dataset shows that LLM-based methods significantly reduce collision rates and improve long-term prediction accuracy [18]. Discussion and Future Directions - The widespread application of LFMs indicates a shift from local pattern matching to global semantic understanding, enhancing safety and compliance in trajectory generation [20]. - Future research should focus on developing low-latency inference techniques, constructing motion-oriented foundational models, and advancing world perception and causal reasoning models [21].
万字长文!首篇智能体自进化综述:迈向超级人工智能之路
自动驾驶之心· 2025-09-11 23:33
Core Insights - The article discusses the transition from static large language models (LLMs) to self-evolving agents capable of continuous learning and adaptation in dynamic environments, paving the way towards artificial superintelligence (ASI) [3][4][46] - It emphasizes the need for a structured framework to understand and design self-evolving agents, focusing on three fundamental questions: what to evolve, when to evolve, and how to evolve [6][46] Group 1: What to Evolve - Self-evolving agents can improve various components such as models, memory, tools, and architecture over time to enhance performance and adaptability [19][20] - The evolution of these components is crucial for the agent's ability to handle complex tasks and environments effectively [19][20] Group 2: When to Evolve - The article categorizes self-evolution into two time modes: intra-test-time self-evolution, which occurs during task execution, and inter-test-time self-evolution, which happens between tasks [22][23] - Intra-test-time self-evolution allows agents to adapt in real-time to specific challenges, while inter-test-time self-evolution leverages accumulated experiences for future performance improvements [22][23] Group 3: How to Evolve - Self-evolution emphasizes a continuous learning process where agents learn from real-world interactions, seek feedback, and adjust strategies dynamically [26][27] - Various methodologies for self-evolution include reward-based evolution, imitation learning, and population-based approaches, each with distinct feedback types and data sources [29][30] Group 4: Applications and Evaluation - Self-evolving agents have significant potential in various fields, including programming, education, and healthcare, where continuous adaptation is essential [6][34] - Evaluating self-evolving agents presents unique challenges, requiring metrics that capture adaptability, knowledge retention, and long-term generalization capabilities [34][36] Group 5: Future Directions - The article highlights the importance of addressing challenges such as catastrophic forgetting, knowledge transfer, and ensuring safety and controllability in self-evolving agents [40][43] - Future research should focus on developing scalable architectures, dynamic evaluation methods, and personalized agents that can adapt to individual user preferences [38][44]
敏捷大佬:AI 大模型彻底改写编程规则,这一变化颠覆所有人认知
程序员的那些事· 2025-09-05 01:08
Core Viewpoint - The emergence of large language models (LLMs) represents a transformative change in software development, comparable to the shift from assembly language to the first generation of high-level programming languages [5][10]. Group 1: Impact of LLMs on Programming - LLMs not only enhance the level of abstraction in programming but also compel a reevaluation of what it means to program with non-deterministic tools [7][10]. - The transition from deterministic to non-deterministic programming paradigms expands the dimensions of programming practices [8][10]. Group 2: Historical Context of Programming Languages - High-level programming languages (HLLs) introduced a new level of abstraction, allowing programmers to think in terms of sequences, conditions, and iterations rather than specific machine instructions [8][9]. - Despite advancements in programming languages, the fundamental nature of programming has not changed significantly until the advent of LLMs [6][9]. Group 3: Embracing Non-Determinism - The introduction of non-deterministic abstractions means that results from LLMs cannot be reliably reproduced, contrasting with the consistent outcomes from traditional programming [10][13]. - The industry is experiencing a radical transformation as developers learn to navigate this non-deterministic environment, which is unprecedented in the history of software development [13].
招聘最猛的竟不是OpenAI,这家陷入间谍案的HR初创,正在狂招工程师
3 6 Ke· 2025-09-04 08:22
Group 1 - The U.S. tech job market has undergone significant changes since the launch of ChatGPT in November 2022, with some positions experiencing drastic declines while others remain in high demand [1] - The largest wave of layoffs in U.S. history began in 2023, impacting the IT job market, but hiring activities are gradually recovering, albeit with limited new positions [2] - The average tenure of software engineers at major tech companies has increased significantly, indicating a slowdown in hiring and a reluctance among employees to change jobs [6][80] Group 2 - The demand for AI engineers has surged since mid-2023, making it the hottest position in the tech industry, with a notable increase in job openings [29] - Major tech companies like Apple, IBM, and Amazon are leading in job openings, with Apple having the highest number at 2,177 positions [13] - Over half of the open positions are at senior levels, and there is a notable decrease in vacancies for senior engineers, prompting them to apply for lower-level positions [21][24] Group 3 - The San Francisco Bay Area remains the dominant hub for tech jobs, accounting for nearly 20% of global tech job openings, with a total of 9,072 positions [72][74] - The average tenure at major tech companies has increased by about two years over the past three years, reflecting a more stable workforce amid hiring slowdowns [80] - The trend of internal mobility among major tech firms is prevalent, with companies primarily hiring from each other, leading to longer tenures [85] Group 4 - Remote job opportunities have decreased, with the proportion of remote positions falling from 25% to 20% over the past year, although AI engineering roles still see a slight increase in remote opportunities [98][100] - The salary for remote positions has generally declined by 10-15%, as supply exceeds demand, making high-paying remote jobs a rare privilege [102]
Kitchen-R :高层任务规划与低层控制联合评估的移动操作机器人基准
具身智能之心· 2025-08-25 00:04
Core Viewpoint - The article introduces the Kitchen-R benchmark, a unified evaluation framework for task planning and low-level control in embodied AI, addressing the existing fragmentation in current benchmarks [4][6][8]. Group 1: Importance of Benchmarks - Benchmarks are crucial in various fields such as natural language processing and computer vision for assessing model progress [7]. - In robotics, simulator-based benchmarks like Behavior-1K are common, providing model evaluation and training capabilities [7]. Group 2: Issues with Existing Benchmarks - Current benchmarks for high-level language instruction and low-level robot control are fragmented, leading to incomplete assessments of integrated systems [8][9]. - High-level benchmarks often assume perfect execution of atomic tasks, while low-level benchmarks rely on simple single-step instructions [9]. Group 3: Kitchen-R Benchmark Features - Kitchen-R fills a critical gap in embodied AI research by providing a comprehensive testing platform that closely simulates real-world scenarios [6][8]. - It includes a digital twin kitchen environment and over 500 language instructions, supporting mobile ALOHA robots [9][10]. - The benchmark supports three evaluation modes: independent evaluation of planning modules, independent evaluation of control strategies, and critical full system integration evaluation [9][10]. Group 4: Evaluation Metrics - Kitchen-R is designed with offline independent evaluation and online joint evaluation metrics to ensure comprehensive system performance measurement [16][20]. - Key metrics include Exact Match (EM) for task planning accuracy and Mean Squared Error (MSE) for trajectory prediction accuracy [20][21]. Group 5: Baseline Methods - Kitchen-R provides two baseline methods: a VLM-driven task planning baseline and a Diffusion Policy low-level control baseline [43][49]. - The VLM planning baseline enhances planning accuracy through contextual examples and constrained generation [47][48]. - The Diffusion Policy baseline integrates visual features and robot states to predict future actions [49][52]. Group 6: Future Directions - Kitchen-R can expand to include more complex scenarios, such as multi-robot collaboration and dynamic environments, promoting the application of language-guided mobile manipulation robots in real-world settings [54].
速递|种子轮融资500万美元,Paradigm配备超5000个AI智能体表格
Z Potentials· 2025-08-19 15:03
Core Insights - Paradigm has launched a product that integrates AI agents into spreadsheets, aiming to enhance the management of CRM data traditionally stored in spreadsheets [3][4] - The company has raised $5 million in seed funding led by General Catalyst, bringing total funding to $7 million [3] - Paradigm's platform features over 5,000 AI agents that can autonomously gather and populate information in spreadsheets [3][4] Funding and Product Development - Paradigm completed a $5 million seed round, with total funding reaching $7 million [3] - The product is currently in a closed beta testing phase, with plans for continuous iteration based on user feedback [3] Target Market and User Base - Early adopters include consulting firms like Ernst & Young, AI chip startups, and AI programming companies [4] - The platform attracts a diverse user base, including consultants, sales professionals, and finance personnel, utilizing a tiered subscription model based on usage [3][4] Competitive Landscape - Paradigm does not view itself as a competitor in the AI-driven spreadsheet market but rather as a new type of AI-driven workflow [5] - Other companies, such as Quadratic, are also working on integrating AI into spreadsheets, with Quadratic having raised over $6 million [4]
开源扩散大模型首次跑赢自回归!上交大联手UCSD推出D2F,吞吐量达LLaMA3的2.5倍
机器之心· 2025-08-18 03:22
Core Insights - The article discusses the introduction of Discrete Diffusion Forcing (D2F), a new model that significantly enhances the inference speed of open-source diffusion large language models (dLLMs) compared to autoregressive (AR) models, achieving up to 2.5 times higher throughput on benchmarks like GSM8K [2][6][22]. Group 1: Challenges and Solutions - Existing dLLMs face challenges such as the lack of a complete KV cache mechanism and insufficient parallel potential, resulting in slower inference speeds compared to AR models [2][8]. - D2F addresses these challenges by integrating a mixed paradigm of autoregressive and diffusion approaches, optimizing model architecture, training methods, and inference strategies [11][12]. Group 2: D2F Design Features - D2F incorporates block-level causal attention to ensure compatibility with KV caching, allowing for the reuse of KV states and reducing computational redundancy [12][15]. - The model employs asymmetric distillation and structured noise scheduling to efficiently transfer knowledge from a pre-trained teacher model to the D2F student model, enhancing its parallel capabilities [18]. Group 3: Inference Mechanism - D2F introduces a pipelined parallel decoding algorithm that maintains a dynamic decoding window, allowing for semi-activated and fully-activated states to optimize throughput and quality [20][21]. - The model achieves a maximum speedup of up to 50 times compared to original dLLMs while maintaining average performance levels [22]. Group 4: Performance Metrics - D2F demonstrates superior performance-efficiency trade-offs, with the ability to adapt to various scenarios by adjusting decoding parameters, achieving over four times the throughput of AR models in specific tasks [25]. - Comparative tests show D2F-LLaDA achieving a throughput of 52.5 tokens per second, representing a 7.3 times increase over baseline methods [23]. Group 5: Future Directions - The success of D2F indicates a promising path for further research in parallel decoding technologies, with potential future developments including real-time serving capabilities and hybrid parallel processing [28].
万字长文!首篇智能体自进化综述:迈向超级人工智能之路~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the transition from static large language models (LLMs) to self-evolving agents that can adapt and learn continuously from interactions with their environment, aiming for artificial superintelligence (ASI) [3][5][52] - It emphasizes three fundamental questions regarding self-evolving agents: what to evolve, when to evolve, and how to evolve, providing a structured framework for understanding and designing these systems [6][52] Group 1: What to Evolve - Self-evolving agents can improve various components such as models, memory, tools, and workflows to enhance performance and adaptability [14][22] - The evolution of agents is categorized into four pillars: cognitive core (model), context (instructions and memory), external capabilities (tool creation), and system architecture [22][24] Group 2: When to Evolve - Self-evolution occurs in two main time modes: intra-test-time self-evolution, which happens during task execution, and inter-test-time self-evolution, which occurs between tasks [26][27] - The article outlines three basic learning paradigms relevant to self-evolution: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement learning (RL) [27][28] Group 3: How to Evolve - The article discusses various methods for self-evolution, including reward-based evolution, imitation and demonstration learning, and population-based approaches [32][36] - It highlights the importance of continuous learning from real-world interactions, seeking feedback, and adjusting strategies based on dynamic environments [30][32] Group 4: Evaluation of Self-evolving Agents - Evaluating self-evolving agents presents unique challenges, requiring assessments that capture adaptability, knowledge retention, and long-term generalization capabilities [40] - The article calls for dynamic evaluation methods that reflect the ongoing evolution and diverse contributions of agents in multi-agent systems [51][40] Group 5: Future Directions - The deployment of personalized self-evolving agents is identified as a critical goal, focusing on accurately capturing user behavior and preferences over time [43] - Challenges include ensuring that self-evolving agents do not reinforce existing biases and developing adaptive evaluation metrics that reflect their dynamic nature [44][45]