轨迹规划
Search documents
扩散模型终于学会「看题下菜碟」!根据提示词难度动态分配算力,简单题省时复杂题保画质
量子位· 2026-03-09 10:05
Core Viewpoint - The article discusses the introduction of a new framework called "CoTj" by China Unicom's Data Science and AI Research Institute, which enhances diffusion models' ability to dynamically allocate computational resources based on the complexity of prompts, significantly improving image generation quality [4][35]. Group 1: Framework and Mechanism - The CoTj framework allows diffusion models to possess "System 2" planning capabilities, enabling them to allocate computational resources dynamically according to the complexity of the prompts [4][14]. - CoTj employs a "Predict-Plan-Execute" reasoning paradigm, featuring a lightweight predictor that estimates the current Diffusion DNA from condition embeddings, achieving rapid predictions [14][15]. - The framework transforms complex sampling processes into a directed acyclic graph (DAG) optimization problem, allowing for efficient trajectory planning [11][13]. Group 2: Performance and Results - In experiments, CoTj demonstrated superior image quality even with a basic first-order solver, outperforming traditional methods that used high-order solvers under the same conditions [22][24]. - The framework achieved significant improvements in accuracy and speed across various models, with notable metrics such as a 60% reduction in mean squared error (MSE) and over 6 dB increase in peak signal-to-noise ratio (PSNR) [25][28]. - CoTj's trajectory planning allows for high fidelity in image generation, even with drastically reduced sampling steps, maintaining essential details that traditional methods often lose [27][29]. Group 3: Future Directions - The research team indicates that the theoretical foundation of CoTj will be expanded to more complex video dynamics and will explore unsupervised Diffusion DNA discovery across modalities [36][37]. - The framework represents a significant leap in computational efficiency and resource-aware planning in generative AI, marking a new era for diffusion models [35][36].
基于深度强化学习的轨迹规划
自动驾驶之心· 2025-08-28 23:32
Core Viewpoint - The article discusses the advancements and potential of reinforcement learning (RL) in the field of autonomous driving, highlighting its evolution and comparison with other learning paradigms such as supervised learning and imitation learning [4][7][8]. Summary by Sections Background - The article notes the recent industry focus on new technological paradigms like VLA and reinforcement learning, emphasizing the growing interest in RL following significant milestones in AI, such as AlphaZero and ChatGPT [4]. Supervised Learning - In autonomous driving, perception tasks like object detection are framed as supervised learning tasks, where a model is trained to map inputs to outputs using labeled data [5]. Imitation Learning - Imitation learning involves training models to replicate actions based on observed behaviors, akin to how a child learns from adults. This is a primary learning objective in end-to-end autonomous driving [6]. Reinforcement Learning - Reinforcement learning differs from imitation learning by focusing on learning through interaction with the environment, using feedback from task outcomes to optimize the model. It is particularly relevant for sequential decision-making tasks in autonomous driving [7]. Inverse Reinforcement Learning - Inverse reinforcement learning addresses the challenge of defining reward functions in complex tasks by learning from user feedback to create a reward model, which can then guide the main model's training [8]. Basic Concepts of Reinforcement Learning - Key concepts include policies, rewards, and value functions, which are essential for understanding how RL operates in autonomous driving contexts [14][15][16]. Markov Decision Process - The article explains the Markov decision process as a framework for modeling sequential tasks, which is applicable to various autonomous driving scenarios [10]. Common Algorithms - Various algorithms are discussed, including dynamic programming, Monte Carlo methods, and temporal difference learning, which are foundational to reinforcement learning [26][30]. Policy Optimization - The article differentiates between on-policy and off-policy algorithms, highlighting their respective advantages and challenges in training stability and data utilization [27][28]. Advanced Reinforcement Learning Techniques - Techniques such as DQN, TRPO, and PPO are introduced, showcasing their roles in enhancing training stability and efficiency in reinforcement learning applications [41][55]. Application in Autonomous Driving - The article emphasizes the importance of reward design and closed-loop training in autonomous driving, where the vehicle's actions influence the environment, necessitating sophisticated modeling techniques [60][61]. Conclusion - The rapid development of reinforcement learning algorithms and their application in autonomous driving is underscored, encouraging practical engagement with the technology [62].