Core Viewpoint - The article discusses the introduction of a new framework called "CoTj" by China Unicom's Data Science and AI Research Institute, which enhances diffusion models' ability to dynamically allocate computational resources based on the complexity of prompts, significantly improving image generation quality [4][35]. Group 1: Framework and Mechanism - The CoTj framework allows diffusion models to possess "System 2" planning capabilities, enabling them to allocate computational resources dynamically according to the complexity of the prompts [4][14]. - CoTj employs a "Predict-Plan-Execute" reasoning paradigm, featuring a lightweight predictor that estimates the current Diffusion DNA from condition embeddings, achieving rapid predictions [14][15]. - The framework transforms complex sampling processes into a directed acyclic graph (DAG) optimization problem, allowing for efficient trajectory planning [11][13]. Group 2: Performance and Results - In experiments, CoTj demonstrated superior image quality even with a basic first-order solver, outperforming traditional methods that used high-order solvers under the same conditions [22][24]. - The framework achieved significant improvements in accuracy and speed across various models, with notable metrics such as a 60% reduction in mean squared error (MSE) and over 6 dB increase in peak signal-to-noise ratio (PSNR) [25][28]. - CoTj's trajectory planning allows for high fidelity in image generation, even with drastically reduced sampling steps, maintaining essential details that traditional methods often lose [27][29]. Group 3: Future Directions - The research team indicates that the theoretical foundation of CoTj will be expanded to more complex video dynamics and will explore unsupervised Diffusion DNA discovery across modalities [36][37]. - The framework represents a significant leap in computational efficiency and resource-aware planning in generative AI, marking a new era for diffusion models [35][36].
扩散模型终于学会「看题下菜碟」!根据提示词难度动态分配算力,简单题省时复杂题保画质