Workflow
离散扩散模型
icon
Search documents
会自检的VLA!ReflectDrive:更安全更高效scaling的端到端框架(理想&清华)
自动驾驶之心· 2025-09-27 23:33
会自检的ReflectDrive:我的轨迹我做主,安全感拉满! 端到端自动驾驶已成为一个重要且快速发展的研究领域。通过大规模数据集学习类人驾驶策略具有相当大的潜力。但是在多模态性能以及长尾场景, 没有可持续解决问题的框架。如果仅依赖强化学习来加强,那么reward hack又成为了棘手的问题,很难写出一个全面的reward可以适用连续轨迹复杂的 三维空间。所以近年来大语言模型的泛化能力突破让大家看到了希望,是否能够利用模型scaling以及数据scaling去激发模型的泛化性能,也就是vla模 型的兴起。 大家都想利用上vlm的泛化能力,用更少的数据去解决few shot/zero shot的场景。下面是对于目前自动驾驶方案vla方案的痛点分析: 基于上面的描述,可以看出目前迫切需要做到的是L模态和A模态的融合,一种更容易scaling的统一的架构,同时还要做到高效生成。为应对这些挑 战, 理想和清华的团队提出ReflectDrive——一种新型学习框架,通过离散扩散的反思机制实现安全轨迹生成。 我们首先将二维驾驶空间离散化以构 建动作代码本,从而能够通过微调将预训练扩散语言模型用于规划任务。该框架的核心是安 ...
AI动态汇总:智谱发布GLM-4.5,蚂蚁数科发布金融推理大模型Agentar-Fin-R1
China Post Securities· 2025-08-06 02:33
- The GLM-4.5 model, developed by Zhipu, integrates reasoning, coding, and intelligent agent capabilities into a single architecture. It employs a hybrid expert framework with 355 billion total parameters, activating only 32 billion parameters per inference to enhance computational efficiency. The training process includes three stages: pretraining on 15 trillion general text tokens, fine-tuning on 8 trillion specialized data, and reinforcement learning for multi-task alignment. The model achieves a 37% performance improvement in complex reasoning tasks through innovations like deep-layer prioritization and grouped query attention mechanisms [12][14][15] - GLM-4.5 ranks third globally in AGI core capability evaluations, with a composite score of 63.2. It outperforms competitors in tasks such as web interaction (26.4% accuracy in BrowseComp) and code repair (64.2 in SWE-bench Verified). The model demonstrates an 80.8% win rate against Qwen3-Coder in 52 real-world programming tasks, despite having half the parameters of DeepSeek-R1, showcasing its superior performance-to-parameter ratio [15][16][19] - The Agentar-Fin-R1 model, launched by Ant Financial, is a financial reasoning model based on the Qwen3 architecture. It features a dual-engine design: the Master Builder engine translates business logic into executable code, while the Agent Group engine uses consensus algorithms for multi-agent decision-making. The model is trained on a domain-specific corpus covering six major financial sectors, achieving a financial knowledge accuracy rate of 92.3% through weighted training algorithms [20][21][23] - Agentar-Fin-R1 excels in financial evaluations, scoring 87.70 in FinEval1.0 and 86.79 in FinanceIQ. It leads in tasks like risk pricing and compliance review, with a score of 69.93 in the Finova evaluation, surpassing larger general-purpose models. Its compliance system improves review efficiency by 90%, and its credit approval module reduces loan processing time from 3 days to 15 minutes while lowering bad debt rates by 18% [23][24][25] - The Goedel-Prover-V2 theorem-proving system, developed by Princeton, Tsinghua, and NVIDIA, uses an 8B/32B parameter model to achieve state-of-the-art results. It employs scaffolded data synthesis, validator-guided self-correction, and model averaging to enhance performance. The system achieves 88.1% Pass@32 accuracy on the MiniF2F benchmark, with the 8B model reaching 83.3% of the performance of the 671B DeepSeek-Prover-V2 while using only 1/100th of the parameters [58][60][61] - Goedel-Prover-V2 demonstrates exceptional efficiency, with its 32B model solving 64 problems in the PutnamBench competition at Pass@64, outperforming the 671B DeepSeek-Prover-V2, which required Pass@1024 to solve 47 problems. The system's iterative self-correction mode improves proof quality with minimal token consumption increase, and its training process is highly efficient, requiring only 12 hours per iteration on 4 H100 GPUs [60][61][63]