Workflow
强化学习
icon
Search documents
放榜了!NeurIPS 2025论文汇总(自动驾驶/大模型/具身/RL等)
自动驾驶之心· 2025-09-22 23:34
Core Insights - The article discusses the recent announcements from NeurIPS 2025, focusing on advancements in autonomous driving, visual perception reasoning, large model training, embodied intelligence, reinforcement learning, video understanding, and code generation [1]. Autonomous Driving - The article highlights various research papers related to autonomous driving, including "FutureSightDrive" and "AutoVLA," which explore visual reasoning and end-to-end driving models [2][4]. - A collection of papers and codes from institutions like Alibaba, UCLA, and Tsinghua University is provided, showcasing the latest developments in the field [6][7][13]. Visual Perception Reasoning - The article mentions "SURDS," which benchmarks spatial understanding and reasoning in driving scenarios using vision-language models [11]. - It also references "OmniSegmentor," a flexible multi-modal learning framework for semantic segmentation [16]. Large Model Training - The article discusses advancements in large model training, including papers on scaling offline reinforcement learning and fine-tuning techniques [40][42]. - It emphasizes the importance of adaptive methods for improving model performance in various applications [44]. Embodied Intelligence - Research on embodied intelligence is highlighted, including "Self-Improving Embodied Foundation Models" and "ForceVLA," which enhance models for contact-rich manipulation [46][48]. Video Understanding - The article covers advancements in video understanding, particularly through the "PixFoundation 2.0" project, which investigates the use of motion in visual grounding [28][29]. Code Generation - The article mentions developments in code generation, including "Fast and Fluent Diffusion Language Models" and "Step-By-Step Coding for Improving Mathematical Olympiad Performance" [60].
理想智驾二级部门数量从3个调整为11个是次要矛盾
理想TOP2· 2025-09-22 16:56
申明: 本文是一篇推理文,推理前提是认为以下2个推论成立: 推论1: 李想之于理想辅助驾驶作用可以高度类比马斯克之于特斯拉辅助驾驶。(3个核心作用 1.做 大资源 2.保证资源持续投入 3.具备 理解AI底层原理与直接参与公司AI技术讨论的能力的前提下,对 公司长期发展方向与技术路线下关键think different判断并执行。) 推论2: 理想智驾发展主要矛盾是 全球AI产业发展阶段/ 理想各类生产要素匹配度/ 李想(其实就是 天时地利人和) 这2个推论第一性上显然不是必然成立的,故强烈推荐读者充分批判性看待这2个推论,充分默认这2 个推论有可能不成立。 如果这两个推论成立,引申3个观点: 观点1: 理想智驾二级部门数量从3个调整为11个是理想各类生产要素匹配子类下的次要矛盾。 观点2: 不管理想智驾二级部门具体如何变,由于迭代方向过于明确,理想智驾接下来1-12个月有多 次高质量快速迭代是高概率事件。 所有的老板都有前两个作用,大部分技术负责人具备 理解AI底层原理与直接参与公司AI技术讨论的 能力,很少量技术负责人具备对AI技术路线下关键think different的能力,具备这个能力的老板也很 少 ...
别克至境L7将于9月28日上市 起售价有望杀入20万
Yang Zi Wan Bao Wang· 2025-09-22 12:38
Group 1 - The core product of Buick's high-end new energy sub-brand "Zhijing" is the Zhijing L7, which features the advanced "Zhenlong" range extension system and the "Xiaoyao Zhixing" driver assistance system, positioning it among the industry's top tier in autonomous driving capabilities [2] - The Zhijing L7 is the first vehicle to launch with the Momenta R6 flywheel model based on end-to-end "reinforcement learning," enhancing its autonomous driving technology [2] - The vehicle is equipped with Qualcomm's latest SA8775P chip, luxurious four-seat floating chairs, and a 27-speaker sound system with headrest audio, providing an upgraded luxury and comfort experience [2] Group 2 - Since the blind booking began on September 15, the Zhijing L7 has garnered significant attention and recognition from new energy users [4] - The price range for the Zhijing L7 is set between 200,000 to 250,000 yuan, with the starting price potentially dropping to 200,000 yuan, making it a new choice in the B-class car segment [4] - Users who place orders through official channels before the September 28 launch can enjoy "early bird benefits," encouraging potential buyers to act quickly [4]
美团王兴,又开源一款大模型
3 6 Ke· 2025-09-22 10:53
Core Insights - Meituan has accelerated its efforts in the AI open-source arena by releasing its first self-developed reasoning model, LongCat-Flash-Thinking, just 24 days after its initial large language model launch [1][3] - LongCat-Flash-Thinking boasts a training speed improvement of over 200%, achieving more than three times the efficiency of its predecessor, LongCat-Flash [1][9] - The model excels in various benchmark tests, particularly in formal reasoning and agent reasoning tasks, outperforming several leading models in specific categories [1][12] Group 1: Model Performance and Features - LongCat-Flash-Thinking has shown strong performance in multi-domain benchmark tests, achieving competitive results in general question answering, mathematical reasoning, and general reasoning tasks [1][12] - In mathematical reasoning, the model scored 99.2% in the MATH-500 benchmark, nearly reaching full marks, and demonstrated strong capabilities in challenging tasks like AIME and HMMT [12][14] - The model's performance in logical reasoning reached 50.3% on the ARC-AGI benchmark, surpassing OpenAI-o3 and Gemini 2.5-Pro [12] Group 2: Training Methodology - The model was developed using a two-phase training system, which includes mid-training for reasoning enhancement and supervised fine-tuning (SFT) focused on reasoning tasks [5][8] - During the SFT phase, the model's instruction-following and specialized reasoning capabilities were further improved through a curriculum learning approach [7][8] - A high-difficulty reasoning training set was created to enhance logical reasoning while maintaining general capabilities [5][7] Group 3: Reinforcement Learning Optimization - LongCat-Flash-Thinking employs a "three-pronged" approach to optimize reinforcement learning efficiency and stability, focusing on system design, algorithm improvements, and reward mechanisms [9][10] - The DORA framework, a distributed reinforcement learning system, supports asynchronous training and flexible accelerator scheduling, achieving training speeds over three times faster than traditional methods [9][10] - The model incorporates a novel reward mechanism that includes both discriminative and generative models to evaluate performance in various tasks [10][12] Group 4: Practical Applications and Future Directions - The open-sourcing of LongCat-Flash-Thinking aims to advance research in efficient reinforcement learning and native agent reasoning [19] - Meituan plans to leverage this model to enhance its consumer-facing agent products and AI search capabilities, potentially improving user experience [19]
突破后训练瓶颈?Meta超级智能实验室又一力作:CaT解决RL监督难题
机器之心· 2025-09-22 02:05
机器之心报道 机器之心编辑部 在 AI 领域,大家通常采取后训练方式来让模型获取专项技能。然而后训练一般依赖带有标注参考的监督微调,或通过可验证的程序化检查器提供奖励。 这就带来一些问题,目前许多有价值的任务可能同时缺乏这两种资源。例如在不可验证的场景中(临床、自由对话和创意写作),可能存在多个有效答案,确定 性规则检查难以实施。 在这种情况下,实践者往往只能依赖(i)繁琐的标注流程,或(ii)通过另一个 LLM 对自由形式输出进行粗略奖励。 然而,当后训练缺乏真实标注时,学习信号从何而来? 为了回答这一问题,来自牛津大学、Meta 超级智能实验室等机构的研究者提出设想: 推理计算是否可以替代缺失的监督? 本文认为答案是肯定的,他们提出了一种名为 CaT(Compute as Teacher) 的方法,核心思想是把推理时的额外计算当作教师信号,在缺乏人工标注或可验证答 案时,也能为大模型提供监督信号。 结果显示,推理时直接应用 CaT显著提升了 Gemma 3 4B、Qwen 3 4B 和 Llama 3.1 8B 的性能,即使在不可验证领域(MATH-500 最高提升 27%;HealthBench 提升 ...
VLA搞到现在,可能还是情绪价值的内容偏多一些......
自动驾驶之心· 2025-09-20 16:03
Core Insights - The article discusses the current state of end-to-end (E2E) technology in both academia and industry, highlighting the differences in approach and data availability between the two sectors [1][4][5] - It emphasizes the importance of data iteration speed in the AI model development process, suggesting that a slow data iteration can hinder technological advancements [2][4] - The article also explores the role of reinforcement learning in enhancing Vision-Language Models (VLA), particularly in scenarios where there are no definitive correct answers [6][7][9][10] Summary by Sections End-to-End Technology - The academic field is experiencing a proliferation of end-to-end methodologies, with various approaches emerging [1] - In contrast, the industrial sector is more pragmatic, facing computational limitations that exclude some popular models, but benefiting from vast amounts of data [4] - The success of models like ChatGPT is attributed to the internet's ability to provide extensive data, which is also true for the automotive industry where companies can easily gather massive driving data [4] Data and Technology Iteration - The article stresses that as technology evolves rapidly, the iteration of datasets must keep pace; otherwise, it will impede technological progress [2] - Research teams are increasingly publishing datasets alongside their papers to maintain high-impact outputs [3] Reinforcement Learning and VLA - Reinforcement learning is suitable for problems where there are no correct answers, only characteristics of correct and incorrect answers [7] - The training process in reinforcement learning allows for the identification of optimal solutions based on reward systems, thus reducing the need for extensive demonstration data [9] - The article notes that while short-term results of VLA applications may be uncertain, the long-term potential is widely recognized [10][11] Future of VLA - The article suggests that the importance of algorithms in VLA models extends beyond mere performance metrics; factors such as data availability and training strategies are crucial [12] - The community is encouraged to engage in discussions about the development and challenges of autonomous driving technologies [5][13][16]
特斯拉Optimus再生动荡:AI团队负责人Ashish Kumar转投Meta
Huan Qiu Wang Zi Xun· 2025-09-20 04:20
据悉,在特斯拉任职期间,Ashish Kumar主导了Optimus AI团队的核心技术研发工作,其团队专注于通 过人工智能技术突破人形机器人的实用化瓶颈。他在社交平台发文中特别提到,团队"全力推进可扩展 方法——用强化学习取代传统技术栈,并通过视频学习来提升机器人的灵巧度"。 来源:环球网 强化学习作为人工智能领域的前沿技术,允许机器人通过试错自主优化行为策略,而非依赖预设程序。 Ashish Kumar团队此前展示的Optimus原型机已具备分拣电池、搬运物品等基础任务能力,其流畅的动 作控制被业界视为强化学习技术落地的标杆案例。此外,该团队通过视频学习技术,使机器人能够从人 类操作视频中提取动作模式,显著缩短了技能训练周期。 【环球网科技综合报道】9月20日消息,据多家外媒报道,特斯拉Optimus(擎天柱)人形机器人项目AI 团队负责人阿希什·库马尔(Ashish Kumar)已正式辞去在特斯拉的职务,并将于近期加入Meta(原 Facebook)公司担任研究科学家一职。当地时间9月19日,Ashish Kumar在个人社交平台发布长文,回 顾其在特斯拉的职业生涯,并透露了关于人形机器人技术发展的关键 ...
重磅!DeepSeek 梁文锋论文登上《自然》封面,正面回应蒸馏质疑
程序员的那些事· 2025-09-20 01:10
9 月 18 日,由 DeepSeek 团队共同完成、梁文锋担任通讯作者的 DeepSeek-R1 推理模型研究论文,登上了国际权威期刊《自然(Nature)》的封面。 与今年 1 月发布的 DeepSeek-R1 的初版论文相比,本次论文披露了更多模型训练的细节,并正面回应了模型发布之初的蒸馏质疑。 DeepSeek-R1 是全球首个经过同行评审的主流大语言模型。目前几乎所有主流的大模型都还没有经过独立同行评审,这一空白"终于被 DeepSeek 打 破"。 在《自然》封面的推荐介绍中,是这样写的: "如果训练出的大模型能够规划解决问题所需的步骤,那么它们往往能够更好地解决问题。这种『推理』与人类处理更复杂问题的方式类似,但这对人工 智能有极大挑战,需要人工干预来添加标签和注释。在本周的期刊中,DeepSeek 的研究人员揭示了他们如何能够在极少的人工输入下训练一个模型,并 使其进行推理。 DeepSeek-R1 模型采用强化学习进行训练。在这种学习中,模型正确解答数学问题时会获得高分奖励,答错则会受到惩罚。结果,它学会了推理——逐 步解决问题并揭示这些步骤——更有可能得出正确答案。这使得 DeepSeek ...
攻克大模型训推差异难题,蚂蚁开源新一代推理模型Ring-flash-2.0
机器之心· 2025-09-19 10:43
Core Viewpoint - The article discusses the release of Ring-flash-2.0 by Ant Group's Bailing team, highlighting its potential to reshape the competitive landscape of large models by achieving high performance with lower activation parameters and improved training stability [1][4][26]. Performance Overview - Ring-flash-2.0 features a total of 100 billion parameters and 6.1 billion activations, achieving a score of 86.98 in mathematical AIME and an Elo score of 90.23 on CodeForces, with a throughput of over 200 tokens per second [1][21]. - The model's performance is comparable to state-of-the-art (SOTA) levels of 40 billion dense models, demonstrating significant advancements in reasoning tasks [1][21]. Technical Innovations - The introduction of the icepop algorithm allows for stable long-term reinforcement learning (RL) training by freezing tokens with large discrepancies in training and inference accuracy, preventing gradient backpropagation [6][10][13]. - The two-staged RL approach combines supervised fine-tuning (SFT) with reinforcement learning using verifiable rewards (RLVR) and human feedback (RLHF), optimizing the training process [14][16]. Cost Efficiency - Ring-flash-2.0 achieves a performance equivalent to a 40 billion dense model while only activating 6.1 billion parameters, marking a turning point in cost efficiency within the large model competition [17][21]. - The model's design allows for high sparsity and low activation, significantly reducing inference costs in high-concurrency scenarios [21]. Market Implications - The competitive landscape for large models is shifting from a focus on parameter quantity to cost-effectiveness, with Ring-flash-2.0 positioned as a leading solution in this new era [18][25]. - The article suggests that Ring-flash-2.0 may signify the beginning of a "high cost-performance era" in the field of large models, following the advancements initiated by GPT-4 [26].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].