ReconVLA
Search documents
AAAI 2026杰出论文奖 | ReconVLA:具身智能领域首次获得
具身智能之心· 2026-01-27 03:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨机器之心 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 在长期以来的 AI 研究版图中,具身智能虽然在机器人操作、自动化系统与现实应用中至关重要,却常被视为「系统工程驱动」的研究方向,鲜少被认为能够在 AI 核心建模范式上产生决定性影响。 而 ReconVLA 获得 AAAI Outstanding Paper Awards,释放了一个清晰而重要的信号: 让智能体在真实世界中「看、想、做」的能力,已经成为人工智能研究的核 心问题之一 。 1月30日(周五)晚19:30,我们很荣幸能邀请到AAAI 2026最佳论文ReconVLA的第一作者宋文轩,做客"具身智能之心"直播间。 论文标题:ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver 论文地址:https://arxiv.org/abs/2508.10333 论文代码:https://github.com/Chowzy069/Reconvla 本次直播将聚焦一个核心议题:抛开参 ...
AAAI 2026杰出论文奖 | ReconVLA:具身智能研究首次获得AI顶级会议最佳论文奖
机器之心· 2026-01-26 03:08
在长期以来的 AI 研究版图中,具身智能虽然在机器人操作、自动化系统与现实应用中至关重要,却常被视 为「系统工程驱动」的研究方向,鲜少被认为能够在 AI 核心建模范式上产生决定性影响。 近年来,Vision-Language-Action(VLA)模型在多任务学习与长时序操作中取得了显著进展。然而,我们 在大量实验中发现,一个基础但被长期忽视的问题严重制约了其性能上限: 视觉注意力难以稳定、精准地 聚焦于任务相关目标。 以指令「将蓝色积木放到粉色积木上」为例,模型需要在复杂背景中持续锁定「蓝色积木」和「粉色积 木」。但现实中,许多 VLA 模型的视觉注意力呈现为近似均匀分布,不同于人类行为专注于目标物体, VLA 模型容易被 无关物体或背景干扰 ,从而导致抓取或放置失败。 而 ReconVLA 获得 AAAI Outstanding Paper Awards,释放了一个清晰而重要的信号: 让智能体在真实世界 中「看、想、做」的能力,已经成为人工智能研究的核心问题之一 。 这是具身智能(Embodied Intelligence / Vision-Language-Action)方向历史上, 首次获得 AI 顶 ...
AAAI杰出论文来了!港科大、同济、浙师大等国内高校获奖
机器之心· 2026-01-22 08:13
编辑|张倩、陈陈 刚刚,AAAI 2026 官网公布了今年的「杰出论文」(相当于最佳论文)奖项,共有 5 篇论文获奖,其中有三篇由华人团队主导,作者来自香港科技大学(广 州)、西湖大学、浙江大学、同济大学、浙江师范大学、香港城市大学等多所国内高校。 AAAI 由国际人工智能促进协会主办,是人工智能领域历史最悠久、涵盖内容最广泛的国际顶级学术会议之一,也是中国计算机学会(CCF)推荐的 A 类国际学术 会议,每年举办一届。 AAAI 2026 于 1 月 20 日至 27 日在新加坡举行,总投稿数为 23,680 篇,录用论文 4,167 篇,接收率为 17.6%。 以下是获奖论文的具体情况。 近年来,视觉 — 语言 — 动作(VLA)模型的进展,使机器人智能体能够将多模态理解与动作执行相结合。然而,实证分析发现,现有的 VLA 模型在将视觉注意 力分配到目标区域时仍然存在明显困难,其注意力往往呈现分散状态。 为引导视觉注意力在正确目标上的有效 grounding ,作者提出了 ReconVLA,一种采用隐式对齐范式的重建式 VLA 模型。 论文 1:ReconVLA: Reconstructive Visio ...
AAAI 2026结果公布,刷出88887高分!2.3万投稿录用率仅17.6%
具身智能之心· 2025-11-11 00:02
Core Insights - The AAAI 2026 conference received a record-high submission of 23,680 papers, with an acceptance rate of only 17.6%, indicating a significant increase in competition compared to previous years [3][4][45]. Submission Statistics - AAAI 2026 had 23,680 submissions, a substantial rise from 12,957 in 2025 [3][45]. - A total of 4,167 papers were accepted, which is a decrease from 3,032 accepted papers in 2025, reflecting a lower acceptance rate [4][45]. Research Highlights - Researchers from various institutions showcased their successful submissions, with notable works including: - "CogniTrust," which combines verifiable supervision with a three-tier memory model to enhance AI model reliability [12][14]. - Papers focusing on privacy protection in large models, multi-modal safety, and robust communication in autonomous driving [18][20]. - "ReconVLA," which achieved a high score of 88,887, proposing a new approach to visual representation learning [24][25]. Competitive Landscape - The competition for AAAI 2026 was described as exceptionally fierce, with some reviewers noting that only highly innovative papers were accepted [43][46]. - The overall trend indicates that papers scoring around 5 or higher had a chance of acceptance, but many authors faced rejections despite high scores [51][52]. Reviewer Experiences - Some reviewers reported unusual experiences during the review process, including significant score adjustments and perceived biases in evaluations [48][56][62].
AAAI 2026结果公布,刷出88887高分,2.3万投稿录用率仅17.6%
3 6 Ke· 2025-11-10 09:55
Core Insights - The AAAI 2026 conference has seen a record submission of 23,680 papers, with an acceptance rate of only 17.6%, indicating a highly competitive environment compared to previous years [1][37][40] - The conference will take place from January 20 to January 27, 2026, at the Singapore Expo, marking its 40th annual meeting [3] Submission Statistics - AAAI 2026 received 23,680 submissions, a significant increase from 12,957 in 2025 [1][37] - A total of 4,167 papers were accepted, compared to 3,032 in the previous year, reflecting a decrease in acceptance rate from 23.4% to 17.6% [1][37] Research Highlights - Researchers from various institutions have shared their successful submissions, with notable works including "CogniTrust," which combines verifiable supervision with a three-tier memory model [5][7] - Other accepted papers focus on critical areas such as privacy protection in large models, multi-agent safety communication, and robust methods for autonomous driving [11][12][16] Notable Achievements - A student from Peking University achieved a high score of 88,887 for their paper on "CogniTrust" [5][18] - Teams from Nanyang Technological University and Hong Kong University of Science and Technology also reported multiple accepted papers, showcasing significant contributions to the field [10][18][27] Community Reactions - The competitive nature of AAAI 2026 has sparked discussions online, with some expressing concerns about the fairness of the review process and the influence of personal relationships on paper evaluations [35][40][46] - There are reports of discrepancies in scoring, with some reviewers allegedly adjusting scores post-rebuttal, raising questions about the integrity of the review process [42][48][51]
ReconVLA:基于重建式VLA模型的机器人感知方法
具身智能之心· 2025-08-29 16:03
Core Viewpoint - The article discusses the rapid development of Vision-Language-Action (VLA) models and introduces a new model called ReconVLA, which aims to enhance the precision of robotic actions by improving visual attention and focus on target objects [2][3][27]. Summary by Sections Introduction - Existing VLA models struggle with visual attention in complex scenes, leading to errors in object manipulation. Traditional methods to improve visual localization have not significantly enhanced attention distribution [6]. Model Overview - ReconVLA introduces a reconstructive approach to visual localization, where the model first reconstructs the gaze region before predicting actions. This implicit supervision forces the model to focus on the correct object, improving action precision [8][11][14]. Methodology - The framework consists of two branches: visual reconstruction and action prediction. The model uses a frozen visual tokenizer to encode the gaze region and employs a diffusion transformer for denoising and reconstruction [13][16]. - A large-scale dataset with over 100,000 trajectories and 2 million samples was created to pre-train the model, enhancing its visual generalization and implicit grounding capabilities [19]. Performance Results - In simulations, ReconVLA achieved a near 95% success rate in long-term tasks, outperforming existing methods. The model demonstrated strong transferability to unseen objects, maintaining over 40% success rates even with novel items [9][26]. - The model's performance in real-world tasks, such as stacking bowls and placing fruits, showed significant improvements over previous models, achieving up to 90% success in specific tasks [25]. Contributions - ReconVLA is the first model to utilize a gaze region reconstruction paradigm, significantly enhancing visual attention and action prediction accuracy. The extensive pre-training on diverse datasets has established a solid foundation for its performance in various tasks [14][27]. Conclusion - The study highlights the limitations of current VLA models in visual focus and presents ReconVLA as a solution that effectively directs attention to key objects, paving the way for more reliable multi-modal robotic control [27].