具身智能之心
Search documents
AAAI 2026结果公布,刷出88887高分!2.3万投稿录用率仅17.6%
具身智能之心· 2025-11-11 00:02
Core Insights - The AAAI 2026 conference received a record-high submission of 23,680 papers, with an acceptance rate of only 17.6%, indicating a significant increase in competition compared to previous years [3][4][45]. Submission Statistics - AAAI 2026 had 23,680 submissions, a substantial rise from 12,957 in 2025 [3][45]. - A total of 4,167 papers were accepted, which is a decrease from 3,032 accepted papers in 2025, reflecting a lower acceptance rate [4][45]. Research Highlights - Researchers from various institutions showcased their successful submissions, with notable works including: - "CogniTrust," which combines verifiable supervision with a three-tier memory model to enhance AI model reliability [12][14]. - Papers focusing on privacy protection in large models, multi-modal safety, and robust communication in autonomous driving [18][20]. - "ReconVLA," which achieved a high score of 88,887, proposing a new approach to visual representation learning [24][25]. Competitive Landscape - The competition for AAAI 2026 was described as exceptionally fierce, with some reviewers noting that only highly innovative papers were accepted [43][46]. - The overall trend indicates that papers scoring around 5 or higher had a chance of acceptance, but many authors faced rejections despite high scores [51][52]. Reviewer Experiences - Some reviewers reported unusual experiences during the review process, including significant score adjustments and perceived biases in evaluations [48][56][62].
具身智能公司无界动力完成3亿元首轮融资,红杉中国、线性资本领投,高瓴创投、地平线等跟投
具身智能之心· 2025-11-11 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 无界动力2025年创立于北京,聚焦于构建机器人"通用大脑"与"操作智能",突破手、眼、脑协同的关键瓶 颈,将具身智能转化为一种可广泛部署、持续进化的基础设施,以通用基础模型研发与通用专家模型落地 应用双线驱动,致力于为全球客户提供软硬一体、高可靠性的具身智能解决方案。 无界动力创始人兼CEO 张玉峰 创始人兼CEO张玉峰拥有从核心技术研发到大规模商业化落地的完整经历,是兼具技术深度与工程化管理 能力的跨界创业者,曾先后在Sony、ARM等全球顶尖科技企业总部从事研发管理,具备深厚的底层技术能 力。2017年加入地平线,曾任地平线副总裁、智能汽车事业部总裁、公司董事及经营管理委员会成员,期 间带领千人团队实现智能驾驶软件算法产品的研发突破、快速工程化落地与规模化交付,实现中国自主品 牌乘用车辅助驾驶市场份额第一的成绩,并主导与比亚迪、长安汽车、大众集团、大陆集团等国内外头部 企业的多起战略合作,具备卓越的国际化布局与产业化落地经验。 目前,无界动力已组建了具备前沿技术创新与工程落地能力的核心团队。联合创始人兼CTO许闻达为卡内 基梅隆大学机器人学博士,有长期的 ...
仅需300美元!先进VLA模型与低成本硬件相结合
具身智能之心· 2025-11-11 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Samarth Chopra等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 低成本视觉-语言-动作(VLA)系统,匹兹堡大学研究团队通过300美元级6DOF机械臂搭配自适应视野集成器,解决传统VLA硬件昂贵、泛化性差的痛点,在真 实场景中实现超越现有方法的性能,推动机器人基础模型的普及。 背景与核心挑战 关键创新 低成本6DOF机械臂设计 核心参数:成本约311.98美元,6个自由度,有效负载0.2kg,工作半径382mm,最大速度0.7m/s,重复定位精度≤10mm(table1、figure3)。 VLA模型优势在于直接从图像和自然语言指令映射到机器人动作,跳过手工设计的感知/规划模块,但在陌生光照、新物体、视觉干扰下易失效,泛化能力不 足。 硬件层面,现有顶尖机械臂成本达数千至数万美元,即便"低成本"产品也常超1000美元,且依赖专用软件框架,普通用户和研究者难以获取。 训练与 ...
VLA+RL正在不断拉升着具身操作的上限!
具身智能之心· 2025-11-11 00:02
Core Insights - The article discusses the integration of Reinforcement Learning (RL) with Visual Language Models (VLA), highlighting how RL enhances the capabilities of VLA by bridging the gap between pre-training and real-world tasks [1][4]. Group 1: Technical Developments - RL training models directly optimize the "complete task" goal, allowing models to handle unexpected situations not present in training data, thus improving robustness [1]. - The reward mechanism enables VLA to learn smoother trajectories and align more closely with the physical world [1]. - A recommended open-source repository for VLA+RL methods is provided, facilitating entry-level research [2]. Group 2: Evaluation Results - Evaluation results on various LIBERO task groups show significant performance metrics for different models, with the π0.5 model achieving an average accuracy of 96.9% across tasks [5]. - The Flow-SDE π0 model demonstrated a 38.5% improvement in average accuracy when combined with RL [5]. Group 3: Community and Resources - The community offers continuous live sharing sessions, including roundtable forums and discussions on various topics within the embodied intelligence industry [7]. - A comprehensive technical roadmap is available for beginners, outlining essential technologies and learning paths [9]. - The community has established job referral mechanisms with several companies in the embodied intelligence sector, providing valuable networking opportunities [13]. Group 4: Educational Materials - The community has compiled over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms and various technical learning routes [15]. - Specific learning routes for different aspects of embodied intelligence, such as reinforcement learning and multi-modal large models, are detailed to assist learners at various levels [16][42]. Group 5: Industry Insights - The community includes members from renowned universities and leading companies in the field, fostering a rich environment for academic and industrial exchange [14]. - Regular updates on academic progress and industrial applications in embodied intelligence are shared, keeping members informed about the latest developments [21][23].
VLA方向,想再带几个同学冲一下具身的A会......
具身智能之心· 2025-11-10 10:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 2025年还剩不到2个月,有些同学刚结束CVPR,又火急火燎的去准备其它会议了。具身智能之心今 年也带了几名同学,paper已经陆续投出去了,希望能有好的结果。 目前我们向全网招募3名VLA方向的同学进行论文辅导,因为要保证质量,所以名额有限。主要方 向:VLA模型、轻量化、VLA+触觉、VLA+世界模型、VLA+RL等。 感兴趣的同学欢迎联系小助理微信:AIDriver005,备注"具身论文辅导咨询"。 ...
聊聊在线强化学习是怎么微调π0和π0.5的?为什么性能最高能提升50%以上?
具身智能之心· 2025-11-10 03:30
Core Viewpoint - The article discusses the introduction of the πRL framework, which enhances flow-based vision-language-action (VLA) models through online reinforcement learning (RL) fine-tuning, significantly improving their performance and generalization capabilities [5][7]. Group 1: Introduction to VLA Models - VLA models enable robots to understand and execute complex tasks through multimodal inputs, but large-scale RL applications face challenges due to the difficulty in handling action log-likelihood during the iterative denoising process [5]. Group 2: πRL Framework - The πRL framework, developed by teams from Tsinghua University and Peking University, addresses the challenges of applying large-scale RL to flow-based VLA models by training them in parallel simulations [6]. Group 3: RL Algorithms in πRL - πRL implements two RL algorithms: 1. FlowNoise models the denoising process as a discrete-time Markov Decision Process (MDP) using a learnable noise network for precise log-likelihood calculations [7]. 2. Flow-SDE combines the denoising process with agent-environment interaction, constructing a dual-layer MDP that transitions from ODE to SDE for efficient RL exploration [7]. Group 4: Performance Evaluation - In benchmark tests, πRL significantly improved the performance of few-shot SFT models π0 and π0.5 from 57.6% to 97.6% and from 77.1% to 98.3% on the LIBERO dataset, respectively [7]. - In the ManiSkill benchmark, πRL demonstrated scalable multi-task RL capabilities across 4,352 grasping and placing tasks using 320 parallel environments [7]. Group 5: Conclusion - Overall, πRL shows substantial performance enhancements and stronger generalization compared to SFT models, validating the effectiveness of online RL in flow-based VLA models [7].
机器人训练,北京男大有了技能玩法
具身智能之心· 2025-11-10 00:02
作者丨 量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 还得是大学生会玩啊(doge)! 网上正高速冲浪中,结果意外发现:有男大竟找了个机器人队友?而且机器人还相当黏人(bushi~ 白天超市打工它要跟着,一看东西装好就立马乐颠颠帮忙拉小推车,上楼下楼忙个不停: 等到中午去食堂兼职,它也自告奋勇帮忙推餐车,而且指哪打哪 (拍拍头就知道你想让它停下) : 甚至,一天劳作结束后,连健身它也要一起。既然来都来了,男大表示:那就练起来! 笑死,感觉可以以机器人视角去拍vlog了,标题就叫《高能量之机器人的一天》。 言归正传,不知道大家发现没有,图中男大和机器人伙伴的交流都是通过拍拍头、拉拉身体搞定的, 既没有遥控、也没有语音 。 这就有点东西了!要知道目前绝大多数机器人都是靠外部传感器 (摄像头、激光雷达等) 和遥控驱动的,而这群男大竟提出了一种全新的方 式——仅通过 "本体感知(Proprioception)" 就能和外界交互。 好好好,搞半天人家这 ...
银河通用全新模型统一机器人导航任务,7B参数模型支持实时部署
具身智能之心· 2025-11-10 00:02
Core Insights - The article discusses the development of NavFoM, a foundational model for embodied navigation that aims to unify navigation tasks across different robots and scenarios, marking a significant technological leap from specialized to general-purpose navigation [1][29]. Group 1: Unified Navigation Paradigm - NavFoM is based on a fundamental idea of unifying different robot navigation tasks into a common paradigm: streaming video input from robots combined with natural language navigation instructions to predict action trajectories [3]. - The model supports multiple tasks such as visual language navigation, target search, target following, and autonomous driving, across various environments including indoor and outdoor settings, and is applicable to different types of robots like quadrupeds, wheeled robots, humanoids, drones, and cars [3][29]. Group 2: Model Structure and Efficiency - The model features TVI Tokens, which provide a scalable method for understanding images under different tasks and camera settings, enhancing the model's adaptability [5]. - To enable real-time deployment of the 7B parameter navigation model, the team introduced the Budget-Aware Token Sampling Strategy (BATS), which adaptively samples key frames under computational constraints to maintain performance while ensuring efficient operation on real robots [6][11]. Group 3: Training Data and Performance - The team trained NavFoM on 8 million navigation data entries, including various tasks and robot types, as well as 4 million entries of open-world question-answering data, effectively doubling the training volume compared to previous works [12][15]. - NavFoM achieved state-of-the-art (SOTA) and SOTA-comparable results across multiple public benchmarks without requiring task-specific fine-tuning, demonstrating its versatility and effectiveness [16][29]. Group 4: Future Implications - The development of NavFoM signifies a move towards generalization in embodied navigation models, enabling cross-industry applications and fostering further research in intelligent navigation technologies [29]. - The team aims to inspire new technologies, datasets, and benchmarks in the field of embodied navigation, accelerating innovation in intelligent services and production capabilities [29].
具身的大小脑路线都在这里了......
具身智能之心· 2025-11-10 00:02
在通往通用人工智能(AGI)的探索中,具身智能逐渐成为关键方向之一。相比于传统的预设动作序列不 同,具身智能强调智能体与物理环境的交互与适应,聚焦于如何让智能体具备在物理世界中感知环境、理 解任务、执行动作并反馈学习的能力。 而具身智能领域最重要的两个部分:大脑和小脑构成了具身机器人最重要的模块,如果类比于人,大脑负 责思考感知(主导语义理解和任务规划),小脑负责执行(高精度的运动执行)。 国内外相关领域产业分析 近2年,许多具身明星团队陆续出来创业,成立了多家非常有价值的公司。星海图、银河通用、逐际动力等 团队陆续从实验室走向商业和工业界,推动具身本体和大小脑技术的不断进步。 国内传统大厂,华为于2024年底启动"全球具身智能产业创新中心",与乐聚机器人、大族机器人等企业合 作,共同建设具身智能大脑、小脑等关键技术;京东自2025年5月以来连续投资智元机器人、千寻智能、逐 际动力等多家公司,以强化其在物流科技与家庭服务场景中的效率与服务能力。此外,腾讯、蚂蚁集团、 小米等科技巨头也积极通过战略投资与合作布局,加快构建具身智能产业生态。 国外方面,Tesla/Figure AI在工业与物流机器人应用上持续推进 ...
迭代模型与累积数据才是正解!灵巧智能软硬全系列平台亮相25年世界互联网大会
具身智能之心· 2025-11-10 00:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 2025年世界互联网大会乌镇峰会于11.7号正式开幕,吸引了全球近130多个国家和地区的1600多名嘉宾参 会,其中具身智能与机器人模块是本次峰会非常吸晴的内容,也为行业输出了很有价值的insight。 灵巧智能作为具身灵巧操作领域的行业龙头,峰会期间主办了"具身智能灵巧操作生态对接会"。创始人兼 CEO周晨为行业带来了一场关于灵巧操作数据、量产、场景落地等多个方向内容的分享,并强调"数据是制 约具身智能规模化落地的最大瓶颈之一"。 DexRobot正在做哪些事情? 周总率先介绍了灵巧智能研发团队:"DexRobot是一家以灵巧操作为核心科技的具身机器人企业,由院士团 队和众多硬核机器人科学家合作组建"。我们期望推动人形机器人和工业机器人产业的发展,提高机器人末 端执行器技术水平,实现通用智能多模态触视感知的灵巧操作系统的研发、生产和销售。"灵巧智能也在不 断努力成为全球领先的灵巧操作机器人和灵巧操作方案的提供商"。 不到两年,灵巧智能推出了3款灵巧手 | 14:35 - 14:55 | 《Al+机器人的柔性进化与产业落地》 | | --- | --- | ...