Workflow
世界模型
icon
Search documents
AI 开始「自由玩电脑」了!吉大提出「屏幕探索者」智能体
机器之心· 2025-06-27 04:02
Core Viewpoint - The article discusses the development of a vision-language model (VLM) agent named ScreenExplorer, which is designed to autonomously explore and interact within open graphical user interface (GUI) environments, marking a significant step towards achieving general artificial intelligence (AGI) [2][3][35]. Group 1: Breakthroughs and Innovations - The research introduces three core breakthroughs in the training of VLM agents for GUI exploration [6]. - A real-time interactive online reinforcement learning framework is established, allowing the VLM agent to interact with a live GUI environment [8][11]. - The introduction of a "curiosity mechanism" addresses the sparse feedback issue in open GUI environments, motivating the agent to explore diverse interface states [10][12]. Group 2: Training Methodology - The training involves a heuristic and world model-driven reward system that encourages exploration by providing immediate rewards for diverse actions [12][24]. - The GRPO algorithm is utilized for reinforcement learning training, calculating the advantage of actions based on rewards obtained [14][15]. - The training process allows for multiple parallel environments to synchronize reasoning, execution, and recording, enabling "learning by doing" [15]. Group 3: Experimental Results - Initial experiments show that without training, the Qwen2.5-VL-3B model fails to interact effectively with the GUI [17]. - After training, the model demonstrates improved capabilities, successfully opening applications and navigating deeper into pages [18][20]. - The ScreenExplorer models outperform general models in exploration diversity and interaction effectiveness, indicating a significant advancement in autonomous GUI interaction [22][23]. Group 4: Skill Emergence and Conclusion - The training process leads to the emergence of new skills, such as cross-modal translation and complex reasoning abilities [29][34]. - The research concludes that ScreenExplorer effectively enhances GUI interaction capabilities through a combination of exploration rewards, world models, and GRPO reinforcement learning, paving the way for more autonomous agents and progress towards AGI [35].
具身世界模型新突破,地平线 & 极佳提出几何一致视频世界模型增强机器人策略学习
机器之心· 2025-06-26 04:35
近年来,随着人工智能从感知智能向决策智能演进, 世界模型 (World Models) 逐渐成为机器人领域的重要研究方向。世界模型旨在让智能体对环境进行建模并 预测未来状态,从而实现更高效的规划与决策。 与此同时,具身数据也迎来了爆发式关注。因为目前具身算法高度依赖于大规模的真实机器人演示数据,而这些数据的采集过程往往成本高昂、耗时费力,严重 限制了其可扩展性和泛化能力。尽管仿真平台提供了一种相对低成本的数据生成方式,但由于仿真环境与真实世界之间存在显著的视觉和动力学差异(即 sim-to- real gap),导致在仿真中训练的策略难以直接迁移到真实机器人上,从而限制了其实际应用效果。 因此如何高效获取、生成和利用高质量的具身数据,已成为当 前机器人学习领域的核心挑战之一 。 项目主页: https://horizonrobotics.github.io/robot_lab/robotransfer/ 模仿学习(Imitation Learning)已成为机器人操作领域的重要方法之一。通过让机器人 "模仿" 专家示教的行为,可以在复杂任务中快速构建有效的策略模型。然 而,这类方法通常依赖大量高质量的真实机器 ...
特文特大学Vanessa Evers:构建机器人的“世界模型”是实现社交智能的关键
Qi Lu Wan Bao· 2025-06-25 06:38
Group 1 - The event "Dancing with Social Robots" was held at the National Exhibition and Convention Center in Tianjin, focusing on the cultural phenomenon of robots entering various domains such as classrooms and public spaces [1] - Experts discussed the coexistence with social intelligent robots and the underlying reasons for their integration into society [1] Group 2 - Professor Vanessa Evers from Twente University emphasized the need to build a "world model" for achieving social intelligence in robots, using the example of fishing to illustrate the complexity of sensory inputs required for decision-making [3] - Current limitations include the need for digitalizing the entire world, as existing trials are confined to limited environments like classrooms and hospitals, making implementation challenging despite the availability of various sensors [3] - Evers highlighted that robots can learn human expressions and etiquette by analyzing YouTube videos, but their operational methods do not need to mimic humans exactly, suggesting the use of optimized mechanical arms instead of human-like ones [3] - The ultimate goal of developing social robots raises questions about their integration into human life versus providing a space for self-expression, with concerns about misuse prompting a call for public and governmental discussions on technology's development and application boundaries [3] - Evers pointed out that energy issues pose significant challenges in the laboratory, particularly for soft robots that require efficient energy transmission similar to human blood, while battery technology is progressing slowly [3]
【私募调研记录】深圳领峰资产调研四维图新
Zheng Quan Zhi Xing· 2025-06-25 00:10
Group 1: Company Insights - Shenzhen Lingfeng Asset recently conducted research on the listed company Siwei Tuxin, highlighting the trend of intelligent driving equality becoming a key industry focus [1] - The company noted that mid-to-high-level assisted driving functions are gradually being integrated into lower-end models, establishing intelligent driving as a leading business segment [1] - Siwei Tuxin's data compliance business shows a clear growth trend, with AI-enhanced data loops aiding automakers in rapid algorithm iteration and optimization [1] Group 2: Product Development and Market Trends - The world model is being utilized for behavior prediction and trajectory generation, with productization aimed at OEMs and Tier 1 suppliers [1] - The company emphasized the need for intelligent driving orders to achieve certain sales volumes to realize economies of scale, alongside internal cost control and operational efficiency improvements positively impacting profitability [1] - The implementation of new national standards for two-wheeled vehicles is expected to create new market demands for Jiefa Technology's SoC cockpit products, aligning with leading automakers' overseas expansion needs [1] Group 3: Financial Projections and Growth - Jiefa Technology anticipates a revenue growth of over 12% in 2024, with an additional 3 million sets of basic driving point products and 600,000 sets of cockpit products expected to be secured by Q1 2025 [1] - The company is confident in achieving significant loss reduction by 2025, supported by the successful launch of its fifth-generation SoC product, the AC8025AE [1] - Jiefa Technology's automotive-grade MCU chip AC7870 has been successfully launched, meeting ISO 26262 ASIL-D functional safety standards, applicable across various scenarios [1]
华为车BU招聘(端到端/感知模型/模型优化等)!岗位多多~
自动驾驶之心· 2025-06-24 07:21
Core Viewpoint - The article emphasizes the rapid evolution and commercialization of autonomous driving technologies, highlighting the importance of community engagement and knowledge sharing in this field [9][14][19]. Group 1: Job Opportunities and Community Engagement - Huawei is actively recruiting for various positions in its autonomous driving division, including roles focused on end-to-end model algorithms, perception models, and efficiency optimization [1][2]. - The "Autonomous Driving Heart Knowledge Planet" serves as a platform for technical exchange, targeting students and professionals in the autonomous driving and AI sectors, and has established connections with numerous industry companies for job referrals [7][14][15]. Group 2: Technological Trends and Future Directions - The article outlines that by 2025, the focus will be on advanced technologies such as visual large language models (VLM), end-to-end trajectory prediction, and 3D generative simulations, indicating a shift towards more integrated and intelligent systems in autonomous driving [9][22]. - The community has developed over 30 learning pathways covering various subfields of autonomous driving, including perception, mapping, and AI model deployment, which are crucial for industry professionals [19][21]. Group 3: Educational Resources and Content - The knowledge platform offers exclusive rights to members, including access to academic advancements, professional Q&A sessions, and discounts on courses, fostering a comprehensive learning environment [17][19]. - Regular webinars featuring experts from top conferences and companies are organized to discuss practical applications and research in autonomous driving, enhancing the learning experience for participants [21][22].
新股消息 | 斯坦德机器人递表港交所 为全球第五大工业智能移动机器人解决方案提供商
智通财经网· 2025-06-23 22:52
Core Viewpoint - Stand Robot (Wuxi) Co., Ltd. has submitted an application for listing on the Hong Kong Stock Exchange, with CITIC Securities and Guotai Junan International as joint sponsors [1] Company Overview - Stand Robot is a global leader in industrial intelligent mobile robot solutions, focusing on empowering smart factories across various industrial scenarios [4] - The company is recognized as the fifth largest provider of industrial intelligent mobile robot solutions and the fourth largest provider of industrial embodied intelligent robot solutions globally, according to Zhaoshang Consulting [4] - Stand Robot has a diverse customer base, with over 400 clients, many of whom are leaders in their respective fields, particularly in high-tech industries such as 3C, automotive, and semiconductors [4][6] Technological Advancements - The company is one of the few in the industry to achieve independent research and development of full-stack technology and has pioneered proprietary operating systems for industrial intelligent robots in China [5] - Stand Robot has made significant breakthroughs in positioning, navigation, control, and perception technologies, enabling robots to operate with intelligence, efficiency, stability, and safety [5] - The company is capable of dispatching over 2,000 robots in a single simulated scenario, a feat that is uncommon in real industrial settings [5] Financial Performance - Stand Robot's revenue for the years 2022, 2023, and 2024 was approximately RMB 96.3 million, RMB 162.2 million, and RMB 251.5 million, respectively [7] - The company reported losses of approximately RMB 128 million, RMB 100.3 million, and RMB 45.1 million for the same years [7] - The gross profit for the years 2022, 2023, and 2024 was RMB 12.4 million, RMB 51.2 million, and RMB 97.2 million, respectively [8]
商汤绝影世界模型负责人离职。。。
自动驾驶之心· 2025-06-21 13:15
Core Viewpoint - The article discusses the challenges and opportunities faced by SenseTime's autonomous driving division, particularly focusing on the competitive landscape and the importance of technological advancements in the industry. Group 1: Company Developments - The head of the world model development for SenseTime's autonomous driving division has left the company, which raises concerns about the future of their cloud technology system and the R-UniAD generative driving solution [2][3]. - SenseTime's autonomous driving division has successfully delivered a mid-tier solution based on the J6M model to GAC Trumpchi, but the mid-tier market is expected to undergo significant upgrades this year [4]. Group 2: Market Dynamics - The mid-tier market will see a shift from highway-based NOA (Navigation on Autopilot) to full urban NOA, which represents a major change in the competitive landscape [4]. - Leading companies are introducing lightweight urban NOA solutions based on high-tier algorithms, targeting chips with around 100 TOPS computing power, which are already being demonstrated to OEM clients [4]. Group 3: High-Tier Strategy - The key focus for SenseTime this year is the one-stage end-to-end solution, which has shown impressive performance and is a requirement for high-tier project tenders from OEMs [5]. - Collaborations with Dongfeng Motor aim for mass production and delivery of the UniAD one-stage end-to-end solution by Q4 2025, marking a critical opportunity for SenseTime to establish a foothold in the high-tier market [5][6]. Group 4: Competitive Landscape - SenseTime's ability to deliver a benchmark project in the high-tier segment is crucial for gaining credibility with OEMs and securing additional projects [6][7]. - The current window of opportunity for SenseTime in the high-tier market is limited, as many models capable of supporting high-tier software and hardware costs are being released this year [6][8].
人形机器人“闹展会”,量产易、应用难
3 6 Ke· 2025-06-20 12:15
当AI大模型以星火燎原之势渗透至千行百业,作为其重要落地载体的具身智能,正以"现实版钢铁侠"的姿态,成为科技展会中"最靓的仔"。 从通信技术中来,往通信世界里去 人形机器人向来是科技展会中最吸睛的存在。 一大早,智元机器人展台早已挤满前来参观的观众。远征A2手持毛笔,一笔一画写着"福"字;灵犀X2不仅用"内心戏"模式与观众互动,还向观众表演了 一段太极拳。这些能力的背后,既有智元对模型架构的创新构建,也少不了通信技术的支持。 智元打造了"本体—小脑—大脑"的软硬件技术架构,让人形机器人实现了运动智能、交互智能和作业智能。"我们将一些基本能力,比如手脚运动,做在 本体和小脑中,使机器人在断网的情况下,也能实现基本操作。"智元机器人首席运营官邱恒告诉《IT时报》记者,"大脑"作为人形机器人智慧的关键, 由云平台+具身算法构建而成,通信技术被运用其中。"有了通信技术的加持,就像给人形机器人配备了一台可以实时获取信息的手机,联网后能获得更 多智慧,一些复杂问题也将交由云端处理,交互就会更加'聪明'。" 具备这些能力后,人形机器人将走进通信场景。智元旗下的远征A2、精灵G1、灵犀X2等多款机器人将进入展厅、营业厅、机房 ...
北大卢宗青:现阶段世界模型和 VLA 都不触及本质|具身先锋十人谈
雷峰网· 2025-06-20 11:54
" 互联网视频数据是唯一可以 scale up 的道路 。 " 作者丨 郭海惟 编辑丨 陈彩娴 作为一名具身大脑的创业者,卢宗青有着金光闪闪的履历: 他是紧随 DeepMind之后,中国新生代的强化学习研究者。北京大学计算机学院长聘副教授,担任过智源 研究院多模态交互研究中心负责人,负责过首个国家自然科学基金委原创探索计划通用智能体项目,还同 时在NeurIPS、ICLR、ICML等机器学习的国际顶级会议担任领域主席。 早在 2023年,他旗下团队便有利用多模态模型研究通用 Agent 的研究尝试,让 Agent 玩《荒野大镖客 2》和办公,使其成为第一个从零开始在AAA级游戏中完成具体任务的 LLM 智能体。相关论文几经波折, 今年终于被 ICML 2025 录用。不过他自述对那份研究其实不够满意,因为"泛化性不足"。 当完成那些研究以后,卢宗青意识到 "当前的多模态模型缺乏与世界交互的能力"。因为模型缺少学习物 理交互的数据,所以 我们看到的那些泛化的能力本质都是 "抽象"的,它终究无法理解动作和世界的关 系,自然也无法预测世界 。 这如今成为他想在具身智能创业的起点:开发一个通用的具身人工智能模型。 卢 ...
Midjourney发布视频模型:不卷分辨率,但网友直呼画面惊艳
虎嗅APP· 2025-06-20 09:47
以下文章来源于APPSO ,作者发现明日产品的 APPSO . AI 第一新媒体,「超级个体」的灵感指南。 #AIGC #智能设备 #独特应用 #Generative AI 本文来自微信公众号: APPSO (ID:appsolution) ,作者:appso,原文标题:《这个AI生图神器首次发布视频模型:不卷分辨率,但网友直呼画面 惊艳超预期|附提示词》,题图来自:AI生成 面对迪士尼和环球影业的版权诉讼,老牌文生图"独角兽"Midjourney没有放慢节奏,反而于今天凌晨顶着压力推出了首个视频模型V1。 调色精准、构图考究、情绪饱满,风格依旧在线。 不卷分辨率、不卷长镜头、Midjourney卷的,是一股独有的氛围感和审美辨识度。Midjourney是有野心的,目标剑指"世界模型",但目前略显"粗糙"的 功能设计,能否让其走得更远,恐怕还是一个未知数。 你卷你的分辨率,我走我的超现实。 Midjourney一直以奇幻、超现实的视觉风格见长,而从目前用户实测的效果来看,其视频模型也延续了这一美学方向,风格稳定,辨识度高。 省流版如下: 上传或生成图像后点击"Animate"即可,单次任务默认输出4段5秒视频 ...