Workflow
多模态
icon
Search documents
RoboSense 2025 机器感知挑战赛正式启动
具身智能之心· 2025-06-25 13:52
面向现实世界的机器人感知评测任务,五大赛道,全链路挑战,全球征集解决方案! 为什么需要 RoboSense? 在机器人系统不断迈向真实世界的进程中,感知系统的稳定性、鲁棒性与泛化能力正成为制约其部署能力 的关键因素。面对动态人群、恶劣天气、传感器故障、跨平台部署等复杂环境条件,传统感知算法往往面 临性能大幅下降的挑战。 为此, RoboSense Challenge 2025 应运而生。该挑战赛旨在系统性评估机器人在真实场景下的感知与理解 能力,推动多模态感知模型的稳健性研究,鼓励跨模态融合与任务泛化方向的创新探索。 | Registration | From June 2025 | | --- | --- | | Competition Server Online | June 15th, 2025 | | Phase One Deadline | August 15th, 2025 | | Phase Two Deadline | September 15th, 2025 | | Award Decision @ IROS 2025 | October 19th, 2025 | 该赛事由新加坡国立大学、南 ...
同济大学最新!多模态感知具身导航全面综述
具身智能之心· 2025-06-25 13:52
作者丨 视觉语言导航 编辑丨 视觉语言导航 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有 你想要的。 主要贡献 研究背景 | Task | PointNav | ImageNav | ObjectNav | Audio-GoalNav | | --- | --- | --- | --- | --- | | Description | Navigate to a | Navigate to a | Navigate to a | Navigate to | | | specific 3D point | location matching | specific object. | sound sources. | | | in space. | a visual image. | | | | Sensory Inputs | Visual (RGB, | Visual | Visual (Object | Visual (RGB-D) | | | Depth, ...
关于2025智能机器人关键技术大会开放优秀硕博学术交流的通知
机器人圈· 2025-06-25 09:48
Core Points - The "2025 Intelligent Robot Key Technology Conference" will be held from July 22-24, 2025, in Qiqihar City, focusing on the integration and breakthroughs in embodied intelligence and multimodal interaction technology [1][4]. - The conference aims to facilitate communication among young researchers, including master's and doctoral students, in the field of intelligent robotics [2]. - The event is supported by several prominent organizations and institutions, including the Chinese Automation Society and the Chinese Academy of Sciences [3]. Conference Details - The conference will feature keynote speeches from leading scholars and industry leaders, specialized forums on various topics, a technology exhibition, and a platform for academia-industry collaboration [4]. - Key topics include humanoid robots, medical robotics, ethical considerations, and core component innovation [4]. - The conference will also include a call for papers, with accepted submissions to be published in various high-impact journals by the end of 2025 [7]. Registration Information - Registration is open until July 21, 2025, with different fees for students, general attendees, and corporate representatives [8][9]. - The registration fee includes meals and conference materials, while accommodation and transportation costs are the responsibility of the attendees [9][10]. - Payment must be made in advance, and refunds are subject to specific conditions [10]. Organizational Structure - The conference is organized by a committee comprising various experts and scholars from leading universities and research institutions [5][6]. - Notable invited speakers include academicians and professors from prestigious institutions, enhancing the conference's academic credibility [6].
让多模态大模型「想明白再画」!港大等开源GoT-R1:强化学习解锁视觉生成推理新范式
机器之心· 2025-06-25 06:50
当前,多模态大模型在根据复杂文本提示生成高保真、语义一致的图像方面取得了显著进展,但在处理包含精确空间关系、多对象属性及复杂组合的指令时,仍 面临挑战。 针对此,来自香港大学 MMLab、香港中文大学 MMLab 和商汤科技的研究团队,继其先前发布的 Generation Chain-of-Thought (GoT) 框架之后,现推出重要进展 ——GoT-R1。 该新框架通过引入强化学习,显著增强了多模态大模型在视觉生成任务中的语义 - 空间推理能力,使其能够超越预定义模板,自主探索和学习更优的推理策略 。 GoT 和 GoT-R1 已全面开源。 GoT 框架首先通过引入显式的语言推理过程,在生成图像前对语义内容和空间布局进行规划,从而提升了生成图像的准确性和可控性 。然而,GoT 的推理能力主 要源于基于人工定义模板的监督微调数据,这在一定程度上限制了模型自主发现更优推理策略的潜力,有时可能导致生成的推理链条未能完全忠实于用户复杂的 文本提示 。 GoT-R1 的提出,旨在克服上述局限。它将强化学习(RL)创新性地应用于视觉生成的语义 - 空间推理过程,赋予模型自主学习和优化推理路径的能力。 强化学习训练前 ...
掌阅科技联合优酷发起征文大赛 扩充优质IP储备
Group 1 - The core viewpoint of the articles is that Zhangyue Technology (掌阅科技) and Youku are collaborating to launch a year-long writing competition aimed at discovering quality original stories and potential creators [1][2] - The writing competition will focus on four main themes: "urban emotions," "realism," "suspense adventure," and "innovation track" [1] - Youku will leverage its film and television resources to provide opportunities for the winning works to enter its IP evaluation system, prioritizing them for film adaptation [1][2] Group 2 - Zhangyue Technology aims to enhance its content ecosystem by integrating resources with Youku, creating a broader creative space and display platform for creators [2] - The CEO of Zhangyue Technology, Sun Kai, stated that the short drama business has become the company's largest business segment, emphasizing its importance in expanding content expression and optimizing user engagement [2] - The company is transitioning from a "digital reading platform" to a "multi-modal content production and operation platform in the AI era," indicating a strategic upgrade [2]
技术干货:VLA(视觉-语言-动作)模型详细解读(含主流玩家梳理)
Robot猎场备忘录· 2025-06-25 04:21
温馨提示 : 点击下方图片,查看运营团队2025年6月最新原创报告(共235页) 说明: 欢迎约稿、刊例合作、行业人士交流 , 行业交流记得先加入 "机器人头条"知识星球 ,后添加( 微信号:lietou100w ) 微信; 若有侵权、改稿请联系编辑运营(微信:li_sir_2020); 正文: 早期小编整理文章 【技术干货】"具身智能 "技术最全解析 , 本篇文章重点解读现阶段大火的 视觉-语言-动作 (VLA)模型, 一种整合视觉(Vision)、语言(Language)和动作(Action)的多模态模型 。 2022年,Google和CMU相继推出"SayCan"、"Instruct2Act" 工作,Transformer模型既看图、又读指令、还能 生成生成动作轨迹成为可能;2023年,随着谷歌DeepMind推出RT-2模型,机器人可以端到端地从给定的语言指 令和视觉信号,直接生成特定的动作,具身智能领域也迎来了一个新名词: VLA(Vision-Language-Action Model,视觉-语言-动作模型)。 如果说过去十年,机器人领域的焦点先后经历了「看得见」的视觉感知、「听得懂」的语言理解, ...
BEV高频面试问题汇总!(纯视觉&多模态融合算法)
自动驾驶之心· 2025-06-25 02:30
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 写在前面 自BEVFormer带起BEV感知热潮以来,从BEVDet到PETR、BEVDepth、BEVFusion,再到最近火热的 InternBEV、BEV-Lane系列,BEV(Bird's Eye View)感知已经成为视觉感知中的兵家必争之地。不仅如此,随 着多模态融合、时间建模、实时性优化等技术不断取得突破,BEV感知的实际落地也正在加速推进。2024年以 来,地平线、文远、小鹏、比亚迪、毫末等厂商纷纷投入量产研发,不少团队也将BEV作为核心视觉模块融入自 研的自动驾驶栈。 如今2025年已过半, BEV感知领域还有哪些值得重点关注的新技术?哪些论文提出了真正具有变革性的想法? 有哪些方法已经在量产项目中落地? 这些问题,都是每一位前沿从业者和技术爱好者无法回避的思考方向为 此,自动驾驶之心对 BEV感知方向的相关提问和回答进行了系统汇总 ,看不过瘾可以关注文末附上的资料,感 兴趣的小伙伴千万别错过呦 BEV知多少 在bev空间上,要检测A目标,肯定要用对应的A目标特征,拿其他目标的特征过来有啥用 ...
【公告全知道】数字货币+区块链+国产芯片+跨境支付+多模态AI!公司截至去年末累计为近1.5万家单商户开通数字人民币服务
财联社· 2025-06-24 14:06
Group 1 - The article highlights the importance of weekly announcements from Sunday to Thursday, which include significant stock market updates such as suspensions, increases or decreases in holdings, investment wins, acquisitions, earnings reports, unlocks, and high transfers, marked in red for easy identification [1] - A company has provided digital RMB services to nearly 15,000 single merchants as of the end of last year, focusing on digital currency, blockchain, domestic chips, cross-border payments, multimodal AI, cloud computing, and Huawei's HarmonyOS [1] - Another company is involved in solid-state batteries, lithium batteries, and drones, with existing orders for solid-state battery and key material businesses [1] - A robotics subsidiary of a company is engaged in humanoid robots, autonomous driving, and chips, with products applicable in service robots and humanoid robot sectors [1]
多模态AI黑马刷榜后再造神器:一个产品搞定图片视频播客生成,自带百种特效,大牛梅涛团队出品
量子位· 2025-06-24 13:36
西风 梦晨 发自 凹非寺 量子位 | 公众号 QbitAI A I大牛梅涛坐镇,全新多模态AI问世! 用 法上堪称: 全能 。 不仅 支持 图 片、视频 生成 : 奇幻场景、多样视角都能驾驭: 而且 唇形同步 功能上线,社 恐大"i"人也能玩转 播客 : 划重点: 官方还提供了 上百种可直接套用的趣味特效模版 ,让 用户实现"躺 平创 作"。 人物、 动物、建筑物的"变身"模版通通都有 : 像下面这种炫酷转换, 操作 简单到只需上传一张图: 另外,生图板块的Image Agent也是官方主打,修图生图只需大白话表述,不会写prompt不是问题,它会自动帮你优化 修改。 不卖关子,这个最新创作工具就是 vivago2.0 (智小象AI) 。 打造出它的团队 智象 未 来 (HiDr eam.a i) ,是圈内鼎鼎有名的大牛——加拿大工程院外籍院士梅涛创立的AI公司,研发团队中挤满了 来自中科大的中坚。 前段时间,团队推出的 开源模型HiDream-I1 曾在文生图模型竞技场一鸣惊人, 开源24小时就拿下了排行榜榜首 ,在国内一众开源大模型 中率先跻身第一梯队。 | CREATOR | NAME | ARENA ...
穆尧团队最新!RoboTwin 2.0:用于鲁棒双臂操作的可扩展数据基准
自动驾驶之心· 2025-06-24 12:41
以下文章来源于具身智能之心 ,作者Tianxing Chen等 具身智能之心 . 与世界交互,更进一步 点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Tianxing Chen等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 天行和muyao大佬团队出品的2.0工作,看看有哪些创新点和惊喜吧~ Webpage: https://robotwin-platform.github.io/ arXiv:https://arxiv.org/abs/2506.18088 Code: https://github.com/RoboTwin-Platform/RoboTwin Document: https://robotwin-platform.github.io/doc/ Title:RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Rando ...