Workflow
世界模型
icon
Search documents
商汤绝影世界模型负责人离职。。。
自动驾驶之心· 2025-06-21 13:15
Core Viewpoint - The article discusses the challenges and opportunities faced by SenseTime's autonomous driving division, particularly focusing on the competitive landscape and the importance of technological advancements in the industry. Group 1: Company Developments - The head of the world model development for SenseTime's autonomous driving division has left the company, which raises concerns about the future of their cloud technology system and the R-UniAD generative driving solution [2][3]. - SenseTime's autonomous driving division has successfully delivered a mid-tier solution based on the J6M model to GAC Trumpchi, but the mid-tier market is expected to undergo significant upgrades this year [4]. Group 2: Market Dynamics - The mid-tier market will see a shift from highway-based NOA (Navigation on Autopilot) to full urban NOA, which represents a major change in the competitive landscape [4]. - Leading companies are introducing lightweight urban NOA solutions based on high-tier algorithms, targeting chips with around 100 TOPS computing power, which are already being demonstrated to OEM clients [4]. Group 3: High-Tier Strategy - The key focus for SenseTime this year is the one-stage end-to-end solution, which has shown impressive performance and is a requirement for high-tier project tenders from OEMs [5]. - Collaborations with Dongfeng Motor aim for mass production and delivery of the UniAD one-stage end-to-end solution by Q4 2025, marking a critical opportunity for SenseTime to establish a foothold in the high-tier market [5][6]. Group 4: Competitive Landscape - SenseTime's ability to deliver a benchmark project in the high-tier segment is crucial for gaining credibility with OEMs and securing additional projects [6][7]. - The current window of opportunity for SenseTime in the high-tier market is limited, as many models capable of supporting high-tier software and hardware costs are being released this year [6][8].
人形机器人“闹展会”,量产易、应用难
3 6 Ke· 2025-06-20 12:15
当AI大模型以星火燎原之势渗透至千行百业,作为其重要落地载体的具身智能,正以"现实版钢铁侠"的姿态,成为科技展会中"最靓的仔"。 从通信技术中来,往通信世界里去 人形机器人向来是科技展会中最吸睛的存在。 一大早,智元机器人展台早已挤满前来参观的观众。远征A2手持毛笔,一笔一画写着"福"字;灵犀X2不仅用"内心戏"模式与观众互动,还向观众表演了 一段太极拳。这些能力的背后,既有智元对模型架构的创新构建,也少不了通信技术的支持。 智元打造了"本体—小脑—大脑"的软硬件技术架构,让人形机器人实现了运动智能、交互智能和作业智能。"我们将一些基本能力,比如手脚运动,做在 本体和小脑中,使机器人在断网的情况下,也能实现基本操作。"智元机器人首席运营官邱恒告诉《IT时报》记者,"大脑"作为人形机器人智慧的关键, 由云平台+具身算法构建而成,通信技术被运用其中。"有了通信技术的加持,就像给人形机器人配备了一台可以实时获取信息的手机,联网后能获得更 多智慧,一些复杂问题也将交由云端处理,交互就会更加'聪明'。" 具备这些能力后,人形机器人将走进通信场景。智元旗下的远征A2、精灵G1、灵犀X2等多款机器人将进入展厅、营业厅、机房 ...
北大卢宗青:现阶段世界模型和 VLA 都不触及本质|具身先锋十人谈
雷峰网· 2025-06-20 11:54
" 互联网视频数据是唯一可以 scale up 的道路 。 " 作者丨 郭海惟 编辑丨 陈彩娴 作为一名具身大脑的创业者,卢宗青有着金光闪闪的履历: 他是紧随 DeepMind之后,中国新生代的强化学习研究者。北京大学计算机学院长聘副教授,担任过智源 研究院多模态交互研究中心负责人,负责过首个国家自然科学基金委原创探索计划通用智能体项目,还同 时在NeurIPS、ICLR、ICML等机器学习的国际顶级会议担任领域主席。 早在 2023年,他旗下团队便有利用多模态模型研究通用 Agent 的研究尝试,让 Agent 玩《荒野大镖客 2》和办公,使其成为第一个从零开始在AAA级游戏中完成具体任务的 LLM 智能体。相关论文几经波折, 今年终于被 ICML 2025 录用。不过他自述对那份研究其实不够满意,因为"泛化性不足"。 当完成那些研究以后,卢宗青意识到 "当前的多模态模型缺乏与世界交互的能力"。因为模型缺少学习物 理交互的数据,所以 我们看到的那些泛化的能力本质都是 "抽象"的,它终究无法理解动作和世界的关 系,自然也无法预测世界 。 这如今成为他想在具身智能创业的起点:开发一个通用的具身人工智能模型。 卢 ...
Midjourney发布视频模型:不卷分辨率,但网友直呼画面惊艳
虎嗅APP· 2025-06-20 09:47
以下文章来源于APPSO ,作者发现明日产品的 APPSO . AI 第一新媒体,「超级个体」的灵感指南。 #AIGC #智能设备 #独特应用 #Generative AI 本文来自微信公众号: APPSO (ID:appsolution) ,作者:appso,原文标题:《这个AI生图神器首次发布视频模型:不卷分辨率,但网友直呼画面 惊艳超预期|附提示词》,题图来自:AI生成 面对迪士尼和环球影业的版权诉讼,老牌文生图"独角兽"Midjourney没有放慢节奏,反而于今天凌晨顶着压力推出了首个视频模型V1。 调色精准、构图考究、情绪饱满,风格依旧在线。 不卷分辨率、不卷长镜头、Midjourney卷的,是一股独有的氛围感和审美辨识度。Midjourney是有野心的,目标剑指"世界模型",但目前略显"粗糙"的 功能设计,能否让其走得更远,恐怕还是一个未知数。 你卷你的分辨率,我走我的超现实。 Midjourney一直以奇幻、超现实的视觉风格见长,而从目前用户实测的效果来看,其视频模型也延续了这一美学方向,风格稳定,辨识度高。 省流版如下: 上传或生成图像后点击"Animate"即可,单次任务默认输出4段5秒视频 ...
本周精华总结:Meta发布世界模型,下一个ChatGPT时刻何时来临?
老徐抓AI趋势· 2025-06-19 16:47
欢迎大家 点击【预约】 按钮 文字版速览 预约 我 下一场直播 本文重点 观点来自: 6 月 16 日本周一直播 【 强 烈建议直接看】 本段视频精华,逻辑更完整 自动驾驶系统要像老司机一样理解复杂的交通场景,不仅是识别路况,更要对潜在风险做出预判——例 如,看到前车旁边有人过马路被遮挡,系统要能预测行人可能出现的位置,从而保证行车安全和平稳。 没有对物理世界和事件的深刻理解,自动驾驶无法实现真正的安全与智能。 更广泛来看,具备成熟世界模型的机器人将极大提升生产力,推动经济飞速发展,带动运输、物流、公 共和私人交通等行业变革。我认为,拥有这一技术优势的企业将成为未来市场的最大受益者,提前布局 相关机会尤为重要。 此外,量子计算技术也在加速发展。黄仁勋最近在欧洲演讲中提到,量子计算的拐点即将到来,这将进 一步促进科学研究和AI进步,加速人类科技革命的步伐。我认为,这场科技革命的节奏将越来越快, 未来几年内我们可能迎来多次类似蒸汽机或电力革命级别的突破,全球经济和社会结构都将因此发生深 刻变革。 以上内容仅为案例展示,不构成投资建议,投资有风险,交易需谨慎。 注:基金投顾服务由盈米--小帮投顾服务团队提供!投资有 ...
学习端到端大模型,还不太明白VLM和VLA的区别。。。
自动驾驶之心· 2025-06-19 11:54
Core Insights - The article emphasizes the growing importance of large models (VLM) in the field of intelligent driving, highlighting their potential for practical applications and production [2][4]. Group 1: VLM and VLA - VLM (Vision-Language Model) focuses on foundational capabilities such as detection, question answering, spatial understanding, and reasoning [4]. - VLA (Vision-Language Action) is more action-oriented, aimed at trajectory prediction in autonomous driving, requiring a deep understanding of human-like reasoning and perception [4]. - It is recommended to learn VLM first before expanding to VLA, as VLM can predict trajectories through diffusion models, enhancing action capabilities in uncertain environments [4]. Group 2: Community and Resources - The article invites readers to join a knowledge-sharing community that offers comprehensive resources, including video courses, hardware, and coding materials related to autonomous driving [4]. - The community aims to build a network of professionals in intelligent driving and embodied intelligence, with a target of gathering 10,000 members in three years [4]. Group 3: Technical Directions - The article outlines four cutting-edge technical directions in the industry: Visual Language Models, World Models, Diffusion Models, and End-to-End Autonomous Driving [5]. - It provides links to various resources and papers that cover advancements in these areas, indicating a robust framework for ongoing research and development [6][31]. Group 4: Datasets and Applications - A variety of datasets are mentioned that are crucial for training and evaluating models in autonomous driving, including pedestrian detection, object tracking, and scene understanding [19][20]. - The article discusses the application of language-enhanced systems in autonomous driving, showcasing how natural language processing can improve vehicle navigation and interaction [20][21]. Group 5: Future Trends - The article highlights the potential for large models to significantly impact the future of autonomous driving, particularly in enhancing decision-making and control systems [24][25]. - It suggests that the integration of language models with driving systems could lead to more intuitive and human-like vehicle behavior [24][25].
Midjourney发布视频模型:不卷分辨率,但网友直呼画面惊艳
Hu Xiu· 2025-06-19 06:56
面对迪士尼和环球影业的版权诉讼,老牌文生图"独角兽"Midjourney没有放慢节奏,反而于今天凌晨顶着压力推出了首个视频模型V1。 调色精准、构图考究、情绪饱满,风格依旧在线。 不卷分辨率、不卷长镜头、Midjourney卷的,是一股独有的氛围感和审美辨识度。Midjourney是有野心的,目标剑指"世界模型",但目前略显"粗糙"的功 能设计,能否让其走得更远,恐怕还是一个未知数。 省流版如下: 上传或生成图像后点击"Animate"即可,单次任务默认输出4段5秒视频,最长可扩展至21秒; 支持手动和自动两种模式,用户可通过提示词设定画面生成效果;提供低运动和高运动选项,分别适合静态氛围或强动态场景; 0:00 / 2:24 Midjourney官方宣传demo 开卷氛围感,Midjourney视频模型正式上线 你卷你的分辨率,我走我的超现实。 Midjourney一直以奇幻、超现实的视觉风格见长,而从目前用户实测的效果来看,其视频模型也延续了这一美学方向,风格稳定,辨识度高。 Prompt:The train passing through the station.|@PJaccetturo 知名X博主@ ...
Midjourney 推出其首个图生视频模型 V1:延续美学风格,目标是构建「世界模型」
Founder Park· 2025-06-19 05:52
内容转载自 「 AI寒武纪 」 今天凌晨,Midjourney推出视频生成模型 V1,主打高性价比、易于上手的视频生成功能,作为其实 现"实时模拟世界"愿景的第一步。用户现在可以通过动画化Midjourney图片或自己的图片来创作短视 频,定位为有趣、易用、美观且价格亲民。 Midjourney一如既往,视频模型在美学细节上下了一番功夫,官方宣传视频: 超 7000 人的「AI 产品市集」社群!不错过每一款有价值的 AI 应用。 邀请从业者、开发人员和创业者,飞书扫码加群: 进群后,你有机会得到: 01 图生视频, 支持手动和自动两种模式 最新、最值得关注的 AI 新品资讯; 不定期赠送热门新品的邀请码、会员码; 最精准的AI产品曝光渠道 核心流程 :采用"图像转视频" (Image-to-Video) 的工作方式。用户先生成一张满意的图 片,然后点击新增的 "Animate" 按钮来使其动画化。 支持外部图片 :用户可以上传自己的图片,然后通过输入运动提示词来生成视频。 两种动画模式 : 自动模式 (Automatic):AI 会自动为你生成"运动提示",简单快捷 手动模式 (Manual):用户可以自己写 ...
第四范式(06682):2025Q1业绩超预期,Agent业务高歌猛进带动公司进入高速增长轨道
Investment Rating - The report maintains an "Outperform" rating for the company [4][8]. Core Insights - The company has entered a high-growth trajectory supported by its Agent business, with a forecasted revenue growth of 30.85% in 2025, 28.75% in 2026, and 27.22% in 2027 [4][8]. - The first quarter of 2025 saw revenue of 1.08 billion RMB, a year-on-year increase of 30.1%, with a gross profit of 444 million RMB, also up 30.1% [4][8]. - The average revenue per key user reached 11.67 million RMB, reflecting a 31.3% year-on-year increase, indicating strong performance despite macroeconomic pressures [4][8]. Financial Summary - Revenue projections for 2025-2027 are 6.88 billion RMB, 8.86 billion RMB, and 11.28 billion RMB respectively, with EPS expected to be 0.11 RMB, 0.56 RMB, and 1.19 RMB [3][4][8]. - The company’s gross profit margin (GPM) for Q1 2025 was 41.2%, maintaining stability compared to the previous year [4][8]. - The Prophet AI platform generated 805 million RMB in revenue for Q1 2025, marking a 60.5% increase year-on-year [4][8]. Business Development - The company has upgraded to a dual 2B+2C business model, enhancing its capabilities in both enterprise and consumer sectors [4][8]. - The launch of the AI Agent development platform has enabled the company to cover the full lifecycle of AI Agent development, with applications across over 14 industries [4][8]. - The establishment of the Phancy consumer electronics sector aims to provide AI Agent solutions for devices, further diversifying the company's offerings [4][8].
首个转型AI公司的新势力,在全球AI顶会展示下一代自动驾驶模型
机器之心· 2025-06-17 04:50
Core Viewpoint - The article emphasizes the significance of high computing power, large models, and extensive data in achieving Level 3 (L3) autonomous driving, highlighting the advancements made by XPeng with its G7 model and its proprietary AI chips [3][18][19]. Group 1: Technological Advancements - XPeng's G7 is the world's first L3 level AI car, featuring three self-developed Turing AI chips with over 2200 TOPS of effective computing power [3][18]. - The G7 introduces the VLA-OL model, which incorporates a "motion brain" for decision-making in intelligent assisted driving [4]. - The VLM (Vision Large Model) serves as the AI brain for vehicle perception, enabling new interaction capabilities and future functionalities like local chat and multi-language support [5][19]. Group 2: Industry Positioning - XPeng was the only invited Chinese car company to present at the global computer vision conference CVPR 2025, showcasing its advancements in autonomous driving models [6][13]. - The company has established a comprehensive system from computing power to algorithms and data, positioning itself as a leader in the autonomous driving sector [8][18]. Group 3: Model Development and Training - The next-generation autonomous driving base model developed by XPeng has a parameter scale of 72 billion and has been trained on over 20 million video clips [20]. - The model utilizes a large language model backbone and extensive multimodal driving data, enhancing its capabilities in visual understanding and reasoning [20][21]. - XPeng employs a distillation approach to adapt large models for vehicle-side deployment, ensuring core capabilities are retained while optimizing performance [27][28]. Group 4: Future Directions - The development of a world model is underway, which will simulate real-world conditions and enhance the feedback loop for continuous learning [36][41]. - XPeng aims to leverage its AI advancements not only for autonomous driving but also for AI robots and flying cars in the future [43][64]. - The transition to an AI company involves building a robust AI infrastructure, with a focus on optimizing the entire production process from cloud to vehicle [50][62].