Workflow
世界模型
icon
Search documents
Midjourney发布视频模型:不卷分辨率,但网友直呼画面惊艳
Hu Xiu· 2025-06-19 06:56
面对迪士尼和环球影业的版权诉讼,老牌文生图"独角兽"Midjourney没有放慢节奏,反而于今天凌晨顶着压力推出了首个视频模型V1。 调色精准、构图考究、情绪饱满,风格依旧在线。 不卷分辨率、不卷长镜头、Midjourney卷的,是一股独有的氛围感和审美辨识度。Midjourney是有野心的,目标剑指"世界模型",但目前略显"粗糙"的功 能设计,能否让其走得更远,恐怕还是一个未知数。 省流版如下: 上传或生成图像后点击"Animate"即可,单次任务默认输出4段5秒视频,最长可扩展至21秒; 支持手动和自动两种模式,用户可通过提示词设定画面生成效果;提供低运动和高运动选项,分别适合静态氛围或强动态场景; 0:00 / 2:24 Midjourney官方宣传demo 开卷氛围感,Midjourney视频模型正式上线 你卷你的分辨率,我走我的超现实。 Midjourney一直以奇幻、超现实的视觉风格见长,而从目前用户实测的效果来看,其视频模型也延续了这一美学方向,风格稳定,辨识度高。 Prompt:The train passing through the station.|@PJaccetturo 知名X博主@ ...
Midjourney 推出其首个图生视频模型 V1:延续美学风格,目标是构建「世界模型」
Founder Park· 2025-06-19 05:52
内容转载自 「 AI寒武纪 」 今天凌晨,Midjourney推出视频生成模型 V1,主打高性价比、易于上手的视频生成功能,作为其实 现"实时模拟世界"愿景的第一步。用户现在可以通过动画化Midjourney图片或自己的图片来创作短视 频,定位为有趣、易用、美观且价格亲民。 Midjourney一如既往,视频模型在美学细节上下了一番功夫,官方宣传视频: 超 7000 人的「AI 产品市集」社群!不错过每一款有价值的 AI 应用。 邀请从业者、开发人员和创业者,飞书扫码加群: 进群后,你有机会得到: 01 图生视频, 支持手动和自动两种模式 最新、最值得关注的 AI 新品资讯; 不定期赠送热门新品的邀请码、会员码; 最精准的AI产品曝光渠道 核心流程 :采用"图像转视频" (Image-to-Video) 的工作方式。用户先生成一张满意的图 片,然后点击新增的 "Animate" 按钮来使其动画化。 支持外部图片 :用户可以上传自己的图片,然后通过输入运动提示词来生成视频。 两种动画模式 : 自动模式 (Automatic):AI 会自动为你生成"运动提示",简单快捷 手动模式 (Manual):用户可以自己写 ...
第四范式(06682):2025Q1业绩超预期,Agent业务高歌猛进带动公司进入高速增长轨道
Investment Rating - The report maintains an "Outperform" rating for the company [4][8]. Core Insights - The company has entered a high-growth trajectory supported by its Agent business, with a forecasted revenue growth of 30.85% in 2025, 28.75% in 2026, and 27.22% in 2027 [4][8]. - The first quarter of 2025 saw revenue of 1.08 billion RMB, a year-on-year increase of 30.1%, with a gross profit of 444 million RMB, also up 30.1% [4][8]. - The average revenue per key user reached 11.67 million RMB, reflecting a 31.3% year-on-year increase, indicating strong performance despite macroeconomic pressures [4][8]. Financial Summary - Revenue projections for 2025-2027 are 6.88 billion RMB, 8.86 billion RMB, and 11.28 billion RMB respectively, with EPS expected to be 0.11 RMB, 0.56 RMB, and 1.19 RMB [3][4][8]. - The company’s gross profit margin (GPM) for Q1 2025 was 41.2%, maintaining stability compared to the previous year [4][8]. - The Prophet AI platform generated 805 million RMB in revenue for Q1 2025, marking a 60.5% increase year-on-year [4][8]. Business Development - The company has upgraded to a dual 2B+2C business model, enhancing its capabilities in both enterprise and consumer sectors [4][8]. - The launch of the AI Agent development platform has enabled the company to cover the full lifecycle of AI Agent development, with applications across over 14 industries [4][8]. - The establishment of the Phancy consumer electronics sector aims to provide AI Agent solutions for devices, further diversifying the company's offerings [4][8].
首个转型AI公司的新势力,在全球AI顶会展示下一代自动驾驶模型
机器之心· 2025-06-17 04:50
Core Viewpoint - The article emphasizes the significance of high computing power, large models, and extensive data in achieving Level 3 (L3) autonomous driving, highlighting the advancements made by XPeng with its G7 model and its proprietary AI chips [3][18][19]. Group 1: Technological Advancements - XPeng's G7 is the world's first L3 level AI car, featuring three self-developed Turing AI chips with over 2200 TOPS of effective computing power [3][18]. - The G7 introduces the VLA-OL model, which incorporates a "motion brain" for decision-making in intelligent assisted driving [4]. - The VLM (Vision Large Model) serves as the AI brain for vehicle perception, enabling new interaction capabilities and future functionalities like local chat and multi-language support [5][19]. Group 2: Industry Positioning - XPeng was the only invited Chinese car company to present at the global computer vision conference CVPR 2025, showcasing its advancements in autonomous driving models [6][13]. - The company has established a comprehensive system from computing power to algorithms and data, positioning itself as a leader in the autonomous driving sector [8][18]. Group 3: Model Development and Training - The next-generation autonomous driving base model developed by XPeng has a parameter scale of 72 billion and has been trained on over 20 million video clips [20]. - The model utilizes a large language model backbone and extensive multimodal driving data, enhancing its capabilities in visual understanding and reasoning [20][21]. - XPeng employs a distillation approach to adapt large models for vehicle-side deployment, ensuring core capabilities are retained while optimizing performance [27][28]. Group 4: Future Directions - The development of a world model is underway, which will simulate real-world conditions and enhance the feedback loop for continuous learning [36][41]. - XPeng aims to leverage its AI advancements not only for autonomous driving but also for AI robots and flying cars in the future [43][64]. - The transition to an AI company involves building a robust AI infrastructure, with a focus on optimizing the entire production process from cloud to vehicle [50][62].
本周精华总结:谷歌AI的进阶之路:从技术积累到发现新知的未来探索
老徐抓AI趋势· 2025-06-15 03:41
欢迎大家 点击【预约】 按钮 预约 我 下一场直播 本文重点 观点来自: 6 月 9 日本周一直播 谷歌未来的目标是实现通用人工智能(AGI),即让机器具备与人脑同等的通用智能能力。DeepMind 团队对AGI有清晰定义,认为通用智能即机器能像人脑一样处理各种任务。尽管现阶段AI在某些简单任 务仍有不足,但正在不断弥补"认知漏洞",逐步向真正的通用智能靠近。 【 强 烈建议直接看】 本段视频精华,逻辑更完整 谷歌与特斯拉被认为是最接近实现"世界模型"的两家公司,谷歌依托YouTube海量视频数据,特斯拉则 依靠车辆摄像头采集的现实世界数据。这些多维度的现实数据对训练通用智能极为关键,远超单一文本 数据的深度。 文字版速览 总的来说,谷歌的AI技术不仅扎实,更具备创新和超越的潜力。未来几年,谷歌AI有望在智能发现、 模型完善以及通用智能方向实现突破,继续保持其在AI领域的领先地位。作为关注AI发展的朋友,我 认为谷歌值得持续跟踪和关注。 谷歌作为AI领域的重要玩家,其发展历程和技术积累值得深入分析。谷歌母公司Alphabet的架构设计十 分巧妙,它将多个创新子公司独立运营,如Google、DeepMind、I ...
“多模态方法无法实现AGI”
AI前线· 2025-06-14 04:06
Core Viewpoint - The article argues that true Artificial General Intelligence (AGI) requires a physical understanding of the world, as many problems cannot be reduced to symbolic operations [2][4][21]. Group 1: Limitations of Current AI Models - Current large language models (LLMs) may give the illusion of understanding the world, but they primarily learn heuristic collections for predicting tokens rather than developing a genuine world model [4][5][7]. - The understanding of LLMs is superficial, leading to misconceptions about their intelligence levels, as they do not engage in physical simulations when processing language [8][12][20]. Group 2: The Need for Embodied Cognition - The pursuit of AGI should prioritize embodied intelligence and interaction with the environment rather than merely combining multiple modalities into a patchwork solution [1][15][23]. - A unified approach to processing different modalities, inspired by human cognition, is essential for developing AGI that can generalize across various tasks [19][23]. Group 3: Critique of Multimodal Approaches - Current multimodal models often artificially sever the connections between modalities, complicating the integration of concepts and hindering the development of a coherent understanding [17][18]. - The reliance on large-scale models to stitch together narrow-domain capabilities is unlikely to yield a fully cognitive AGI, as it does not address the fundamental nature of intelligence [21][22]. Group 4: Future Directions for AGI Development - The article suggests that future AGI development should focus on interactive and embodied processes, leveraging insights from human cognition and classical disciplines [23][24]. - The challenge lies in identifying the necessary functions for AGI and arranging them into a coherent whole, which is more of a conceptual issue than a mathematical one [23].
烧钱一年,李飞飞的「空间智能」愿景有变化吗?
机器之心· 2025-06-13 12:02
Group 1 - The core vision of World Labs, founded by Fei-Fei Li, emphasizes the importance of spatial intelligence and world models in AI development, aiming to create AI systems that can understand and generate 3D physical worlds [5][6][7] - World Labs has achieved significant milestones in its first year, including raising $230 million in funding and reaching a valuation of over $1 billion, positioning itself as a notable player in the AI sector [5][6] - The company has released technologies such as the "world generation" model and the Forge renderer, which facilitate the creation of interactive 3D environments from single images [6][7] Group 2 - Fei-Fei Li argues that current language models (LLMs) have limitations in describing and understanding 3D physical worlds, making spatial intelligence a crucial component for AI [5][6] - The success of LLMs has provided methodologies for spatial intelligence, but true breakthroughs require interdisciplinary integration, particularly between AI and computer graphics [7][8] - The advancements in computational power, data availability, and engineering capabilities have made the pursuit of "world models" a realistic goal [7]
凭借RCE和AI两把利器,广汽丰田开启中国自研2.0时代
Core Viewpoint - GAC Toyota aims to leverage AI technology and local resources to achieve 80% production and sales of smart electric vehicles by 2030 [1] Group 1: R&D and Development Strategy - GAC Toyota's new development model is led by local engineers under the Regional Chief Engineer (RCE) system, focusing on integrating local suppliers and AI technology [2][5] - The decision-making power for the development of smart electric products has been transferred from Japan to China, allowing RCE to lead all model developments, including new and updated versions of key models like Sienna, Highlander, and Camry [2][5] - GAC Toyota is entering a "self-research 2.0 era," emphasizing the importance of local engineers in defining product specifications and driving innovation [5] Group 2: Product Platforms and Innovations - GAC Toyota is developing two dedicated new energy platforms: a compact car platform for A and B class vehicles and a high-compatibility platform for C and D class vehicles [9] - The first model on the compact platform, the Platinum 3X, has seen high demand, while the first model on the high-compatibility platform, the Platinum 7, is set to launch in Q1 next year [9] Group 3: AI Integration and Manufacturing - AI is being utilized to enhance product capabilities, making vehicles more intelligent and responsive to user needs [10][12] - GAC Toyota is implementing AI in manufacturing processes, achieving a defect rate of 0.008 units per vehicle and improving supply chain quality to a record low of 0.26 PPM [16] - The company has integrated over 40 patents in AI logistics, achieving zero inventory through advanced automation technologies [16]
AGI真方向?谷歌证明:智能体在自研世界模型,世界模型is all You Need
机器之心· 2025-06-13 02:32
Core Insights - The article discusses the necessity of world models for general agents in achieving flexible, goal-directed behavior, emphasizing that any AI capable of generalizing to multi-step tasks must learn a predictive model of its environment [4][9][20]. Group 1: Importance of World Models - World models are essential for agents to generalize across complex, long-term tasks, as they allow for the prediction of future states based on current actions [4][5][9]. - Google DeepMind's research indicates that learning world models is not just beneficial but necessary for achieving human-level artificial intelligence [9][20]. Group 2: Theoretical Framework - The authors developed a mathematical framework consisting of four components: environment, goals, agents, and world models, to formalize the relationship between these elements [24][30]. - The framework posits that any agent capable of handling simple goal-directed tasks must learn a predictive model of its environment, which can be extracted from the agent's policy [20][30]. Group 3: Algorithm for World Model Recovery - The article outlines an algorithm that allows for the recovery of world models from bounded agents by querying them with carefully designed composite goals [37][39]. - Experiments demonstrated that even when agents deviated from theoretical assumptions, the algorithm successfully recovered accurate world models, confirming the link between agent capabilities and the quality of the world model [40][46]. Group 4: Implications for AI Development - The findings suggest that the race for superintelligent AI may actually be a competition to build more complex world models, transitioning from a "human data era" to an "experience era" [49][52]. - The development of foundational world models like Genie 2, which can generate diverse 3D environments from a single image, represents a significant advancement in training and evaluating embodied agents [51][52].
LeCun亲自官宣!Meta世界模型V-JEPA 2登场!仅用62小时机器人数据,就能实现零样本控制!
AI科技大本营· 2025-06-12 10:48
Core Viewpoint - Meta has launched V-JEPA 2, an advanced AI system designed to enhance machines' understanding, prediction, and interaction with the physical world, marking a significant step towards building more general AI agents [3][27]. Group 1: V-JEPA 2 Overview - V-JEPA 2 is based on video training and aims to provide deeper physical world understanding and predictive capabilities [3]. - The model has achieved the top ranking in the Hugging Face physical reasoning leaderboard, surpassing GPT-4o [6]. - The training process consists of two phases: unsupervised pre-training using over 1 million hours of video and 1 million images, followed by action-conditioned training [9][10]. Group 2: Model Performance - V-JEPA 2 has demonstrated excellent understanding and prediction capabilities, achieving state-of-the-art results in various action recognition and prediction tasks [12][14]. - The model can perform zero-shot task planning, successfully completing tasks in entirely new environments with a success rate of 65% to 80% for object manipulation [17]. Group 3: World Model Concept - The concept of a world model is introduced, which allows AI to predict the consequences of actions based on an internal simulation of the physical world [21]. - Meta emphasizes the importance of understanding, predicting, and planning as key capabilities for AI's world model [25]. Group 4: New Benchmark Tests - Meta has released three new benchmarks: IntPhys 2, MVPBench, and CausalVQA, to evaluate AI models' understanding of physical laws, causal relationships, and counterfactual reasoning [23]. - These benchmarks highlight the gap between human performance (85%-95% accuracy) and current AI models, including V-JEPA 2 [24]. Group 5: Future Directions - Future efforts will focus on developing hierarchical world models and enhancing multimodal modeling capabilities to improve AI's understanding and predictive abilities [30].