世界模型

Search documents
对话智源研究院院长王仲远:人工智能正加速从数字世界走向物理世界
Mei Ri Jing Ji Xin Wen· 2025-06-06 05:15
每经记者|可杨 每经编辑|董兴生 6月6日,智源研究院在"2025智源大会"上发布"悟界"系列大模型,宣告其从"悟道"时代迈入"具身智能"探索阶段。 智源研究院院长王仲远在接受《每日经济新闻》记者在内的媒体采访时表示,"AI(人工智能)正加速从数字世界走向物理世界",这是推动其战略升级的根 本逻辑。 王仲远 图片来源:主办方供图 这一判断背后,是AI技术与应用边界的重构。当前,主流大模型大多聚焦在C端文本生成、语言对话等"数字智能"场景,而智源试图将AI推向更具挑战性也 更具想象空间的"现实世界"——包括机器人、操作系统与世界模型的构建。在王仲远看来:"这个世界不需要那么多'博士',更需要能执行任务、能落地的 AI。" "具身智能"正成为下一场AI竞赛的起点。王仲远判断,具身智能的"小组赛"还没结束,远没有到"淘汰赛"。但谁能在这一新赛道率先跑通技术路径、突破数 据瓶颈,谁或将定义人工智能的下一个十年。 从早期的"悟道"系列到如今的"悟界"系列,智源研究院的战略转向并非突如其来,而是"水到渠成"。王仲远坦言:"我们认为人工智能最终要造福人类社 会,要帮助大家摆脱繁琐的、重复的、简单的劳动,使得大家能够更多地享 ...
蔚来-SW(09866.HK):将迎来多款新车交付;改革成效有望逐步兑现
Ge Long Hui· 2025-06-06 02:06
Core Viewpoint - The company reported 1Q25 performance in line with market expectations, with revenue of 12 billion and a Non-GAAP net loss of 6.28 billion, driven by seasonal factors and product iteration [1] Group 1: Financial Performance - 1Q25 revenue reached 12 billion, with a Non-GAAP net loss of 6.28 billion, aligning with market expectations [1] - Vehicle deliveries in 1Q25 totaled 42,094 units, showing a sequential decline [1] - The automotive gross margin decreased to 10.2%, but the company aims for some models to exceed a gross margin of 20% through pricing stability and cost reductions [1] Group 2: Future Outlook - The company plans to deliver 72,000 to 75,000 vehicles in 2Q25, with several new models expected to contribute to growth in 2025 [1] - The company initiated internal CBU reforms to enhance operational efficiency, with expected improvements in expense ratios starting from 2Q [2] - The target for R&D expenses in 2Q is to achieve a 15% efficiency improvement, aiming to control quarterly R&D expenses between 2 to 2.5 billion [2] Group 3: Product Development - The first version of the NWM (NIO World Model) was launched on May 30, focusing on safety and enhancing user experience across various driving scenarios [3] - The company has ensured that its smart driving technology can be updated and iterated, providing existing vehicle owners access to the latest advancements [3] Group 4: Valuation and Market Position - The current valuation for US and Hong Kong stocks corresponds to a 0.6x P/S for 2025, with a maintained outperform rating for 2025-26 Non-GAAP net profit [3] - Target prices for Hong Kong and US stocks have been reduced by 15% to 41 HKD and 5.3 USD, respectively, indicating potential upside of 47% and 41% from current prices [3]
马斯克与特朗普公开对骂,特斯拉市值一夜蒸发超1万亿元;“AI教母”李飞飞揭秘“世界模型”丨全球科技早参
Mei Ri Jing Ji Xin Wen· 2025-06-06 00:30
Group 1 - OpenAI's model behavior head emphasizes the importance of focusing on AI's impact on human emotional well-being rather than debating its essence, suggesting that humans are developing feelings towards AI and will soon enter an "AI consciousness" phase [2] - The public dispute between Elon Musk and Donald Trump has led to a significant drop in Tesla's stock price, with a loss of over $152.5 billion in market value, highlighting the complex relationship between politics and business [3] - Microsoft's CEO acknowledges that the partnership with OpenAI is evolving but remains strong, indicating an understanding of the necessary changes as both companies adapt to new challenges [4] Group 2 - AI expert Fei-Fei Li discusses the concept of "world models," which aims to enable AI systems to understand and reason about the physical world, particularly in three dimensions, potentially advancing AI capabilities beyond text comprehension [5] - Circle, known as the "first stablecoin stock," successfully listed on the NYSE with an opening price increase of 122.58%, reflecting the growing significance of stablecoins in the cryptocurrency market [6]
腾讯研究院AI速递 20250606
腾讯研究院· 2025-06-05 15:26
Group 1: ChatGPT Updates - ChatGPT has introduced a new connector feature for deep research, allowing access to enterprise and personal data sources such as Outlook, Teams, and Google Drive [1] - A new recording mode has been launched, supporting automatic transcription, key point extraction, and timestamped queries, initially available for macOS Team users [1] - OpenAI has adjusted its pricing strategy, adding credit points for Enterprise and Team workspaces, enabling existing users to fully access the latest model features [1] Group 2: Cursor 1.0 Release - Cursor 1.0 has officially launched, introducing the BugBot automatic code review tool that can identify potential bugs and provide repair suggestions [2] - The background agent feature is now available to all users, supporting deep integration with Jupyter Notebook, significantly enhancing efficiency in research and data science tasks [2] - A new memory function remembers key information from conversations, allows one-click installation of the MCP server, and optimizes chat experience with direct rendering of Mermaid charts and Markdown tables [2] Group 3: Luma AI's Modify Video Feature - Luma AI has launched the "Modify Video" feature, which can completely change scenes, characters, and environments while preserving the original video's actions and camera movements [3] - This feature supports video motion capture, style transfer, and single-element editing, allowing precise control over the elements to be edited without altering the original actions [3] - Official evaluations show that Luma surpasses competitors like Runway V2V in viewer enjoyment, structural similarity, and motion trajectory tracking across multiple dimensions [3] Group 4: Bland TTS Voice Cloning Technology - Bland TTS has introduced groundbreaking voice cloning technology that can perfectly replicate a speaking style with just 3-6 voice samples and automatically adjust emotional expression based on text content [4][5] - This technology disrupts traditional TTS pipeline models by using large language models to directly predict "audio tokens," achieving four core functions: voice style control, sound effect generation, voice mixing, and emotional understanding [5] - Bland TTS is widely applied in creator voiceovers, developer API integration, and enterprise customer service, with future potential for hyper-personalized voice assistants and a revolution in language learning [5] Group 5: Firecrawl Search API Launch - Firecrawl has released version 1.10.0, introducing the Search MCP, which enables one-click web search and content scraping capabilities [6] - The new version supports various output formats and customizable search parameters, with comprehensive support for these new features in Python/Node.js SDK [6] - Enhanced functionalities include automatic proxy scraping, Redis separation, concurrent logging interfaces, improved metadata extraction, and fixes for subdomain handling to enhance stability [6] Group 6: Visual Embodied Brain Framework - Shanghai AI Lab has proposed the VeBrain framework, integrating visual perception, spatial reasoning, and robotic control capabilities [7] - This framework innovatively transforms robotic control into conventional 2D spatial text tasks and achieves precise mapping from text decisions to real actions through a "robot adapter" [7] - VeBrain outperforms GPT-4o and Qwen2.5-VL in 13 multimodal benchmark tests, improving success rates in robotic control tasks by 50%, and has constructed a high-quality dataset of 600,000 instructions [7] Group 7: DeepMind's Insights on Agents and World Models - DeepMind scientist Jon Richens' ICML 2025 paper reveals that any agent capable of generalizing to multi-step goal tasks must have learned an environmental prediction model, asserting that "agents are world models" [8] - The research demonstrates that agent strategies contain all information necessary to accurately simulate the environment, and algorithms can extract world models from these strategies, aligning with Ilya's 2023 predictions [8] - The study indicates that there is no shortcut to achieving AGI without a model, emphasizing that enhancing performance and generality requires learning more precise world models, while "short-sighted agents" focus only on immediate rewards without learning world models [8] Group 8: Karpathy's Views on Software Complexity - Karpathy argues that software products with complex UIs, lack of script support, and opaque binary formats face the risk of obsolescence, as LLMs struggle to understand and operate their underlying data [9] - He categorizes software by risk levels: Adobe products and DAWs are in the high-risk zone, Blender and Unity are in the mid-high risk zone, Excel is in the mid-low risk zone, while text-based tools like VS Code and Figma are in the low-risk zone [9] - Even with advancements in AI's understanding of UI/UX, products that do not proactively adapt to current technological standards will remain at a disadvantage [9] Group 9: Fei-Fei Li's Perspective on LLMs and World Models - Fei-Fei Li believes that LLMs represent a "lossy compression" of cognition, asserting that world models are the true important direction for AI development, with spatial intelligence being more ancient and fundamental [10] - She founded World Labs to develop AI systems with "spatial intelligence," claiming that technological breakthroughs like NeRF have made world model construction feasible [10] - The applications of world models extend beyond robotics, enabling AI to not only "understand" the three-dimensional world but also to "generate" and "manipulate" virtual spaces, opening new dimensions for design, creation, and simulation experiments [10]
【蔚来(NIO.N)】1Q25基本面承压,多维度寻求边际改善——2025年一季度业绩点评(倪昱婧)
光大证券研究· 2025-06-05 13:36
Core Viewpoint - The report indicates that NIO's financial performance in Q1 2025 faced pressure, with a significant decline in revenue compared to the previous quarter, but a year-on-year increase was noted [3][4]. Financial Performance Summary - NIO's total revenue in Q1 2025 was 12.04 billion yuan, reflecting a year-on-year increase of 21.5% but a quarter-on-quarter decrease of 38.9% [3]. - The gross margin for Q1 2025 was 7.6%, which is an increase of 2.7 percentage points year-on-year but a decrease of 4.1 percentage points quarter-on-quarter [3]. - The Non-GAAP net loss attributable to the parent company widened by 28.2% year-on-year and narrowed by 4.2% quarter-on-quarter to 6.28 billion yuan [3]. Operational Insights - In Q1 2025, NIO delivered 42,000 vehicles, a year-on-year increase of 40.1% but a quarter-on-quarter decrease of 42.1% [4]. - The automotive business revenue was 9.94 billion yuan, with a year-on-year increase of 18.6% but a quarter-on-quarter decrease of 43.1% [4]. - The average selling price (ASP) decreased by 15.3% year-on-year and 1.8% quarter-on-quarter to 236,000 yuan [4]. - The Non-GAAP vehicle loss per unit expanded to 149,000 yuan, and free cash flow remained under pressure, with total cash on hand at 26 billion yuan by the end of Q1 2025 [4]. Future Outlook - Management guidance for Q2 2025 estimates delivery volumes of approximately 72,000 to 75,000 vehicles [4]. - The company anticipates that the gross margin may still be under pressure due to the clearance of older models until June, when new models are expected to drive margin recovery [4]. - NIO is implementing multiple strategies to improve its fundamentals, including cost reduction through self-developed chips and enhancing the sales network for its new brand, Lado [5]. - The launch of the "World Model" on May 30 is expected to enhance NIO's leadership in intelligent driving technology [5].
CVPR 2025 Tutorial:从视频生成到世界模型 | MMLab@NTU团队&快手可灵等联合呈现
量子位· 2025-06-05 08:32
Core Insights - Video generation technology has evolved from simple animations to high-quality dynamic content capable of storytelling and long-term reasoning [1] - The advancements in models like 可灵, Sora, Genie, Cosmos, and Movie Gen are expanding the boundaries of video generation, prompting researchers to explore deeper questions about its potential as a bridge to world models and its role in embodied intelligence [2][6] Group 1: Video Generation and Its Implications - Video generation is being recognized as a powerful visual prior that can enhance AI's perception of the world, understanding interactions, and reasoning about physics, leading towards more general and embodied intelligent world models [3] - The tutorial at CVPR 2025 will feature leading researchers from academia and industry discussing how generative capabilities can be transformed into a foundation for perception, prediction, and decision-making [4] Group 2: Tutorial Details - The CVPR 2025 tutorial is scheduled for June 11, 2025, at the Music City Center in Nashville, TN, focusing on the transition from video generation to understanding and modeling the real world [9] - The agenda includes various invited talks from experts in the field, covering topics such as scaling world models, physics-grounded models, and advancements in video generation [5] Group 3: Future Directions - The development of video generation models suggests potential for understanding interactions between objects and capturing the physical and semantic causality behind human behavior, indicating a shift from mere generation to interactive world modeling [6] - The tutorial aims to provide insights, tools, and future research directions for those interested in video generation, multimodal understanding, embodied AI, and physical reasoning [7]
2025中国高阶智能辅助驾驶最新技术洞察:算力跃迁、数据闭环、VLA与世界模型
EqualOcean· 2025-06-05 05:42
Investment Rating - The report does not explicitly state an investment rating for the industry Core Insights - The report highlights the evolution of advanced driver assistance systems (ADAS) in China, focusing on the expansion of operational design domains (ODD), technological equity, safety concerns, and supportive policies [4][21][23] - It emphasizes the need for algorithm, data, and computing power upgrades to address safety shortcomings in high-level ADAS technologies [23][66] - The report discusses the transition from modular to end-to-end architectures in vehicle algorithms, aiming for human-like driving capabilities [66][68] Summary by Sections 1. Market Background - The expansion of high-level ADAS ODD is noted, with a focus on technological inclusivity and addressing accident anxiety through safety redundancies [4][21] - Policy support is highlighted as crucial for rational promotion of ADAS technologies [4][21] 2. Technology Insights - The report decodes the underlying logic of data, algorithms, and computing power in high-level ADAS [4][28] - It discusses the computing power landscape, noting the shift towards higher TOPS (trillions of operations per second) capabilities in vehicle and cloud computing [42][44] - Data challenges, including collection and positioning technologies, are identified as critical areas for development [4][28] 3. Competitive Analysis - The competitive landscape is analyzed, detailing the tiered structure of companies and their development strategies [29][30] - The report outlines various collaboration models among automotive manufacturers and technology providers, emphasizing the balance between self-research and external sourcing [83] 4. Trend Insights - The report notes the commercialization progress of passenger vehicle L3 systems, indicating a growing market for advanced ADAS [31][32] - It highlights the importance of continuous upgrades and iterations in ADAS functionalities to meet evolving consumer expectations and safety standards [82][83]
图灵奖得主杨立昆:中国人并不需要我们,他们自己就能想出非常好的点子
AI科技大本营· 2025-06-02 07:24
Core Viewpoint - The current large language models (LLMs) are limited in their ability to generate original scientific discoveries and truly understand the complexities of the physical world, primarily functioning as advanced pattern-matching systems rather than exhibiting genuine intelligence [1][3][4]. Group 1: Limitations of Current AI Models - Relying solely on memorizing vast amounts of text is insufficient for fostering true intelligence, as current AI architectures struggle with abstract thinking, reasoning, and planning, which are essential for scientific discovery [3][5]. - LLMs excel at information retrieval but are not adept at solving new problems or generating innovative solutions, highlighting their inability to ask the right questions [6][19]. - The expectation that merely scaling up language models will lead to human-level AI is fundamentally flawed, with no significant advancements anticipated in the near future [19][11]. Group 2: The Need for New Paradigms - There is a pressing need for new AI architectures that prioritize search capabilities and the ability to plan actions to achieve specific goals, rather than relying on existing data [14][29]. - The current investment landscape is heavily focused on LLMs, but the diminishing returns from these models suggest a potential misalignment with future AI advancements [18][19]. - The development of systems that can learn from natural sensors, such as video, rather than just text, is crucial for achieving a deeper understanding of the physical world [29][37]. Group 3: Future Directions in AI Research - The exploration of non-generative architectures, such as Joint Embedding Predictive Architecture (JEPA), is seen as a promising avenue for enabling machines to abstractly represent and understand real-world phenomena [44][46]. - The ability to learn from visual and tactile experiences, akin to human learning, is essential for creating AI systems that can reason and plan effectively [37][38]. - Collaborative efforts across the global research community will be necessary to develop these advanced AI systems, as no single entity is likely to discover a "magic bullet" solution [30][39].
具身进化·无界未来:这场论坛引领具身智能模型革命新浪潮
机器之心· 2025-05-30 09:33
机器之心报道 机器之心编辑部 具身智能持续进化的浪潮下, "具身 AI 模型 +人形机器人"为 AGI 走进物理世界提供了更多可能。多模态大模型的兴起为具身 AI 注入强劲动力,世界模型 的出现也为具身智能训练和测试提供了新范式。如何让机器智能不仅「看懂」物理世界,更能像人类一样理解、规划并操作,是当下学术和业界共同面临的 挑战和机遇。 5 月 29 日,2025 张江具身智能开发者大会暨国际人形机器人技能大赛在上海浦东张江科学会堂顺利举行。作为大会重要组成模块, "具身·无界:智能模 型的范式创新与架构革命"论坛(以下简称"论坛")在上海市经济和信息化委员会、上海市浦东新区人民政府指导下,由上海张江(集团)有限公司主办, 上海张江数智经济发展有限公司、机器之心承办,上海市浦东新区工商联张江人工智能商会协办。 本场论坛汇聚顶尖技术专家、知名高校学者、具身智能明星厂商代表等 10 余位重磅嘉宾,行业领袖深度洞察,技术大咖同台论道,深入探讨具身 AI 与世 界模型、分层决策与端到端路线、具身智能 Scaling Law 等业界热点话题,带来 五 场精彩的主题演讲与一场高质量圆桌对话,论坛由机器之心副主编谢文 菲主 ...
大模型智能体如何突破规模化应用瓶颈,核心在于Agentic ROI
机器之心· 2025-05-30 04:16
Core Viewpoint - The main barrier to the usability of large language model agents (LLM Agents) is not the capability of the models but rather the "Agentic ROI" which has not reached a practical threshold for widespread application [1][3][4]. Group 1: Agentic ROI Concept - Agentic ROI (Agentic Return on Investment) is a key metric that measures the ratio of "information yield" to "usage cost" for LLM Agents in real-world scenarios [4]. - Usability is achieved only when the quality of information exceeds a certain threshold and the ratio of time and cost saved by the agent is sufficiently high [4][5]. Group 2: Current Application Landscape - Most LLM Agents are currently applied in high human task time cost scenarios, such as research and programming, where human labor is intensive, thus allowing for significant efficiency improvements [7]. - In everyday applications with high user demand, such as e-commerce and personal assistants, the tasks are simpler, leading to lower marginal value from LLM Agents, which may introduce additional interaction costs and delays, resulting in low Agentic ROI [7]. Group 3: Development Trajectory - The development path of LLM Agents is characterized by a "zigzag" model of first scaling up to enhance information quality, followed by scaling down to reduce time and cost while maintaining quality [9]. - The evolution of foundational models, such as the OpenAI series, illustrates this zigzag trend, with significant performance improvements in larger models and the introduction of smaller models that maintain performance while reducing inference costs and delays [9]. Group 4: Scaling Up Information Quality - Pre-training scaling involves expanding model size, data volume, and computational resources to enhance foundational capabilities in language understanding and reasoning [11]. - Post-training scaling, including supervised fine-tuning and reinforcement learning, aligns the agent's performance with human needs and values, relying on extensive interaction data for continuous learning [12]. - Test-time scaling focuses on building a world model that supports multimodal interactions and can handle complex tasks while reflecting real-world uncertainties [13]. Group 5: Ensuring Robustness and Security - Ensuring the robustness and security of LLM Agents is crucial for enhancing information quality, preventing exploitation of reward mechanisms, and safeguarding against data contamination and feedback manipulation [16]. Group 6: Scaling Down to Reduce Time and Cost - Introducing memory mechanisms allows agents to skip redundant calculations, leveraging past knowledge to enhance processing speed [18]. - Model compression techniques can significantly reduce computational resources and inference delays without compromising performance [18]. - Optimizing reasoning strategies and infrastructure can further enhance the efficiency and responsiveness of LLM Agents [18]. Group 7: Cost Management - Reducing interaction time by enabling agents to proactively understand user intent can lower cognitive burdens and improve user experience [19]. - Managing operational costs effectively is essential, especially in large-scale deployments, by optimizing context management and controlling inference complexity [19]. - Agentic ROI serves as a framework for evaluating the real usability of LLM Agents, shifting focus from mere model performance to practical benefits and comprehensive efficiency [19].