Workflow
Marble
icon
Search documents
未来智造局|当AI走进物理世界:从一场技能赛看具身智能的“能”与“不能”
Xin Hua Cai Jing· 2025-12-17 16:53
新华财经上海12月17日电(记者杜康、龚雯)在日前举办的2025全球开发者先锋大会上,机器人在插 花、搬运、救灾等真实场景中"各显神通"。冷冰冰的技术参数,在这里化作了鲜活的技能比拼。当然, 大赛也暴露了具身智能"笨拙"的一面:在叠衣服、拧螺丝等精细操作背后,不少机器人仍连着"遥操 作"的手柄。 恰恰是在这"能"与"不能"的缝隙中,公众得以窥见这一火热领域的技术边界与未来方向。 从机器人的"能"里看技术进阶 回望过去一年,中国具身智能领域"快步疾行":智元远征A2人形机器人完成无间断百公里跨省行走, 充分证明了机器人能够"走得稳";行业商业化"大单"频现,机器人真正进入工厂,负责分拣、上下料; VLA(视觉-语言-动作)模型的进化,则让机器人大脑更聪明,能够听懂人的需求。 在2025全球开发者先锋大会上,观众再一次真切看到了机器人的"能"。 更棘手的是环境干扰。"光照变化、桌子周边物体的摆放、强光下周边物体在桌子上的倒影等,都有可 能让机器人'智商下线',操作不准。这种难以将目标与'背景噪音'剥离的困境,折射出当下具身智能在 物理场景理解能力上的短板——泛化性不足。"参赛队员对记者表示。 ——拧螺丝等精细活儿 ...
深度解析世界模型:新范式的路线之争,实时交互与物理仿真
海外独角兽· 2025-12-17 07:53
我们相信 26 年会是多模态技术的大年,其中视频生成会快速进步让应用大规模落地,而世界模型 则会有研究上的科学突破,甚至开始从 research 走向 production。 在相当长的一段时间内, World Model 这一概念始终处于较为混沌的状态;直到近半年,随着技术 路径逐渐收敛,尤其是在具身智能与真实交互场景中出现了初步落地的案例,世界模型的轮廓开始 变得清晰。 作者:Cage、Haozhen 如果和语言模型对比:语言模型解决的是语义层面的压缩和推理,预测下一个 token;世界模型是 在解决下一步更根本的问题,AI agent 是否能真正理解时间与空间,并进行预测下一帧、下一个行 动。如果和视频生成模型对比:世界模型在交互性、实时性、长时记忆和物理合理性这四点上都需 要更进一步。 于是行业中的玩家开始在这些提升方向有了各自的 bet, World Model 领域逐步分化出两条路线: 一条以实时视频生成为核心,服务文娱、游戏等 for human 的消费者场景;另一条以显式 3D 结构 为中心,服务机器人、自动驾驶等 for AI 的领域。 本文沿着这个路线分化展开,拆解两条路线的技术趋势和落地 ...
世界太小,不够世界模型们用了
3 6 Ke· 2025-12-04 09:29
世界模型,已经像这个世界一样混乱了。 OpenAI指着Sora生成的视频说,这就是"世界模拟器";杨立昆(Yann LeCun)指着Sora,说它是像素幻 觉,真正的世界模型应该是"预测未来的抽象大脑";谷歌DeepMind称,Genie3就是一个"可交互的通用世 界模型";而李飞飞说,"空间智能"才是正解。 现实世界是唯一的、客观的,但AI圈里似乎人人都在制造属于自己的"世界模型"。 尽管定义南辕北辙,但这群吵得不可开交的大佬们,在一个基本判断上达成了共识:大语言模型早晚到 头,世界模型才是通往AGI的必经之路。 大语言模型在GPT-3.5之后经历了参数的膨胀,而世界模型在技术路线收敛之前,就先经历了概念的通货 膨胀。 世界模型是个筐,啥都往里装 "世界模型"的混乱,根源在于它是一种目的,指的是让AI具备理解外部世界规律,预测世界变化的能力, 而非具体的技术路径。 最先混乱的就是概念。 关于世界模型的思想,最早可追溯至1943年认知科学家Kenneth Craik提出的"心智模型(Mental Model)",即大脑通过构建外部世界的微缩模型来进行预测,换句话说,我们脑中有一个心智模型,不仅 能处理当前看到 ...
从 LLM 到 World Model:为什么我们需要能理解并操作世界的空间智能?
海外独角兽· 2025-12-03 12:05
编译:Haozhen、Gemini 如今 LLM 的语言理解与生成能力已展现出惊人的广泛适用性,但随着 LLM 的发展,一个事实越 发凸显:仅靠语言,仍不足以支撑真正的智能。 从更本质的角度看,人类处理世界的方式从来不只依赖文字,而是通过视觉、空间感知、物理直觉 与行动能力等共同构成完整的认知体系。语言只是对三维世界的"有损压缩":它记录结论,却省略 过程;它表达结构,却隐藏动态。而真正的智能,源于不断与世界互动、不断在空间中推理和行动 的能力。 正因如此,构建能够"理解并操作世界"的空间智能(Spatial Intelligence)与世界模型(World Models)成为继 LLM 之后的关键方向。 2024 年,李飞飞、Justin Johnson 等学者创立了 World Labs,今年 11 月推出了 Marble 这个 3D 世界 生成模型。团队尝试突破模型"只懂文本"的限制,让模型具备在三维环境中定位、推理、模拟、生 成甚至执行任务的能力。这不仅意味着新的技术路线,也意味着新的 AI 价值尺度:从语言走向世 界、从描述走向交互、从静态认知走向动态智能。 本文整理了李飞飞和 Justin Joh ...
潮声丨人工智能有时比人还“蠢”,AI版图缺的这块拼图是什么
Sou Hu Cai Jing· 2025-12-03 00:35
Core Insights - The current era of artificial intelligence, dominated by large language models and image classifiers, has reached its limits, and AI with spatial intelligence is seen as the next frontier to break through this bottleneck [2][11][24] Group 1: AI Limitations - AI is categorized into two types: speaking intelligence and doing intelligence, with the former being strong in text output but often failing in practical tasks [6][11] - Examples of AI failures include generating unrealistic images and videos, highlighting the lack of common sense and physical understanding in current models [7][10] Group 2: Spatial Intelligence - Spatial intelligence, a concept originating from educational psychology, involves the perception, understanding, and manipulation of spatial information, which is crucial for human development and creativity [12][15] - Current AI systems lack deep, common-sense understanding of the physical world, which directly affects the quality of their outputs [11][17] Group 3: World Models - The concept of world models, inspired by human cognitive abilities, is emerging as a key area of focus for AI development, aiming to enable machines to understand and interact with the physical world [19][23] - Recent advancements in world models include new products and technologies from companies like NVIDIA and Google DeepMind, indicating a growing interest and investment in this area [22][23] Group 4: Future Challenges - Building AI that can operate like humans presents significant challenges, including the complexity and uncertainty of the real world, limitations in existing data, and the inherent constraints of physical laws [23][24]
2026年互联网传媒投资策略:国内AI纵深发展,悦己消费全球化
Group 1 - The core opportunity in the internet and media sectors for 2025 is centered around AI revaluation, particularly in cloud computing, and the globalization and youth-oriented trends in self-consumption, such as trendy toys, music, and concerts [3][4] - AI cloud capital expenditure (capex) is expected to expand in its second year, with a focus on return on investment (ROI) from AI investments, making capex/operating cash flow a key metric for investors [3][4] - Major companies to watch in the AI cloud space include Alibaba, Baidu, and Kingsoft Cloud, which are focusing on domestic production and infrastructure [3][12] Group 2 - The AI application landscape is shifting from conceptual discussions to a focus on commercial viability, with significant developments in AI advertising and video monetization expected in 2026 [3][4] - Tencent, Bilibili, Meitu, Kuaishou, and Focus Technology are highlighted as key players in the AI application ecosystem, with a particular emphasis on the monetization of chatbot applications and the evolution of AI video tools into community platforms [3][4] - The gaming sector is seeing structural opportunities driven by Generation Z and international expansion, with a focus on companies like Giant Network, Century Huatong, and Xindong Company [3][4] Group 3 - The self-consumption trend is expected to continue, with gaming, music, and trendy toys being key areas of growth, particularly as the market adjusts post-2025 [3][4] - The video sector is anticipated to reach a turning point, with policy stabilization and diverse monetization strategies being crucial for growth [3][4] - Companies such as Mango Super Media, Shanghai Film, and Reading Group are positioned to benefit from these trends [3][4] Group 4 - The report indicates a recovery in companies like Focus Media, Vision Source, and educational publishing firms, suggesting a positive outlook for these sectors [3][4] - The report emphasizes the importance of continuous performance and valuation adjustments in the context of evolving market conditions [3][4] Group 5 - The domestic cloud computing market is witnessing increased capital expenditure from major internet companies, with Alibaba and Tencent leading the charge [18][19] - The report highlights the importance of measuring the health of cloud investments through the capex/operating cash flow ratio, with Tencent's ratio being notably lower than its peers [19][29] - AI-driven cloud services are expected to maintain higher profit margins compared to traditional cloud offerings, with a focus on internal workload efficiencies [29][30] Group 6 - The report outlines the competitive landscape of AI applications, noting that Chinese companies are making significant strides in the global market, particularly in productivity tools and content generation [34][35] - The emergence of ChatGPT as a multi-functional platform is reshaping the AI application ecosystem, with significant implications for user engagement and commercial applications [35][39] - Advertising remains a critical area for AI commercialization, with companies like Meta, Tencent, and Bilibili leveraging AI to enhance ad performance and efficiency [43][49]
图灵奖得主 Yann LeCun:大模型是“死胡同”,下一步押在哪一条路?
3 6 Ke· 2025-11-28 01:43
Core Insights - Yann LeCun, a Turing Award winner, announced his departure from Meta to establish a new company focused on Advanced Machine Intelligence (AMI), marking a significant shift in his career and the AI landscape [1][2] - LeCun criticizes large language models (LLMs), labeling them as a "dead end" for achieving human-like intelligence, emphasizing their lack of real-world understanding and limitations in reasoning and action [3][4] Group 1: Critique of Large Language Models - LeCun argues that while LLMs perform well in language tasks, they do not possess true understanding of the world, lacking common sense and causal reasoning [5][6] - He highlights that the performance of LLMs is reaching a saturation point, where increasing model size does not equate to enhanced intelligence [6][7] - The training data and computational costs are approaching their limits, leading to diminishing returns in understanding [7][8] - LLMs are described as being unable to plan or take action effectively, with LeCun providing examples of how human-like intelligence involves more than just language skills [12][13] Group 2: The Concept of World Models - LeCun proposes that the next generation of AI should focus on building "world models" that allow AI to understand and interact with the physical world [14][15] - He introduces the Joint Embedding Predictive Architecture (JEPA) as a new learning paradigm that contrasts with LLMs by enabling AI to learn from multi-modal inputs and develop an internal representation of the world [16][17] - JEPA emphasizes the importance of action and planning, moving beyond mere language processing to a more holistic understanding of the environment [18][19] Group 3: Diverging Paths in AI Development - Both LeCun and former OpenAI chief scientist Ilya Sutskever are questioning the current trajectory of AI, but they propose different solutions: LeCun focuses on world models, while Sutskever emphasizes safety and control in AI systems [25][26] - The industry is witnessing a shift towards new architectures and approaches, as evidenced by significant investments and developments in embodied intelligence and robotics [34][35] - The future of AI is seen as a marathon rather than a sprint, with both LeCun and Sutskever acknowledging that their proposed directions will take years to mature [38][40] Group 4: Implications for Entrepreneurs and Developers - LeCun's transition signals that larger models do not necessarily equate to better intelligence, highlighting the need for architectural innovation [41] - There are opportunities in vertical applications, particularly in fields requiring physical interaction, such as robotics and autonomous driving [42] - The importance of open-source development is emphasized, as LeCun's new company will continue to support this approach, allowing smaller teams to contribute to new paradigms [43]
李飞飞:不要让AI把你变愚蠢,必须守住“人”的主导权
虎嗅APP· 2025-11-25 10:19
Core Viewpoint - AI is a civilization-level technology that has a profound impact on human life and society, requiring careful management to ensure it serves humanity rather than dominating it [4][6]. Group 1: Nature of AI and Human Role - AI is a double-edged sword with both potential and risks, necessitating human guidance and control [5][7]. - The development of AI should be inclusive and open, allowing everyone to participate and shape its future, breaking the monopoly of a few tech giants [5][8]. - The current AI landscape is dominated by a few companies, primarily in the U.S., and there is a need for responsible use of technology [8] Group 2: Future of AI - "Spatial intelligence" is identified as the next key phase in AI evolution, enabling machines to understand and interact with three-dimensional spaces [5][22]. - The societal impact of AI on education, employment, and social structures requires collective responsibility from individuals, businesses, and public sectors [5][25]. - Effective governance of superintelligence is crucial, focusing on human decision-making rather than the technology itself [27][28]. Group 3: Education and Human Development - In the AI era, education should focus on nurturing curiosity, critical thinking, and responsibility in children, preparing them to be active participants rather than passive users of technology [5][31]. - The importance of teachers in society is emphasized, as they play a critical role in guiding students in the responsible use of AI tools [34][35]. Group 4: Industry Trends and Challenges - The influx of capital into the AI sector raises concerns about potential market bubbles, but the demand for AI applications in various fields remains strong [32][33]. - The environmental impact of AI's energy consumption is a pressing issue, highlighting the need for renewable energy innovations [33]. Group 5: Personal Insights and Experiences - The journey from a challenging immigrant experience to becoming a leader in AI reflects resilience and the importance of curiosity in scientific exploration [15][17][20]. - The influence of mentors and the importance of interdisciplinary approaches in AI research are acknowledged [19][11].
Meta再推WorldGen,一句话「盖」出50×50米一座城
具身智能之心· 2025-11-25 00:03
Core Insights - Meta has introduced a groundbreaking research project called WorldGen, which allows users to generate fully navigable and interactive 3D worlds from simple text prompts [12][22][30] - The technology leverages advanced procedural reasoning, diffusion models, and object-oriented scene decomposition to create coherent and visually rich 3D environments [13][19][29] Group 1: Technology Overview - WorldGen enables the creation of 3D worlds by inputting a simple prompt, such as "a medieval village in cartoon style," resulting in a consistent and themed environment [5][12] - The generated 3D worlds are not just static images but are interactive and allow for free movement within the space, maintaining structural integrity and connectivity between different areas [9][12] - Unlike existing methods that often degrade in quality when viewed from different angles, WorldGen maintains high-quality textures and geometry across a 50 x 50 meter area [19][29] Group 2: Development and Future Plans - Currently, WorldGen is in the research phase and is not yet available to developers, but it is compatible with major game engines like Unity and Unreal without additional conversion processes [22][31] - Future iterations of WorldGen are expected to support larger-scale world generation and reduce latency in the generation process [20][22] - The introduction of WorldGen signifies a shift in 3D content creation, making it more accessible to non-experts and potentially revolutionizing workflows in various industries [22][30]
李飞飞最新长文:AI很火,但方向可能偏了
创业邦· 2025-11-23 11:15
Core Viewpoint - The article discusses the limitations of current AI language models, emphasizing that while they are advanced in processing language, they lack true understanding of the physical world, which is essential for achieving genuine intelligence [5][6][7]. Group 1: Limitations of Current AI Models - Current AI language models, like ChatGPT and Google's Gemini, excel at predicting the next word based on statistical patterns but fail to understand basic physical concepts [6][7]. - The analogy of a scholar in a dark room illustrates that while these models can generate coherent text, they lack real-world experience and understanding [7][13]. - AI's reliance on language statistics rather than physical interactions leads to nonsensical outputs, highlighting the need for a deeper understanding of the world [8][13]. Group 2: The Concept of Spatial Intelligence - To advance AI, it is crucial to develop "spatial intelligence," which involves understanding and interacting with the physical world without relying solely on language [8][14]. - The article posits that true intelligence requires the ability to predict physical interactions and outcomes, akin to how humans learn through experience [14][15]. - Examples from child development and scientific discovery illustrate how spatial interactions lead to a deeper understanding of cause and effect [9][11]. Group 3: Future Directions for AI - The future of AI may shift from predicting the next word to predicting the next frame of the world, integrating physical laws and spatial reasoning [14][17]. - Developing a "world model" that incorporates spatial data and physical interactions could revolutionize AI capabilities, allowing for more accurate simulations and predictions [15][17]. - The article mentions ongoing efforts to extract spatial information from 2D videos to train AI models, indicating a significant area of research [17][18]. Group 4: Practical Applications and Opportunities - The emergence of AI with spatial intelligence could lead to practical applications in robotics, enhancing their ability to navigate and interact with real-world environments [20][21]. - Potential use cases include virtual scene generation for design, therapy, and educational purposes, showcasing the versatility of AI in various fields [21][22]. - The ability to convert imagination into tangible reality presents significant opportunities for innovation and entrepreneurship [22][23].