Workflow
世界模型
icon
Search documents
李飞飞发文:空间智能将成AI攀登的下一座高峰
Ke Ji Ri Bao· 2025-11-18 05:17
Core Insights - The development of artificial intelligence (AI) is entering a new phase, transitioning from "understanding language" to "understanding the world" [1] - "Spatial intelligence" is identified as the next frontier for AI, which will enable machines to perceive, reason, and act in the real world like humans [4][9] Current Limitations of AI - Current AI systems, primarily large language models, excel in text and image generation but lack fundamental capabilities in representing and interacting with the physical world [4][6] - These models struggle with basic tasks such as estimating distance, direction, and size, and often fail to maintain coherence in generated videos [4][6] Importance of Spatial Intelligence - Spatial intelligence is crucial for human cognitive construction, driving imagination, creativity, and reasoning, and is essential for integrating perception and action [4][8] - This capability allows for everyday tasks like estimating parking distances and navigating through crowds, representing a leap from mere knowledge to true understanding [4][8] Path to Achieving Spatial Intelligence - To realize true spatial intelligence, a shift from existing large language models to a more fundamental "world model" is necessary [6] - This new model should understand semantic relationships and consistently "imagine" and "reconstruct" the world in terms of geometry, physics, and dynamic rules [6] Applications and Implications - The development of world models can redefine AI's functionality, enabling proactive planning and adaptation in various fields, including robotics and creative industries [8][9] - In creative fields, spatial intelligence will allow creators to construct virtual worlds and visualize structures instantaneously, enhancing the creative process [8][9] Future Prospects - AI with spatial intelligence will not replace humans but will enhance professional judgment, creativity, and empathy, serving humanity more deeply [9] - The transition from language to spatial understanding signifies a new era for AI, capable of genuinely comprehending reality [9]
瞭望 | 何时摆脱遥控器
Xin Hua She· 2025-11-18 03:06
Core Insights - The development of embodied intelligence in China is rapidly advancing, showcasing impressive capabilities in various tasks, but there is a need to look beyond surface-level achievements to understand the actual limitations of current technology [1][5] - Achieving full autonomy in robots requires significant advancements in their cognitive abilities, particularly in understanding and interacting with the physical world [3][5] Group 1: Technological Challenges - The key to overcoming remote control limitations lies in developing a powerful cognitive framework that allows robots to perceive, decide, execute, and provide feedback autonomously [3][5] - Current advancements in embodied intelligence include the VLA large model, which integrates visual, language, and action modalities to enable robots to understand their environment and execute tasks without human intervention [3][4] - The development of world models, which simulate environmental dynamics, is crucial for enhancing robots' predictive capabilities and decision-making processes [4][5] Group 2: Limitations in General Intelligence - Despite breakthroughs in embodied intelligence, there remains a significant gap in achieving general intelligence, as robots can perform well in specific scenarios but struggle in diverse environments [5][6] - The integration of tactile feedback into robots is a complex challenge, as it requires multi-dimensional perception capabilities that go beyond visual data [5][6] - Current algorithms still lack the generalization ability needed for robots to perform effectively across various tasks and environments [6] Group 3: Standardization and Application - To accelerate the realization of general intelligence, there is a need for standardized frameworks that can facilitate technology alignment and product deployment in real-world scenarios [7][8] - Industry organizations are developing classification frameworks for embodied intelligence, similar to those in autonomous driving, to promote technological advancement and application in various fields [7][8] - The establishment of a four-dimensional, five-level evaluation system for humanoid robots will help define capability requirements and applicable scenarios, thereby enhancing their deployment in sectors like logistics, education, and healthcare [8]
李飞飞给AGI泼了盆冷水
3 6 Ke· 2025-11-18 00:17
Core Viewpoint - The development of AI requires fundamental technological innovation beyond just scaling laws, and the concept of Artificial General Intelligence (AGI) is seen more as a marketing term than a scientific one [1][7][9]. Group 1: AI Development Insights - The combination of neural networks, big data, and GPUs is identified as the "golden formula" for modern AI, which remains relevant today with the success of ChatGPT [4][5]. - Current AI systems struggle with tasks that are easy for humans, indicating a significant gap in achieving true creativity, abstract thinking, and emotional intelligence [8][9]. - The concept of "world models" is proposed as a key direction for future AI development, enabling better understanding and interaction with three-dimensional environments [10][17]. Group 2: Challenges in Robotics - The challenges in robotics are highlighted, particularly the difficulty in data acquisition and the complexity of operating in three-dimensional spaces, which is more challenging than autonomous driving [15][16]. - The "bitter lesson" of using simple models with vast data does not apply straightforwardly to robotics due to the unique nature of action data required for training [15][16]. Group 3: AI's Role in Society - The potential of AI to enhance human capabilities rather than replace them is emphasized, with a focus on ensuring that technology development respects human dignity and agency [18][19]. - The belief is expressed that in the AI era, everyone will have a place, highlighting the importance of inclusivity in the technological landscape [19].
做了一份端到端进阶路线图,面向落地求职......
自动驾驶之心· 2025-11-18 00:05
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language Agent) technical talent in the automotive industry, with salaries for experts reaching up to $70,000 per month for positions requiring 3-5 years of experience [1] - The technology stack for end-to-end and VLA is complex, involving various advanced algorithms such as BEV perception, Vision-Language Models (VLM), diffusion models, reinforcement learning, and world models [1] - The company is offering specialized courses to help individuals quickly and efficiently learn about end-to-end and VLA technologies, collaborating with experts from both academia and industry [1] Course Offerings - The "End-to-End and VLA Autonomous Driving Course" focuses on the macro aspects of end-to-end autonomous driving, covering key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [10] - The "Autonomous Driving VLA and Large Model Practical Course" is led by academic experts and covers VLA from the perspective of VLM as an autonomous driving interpreter, modular VLA, and current mainstream inference-enhanced VLA [1][10] - Both courses include practical components, such as building a VLA model and dataset from scratch, and implementing algorithms like the Diffusion Planner and ORION algorithm [10][12] Instructor Profiles - The instructors include experienced professionals and researchers from top institutions, such as Tsinghua University and QS30 universities, with backgrounds in multimodal perception, autonomous driving VLA, and large model frameworks [6][9][12] - Instructors have published numerous papers in prestigious conferences and have hands-on experience in developing and deploying advanced algorithms in the field of autonomous driving [6][9][12] Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and concepts related to transformer large models, reinforcement learning, and BEV perception [14] - Participants are expected to have a background in probability theory and linear algebra, as well as proficiency in Python and PyTorch [14]
腾讯研究院AI速递 20251118
腾讯研究院· 2025-11-17 16:18
Group 1: Meta's AI Integration - Meta will officially incorporate "AI-driven impact" into employee performance metrics starting in 2026, assessing how employees utilize AI to enhance work outcomes and team productivity [1] - The company has launched the "Level Up" game project and AI performance assistant tools this year to encourage employees to use the internal AI chatbot Metamate as much as possible [1] - Meta has begun allowing some job candidates to use AI assistants during coding interviews, believing this better represents a real development environment [1] Group 2: Google NotebookLM Features - Google NotebookLM introduced image data source functionality on November 15, enabling automatic OCR and semantic parsing, allowing users to retrieve content from images using natural language [2] - The underlying multimodal model can distinguish between handwritten and printed areas, extract table structures, and automatically link with existing text, audio, and video notes [2] - Within 48 hours of the feature launch, educational accounts uploaded over 500,000 pages of images, a 340% increase, with plans to integrate AR glasses for real-time "see and ask" capabilities next year [2] Group 3: Alibaba's Qianwen App Launch - Alibaba's Qianwen app public beta has launched, built on the Qwen3 model, providing an all-in-one entry point for users to experience a full suite of AI capabilities for free [3] - The application will gradually cover various life scenarios including office work, maps, health, and shopping, aiming to make AI a daily companion [3] - Qianwen will continue to evolve and integrate the latest Qwen models, currently available for search and download in major app stores in China [3] Group 4: Zhiyu GLM Coding Plan - Zhiyu has launched the "GLM Coding Plan·Special Edition" subscription package, offering a 50% discount for first-time buyers, with a minimum monthly cost of only 16 yuan [4] - Powered by the flagship model GLM-4.6, it ranked first globally in the LMArena evaluation alongside Claude Sonnet 4.5 and GPT-5, supporting 200K long context [4] - The model is officially compatible with over 10 mainstream AI programming tools, with several US tech companies like Cerebras and Vercel adopting GLM-4.6 [4] Group 5: Xiaomi's Miloco Solution - Xiaomi has launched its first "large model + smart home" solution, Miloco, using the Mijia camera as a visual information source, with the self-developed large language model MiMo-VL-Miloco-7B at its core, and the framework is open-sourced [5] - Users can communicate with the smart home system through natural language, allowing the system to automatically fulfill various smart needs and rules while ensuring privacy through visual data understanding [5] - Xiaomi's AIoT platform has connected nearly 1 billion IoT devices, and Miloco achieves interoperability between the Mijia ecosystem and Home Assistant ecosystem through standardized MCP protocols, supporting third-party IoT platform integration [5] Group 6: MiroMind's MiroThinker v1.0 - MiroMind has officially launched the open-source intelligent agent base model MiroThinker v1.0, introducing a new dimension of "deep interaction scaling," supporting 256K context and 600 tool calls [6] - In the BrowseComp test, it achieved an accuracy rate of 47.1%, nearing OpenAI DeepResearch's 51.5%, while surpassing DeepSeek-v3.2 by 7.7 percentage points in Chinese tasks [6] - The model adopts a fully open-source architecture, providing all model weights, toolchains, and interaction frameworks, with the 72B version approaching or even surpassing OpenAI DeepResearch, promoting intelligent agents from passive execution to active learning evolution [6] Group 7: MedGPT's Clinical Success - The core model of Future Doctor AI Studio, MedGPT, has outperformed GPT-5 and other leading international models in a multi-model practical evaluation conducted by 32 top domestic clinical experts, achieving the global first in clinical safety and effectiveness assessment [7] - It has launched two products: a clinical decision AI assistant and a patient follow-up AI assistant, providing safe and effective decision support during diagnosis and supporting patient follow-up for chronic disease management [7] - MedGPT has been adopted by dozens of national discipline leaders for daily use and is recognized by experts as the "best practice" for AI empowering grassroots healthcare, aligning with the National Health Commission's guidelines for promoting and regulating AI in healthcare [7] Group 8: Li Feifei on AGI - Li Feifei stated in an interview that AGI is "more of a marketing term than a scientific term," emphasizing that the current AI's biggest shortcoming is the lack of spatial intelligence, which allows humans to navigate and manipulate in a three-dimensional world [8] - She outlined three core capabilities of world models: generative, multimodal, and interactive, arguing that relying solely on data and computing power will not lead to the maturity of robots, which are physical systems needing bodies and application scenarios [8] - The first large-scale world model product, Marble, released by World Labs, has been widely applied in film production, game development, scientific research, and robot training, reducing creation time by 40 times [8]
AI为啥不懂物理世界?李飞飞、杨立昆:缺个「世界模型」,得学大脑新皮质工作
量子位· 2025-11-17 13:23
Core Insights - The future of AI may be linked to understanding the evolutionary secrets of the human brain, as highlighted by recent developments in the AI field, including Yann LeCun's plans to establish a new AI company focused on "World Models" [1] - Fei-Fei Li emphasizes the limitations of current large language models (LLMs) and advocates for the development of "Spatial Intelligence" as a crucial step towards achieving Artificial General Intelligence (AGI) [3][4] Summary by Sections World Models - "World Models" are essential for AI to understand and predict real-world scenarios, which current AI systems struggle with, such as generating realistic videos or performing household tasks [5][6] - The concept of "World Models" arises from reflections on the limitations of LLMs and the exploration of animal intelligence, suggesting that the ability to learn these models is what current AI lacks [8] Human Perception and Intelligence - Max Bennett's research identifies three key attributes of human perception that are crucial for understanding intelligence: filling-in, sequentiality, and irrepressibility [11] - The brain's ability to fill in gaps in perception and to focus on one interpretation at a time is fundamental to how humans process information [12][20][23] Generative Models - The "Helmholtz Machine" concept illustrates how generative models can learn to recognize and generate data without being explicitly told the correct answers, demonstrating the brain's inferential processes [27] - Modern generative models, including deep fakes and AI-generated art, validate Helmholtz's theories and show that the brain's neocortex operates similarly [28] Advanced Cognitive Abilities - The neocortex not only facilitates imagination and prediction but also enables complex behaviors such as planning, episodic memory, and causal reasoning, which are desired traits for future AI systems [33] - Bennett's book, "A Brief History of Intelligence," connects neuroscience with AI, outlining the evolutionary milestones of the brain and their implications for AI development [35][37]
为什么在海外招到「对的人」这么难?
Founder Park· 2025-11-17 10:08
Group 1 - The core challenge for companies expanding overseas is the difficulty in recruiting suitable talent through traditional channels [4] - Many AI product teams are structured with development teams based in China and growth teams primarily located overseas [3] - The workshop aims to address the challenges of identifying, recruiting, and managing global teams, featuring insights from Deel and Vorka.AI [4][7] Group 2 - Key discussion topics include how to accurately identify candidates that align with team culture and core competencies in unfamiliar overseas markets [7] - The need for adjustments in traditional recruitment funnels and evaluation systems is highlighted [7] - Strategies for leveraging social media platforms like Xiaohongshu and X to enhance employer branding on a limited budget are discussed [7][8] Group 3 - The workshop will also cover compliance with cross-border payroll, hiring policies, and remote team collaboration challenges [7][8] - The event is targeted at founders and business leaders of tech companies with overseas operations or those planning to build global teams [8]
李飞飞站队LeCun,AGI全是炒作,80分钟重磅爆料出炉
3 6 Ke· 2025-11-17 09:52
Core Insights - The interview with Fei-Fei Li highlights the emergence of "world models" as the next frontier in AI over the next decade, emphasizing the importance of spatial intelligence in AI development [1][28]. Group 1: Historical Context of AI - Two decades ago, AI was in a "winter" phase, with limited public interest and funding, often referred to as "machine learning" [10][14]. - Fei-Fei Li entered the AI field during this period, focusing on visual intelligence and the need for large datasets to train models effectively [11][20]. - The creation of ImageNet, which involved collecting 15 million images across 22,000 categories, marked a pivotal moment in AI, leading to the rise of deep learning [23][24]. Group 2: The Concept of World Models - "World models" are defined as systems that can generate an infinite 3D world based on input, allowing for reasoning and interaction [37]. - The Marble platform exemplifies this concept, significantly reducing production time in various industries, including film and gaming, by allowing creators to generate navigable worlds from simple descriptions [40][43]. - The integration of spatial intelligence into AI is seen as crucial for enhancing both robotic capabilities and human understanding [39][32]. Group 3: Challenges in Robotics - The primary challenge in robotics lies in data acquisition, as robots require extensive real-world interaction data, which is difficult to obtain [44][45]. - Unlike language models that operate on text, robots must navigate and interact within a 3D environment, complicating their training [45]. - The historical context of autonomous vehicles illustrates the complexities involved in developing effective robotic systems [46]. Group 4: Fei-Fei Li's Career and Vision - Fei-Fei Li's career trajectory reflects a commitment to addressing significant problems in AI, transitioning from academia to industry and now to entrepreneurship with World Labs [47]. - Her focus on collaboration and team dynamics underscores the importance of human roles in the evolving landscape of AI [47]. - Li emphasizes that every individual has a vital role in the future of AI, regardless of their profession [47].
首款商用世界模型Marble发布,空间智能再进一步
Guotou Securities· 2025-11-17 07:53
Investment Rating - The report maintains an investment rating of "Outperform the Market" for the computer industry, indicating an expected return that exceeds the CSI 300 Index by 10% or more over the next six months [8]. Core Insights - The launch of the first commercial world model product, Marble, by World Labs, allows users to create editable and downloadable 3D virtual scenes from various inputs, significantly reducing scene distortion and inconsistency [1][12]. - The concept of a "world model" is introduced as a new AI system that enables machines to understand spatial relationships and interactions, moving beyond mere language descriptions [2][13]. - Major breakthroughs in world model technology have been achieved by global tech giants, including Tencent's mixed 3D world model and Google DeepMind's Genie 3, which enhances the generation of interactive virtual environments [3][14]. - Spatial intelligence is expected to empower creative tools in the short term and serve as a foundational capability for machines to understand and interact with the three-dimensional world in the medium term [4][15]. Summary by Sections Investment Recommendations - The domestic world model and physical AI industry chain is forming, with significant advancements such as the ReKep system developed by Li Feifei's team, which utilizes RGB-D cameras for 3D visual data support [5][16]. - Recommended stocks include: - Oboe Technology (leader in 3D visual perception) - Zhiwei Intelligent (robotic brain controller) - Suochen Technology (physical AI product developer) - Alter (investing in the robotics sector) [5][16]. Market Performance Review - The computer sector underperformed relative to the CSI 300 Index, with a decline of 3.72% this week, while the overall market indices showed mixed results [17][18]. - The computer industry index ranked 28th among 30 industry indices, indicating weaker performance compared to other sectors [20]. Industry News - The report highlights significant developments in quantum applications in Anhui province, aiming for 1,000 application scenarios by 2027, and the departure of Meta's chief AI scientist, who plans to establish a world model company [24][25].
解决特斯拉「监督稀疏」难题,DriveVLA-W0用世界模型放大自动驾驶Data Scaling Law
机器之心· 2025-11-17 04:23
Core Insights - The article discusses the transition of VLA models in autonomous driving from academic research to practical applications, highlighting the challenge of "supervision deficit" [2][5][8] - A new research paper titled "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving" proposes a solution to this challenge by introducing world models as a means to provide dense self-supervised signals [6][10][12] Group 1: Supervision Deficit - VLA models face a "supervision deficit" where high-dimensional visual input is paired with low-dimensional sparse supervisory signals, leading to wasted representational capacity [8][9] - The research team found that performance of VLA models saturates quickly with increased data under sparse action supervision, diminishing the effects of Data Scaling Law [9][22] Group 2: World Models as a Solution - The introduction of world models allows the model to predict future images, providing a richer and denser learning signal compared to relying solely on sparse actions [11][15][16] - This approach fundamentally alleviates the supervision deficit issue, enabling better learning of complex dynamics in driving environments [16][18] Group 3: Amplifying Data Scaling Law - The core contribution of the research is the discovery that world models significantly amplify the effects of Data Scaling Law, showing a steeper performance improvement with increased data compared to baseline models [18][21] - In experiments with up to 70 million frames, the world model reduced collision rates by 20.4%, demonstrating a qualitative leap in performance that surpasses merely stacking action data [24] Group 4: Efficiency and Real-World Application - The research also addresses the high latency issue in VLA models by proposing a lightweight MoE "action expert" architecture, which reduces inference latency to 63.1% of the baseline VLA without sacrificing performance [26][27] - This design enhances the feasibility of real-time deployment of VLA models in autonomous driving applications [27][29]