世界模型
Search documents
端到端和VLA的岗位,薪资高的离谱......
自动驾驶之心· 2025-11-19 00:03
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language Agent) technical talent in the automotive industry, with salaries for experts reaching up to $70,000 per month for positions requiring 3-5 years of experience [1] - The technology stack involved in end-to-end and VLA is complex, covering various advanced algorithms and models such as BEV perception, VLM (Vision-Language Model), diffusion models, reinforcement learning, and world models [2] Course Offerings - The company is launching two specialized courses: "End-to-End and VLA Autonomous Driving Class" and "Practical Course on VLA and Large Models," aimed at helping individuals quickly and efficiently enter the field of end-to-end and VLA technologies [2] - The "Practical Course on VLA and Large Models" focuses on VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA, including mainstream inference-enhanced VLA [2] - The course includes a detailed theoretical foundation and practical assignments, teaching participants how to build their own VLA models and datasets from scratch [2] Instructor Team - The instructor team consists of experts from both academia and industry, including individuals with extensive research and practical experience in multi-modal perception, autonomous driving VLA, and large model frameworks [7][10][13] - Notable instructors include a Tsinghua University master's graduate with multiple publications in top conferences and a current algorithm expert at a leading domestic OEM [7][13] Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and who have a grasp of concepts related to transformer large models, reinforcement learning, and BEV perception [15] - Participants are expected to have a background in probability theory and linear algebra, as well as proficiency in Python and PyTorch [15]
搞事情!AI天才扎堆虎嗅F&M之夜
虎嗅APP· 2025-11-18 06:17
Core Insights - The article discusses an event organized by Tiger Sniff, featuring young AI entrepreneurs who presented innovative ideas centered around personalized AI companions and emotional connections [2][4][8][10][14][17]. Group 1: Event Overview - The event, referred to as "F&M Night," showcased the creativity of 95 post-90s AI talents, focusing on the theme of creating AI pets that cater to individual emotional needs [2][3]. - The gathering included 150 participants from various fields, including AI entrepreneurs, scientists, and investors, fostering direct connections and collaborations [24]. Group 2: Key Presentations - Zhang Yuno, founder of Skyris, proposed the idea of an AI pet that understands and embraces users' unique preferences and emotions, creating a personal emotional space [4]. - Sun Donglai, founder of Dreamoo, explored the concept of using AI to capture and recreate individual life experiences and emotional memories, providing a tangible medium for remembrance [8]. - Yin Yujie, founder of Qiyin Technology, aimed to push the boundaries of music by training algorithms to create melodies that exceed human vocal limits, inspired by the evolution of sound [10]. - Huang Li'ang, co-founder of Gongji Technology, delved into the philosophical aspects of AGI and free will, questioning the fundamental logic shared between human brains and artificial intelligence [14]. - Zhuang Ziyang, co-founder of Shengjing Technology, suggested that the underlying logic of the world operates similarly to recommendation systems, emphasizing the connection between demand and resources [17][18]. Group 3: Discussion and Engagement - Following the presentations, a deep dialogue was facilitated by notable figures, discussing whether AI is reshaping worldviews, blending historical, commercial, and technological perspectives [21]. - The event provided exclusive networking opportunities for attendees to engage with AI innovators and explore potential collaborations [24]. Group 4: Participation and Accessibility - The event was invitation-only, with limited spots available for industry-related individuals, emphasizing the exclusivity and targeted nature of the gathering [26]. - For those unable to attend in person, a live streaming option was made available, allowing broader access to the discussions and insights shared during the event [27].
李飞飞发文:空间智能将成AI攀登的下一座高峰
Ke Ji Ri Bao· 2025-11-18 05:17
Core Insights - The development of artificial intelligence (AI) is entering a new phase, transitioning from "understanding language" to "understanding the world" [1] - "Spatial intelligence" is identified as the next frontier for AI, which will enable machines to perceive, reason, and act in the real world like humans [4][9] Current Limitations of AI - Current AI systems, primarily large language models, excel in text and image generation but lack fundamental capabilities in representing and interacting with the physical world [4][6] - These models struggle with basic tasks such as estimating distance, direction, and size, and often fail to maintain coherence in generated videos [4][6] Importance of Spatial Intelligence - Spatial intelligence is crucial for human cognitive construction, driving imagination, creativity, and reasoning, and is essential for integrating perception and action [4][8] - This capability allows for everyday tasks like estimating parking distances and navigating through crowds, representing a leap from mere knowledge to true understanding [4][8] Path to Achieving Spatial Intelligence - To realize true spatial intelligence, a shift from existing large language models to a more fundamental "world model" is necessary [6] - This new model should understand semantic relationships and consistently "imagine" and "reconstruct" the world in terms of geometry, physics, and dynamic rules [6] Applications and Implications - The development of world models can redefine AI's functionality, enabling proactive planning and adaptation in various fields, including robotics and creative industries [8][9] - In creative fields, spatial intelligence will allow creators to construct virtual worlds and visualize structures instantaneously, enhancing the creative process [8][9] Future Prospects - AI with spatial intelligence will not replace humans but will enhance professional judgment, creativity, and empathy, serving humanity more deeply [9] - The transition from language to spatial understanding signifies a new era for AI, capable of genuinely comprehending reality [9]
瞭望 | 何时摆脱遥控器
Xin Hua She· 2025-11-18 03:06
Core Insights - The development of embodied intelligence in China is rapidly advancing, showcasing impressive capabilities in various tasks, but there is a need to look beyond surface-level achievements to understand the actual limitations of current technology [1][5] - Achieving full autonomy in robots requires significant advancements in their cognitive abilities, particularly in understanding and interacting with the physical world [3][5] Group 1: Technological Challenges - The key to overcoming remote control limitations lies in developing a powerful cognitive framework that allows robots to perceive, decide, execute, and provide feedback autonomously [3][5] - Current advancements in embodied intelligence include the VLA large model, which integrates visual, language, and action modalities to enable robots to understand their environment and execute tasks without human intervention [3][4] - The development of world models, which simulate environmental dynamics, is crucial for enhancing robots' predictive capabilities and decision-making processes [4][5] Group 2: Limitations in General Intelligence - Despite breakthroughs in embodied intelligence, there remains a significant gap in achieving general intelligence, as robots can perform well in specific scenarios but struggle in diverse environments [5][6] - The integration of tactile feedback into robots is a complex challenge, as it requires multi-dimensional perception capabilities that go beyond visual data [5][6] - Current algorithms still lack the generalization ability needed for robots to perform effectively across various tasks and environments [6] Group 3: Standardization and Application - To accelerate the realization of general intelligence, there is a need for standardized frameworks that can facilitate technology alignment and product deployment in real-world scenarios [7][8] - Industry organizations are developing classification frameworks for embodied intelligence, similar to those in autonomous driving, to promote technological advancement and application in various fields [7][8] - The establishment of a four-dimensional, five-level evaluation system for humanoid robots will help define capability requirements and applicable scenarios, thereby enhancing their deployment in sectors like logistics, education, and healthcare [8]
李飞飞给AGI泼了盆冷水
3 6 Ke· 2025-11-18 00:17
Core Viewpoint - The development of AI requires fundamental technological innovation beyond just scaling laws, and the concept of Artificial General Intelligence (AGI) is seen more as a marketing term than a scientific one [1][7][9]. Group 1: AI Development Insights - The combination of neural networks, big data, and GPUs is identified as the "golden formula" for modern AI, which remains relevant today with the success of ChatGPT [4][5]. - Current AI systems struggle with tasks that are easy for humans, indicating a significant gap in achieving true creativity, abstract thinking, and emotional intelligence [8][9]. - The concept of "world models" is proposed as a key direction for future AI development, enabling better understanding and interaction with three-dimensional environments [10][17]. Group 2: Challenges in Robotics - The challenges in robotics are highlighted, particularly the difficulty in data acquisition and the complexity of operating in three-dimensional spaces, which is more challenging than autonomous driving [15][16]. - The "bitter lesson" of using simple models with vast data does not apply straightforwardly to robotics due to the unique nature of action data required for training [15][16]. Group 3: AI's Role in Society - The potential of AI to enhance human capabilities rather than replace them is emphasized, with a focus on ensuring that technology development respects human dignity and agency [18][19]. - The belief is expressed that in the AI era, everyone will have a place, highlighting the importance of inclusivity in the technological landscape [19].
做了一份端到端进阶路线图,面向落地求职......
自动驾驶之心· 2025-11-18 00:05
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language Agent) technical talent in the automotive industry, with salaries for experts reaching up to $70,000 per month for positions requiring 3-5 years of experience [1] - The technology stack for end-to-end and VLA is complex, involving various advanced algorithms such as BEV perception, Vision-Language Models (VLM), diffusion models, reinforcement learning, and world models [1] - The company is offering specialized courses to help individuals quickly and efficiently learn about end-to-end and VLA technologies, collaborating with experts from both academia and industry [1] Course Offerings - The "End-to-End and VLA Autonomous Driving Course" focuses on the macro aspects of end-to-end autonomous driving, covering key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [10] - The "Autonomous Driving VLA and Large Model Practical Course" is led by academic experts and covers VLA from the perspective of VLM as an autonomous driving interpreter, modular VLA, and current mainstream inference-enhanced VLA [1][10] - Both courses include practical components, such as building a VLA model and dataset from scratch, and implementing algorithms like the Diffusion Planner and ORION algorithm [10][12] Instructor Profiles - The instructors include experienced professionals and researchers from top institutions, such as Tsinghua University and QS30 universities, with backgrounds in multimodal perception, autonomous driving VLA, and large model frameworks [6][9][12] - Instructors have published numerous papers in prestigious conferences and have hands-on experience in developing and deploying advanced algorithms in the field of autonomous driving [6][9][12] Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and concepts related to transformer large models, reinforcement learning, and BEV perception [14] - Participants are expected to have a background in probability theory and linear algebra, as well as proficiency in Python and PyTorch [14]
腾讯研究院AI速递 20251118
腾讯研究院· 2025-11-17 16:18
Group 1: Meta's AI Integration - Meta will officially incorporate "AI-driven impact" into employee performance metrics starting in 2026, assessing how employees utilize AI to enhance work outcomes and team productivity [1] - The company has launched the "Level Up" game project and AI performance assistant tools this year to encourage employees to use the internal AI chatbot Metamate as much as possible [1] - Meta has begun allowing some job candidates to use AI assistants during coding interviews, believing this better represents a real development environment [1] Group 2: Google NotebookLM Features - Google NotebookLM introduced image data source functionality on November 15, enabling automatic OCR and semantic parsing, allowing users to retrieve content from images using natural language [2] - The underlying multimodal model can distinguish between handwritten and printed areas, extract table structures, and automatically link with existing text, audio, and video notes [2] - Within 48 hours of the feature launch, educational accounts uploaded over 500,000 pages of images, a 340% increase, with plans to integrate AR glasses for real-time "see and ask" capabilities next year [2] Group 3: Alibaba's Qianwen App Launch - Alibaba's Qianwen app public beta has launched, built on the Qwen3 model, providing an all-in-one entry point for users to experience a full suite of AI capabilities for free [3] - The application will gradually cover various life scenarios including office work, maps, health, and shopping, aiming to make AI a daily companion [3] - Qianwen will continue to evolve and integrate the latest Qwen models, currently available for search and download in major app stores in China [3] Group 4: Zhiyu GLM Coding Plan - Zhiyu has launched the "GLM Coding Plan·Special Edition" subscription package, offering a 50% discount for first-time buyers, with a minimum monthly cost of only 16 yuan [4] - Powered by the flagship model GLM-4.6, it ranked first globally in the LMArena evaluation alongside Claude Sonnet 4.5 and GPT-5, supporting 200K long context [4] - The model is officially compatible with over 10 mainstream AI programming tools, with several US tech companies like Cerebras and Vercel adopting GLM-4.6 [4] Group 5: Xiaomi's Miloco Solution - Xiaomi has launched its first "large model + smart home" solution, Miloco, using the Mijia camera as a visual information source, with the self-developed large language model MiMo-VL-Miloco-7B at its core, and the framework is open-sourced [5] - Users can communicate with the smart home system through natural language, allowing the system to automatically fulfill various smart needs and rules while ensuring privacy through visual data understanding [5] - Xiaomi's AIoT platform has connected nearly 1 billion IoT devices, and Miloco achieves interoperability between the Mijia ecosystem and Home Assistant ecosystem through standardized MCP protocols, supporting third-party IoT platform integration [5] Group 6: MiroMind's MiroThinker v1.0 - MiroMind has officially launched the open-source intelligent agent base model MiroThinker v1.0, introducing a new dimension of "deep interaction scaling," supporting 256K context and 600 tool calls [6] - In the BrowseComp test, it achieved an accuracy rate of 47.1%, nearing OpenAI DeepResearch's 51.5%, while surpassing DeepSeek-v3.2 by 7.7 percentage points in Chinese tasks [6] - The model adopts a fully open-source architecture, providing all model weights, toolchains, and interaction frameworks, with the 72B version approaching or even surpassing OpenAI DeepResearch, promoting intelligent agents from passive execution to active learning evolution [6] Group 7: MedGPT's Clinical Success - The core model of Future Doctor AI Studio, MedGPT, has outperformed GPT-5 and other leading international models in a multi-model practical evaluation conducted by 32 top domestic clinical experts, achieving the global first in clinical safety and effectiveness assessment [7] - It has launched two products: a clinical decision AI assistant and a patient follow-up AI assistant, providing safe and effective decision support during diagnosis and supporting patient follow-up for chronic disease management [7] - MedGPT has been adopted by dozens of national discipline leaders for daily use and is recognized by experts as the "best practice" for AI empowering grassroots healthcare, aligning with the National Health Commission's guidelines for promoting and regulating AI in healthcare [7] Group 8: Li Feifei on AGI - Li Feifei stated in an interview that AGI is "more of a marketing term than a scientific term," emphasizing that the current AI's biggest shortcoming is the lack of spatial intelligence, which allows humans to navigate and manipulate in a three-dimensional world [8] - She outlined three core capabilities of world models: generative, multimodal, and interactive, arguing that relying solely on data and computing power will not lead to the maturity of robots, which are physical systems needing bodies and application scenarios [8] - The first large-scale world model product, Marble, released by World Labs, has been widely applied in film production, game development, scientific research, and robot training, reducing creation time by 40 times [8]
AI为啥不懂物理世界?李飞飞、杨立昆:缺个「世界模型」,得学大脑新皮质工作
量子位· 2025-11-17 13:23
Core Insights - The future of AI may be linked to understanding the evolutionary secrets of the human brain, as highlighted by recent developments in the AI field, including Yann LeCun's plans to establish a new AI company focused on "World Models" [1] - Fei-Fei Li emphasizes the limitations of current large language models (LLMs) and advocates for the development of "Spatial Intelligence" as a crucial step towards achieving Artificial General Intelligence (AGI) [3][4] Summary by Sections World Models - "World Models" are essential for AI to understand and predict real-world scenarios, which current AI systems struggle with, such as generating realistic videos or performing household tasks [5][6] - The concept of "World Models" arises from reflections on the limitations of LLMs and the exploration of animal intelligence, suggesting that the ability to learn these models is what current AI lacks [8] Human Perception and Intelligence - Max Bennett's research identifies three key attributes of human perception that are crucial for understanding intelligence: filling-in, sequentiality, and irrepressibility [11] - The brain's ability to fill in gaps in perception and to focus on one interpretation at a time is fundamental to how humans process information [12][20][23] Generative Models - The "Helmholtz Machine" concept illustrates how generative models can learn to recognize and generate data without being explicitly told the correct answers, demonstrating the brain's inferential processes [27] - Modern generative models, including deep fakes and AI-generated art, validate Helmholtz's theories and show that the brain's neocortex operates similarly [28] Advanced Cognitive Abilities - The neocortex not only facilitates imagination and prediction but also enables complex behaviors such as planning, episodic memory, and causal reasoning, which are desired traits for future AI systems [33] - Bennett's book, "A Brief History of Intelligence," connects neuroscience with AI, outlining the evolutionary milestones of the brain and their implications for AI development [35][37]
为什么在海外招到「对的人」这么难?
Founder Park· 2025-11-17 10:08
Group 1 - The core challenge for companies expanding overseas is the difficulty in recruiting suitable talent through traditional channels [4] - Many AI product teams are structured with development teams based in China and growth teams primarily located overseas [3] - The workshop aims to address the challenges of identifying, recruiting, and managing global teams, featuring insights from Deel and Vorka.AI [4][7] Group 2 - Key discussion topics include how to accurately identify candidates that align with team culture and core competencies in unfamiliar overseas markets [7] - The need for adjustments in traditional recruitment funnels and evaluation systems is highlighted [7] - Strategies for leveraging social media platforms like Xiaohongshu and X to enhance employer branding on a limited budget are discussed [7][8] Group 3 - The workshop will also cover compliance with cross-border payroll, hiring policies, and remote team collaboration challenges [7][8] - The event is targeted at founders and business leaders of tech companies with overseas operations or those planning to build global teams [8]
李飞飞站队LeCun,AGI全是炒作,80分钟重磅爆料出炉
3 6 Ke· 2025-11-17 09:52
Core Insights - The interview with Fei-Fei Li highlights the emergence of "world models" as the next frontier in AI over the next decade, emphasizing the importance of spatial intelligence in AI development [1][28]. Group 1: Historical Context of AI - Two decades ago, AI was in a "winter" phase, with limited public interest and funding, often referred to as "machine learning" [10][14]. - Fei-Fei Li entered the AI field during this period, focusing on visual intelligence and the need for large datasets to train models effectively [11][20]. - The creation of ImageNet, which involved collecting 15 million images across 22,000 categories, marked a pivotal moment in AI, leading to the rise of deep learning [23][24]. Group 2: The Concept of World Models - "World models" are defined as systems that can generate an infinite 3D world based on input, allowing for reasoning and interaction [37]. - The Marble platform exemplifies this concept, significantly reducing production time in various industries, including film and gaming, by allowing creators to generate navigable worlds from simple descriptions [40][43]. - The integration of spatial intelligence into AI is seen as crucial for enhancing both robotic capabilities and human understanding [39][32]. Group 3: Challenges in Robotics - The primary challenge in robotics lies in data acquisition, as robots require extensive real-world interaction data, which is difficult to obtain [44][45]. - Unlike language models that operate on text, robots must navigate and interact within a 3D environment, complicating their training [45]. - The historical context of autonomous vehicles illustrates the complexities involved in developing effective robotic systems [46]. Group 4: Fei-Fei Li's Career and Vision - Fei-Fei Li's career trajectory reflects a commitment to addressing significant problems in AI, transitioning from academia to industry and now to entrepreneurship with World Labs [47]. - Her focus on collaboration and team dynamics underscores the importance of human roles in the evolving landscape of AI [47]. - Li emphasizes that every individual has a vital role in the future of AI, regardless of their profession [47].