Workflow
世界模型
icon
Search documents
做了一份端到端进阶路线图,面向落地求职......
自动驾驶之心· 2025-11-18 00:05
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language Agent) technical talent in the automotive industry, with salaries for experts reaching up to $70,000 per month for positions requiring 3-5 years of experience [1] - The technology stack for end-to-end and VLA is complex, involving various advanced algorithms such as BEV perception, Vision-Language Models (VLM), diffusion models, reinforcement learning, and world models [1] - The company is offering specialized courses to help individuals quickly and efficiently learn about end-to-end and VLA technologies, collaborating with experts from both academia and industry [1] Course Offerings - The "End-to-End and VLA Autonomous Driving Course" focuses on the macro aspects of end-to-end autonomous driving, covering key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [10] - The "Autonomous Driving VLA and Large Model Practical Course" is led by academic experts and covers VLA from the perspective of VLM as an autonomous driving interpreter, modular VLA, and current mainstream inference-enhanced VLA [1][10] - Both courses include practical components, such as building a VLA model and dataset from scratch, and implementing algorithms like the Diffusion Planner and ORION algorithm [10][12] Instructor Profiles - The instructors include experienced professionals and researchers from top institutions, such as Tsinghua University and QS30 universities, with backgrounds in multimodal perception, autonomous driving VLA, and large model frameworks [6][9][12] - Instructors have published numerous papers in prestigious conferences and have hands-on experience in developing and deploying advanced algorithms in the field of autonomous driving [6][9][12] Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and concepts related to transformer large models, reinforcement learning, and BEV perception [14] - Participants are expected to have a background in probability theory and linear algebra, as well as proficiency in Python and PyTorch [14]
腾讯研究院AI速递 20251118
腾讯研究院· 2025-11-17 16:18
Group 1: Meta's AI Integration - Meta will officially incorporate "AI-driven impact" into employee performance metrics starting in 2026, assessing how employees utilize AI to enhance work outcomes and team productivity [1] - The company has launched the "Level Up" game project and AI performance assistant tools this year to encourage employees to use the internal AI chatbot Metamate as much as possible [1] - Meta has begun allowing some job candidates to use AI assistants during coding interviews, believing this better represents a real development environment [1] Group 2: Google NotebookLM Features - Google NotebookLM introduced image data source functionality on November 15, enabling automatic OCR and semantic parsing, allowing users to retrieve content from images using natural language [2] - The underlying multimodal model can distinguish between handwritten and printed areas, extract table structures, and automatically link with existing text, audio, and video notes [2] - Within 48 hours of the feature launch, educational accounts uploaded over 500,000 pages of images, a 340% increase, with plans to integrate AR glasses for real-time "see and ask" capabilities next year [2] Group 3: Alibaba's Qianwen App Launch - Alibaba's Qianwen app public beta has launched, built on the Qwen3 model, providing an all-in-one entry point for users to experience a full suite of AI capabilities for free [3] - The application will gradually cover various life scenarios including office work, maps, health, and shopping, aiming to make AI a daily companion [3] - Qianwen will continue to evolve and integrate the latest Qwen models, currently available for search and download in major app stores in China [3] Group 4: Zhiyu GLM Coding Plan - Zhiyu has launched the "GLM Coding Plan·Special Edition" subscription package, offering a 50% discount for first-time buyers, with a minimum monthly cost of only 16 yuan [4] - Powered by the flagship model GLM-4.6, it ranked first globally in the LMArena evaluation alongside Claude Sonnet 4.5 and GPT-5, supporting 200K long context [4] - The model is officially compatible with over 10 mainstream AI programming tools, with several US tech companies like Cerebras and Vercel adopting GLM-4.6 [4] Group 5: Xiaomi's Miloco Solution - Xiaomi has launched its first "large model + smart home" solution, Miloco, using the Mijia camera as a visual information source, with the self-developed large language model MiMo-VL-Miloco-7B at its core, and the framework is open-sourced [5] - Users can communicate with the smart home system through natural language, allowing the system to automatically fulfill various smart needs and rules while ensuring privacy through visual data understanding [5] - Xiaomi's AIoT platform has connected nearly 1 billion IoT devices, and Miloco achieves interoperability between the Mijia ecosystem and Home Assistant ecosystem through standardized MCP protocols, supporting third-party IoT platform integration [5] Group 6: MiroMind's MiroThinker v1.0 - MiroMind has officially launched the open-source intelligent agent base model MiroThinker v1.0, introducing a new dimension of "deep interaction scaling," supporting 256K context and 600 tool calls [6] - In the BrowseComp test, it achieved an accuracy rate of 47.1%, nearing OpenAI DeepResearch's 51.5%, while surpassing DeepSeek-v3.2 by 7.7 percentage points in Chinese tasks [6] - The model adopts a fully open-source architecture, providing all model weights, toolchains, and interaction frameworks, with the 72B version approaching or even surpassing OpenAI DeepResearch, promoting intelligent agents from passive execution to active learning evolution [6] Group 7: MedGPT's Clinical Success - The core model of Future Doctor AI Studio, MedGPT, has outperformed GPT-5 and other leading international models in a multi-model practical evaluation conducted by 32 top domestic clinical experts, achieving the global first in clinical safety and effectiveness assessment [7] - It has launched two products: a clinical decision AI assistant and a patient follow-up AI assistant, providing safe and effective decision support during diagnosis and supporting patient follow-up for chronic disease management [7] - MedGPT has been adopted by dozens of national discipline leaders for daily use and is recognized by experts as the "best practice" for AI empowering grassroots healthcare, aligning with the National Health Commission's guidelines for promoting and regulating AI in healthcare [7] Group 8: Li Feifei on AGI - Li Feifei stated in an interview that AGI is "more of a marketing term than a scientific term," emphasizing that the current AI's biggest shortcoming is the lack of spatial intelligence, which allows humans to navigate and manipulate in a three-dimensional world [8] - She outlined three core capabilities of world models: generative, multimodal, and interactive, arguing that relying solely on data and computing power will not lead to the maturity of robots, which are physical systems needing bodies and application scenarios [8] - The first large-scale world model product, Marble, released by World Labs, has been widely applied in film production, game development, scientific research, and robot training, reducing creation time by 40 times [8]
AI为啥不懂物理世界?李飞飞、杨立昆:缺个「世界模型」,得学大脑新皮质工作
量子位· 2025-11-17 13:23
Core Insights - The future of AI may be linked to understanding the evolutionary secrets of the human brain, as highlighted by recent developments in the AI field, including Yann LeCun's plans to establish a new AI company focused on "World Models" [1] - Fei-Fei Li emphasizes the limitations of current large language models (LLMs) and advocates for the development of "Spatial Intelligence" as a crucial step towards achieving Artificial General Intelligence (AGI) [3][4] Summary by Sections World Models - "World Models" are essential for AI to understand and predict real-world scenarios, which current AI systems struggle with, such as generating realistic videos or performing household tasks [5][6] - The concept of "World Models" arises from reflections on the limitations of LLMs and the exploration of animal intelligence, suggesting that the ability to learn these models is what current AI lacks [8] Human Perception and Intelligence - Max Bennett's research identifies three key attributes of human perception that are crucial for understanding intelligence: filling-in, sequentiality, and irrepressibility [11] - The brain's ability to fill in gaps in perception and to focus on one interpretation at a time is fundamental to how humans process information [12][20][23] Generative Models - The "Helmholtz Machine" concept illustrates how generative models can learn to recognize and generate data without being explicitly told the correct answers, demonstrating the brain's inferential processes [27] - Modern generative models, including deep fakes and AI-generated art, validate Helmholtz's theories and show that the brain's neocortex operates similarly [28] Advanced Cognitive Abilities - The neocortex not only facilitates imagination and prediction but also enables complex behaviors such as planning, episodic memory, and causal reasoning, which are desired traits for future AI systems [33] - Bennett's book, "A Brief History of Intelligence," connects neuroscience with AI, outlining the evolutionary milestones of the brain and their implications for AI development [35][37]
为什么在海外招到「对的人」这么难?
Founder Park· 2025-11-17 10:08
Group 1 - The core challenge for companies expanding overseas is the difficulty in recruiting suitable talent through traditional channels [4] - Many AI product teams are structured with development teams based in China and growth teams primarily located overseas [3] - The workshop aims to address the challenges of identifying, recruiting, and managing global teams, featuring insights from Deel and Vorka.AI [4][7] Group 2 - Key discussion topics include how to accurately identify candidates that align with team culture and core competencies in unfamiliar overseas markets [7] - The need for adjustments in traditional recruitment funnels and evaluation systems is highlighted [7] - Strategies for leveraging social media platforms like Xiaohongshu and X to enhance employer branding on a limited budget are discussed [7][8] Group 3 - The workshop will also cover compliance with cross-border payroll, hiring policies, and remote team collaboration challenges [7][8] - The event is targeted at founders and business leaders of tech companies with overseas operations or those planning to build global teams [8]
李飞飞站队LeCun,AGI全是炒作,80分钟重磅爆料出炉
3 6 Ke· 2025-11-17 09:52
Core Insights - The interview with Fei-Fei Li highlights the emergence of "world models" as the next frontier in AI over the next decade, emphasizing the importance of spatial intelligence in AI development [1][28]. Group 1: Historical Context of AI - Two decades ago, AI was in a "winter" phase, with limited public interest and funding, often referred to as "machine learning" [10][14]. - Fei-Fei Li entered the AI field during this period, focusing on visual intelligence and the need for large datasets to train models effectively [11][20]. - The creation of ImageNet, which involved collecting 15 million images across 22,000 categories, marked a pivotal moment in AI, leading to the rise of deep learning [23][24]. Group 2: The Concept of World Models - "World models" are defined as systems that can generate an infinite 3D world based on input, allowing for reasoning and interaction [37]. - The Marble platform exemplifies this concept, significantly reducing production time in various industries, including film and gaming, by allowing creators to generate navigable worlds from simple descriptions [40][43]. - The integration of spatial intelligence into AI is seen as crucial for enhancing both robotic capabilities and human understanding [39][32]. Group 3: Challenges in Robotics - The primary challenge in robotics lies in data acquisition, as robots require extensive real-world interaction data, which is difficult to obtain [44][45]. - Unlike language models that operate on text, robots must navigate and interact within a 3D environment, complicating their training [45]. - The historical context of autonomous vehicles illustrates the complexities involved in developing effective robotic systems [46]. Group 4: Fei-Fei Li's Career and Vision - Fei-Fei Li's career trajectory reflects a commitment to addressing significant problems in AI, transitioning from academia to industry and now to entrepreneurship with World Labs [47]. - Her focus on collaboration and team dynamics underscores the importance of human roles in the evolving landscape of AI [47]. - Li emphasizes that every individual has a vital role in the future of AI, regardless of their profession [47].
首款商用世界模型Marble发布,空间智能再进一步
Guotou Securities· 2025-11-17 07:53
Investment Rating - The report maintains an investment rating of "Outperform the Market" for the computer industry, indicating an expected return that exceeds the CSI 300 Index by 10% or more over the next six months [8]. Core Insights - The launch of the first commercial world model product, Marble, by World Labs, allows users to create editable and downloadable 3D virtual scenes from various inputs, significantly reducing scene distortion and inconsistency [1][12]. - The concept of a "world model" is introduced as a new AI system that enables machines to understand spatial relationships and interactions, moving beyond mere language descriptions [2][13]. - Major breakthroughs in world model technology have been achieved by global tech giants, including Tencent's mixed 3D world model and Google DeepMind's Genie 3, which enhances the generation of interactive virtual environments [3][14]. - Spatial intelligence is expected to empower creative tools in the short term and serve as a foundational capability for machines to understand and interact with the three-dimensional world in the medium term [4][15]. Summary by Sections Investment Recommendations - The domestic world model and physical AI industry chain is forming, with significant advancements such as the ReKep system developed by Li Feifei's team, which utilizes RGB-D cameras for 3D visual data support [5][16]. - Recommended stocks include: - Oboe Technology (leader in 3D visual perception) - Zhiwei Intelligent (robotic brain controller) - Suochen Technology (physical AI product developer) - Alter (investing in the robotics sector) [5][16]. Market Performance Review - The computer sector underperformed relative to the CSI 300 Index, with a decline of 3.72% this week, while the overall market indices showed mixed results [17][18]. - The computer industry index ranked 28th among 30 industry indices, indicating weaker performance compared to other sectors [20]. Industry News - The report highlights significant developments in quantum applications in Anhui province, aiming for 1,000 application scenarios by 2027, and the departure of Meta's chief AI scientist, who plans to establish a world model company [24][25].
解决特斯拉「监督稀疏」难题,DriveVLA-W0用世界模型放大自动驾驶Data Scaling Law
机器之心· 2025-11-17 04:23
AXIV 传播学术,共享智能 About us Alxiv是机器之心发布学术、技术内容的栏目,在过去数年间接收并报道了数千篇内 容,覆盖了全球各大顶级学术及产业界机构,有效促进了领域内的传播、交流与合作。 如果您有优秀的工作想要分享,欢迎联系: 网 zhaoyunfeng@jiqizhixin.com 网 liyazhou@jiqizhixin.com 在自动驾驶领域,VLA 大模型正从学术前沿走向产业落地的"深水区"。近日,特斯拉(Tesla)在 ICCV 的分享中,就将其面临的核心挑战之一公之于众 —— "监 督 稀 疏"。 V 1. Curse of dimensionality Extremely large context length is a minimum requirement for driving · Input context length of 2 billion tokens: · 7 cameras x 36 FPS x 5 Mega pixels x 30s history / (5x5 pixel patch) · Navigation maps and route for ...
图灵奖得主LeCun最后警告Meta:我搞了40年AI,大模型是死路
3 6 Ke· 2025-11-17 02:06
Core Insights - Yann LeCun, Meta's Chief AI Scientist, is expected to leave the company amid significant organizational changes within Meta's AI division [1][3][9] - The appointment of younger leaders, such as Alexandr Wang and Shengjia Zhao, has shifted the power dynamics within Meta's AI research teams, leading to a decline in LeCun's influence [4][12] - LeCun has expressed skepticism about the current direction of AI research, particularly regarding large language models (LLMs), and is reportedly exploring the development of "world models" as a new approach to AI [18][23][24] Group 1 - LeCun's departure is linked to internal restructuring and the rise of younger executives within Meta's AI hierarchy [4][9][12] - Meta's AI division has undergone multiple layoffs and budget cuts, diminishing the influence of the previously prominent FAIR team led by LeCun [9][12][18] - LeCun's criticism of LLMs and his belief in the superiority of world models highlight a fundamental disagreement with Meta's current AI strategy [18][22][24] Group 2 - LeCun's historical contributions to AI span over 40 years, including foundational work in machine learning and neural networks [13][14][20] - He has shifted from a hands-on role in AI development to a more symbolic position, focusing on personal research and public speaking [16][18][20] - LeCun's vision for "objective-driven AI" and world models emphasizes learning through interaction with the physical world, contrasting with the data-driven approach of LLMs [24][30][41]
中金:具身智能走向数据驱动 高价值信息量成具身智能竞争核心
智通财经网· 2025-11-17 01:37
分层控制是基础架构范式,以两级结构实现工程化;VLA范式(以VLM为基础)强化泛化与交互能力,是 当前活跃的研究方向。世界模型通过环境建模与未来预测提供物理约束,处于科研主导阶段。该行认 为,短期分层架构因工程可控性仍是主流,VLA在复杂任务和人机交互中展现潜力,世界模型因具备 跨设备迁移能力被视为长期方向。 具身智能数据:高价值信息量成竞争核心 机器人数据涵盖多模态,产业找寻低数据成本获取&高数据效率应用路径。1)获取端:包括真机、视频 (第一人称/第三人称)、仿真等路线。2)安全端:数据安全为不容忽视的底线,人形机器人厂商面临权限 隔离、数据加密体系、跨境传输政策等多方挑战。3)应用端:传统数据应用策略为 "同构闭环",仅能在 同类型硬件上复现策略。异构训练通过模块化Transformer架构,跨机器人本体共享算法模型。 具身智能热点议题解析 智通财经APP获悉,中金发布研报称,短期分层架构因工程可控性仍是主流,VLA在复杂任务和人机交 互中展现潜力,世界模型因具备跨设备迁移能力被视为长期方向。机器人数据涵盖多模态,产业找寻低 数据成本获取&高数据效率应用路径。具身智能大脑正处于"路线分化"向"融合落地" ...
图灵奖得主杨立昆被曝将离职Meta创业
财富FORTUNE· 2025-11-16 13:06
Core Insights - Dr. Yang Likun, a prominent figure in the AI field, is leaving Meta to start his own company, marking a significant turning point for both Meta and the AI industry [2] - Yang Likun is known for his groundbreaking work in convolutional neural networks, particularly the LeNet architecture, which revolutionized computer vision [2][4] - Meta is undergoing a strategic shift in its AI approach, facing internal disagreements and challenges in keeping pace with competitors like OpenAI and Google [5][6] Background of Yang Likun - Born on July 8, 1960, in France, Yang Likun developed an early interest in electronics, later earning an electrical engineering diploma in 1983 [3] - He completed his PhD in computer science in 1987, focusing on early forms of neural network training using backpropagation [3][4] - His work at AT&T's Bell Labs led to the development of convolutional neural networks, significantly impacting image processing and recognition [4] Meta's Strategic Changes - Meta is restructuring its AI strategy, investing $14.3 billion in Scale AI and appointing CEO Wang Tao to lead a new department [5] - The restructuring reflects deeper strategic divides within Meta, as Yang Likun has expressed skepticism about large language models, which the company is prioritizing [5][6] - The departure of Yang Likun highlights ongoing challenges within Meta's AI division, including a recent reduction of approximately 600 positions [6] Industry Implications - Yang Likun's new venture will focus on "world models," which aim to understand environments through video and spatial data rather than just text [5] - The AI industry is experiencing intense competition, with differing opinions on the path to achieving artificial general intelligence (AGI) [6]