Workflow
世界模型
icon
Search documents
图灵奖得主杨立昆:中国人并不需要我们,他们自己就能想出非常好的点子
AI科技大本营· 2025-06-02 07:24
Core Viewpoint - The current large language models (LLMs) are limited in their ability to generate original scientific discoveries and truly understand the complexities of the physical world, primarily functioning as advanced pattern-matching systems rather than exhibiting genuine intelligence [1][3][4]. Group 1: Limitations of Current AI Models - Relying solely on memorizing vast amounts of text is insufficient for fostering true intelligence, as current AI architectures struggle with abstract thinking, reasoning, and planning, which are essential for scientific discovery [3][5]. - LLMs excel at information retrieval but are not adept at solving new problems or generating innovative solutions, highlighting their inability to ask the right questions [6][19]. - The expectation that merely scaling up language models will lead to human-level AI is fundamentally flawed, with no significant advancements anticipated in the near future [19][11]. Group 2: The Need for New Paradigms - There is a pressing need for new AI architectures that prioritize search capabilities and the ability to plan actions to achieve specific goals, rather than relying on existing data [14][29]. - The current investment landscape is heavily focused on LLMs, but the diminishing returns from these models suggest a potential misalignment with future AI advancements [18][19]. - The development of systems that can learn from natural sensors, such as video, rather than just text, is crucial for achieving a deeper understanding of the physical world [29][37]. Group 3: Future Directions in AI Research - The exploration of non-generative architectures, such as Joint Embedding Predictive Architecture (JEPA), is seen as a promising avenue for enabling machines to abstractly represent and understand real-world phenomena [44][46]. - The ability to learn from visual and tactile experiences, akin to human learning, is essential for creating AI systems that can reason and plan effectively [37][38]. - Collaborative efforts across the global research community will be necessary to develop these advanced AI systems, as no single entity is likely to discover a "magic bullet" solution [30][39].
具身进化·无界未来:这场论坛引领具身智能模型革命新浪潮
机器之心· 2025-05-30 09:33
机器之心报道 机器之心编辑部 具身智能持续进化的浪潮下, "具身 AI 模型 +人形机器人"为 AGI 走进物理世界提供了更多可能。多模态大模型的兴起为具身 AI 注入强劲动力,世界模型 的出现也为具身智能训练和测试提供了新范式。如何让机器智能不仅「看懂」物理世界,更能像人类一样理解、规划并操作,是当下学术和业界共同面临的 挑战和机遇。 5 月 29 日,2025 张江具身智能开发者大会暨国际人形机器人技能大赛在上海浦东张江科学会堂顺利举行。作为大会重要组成模块, "具身·无界:智能模 型的范式创新与架构革命"论坛(以下简称"论坛")在上海市经济和信息化委员会、上海市浦东新区人民政府指导下,由上海张江(集团)有限公司主办, 上海张江数智经济发展有限公司、机器之心承办,上海市浦东新区工商联张江人工智能商会协办。 本场论坛汇聚顶尖技术专家、知名高校学者、具身智能明星厂商代表等 10 余位重磅嘉宾,行业领袖深度洞察,技术大咖同台论道,深入探讨具身 AI 与世 界模型、分层决策与端到端路线、具身智能 Scaling Law 等业界热点话题,带来 五 场精彩的主题演讲与一场高质量圆桌对话,论坛由机器之心副主编谢文 菲主 ...
大模型智能体如何突破规模化应用瓶颈,核心在于Agentic ROI
机器之心· 2025-05-30 04:16
Core Viewpoint - The main barrier to the usability of large language model agents (LLM Agents) is not the capability of the models but rather the "Agentic ROI" which has not reached a practical threshold for widespread application [1][3][4]. Group 1: Agentic ROI Concept - Agentic ROI (Agentic Return on Investment) is a key metric that measures the ratio of "information yield" to "usage cost" for LLM Agents in real-world scenarios [4]. - Usability is achieved only when the quality of information exceeds a certain threshold and the ratio of time and cost saved by the agent is sufficiently high [4][5]. Group 2: Current Application Landscape - Most LLM Agents are currently applied in high human task time cost scenarios, such as research and programming, where human labor is intensive, thus allowing for significant efficiency improvements [7]. - In everyday applications with high user demand, such as e-commerce and personal assistants, the tasks are simpler, leading to lower marginal value from LLM Agents, which may introduce additional interaction costs and delays, resulting in low Agentic ROI [7]. Group 3: Development Trajectory - The development path of LLM Agents is characterized by a "zigzag" model of first scaling up to enhance information quality, followed by scaling down to reduce time and cost while maintaining quality [9]. - The evolution of foundational models, such as the OpenAI series, illustrates this zigzag trend, with significant performance improvements in larger models and the introduction of smaller models that maintain performance while reducing inference costs and delays [9]. Group 4: Scaling Up Information Quality - Pre-training scaling involves expanding model size, data volume, and computational resources to enhance foundational capabilities in language understanding and reasoning [11]. - Post-training scaling, including supervised fine-tuning and reinforcement learning, aligns the agent's performance with human needs and values, relying on extensive interaction data for continuous learning [12]. - Test-time scaling focuses on building a world model that supports multimodal interactions and can handle complex tasks while reflecting real-world uncertainties [13]. Group 5: Ensuring Robustness and Security - Ensuring the robustness and security of LLM Agents is crucial for enhancing information quality, preventing exploitation of reward mechanisms, and safeguarding against data contamination and feedback manipulation [16]. Group 6: Scaling Down to Reduce Time and Cost - Introducing memory mechanisms allows agents to skip redundant calculations, leveraging past knowledge to enhance processing speed [18]. - Model compression techniques can significantly reduce computational resources and inference delays without compromising performance [18]. - Optimizing reasoning strategies and infrastructure can further enhance the efficiency and responsiveness of LLM Agents [18]. Group 7: Cost Management - Reducing interaction time by enabling agents to proactively understand user intent can lower cognitive burdens and improve user experience [19]. - Managing operational costs effectively is essential, especially in large-scale deployments, by optimizing context management and controlling inference complexity [19]. - Agentic ROI serves as a framework for evaluating the real usability of LLM Agents, shifting focus from mere model performance to practical benefits and comprehensive efficiency [19].
腾讯研究院AI速递 20250530
腾讯研究院· 2025-05-29 15:55
Group 1: DeepSeek-R1 and AI Developments - The new version of DeepSeek-R1 has been officially open-sourced, surpassing Claude 4 Sonnet in programming capabilities and performing comparably to o4-mini (Medium) [1] - DeepSeek-R1's core advantages include deep reasoning capabilities, natural text generation, and support for long-duration thinking of 30-60 minutes, allowing for the execution of complex code in a single run [1] - Tencent has integrated multiple products with the latest DeepSeek R1 model within a day, offering users free and unlimited access to the model [3] Group 2: Keling 2.1 Launch - Keling 2.1 has been launched with a price reduction of 65%, featuring improved performance and speed, categorized into standard, high-quality, and master versions [2] - The high-quality version (35 inspiration points) matches the old master version in quality, supporting 1080P video but only for image-to-video generation [2] - The new version significantly enhances cost-effectiveness, making AI video creation more accessible for ordinary users [2] Group 3: Opera Neon Browser - Opera has introduced Opera Neon, the first "AI Agent" browser, aiming to redefine the role of browsers in the network [4] - Opera Neon consists of three main features: Neon Chat (chatting), Neon Do (executing web tasks), and Neon Make (complex creation), which can understand user intent and convert it into actions [4] - The Neon Make feature utilizes cloud technology to execute complex tasks, such as generating reports and designing game prototypes, even while the user is offline [4] Group 4: VAST's Tripo Studio Upgrade - VAST has upgraded Tripo Studio with four core functionalities: intelligent component segmentation, texture magic brush, intelligent low-poly generation, and automatic rigging for all objects [5] - Intelligent component segmentation allows for one-click disassembly, accurately identifying different parts of a model [5] - The automatic rigging feature can recognize various biomechanical characteristics and quickly allocate skeletal weights, enabling non-professionals to complete the entire 3D creation process with over a tenfold efficiency increase [5] Group 5: Odyssey's World Model - Odyssey, founded by autonomous driving experts, has launched a world model capable of real-time video generation at 40 milliseconds per frame, supporting real-time interaction [6] - This technology differs from traditional video models by learning pixel and motion data from real-life videos, using a narrow distribution model architecture to address autoregressive modeling challenges [6] - Odyssey has secured $27 million in funding, with the current preview version supported by H100 GPU clusters, outputting 30 FPS for 5-minute coherent interactive videos [6] Group 6: AI Scientist Zochi - The AI scientist Zochi's paper has been accepted by the top-tier conference ACL, marking it as the first AI system to independently pass peer review at an A* level conference [7] - Zochi's paper demonstrates a multi-round attack method with a success rate of 100% on GPT-3.5 and 97% on GPT-4 [7] - Zochi can autonomously complete the scientific research process from literature analysis to peer review, although its company has faced criticism regarding the misuse of the scientific peer review process [7] Group 7: Wanda 2.0 Robot - Youliqi has launched the Wanda 2.0 wheeled dual-arm robot, priced from 88,000 yuan, capable of autonomously completing complex long-sequence tasks [8] - Wanda 2.0 is equipped with a pre-trained multimodal large model UniTouch and a long-sequence task planning model UniCortex, learning new actions with only 5-10 demonstrations [8] - Youliqi has reduced costs by 70% through full-stack self-research, targeting the C-end and small B customer market, and has completed several hundred million yuan in financing [8] Group 8: Boston Dynamics Atlas Robot - Boston Dynamics has upgraded the Atlas robot, which now features 3D spatial perception and real-time object tracking capabilities, allowing it to perform complex industrial tasks in automotive factories [9] - The core technology includes a 2D object detection system, 3D spatial positioning based on key points, and a SuperTracker object pose tracking system, capable of handling object occlusion and positional changes [9] - The system integrates kinematic data, visual data, and force feedback to estimate poses accurately, with the team working on building a unified foundational model to enhance perception and action integration [9] Group 9: Google CEO's Perspective on AI - Google CEO Pichai believes AI represents a platform-level transformation larger than the internet, entering a phase where research is becoming reality [10] - AI is transitioning into the second stage of building usable products, with search evolving into an agent that can execute tasks on behalf of users, potentially creating Web 2.0-level killer applications [10] - The key transformation brought by AI lies in the change of interaction methods and the lowering of creative barriers, with the third stage involving the integration of AI with the physical world to form universal robotic systems [10]
视频实时生成可交互! 两位自动驾驶大牛创业世界模型:40毫秒/帧,无需任何游戏引擎,人人免费可玩
量子位· 2025-05-29 07:19
Core Viewpoint - Odyssey, a company founded by experts in autonomous driving, has developed a world model that can generate and interact with video in real-time, achieving a frame rate of 40 milliseconds per frame, which is faster than the human blink rate [1][5][6]. Company Highlights - Odyssey has raised $27 million (approximately 190 million RMB) from notable investors including EQT Ventures, Google GV, and Air Street Capital, with Ed Catmull, a co-founder of Pixar and Turing Award winner, on its board [5]. - The platform is currently available for free, attracting significant user interest, leading to server congestion [6]. Technology Differentiation - Odyssey distinguishes between world models and video models, emphasizing that world models allow for real-time interaction and flexibility, while video models generate fixed content without interactivity [8][10]. - The company believes that learning from real-life video data can enhance the capabilities of world models beyond traditional gaming environments [15]. Development Challenges - Odyssey acknowledges the difficulties in learning from open real-world videos due to their complexity and unpredictability [16]. - The primary challenge lies in autoregressive modeling, where the model's output influences future predictions, leading to potential instability [18][19]. Innovative Solutions - To address these challenges, Odyssey has developed a narrow distribution model that pre-trains on broad video data and fine-tunes on specific dense video data, improving stability and persistence in autoregressive generation [20]. Future Prospects - The company is working on the next generation of world models to enhance generalization capabilities [21]. - With the current version being a preview, user feedback has been positive, indicating the model's potential [23]. Industry Context - Over 10 automotive and autonomous driving companies, including Tesla and NIO, are exploring the concept of world models, indicating a competitive landscape [38]. - The autonomous driving sector is seen as a fertile ground for the development of world models, suggesting significant future growth in this area [40].
智驾的遮羞布被掀开
Hu Xiu· 2025-05-26 02:47
Core Insights - The automotive industry is transitioning towards more advanced autonomous driving technologies, moving beyond the simplistic "end-to-end" models that have been prevalent [2][3][25] - Companies are exploring new architectures and models, such as VLA and world models, to address the limitations of current systems and enhance safety and reliability in autonomous driving [4][14][25] Group 1: Industry Trends - Major players like Huawei, Li Auto, and Xpeng are developing unique software architectures to improve autonomous driving capabilities, indicating a shift towards more complex systems [4][5][14] - The introduction of new terminologies and models reflects a diversification in approaches to autonomous driving, with no clear standard emerging [4][25] - The industry is witnessing a split in technological pathways, with some companies focusing on L3 capabilities while others remain at L2, leading to a potential widening of the technology gap [25][26] Group 2: Data Challenges - The demand for high-quality data is critical for training large models in the new phase of autonomous driving, but companies face challenges in acquiring and annotating sufficient real-world data [15][22] - Companies are increasingly turning to simulation and AI-generated data to overcome data scarcity, with some suggesting that simulated data may become more important than real-world data in the future [22][23] Group 3: Competitive Landscape - The competition is intensifying as companies with self-developed capabilities advance towards more complex technologies, while others may rely on suppliers, leading to a concentration of orders among a few capable suppliers [26][27] - The shift towards L3 capabilities will require companies to focus not only on technology but also on operational aspects, as the responsibility for safety and maintenance will shift from users to manufacturers [25][26]
能空翻≠能干活!我们离通用机器人还有多远? | 万有引力
AI科技大本营· 2025-05-22 02:47
Core Viewpoint - Embodied intelligence is a key focus in the AI field, particularly in humanoid robots, raising questions about the best path to achieve true intelligence and the current challenges in data, computing power, and model architecture [2][5][36]. Group 1: Development Stages of Embodied Intelligence - The industry anticipates 2025 as a potential "year of embodied intelligence," with significant competition in multimodal and embodied intelligence sectors [5]. - NVIDIA's CEO Jensen Huang announced the arrival of the "general robot era," outlining four stages of AI development: Perception AI, Generative AI, Agentic AI, and Physical AI [5][36]. - Experts believe that while progress has been made, the journey towards true general intelligence is still ongoing, with many technical and practical challenges remaining [36][38]. Group 2: Transition from Autonomous Driving to Embodied Intelligence - Many researchers from the autonomous driving sector are transitioning to embodied intelligence due to the overlapping technologies and skills required [17][22]. - Autonomous driving is viewed as a specific application of robotics, focusing on perception, planning, and control, but lacks the interactive capabilities needed for general robots [17][19]. - The integration of expertise from autonomous driving is seen as a bridge to advance embodied intelligence, enhancing technology fusion and development [18][22]. Group 3: Key Challenges in Embodied Intelligence - Current robots often lack essential capabilities, such as tactile perception, which limits their ability to maintain balance and perform complex tasks [38][39]. - The operational capabilities of many humanoid robots are still in the demonstration phase, lacking the ability to perform tasks in real-world contexts [38][39]. - The complexity of high-dimensional systems poses significant challenges for algorithm robustness, especially as more sensory channels are integrated [39]. Group 4: Future Applications and Market Focus - The focus for developers should be on specific application scenarios rather than pursuing general capabilities, with potential areas including home care and household services [48]. - Industrial applications are highlighted as promising due to their scalability and the potential for replicable solutions once initial systems are validated [48]. - The gap between laboratory performance and real-world application remains significant, necessitating a focus on improving system accuracy in specific contexts [46][47].
能空翻≠能干活,我们离通用机器人还有多远?
3 6 Ke· 2025-05-22 02:28
Core Insights - Embodied intelligence has gained significant attention in both industry and academia, particularly in humanoid robots, which integrate perception, movement, and decision-making capabilities [1][4][30] - The development of embodied intelligence is seen as a pathway towards achieving general robotics, with ongoing discussions about the challenges and milestones that lie ahead [1][30] Group 1: Current State and Future Prospects - The industry anticipates that 2025 may mark the "year of embodied intelligence," with significant competition emerging in the multimodal and embodied intelligence sectors [3][4] - NVIDIA's CEO Jensen Huang has proclaimed that the era of general robotics has begun, outlining four stages of AI development, culminating in "physical AI," which focuses on understanding and interacting with the physical world [3][4] - Experts believe that while progress has been made, the journey towards true general robotics is still in its early stages, with many technical and conceptual hurdles remaining [31][32] Group 2: Technical Challenges and Opportunities - The current landscape of embodied intelligence is characterized by a lack of comprehensive models and algorithms, with many systems still not achieving convergence [32][33] - Key technical challenges include the integration of sensory feedback, the development of robust algorithms, and the need for advanced perception capabilities, such as tactile sensing [33][34] - The industry is witnessing a shift where many researchers from the autonomous driving sector are transitioning to embodied intelligence, leveraging their expertise in perception and interaction [15][19] Group 3: Application Scenarios - Potential application areas for embodied intelligence include home care, household services, and industrial automation, which are seen as practical and immediate needs [41] - The focus on specific vertical applications rather than general-purpose robots is emphasized, as the technology is still maturing and requires targeted development to meet real-world demands [36][41] - The integration of embodied intelligence into existing industrial systems is viewed as a promising avenue for scalability and broader adoption [39]
谷歌IO大会点评
2025-05-21 15:14
Summary of Google I/O Conference Insights Company Overview - **Company**: Google - **Event**: Google I/O Conference - **Date**: May 21, 2025 Key Points and Arguments Industry and Competitive Landscape - Google is actively responding to challenges from competitors like ChatGPT by innovating at the application level, enhancing its AI search products significantly, with monthly active users reaching 1.5 billion [2][4] - The company has disclosed that its monthly token processing has reached 480 trillion, a 50-fold increase compared to the same period last year, far exceeding Microsoft's 50 trillion tokens [3][13] AI and Technological Advancements - Significant progress has been made in native multimodal technology, including native language understanding and updates to ImageFour, showcasing ongoing innovation in voice, audio, video, and image generation [2][6] - Google Lens app has introduced new features such as Project Xtra (renamed Jennifer Live), enabling real-time screen sharing and camera demonstrations, aimed at enhancing user experience and competing with ChatGPT [2][7] Computational Power and Ecosystem Support - To support its vast ecosystem, Google is significantly increasing its computational power, with projections of reaching 1.5 million equivalent H100 units by 2024 and 4.5 million by 2025 [2][8] - The company is integrating its ecosystem, including Android devices, Gmail, and Google Calendar, to enhance AI applications through a new feature called personal context, which utilizes user-authorized personal information [10] New AI Features and Applications - Google has launched the Action Intelligent AI agent based on the Gemini app, capable of proactively operating user phones and integrating with third-party servers via the MCP interface [2][9] - A new Chrome extension, Gmail on Chrome, allows users to view current web pages and ask questions directly, which has been fully rolled out in the U.S. [9] Future Developments - Google is developing a next-generation model known as the world model, which aims to learn and understand various aspects of the simulated world to advance robotics technology [12] - The company is also collaborating with Samsung and Qualcomm to launch a series of Android XR AI glasses, featuring capabilities like messaging, photo capture, real-time translation, and integration with Google services [11] Financial Outlook - Google's capital expenditure for the year is projected to be $75 billion, with significant growth in its cloud business [3] Additional Important Insights - The enhancements in AI search capabilities and the introduction of new features in Google Lens and the Gemini app reflect Google's strategy to maintain its competitive edge in the rapidly evolving AI landscape [4][7] - The focus on increasing computational power indicates a proactive approach to meet the growing demands of its ecosystem and user base [8]
见谈 | 商汤绝影王晓刚:越过山丘,我如何冲刺智驾高地?
Core Insights - The article discusses the evolution of SenseAuto, a subsidiary of SenseTime, focusing on its advancements in end-to-end autonomous driving technology and the challenges faced in the automotive industry [2][3][4]. Group 1: Company Background and Innovations - Wang Xiaogang, CEO of SenseAuto, was among the first to propose the "end-to-end" approach in computer vision, aiming to reduce errors in intermediate module transmissions [2][3]. - SenseAuto launched its first product, the SenseDrive DMS driver monitoring system, in 2018, and secured partnerships with major Tier 1 suppliers and over 10 OEMs [4][5]. - The company introduced the SenseAuto Pilot-P solution in 2021, achieving L2+ level advanced driver assistance functions [4][5]. Group 2: Market Position and Competition - SenseAuto's entry into the automotive sector was marked by a focus on intelligent cockpit solutions, while the autonomous driving sector was still in a chaotic phase with no consensus on the future direction [3][4]. - The emergence of Tesla and its successful implementation of end-to-end autonomous driving models in 2022 shifted industry dynamics, prompting other companies like Xiaopeng and Li Auto to adopt similar strategies [5][6]. Group 3: Strategic Development and Challenges - Wang Xiaogang emphasized the need for cost reduction and efficiency improvement to compete effectively in mass production, which poses a significant challenge for SenseAuto [6][7]. - The company is focusing on talent acquisition and platformization to address the challenges of adapting to various hardware platforms and software [7][8]. Group 4: Future Outlook and Business Strategy - SenseAuto aims to expand its delivery range in the mid-to-low-end market by 2025, with plans to collaborate with new partners like GAC Aion and FAW Hongqi [11][12]. - The company is also developing a multi-modal large model, DriveAGI, to enhance its autonomous driving technology, which is expected to exceed human capabilities [11][12]. - SenseAuto positions itself as an AI platform company in the automotive sector, focusing on building AI infrastructure and data pipelines for enterprises [11][12].