RT1
Search documents
Why one cross-border payments pilot was stymied
Yahoo Finance· 2025-10-31 10:16
This story was originally published on Payments Dive. To receive daily news and insights, subscribe to our free daily Payments Dive newsletter. LAS VEGAS — Competing national priorities and disparate regulatory regimes have stymied real-time cross-border payments for the time being, according to an executive with The Clearing House. Moving money instantaneously between countries requires regulatory certainty on both sides of the border and infrastructure in both nations capable of sending and receiving mon ...
后端到端时代:我们必须寻找新的道路吗?
自动驾驶之心· 2025-09-01 23:32
Core Viewpoint - The article discusses the evolution of autonomous driving technology, particularly focusing on the transition from end-to-end systems to Vision-Language-Action (VLA) models, highlighting the differing approaches and perspectives within the industry regarding these technologies [6][32][34]. Group 1: VLA and Its Implications - VLA, or Vision-Language-Action Model, aims to integrate visual perception and natural language processing to enhance decision-making in autonomous driving systems [9][10]. - The VLA model attempts to map human driving instincts into interpretable language commands, which are then converted into machine actions, potentially offering both strong integration and improved explainability [10][19]. - Companies like Wayve are leading the exploration of VLA, with their LINGO series demonstrating the ability to combine natural language with driving actions, allowing for real-time interaction and explanations of driving decisions [12][18]. Group 2: Industry Perspectives and Divergence - The current landscape of autonomous driving is characterized by a divergence in approaches, with some teams embracing VLA while others remain skeptical, preferring to focus on traditional Vision-Action (VA) models [5][6][19]. - Major players like Huawei and Horizon have expressed reservations about VLA, opting instead to refine existing VA models, which they believe can still achieve effective results without the complexities introduced by language processing [5][21][25]. - The skepticism surrounding VLA stems from concerns about the ambiguity and imprecision of natural language in driving contexts, which can lead to challenges in real-time decision-making [19][21][23]. Group 3: Technical Challenges and Considerations - VLA models face significant technical challenges, including high computational demands and potential latency issues, which are critical in scenarios requiring immediate responses [21][22]. - The integration of language processing into driving systems may introduce noise and ambiguity, complicating the training and operational phases of VLA models [19][23]. - Companies are exploring various strategies to mitigate these challenges, such as enhancing computational power or refining data collection methods to ensure that language inputs align effectively with driving actions [22][34]. Group 4: Future Directions and Industry Outlook - The article suggests that the future of autonomous driving may not solely rely on new technologies like VLA but also on improving existing systems and methodologies to ensure stability and reliability [34]. - As the industry evolves, companies will need to determine whether to pursue innovative paths with VLA or to solidify their existing frameworks, each offering unique opportunities and challenges [34].
我们距离真正的具身智能大模型还有多远?
2025-08-13 14:56
Summary of Conference Call Notes Industry Overview - The discussion revolves around the humanoid robot industry, emphasizing the importance of the model end in the development of humanoid robots, despite the current market focus on hardware [1][2][4]. Key Points and Arguments 1. **Importance of Large Models**: The emergence of multi-modal large models is seen as essential for equipping humanoid robots with intelligent capabilities, which is the underlying logic for the current development in humanoid robotics [2][4]. 2. **Data Collection Challenges**: The stagnation in model development is attributed to insufficient data collection, as initial data has not been monetized due to a lack of operational robots in factories [3][16]. 3. **Role of Tesla**: Tesla is highlighted as a crucial player in the industry, as the standardization of hardware is necessary for effective data collection and model improvement [3][4][16]. 4. **Data Flywheel Concept**: The formation of a data flywheel is critical for the rapid growth of large models, which requires a solid hardware foundation [4][16]. 5. **Model Development Trends**: The development of models is driven by three main lines: multi-modality, increased action frequency, and enhanced reasoning capabilities [5][11][12]. 6. **Model Evolution**: The evolution of models from C-CAN to RT1, RT2, and Helix shows a progression in capabilities, including the integration of various input modalities and improved action execution frequencies [6][10][11]. 7. **Training Methodology**: The training of models is compared to human learning, involving pre-training on low-quality data followed by fine-tuning with high-quality real-world data [13][14]. 8. **Data Quality and Collection**: Real-world data is deemed the highest quality but is challenging to collect efficiently, while simulation data is more accessible but may lack realism [15][17]. 9. **Motion Capture Technology**: The discussion includes the importance of motion capture technology in data collection, with various methods and their respective advantages and disadvantages [18][19]. 10. **Future Directions**: The future of large models is expected to involve more integration of modalities and the development of world models, which are seen as a consensus in the industry [21][22]. Additional Important Content - **Industry Players**: Companies like Galaxy General and Xinjing are mentioned as key players in the model development space, with Galaxy General focusing on full simulation data [22][23]. - **Market Recommendations**: Recommendations for investment focus on motion capture equipment, cameras, and humanoid robot control systems, with specific companies highlighted for potential investment [26]. This summary encapsulates the critical insights from the conference call, providing a comprehensive overview of the humanoid robot industry's current state and future directions.
不是视频模型“学习”慢,而是LLM走捷径|18万引大牛Sergey Levine
量子位· 2025-06-10 07:35
Core Viewpoint - The article discusses the limitations of AI, particularly in the context of language models (LLMs) and video models, using the metaphor of "Plato's Cave" to illustrate the difference between human cognition and AI's understanding of the world [6][30][32]. Group 1: Language Models vs. Video Models - Language models have achieved significant breakthroughs by using a simple algorithm of next-word prediction combined with reinforcement learning [10][19]. - Despite video data being richer than text data, video models have not developed the same level of complex reasoning capabilities as language models [14][19]. - Language models can leverage human knowledge and reasoning paths found in text, allowing them to answer complex questions that video models cannot [21][22][25]. Group 2: The "Cave" Metaphor - The "Plato's Cave" metaphor is used to describe AI's current state, where it learns from human knowledge but does not truly understand the world [29][32]. - AI's capabilities are seen as a reverse engineering of human cognition rather than independent exploration [33]. - The article suggests that AI should aim to move beyond this "shadow dependency" and interact directly with the physical world for true understanding [34][35]. Group 3: Future Directions for AI - The long-term goal for AI is to break free from reliance on human intermediaries, enabling direct interaction with the physical world [35]. - There is a suggestion that bridging different modalities (visual, language, action) could facilitate this exploration without needing to escape the "cave" [35].