双系统理论 - filings, earnings calls, financial reports, news

双系统理论

Search documents

Jing Ji Ri Bao· 2025-10-18 01:32

Core Insights - The article discusses the phenomenon of impulse buying in online shopping, particularly through the "one-click buy" feature, which enhances consumer spending behavior by leveraging psychological factors [1][2]. Group 1: Impulse Buying Behavior - Impulse buying is defined as unplanned, spontaneous purchasing behavior influenced by emotional and environmental stimuli during the checkout process [1]. - The "dual-system theory" explains that consumers often operate in a fast, emotional mindset when completing purchases, leading to decisions driven by intuition rather than rational thought [1]. Group 2: Psychological Mechanisms - Consumers establish a mental consumption account when shopping, which is activated at checkout, increasing their acceptance of additional purchases [2]. - The "one-click buy" feature effectively utilizes psychological accounts and shopping inertia, making extra spending appear more reasonable and desirable [2]. Group 3: Marketing Strategies - Retailers often recommend products that consumers have previously searched for or frequently need, reducing search and decision-making costs [2]. - The use of original and discounted prices, along with limited-time offers, creates a perception of low price risk, encouraging impulse purchases [2]. Group 4: Consumer Experience - The utility derived from "one-click buy" items comes not only from the products themselves but also from the perceived value of getting a good deal [2]. - The article suggests that occasional impulse purchases can enhance life satisfaction and provide small surprises, reflecting the complexity of consumer behavior [2].

让机器人「不只是走路」，Nav-R1引领带推理的导航新时代

具身智能之心· 2025-09-19 00:03

Core Viewpoint - The article discusses the introduction of Nav-R1, a new embodied foundation model designed to enhance the reasoning and navigation capabilities of robots in 3D environments, integrating perception, reasoning, and action effectively [5][30]. Group 1: Key Innovations - Nav-R1 utilizes a large-scale dataset called Nav-CoT-110K, which contains approximately 110,000 Chain-of-Thought trajectories, facilitating a stable reasoning and action foundation before reinforcement learning optimization [8][6]. - The model incorporates three types of rewards: Format Reward, Understanding Reward, and Navigation Reward, which ensure structured output, semantic understanding, and path fidelity respectively [10][15]. - The Fast-in-Slow reasoning paradigm is inspired by human cognition, where a fast system handles immediate responses while a slow system manages long-term planning and semantic consistency [11][16]. Group 2: Experimental Results - Nav-R1 demonstrated significant improvements in various navigation tasks, achieving an increase of approximately 8% or more in success rates and path efficiency compared to other advanced methods [14]. - In real-world deployments, Nav-R1 was tested on a mobile robot platform, showing robust performance in navigating complex indoor environments [19][26]. Group 3: Applications and Implications - The model has potential applications in service robots and home assistants, enhancing user experience by enabling robots to navigate cluttered environments and understand commands [31]. - In healthcare settings, Nav-R1 can assist in navigating complex environments safely and reliably, which is crucial for elderly care and medical facilities [32]. - The technology is also applicable in augmented and virtual reality, where virtual agents need to navigate physical spaces effectively [33]. - In industrial and hazardous environments, Nav-R1's robustness and generalization capabilities make it suitable for executing tasks in unknown or dangerous settings [34].

不止会动嘴，还会「思考」！字节跳动发布OmniHuman-1.5，让虚拟人拥有逻辑灵魂

机器之心· 2025-09-05 07:12

Core Viewpoint - The article discusses the launch of OmniHuman-1.5 by ByteDance, a new virtual human generation framework that enhances the capabilities of virtual humans to think and express emotions, moving beyond simple imitation to more complex interactions [2][39]. Group 1: Technological Advancements - OmniHuman-1.5 introduces a dual-system framework that incorporates Daniel Kahneman's "dual-system theory," allowing virtual humans to engage in thoughtful reasoning and emotional expression [4][13]. - The model demonstrates logical reasoning abilities, enabling it to understand instructions and execute complex actions in a coherent sequence [6][7]. - It can manage long videos and multi-character interactions, showcasing diverse expressions and movements, thus eliminating monotony [8]. Group 2: Framework Components - The framework consists of two main components: System 1, which handles reactive rendering, and System 2, which is responsible for thoughtful planning [14][18]. - System 2 utilizes a multi-modal large language model (MLLM) to generate a coherent action plan based on input from various modalities [17]. - System 1 employs a specially designed multi-modal diffusion model (MMDiT) to synthesize the final video by integrating high-level planning with low-level audio signals [18][27]. Group 3: Innovations and Solutions - The introduction of the "pseudo last frame" concept allows the model to maintain identity consistency while enabling diverse actions, balancing between fixed identity and dynamic range [25][20]. - The "two-stage warm-up" training strategy helps mitigate modal conflicts by ensuring that each branch of the model retains its strengths during training [28][34]. - The model's architecture has been validated through ablation studies, demonstrating the effectiveness of both the reasoning and execution components in enhancing output quality [35][36]. Group 4: Performance Metrics - OmniHuman-1.5 outperforms previous models across various metrics, showcasing significant improvements in logical coherence and semantic consistency [36][37]. - The model's ability to think and express emotions has been quantitatively validated, indicating a leap from mere reactive behavior to more sophisticated interactions [37][39].

扩散架构 or「NoThinking」，AI 对话的「1Hz 壁垒」如何突破？

机器之心· 2025-08-03 01:30

Group 1 - The core concept of the article revolves around the "Intelligence Spectrum" proposed by Eric Jang, which addresses the current "1Hz barrier" faced by AI and the prerequisites for achieving "Ultra Instinct" capabilities in AI systems [5][6][9] - Jang categorizes different types of intelligent decision-making processes along a spectrum, from "extremely slow intelligence" (e.g., plant growth) to "extremely fast intelligence" (e.g., precise movements of a hummingbird) [6][7] - Current leading LLMs like ChatGPT and Llama operate at a frequency of 1-2Hz, which is significantly slower than the natural conversational pace of humans (approximately 10Hz), leading to a mismatch in interaction speed [7][8] Group 2 - The "1-2Hz intelligence" reflects the limitations of AI in real-time interactions, resulting in a turn-based scenario where human users must wait for AI responses, exacerbating issues like hallucinations and context understanding [8][9] - Jang emphasizes that overcoming the "1Hz barrier" is not merely about increasing speed but is essential for achieving a qualitative transformation in AI capabilities, covering the entire intelligence spectrum from 0.1Hz to 50Hz [8][9] - The article discusses the dual process theory by Daniel Kahneman, illustrating how different decision frequency requirements reflect the fundamental and sometimes conflicting speed demands of various AI applications [10]

1Hz 壁垒

智能频谱

双系统理论

Artificial Intelligence

Artificial Intelligence

ChatGPT

Llama

模拟大脑功能分化！Fast-in-Slow VLA，让“快行动”和“慢推理”统一协作

具身智能之心· 2025-07-13 09:48

Core Viewpoint - The article discusses the introduction of the Fast-in-Slow (FiS-VLA) model, a novel dual-system visual-language-action model that integrates high-frequency response and complex reasoning in robotic control, showcasing significant advancements in control frequency and task success rates [5][29]. Group 1: Model Overview - FiS-VLA combines a fast execution module with a pre-trained visual-language model (VLM), achieving a control frequency of up to 117.7Hz, which is significantly higher than existing mainstream solutions [5][25]. - The model employs a dual-system architecture inspired by Kahneman's dual-system theory, where System 1 focuses on rapid, intuitive decision-making, while System 2 handles slower, deeper reasoning [9][14]. Group 2: Architecture and Design - The architecture of FiS-VLA includes a visual encoder, a lightweight 3D tokenizer, and a large language model (LLaMA2-7B), with the last few layers of the transformer repurposed for the execution module [13]. - The model utilizes heterogeneous input modalities, with System 2 processing 2D images and language instructions, while System 1 requires real-time sensory inputs, including 2D images and 3D point cloud data [15]. Group 3: Performance and Testing - In simulation tests, FiS-VLA achieved an average success rate of 69% across various tasks, outperforming other models like CogACT and π0 [18]. - Real-world testing on robotic platforms showed success rates of 68% and 74% for different tasks, demonstrating superior performance in high-precision control scenarios [20]. - The model exhibited robust generalization capabilities, with a smaller accuracy decline when faced with unseen objects and varying environmental conditions compared to baseline models [23]. Group 4: Training and Optimization - FiS-VLA employs a dual-system collaborative training strategy, enhancing System 1's action generation through diffusion modeling while retaining System 2's reasoning capabilities [16]. - Ablation studies indicated that the optimal performance of System 1 occurs when sharing two transformer layers, and the best operational frequency ratio between the two systems is 1:4 [25]. Group 5: Future Prospects - The authors suggest that future enhancements could include dynamic adjustments to the shared structure and collaborative frequency strategies, which would further improve the model's adaptability and robustness in practical applications [29].

双系统理论

视觉 - 语言 - 动作模型

人工智能

Fast-in-Slow（FiS-VLA）

双系统理论

视觉 - 语言 - 动作模型

人工智能

Fast-in-Slow（FiS-VLA）

模拟大脑功能分化！北大与港中文发布Fast-in-Slow VLA，让“快行动”和“慢推理”统一协作

机器之心· 2025-07-12 02:11

Core Insights - The article discusses the development of a new dual-system visual-language-action model named Fast-in-Slow (FiS-VLA) that integrates high-frequency response and complex reasoning in robotic control [4][29]. Group 1: Research Background and Challenges - The goal of robotic operating systems is to generate precise control signals based on sensor inputs and language instructions in complex environments. However, large-scale visual-language models (VLMs) have limitations due to their large parameters and slow inference speed, which restrict their practical use in high-frequency control tasks [7]. - The research draws inspiration from Kahneman's "dual-system theory," where System 1 represents fast, intuitive decision-making, and System 2 represents slower, deeper reasoning. Previous methods attempted to create a dual-system structure but lacked efficient collaboration between the two systems [8][9]. Group 2: FiS-VLA Architecture and Design - FiS-VLA proposes an innovative structure that directly reconstructs the last few layers of the VLM into a System 1 execution module, embedding it within System 2 to form a unified model for efficient reasoning and control. System 2 processes 2D images and language instructions at a low frequency, while System 1 responds to real-time sensory inputs at a high frequency [11][13]. - The architecture includes a visual encoder, a lightweight 3D tokenizer, a large language model (LLaMA2-7B), and several MLP modules for modality fusion and diffusion modeling. This design allows System 1 to inherit pre-trained knowledge and achieve high-frequency execution [13]. Group 3: Dual-System Collaboration - FiS-VLA consists of a slow System 2 and a fast System 1, where System 2 processes task-related visual observations and language instructions, converting them into high-dimensional features. System 1 focuses on real-time action generation, receiving current sensory inputs and outputting actions while utilizing periodic updates from System 2 [14][15]. - The model employs asynchronous sampling to control the operating frequency of the two systems, ensuring time consistency in action generation [14]. Group 4: Performance Evaluation - In simulation tests, FiS-VLA achieved an average success rate of 69% in RLBench tasks, outperforming other models like CogACT (61%) and π0 (55%). The control frequency reached 21.9Hz, more than double that of CogACT [17]. - In real robot platforms (Agilex and AlphaBot), FiS-VLA demonstrated average success rates of 68% and 74% across eight tasks, significantly surpassing the π0 baseline [19]. - The model exhibited robust performance in generalization tests, showing a smaller accuracy decline compared to π0 when faced with unseen objects, complex backgrounds, and lighting changes [21]. Group 5: Ablation Studies and Future Directions - Ablation studies indicated that the optimal performance of System 1 occurs when sharing two Transformer layers, and the best collaboration frequency ratio between Systems 1 and 2 is 1:4. The theoretical control frequency can reach up to 117.7Hz when predicting eight actions at once [23]. - The article concludes that FiS-VLA innovatively merges reasoning and control within a unified VLM, achieving high-frequency, high-precision, and strong generalization capabilities in robotic manipulation. Future enhancements may include dynamic adjustments to shared structures and collaborative frequency strategies to improve adaptability and robustness in real-world tasks [29].

Fast-in-Slow（FiS-VLA）

Fast-in-Slow（FiS-VLA）

一文了解DeepSeek和OpenAI：企业家为什么需要认知型创新？

混沌学园· 2025-06-10 11:07

Core Viewpoint - The article emphasizes the transformative impact of AI technology on business innovation and the necessity for companies to adapt their strategies to remain competitive in the evolving landscape of AI [1][2]. Group 1: OpenAI's Emergence - OpenAI was founded in 2015 by Elon Musk and Sam Altman with the mission to counteract the monopolistic power of major tech companies in AI, aiming for an open and safe AI for all [9][10][12]. - The introduction of the Transformer architecture by Google in 2017 revolutionized language processing, enabling models to understand context better and significantly improving training speed [13][15]. - OpenAI's belief in the Scaling Law led to unprecedented investments in AI, resulting in the development of groundbreaking language models that exhibit emergent capabilities [17][19]. Group 2: ChatGPT and Human-Machine Interaction - The launch of ChatGPT marked a significant shift in human-machine interaction, allowing users to communicate in natural language rather than through complex commands, thus lowering the barrier to AI usage [22][24]. - ChatGPT's success not only established a user base for future AI applications but also reshaped perceptions of human-AI collaboration, showcasing vast potential for future developments [25]. Group 3: DeepSeek's Strategic Approach - DeepSeek adopted a "Limited Scaling Law" strategy, focusing on maximizing efficiency and performance with limited resources, contrasting with the resource-heavy approaches of larger AI firms [32][34]. - The company achieved high performance at low costs through innovative model architecture and training methods, emphasizing quality data selection and algorithm efficiency [36][38]. - DeepSeek's R1 model, released in January 2025, demonstrated advanced reasoning capabilities without human feedback, marking a significant advancement in AI technology [45][48]. Group 4: Organizational Innovation in AI - DeepSeek's organizational model promotes an AI Lab paradigm that fosters emergent innovation, allowing for open collaboration and resource sharing among researchers [54][56]. - The dynamic team structure and self-organizing management style encourage creativity and rapid iteration, essential for success in the unpredictable field of AI [58][62]. - The company's approach challenges traditional hierarchical models, advocating for a culture that empowers individuals to explore and innovate freely [64][70]. Group 5: Breaking the "Thought Stamp" - DeepSeek's achievements highlight a shift in mindset among Chinese entrepreneurs, demonstrating that original foundational research in AI is possible within China [75][78]. - The article calls for a departure from the belief that Chinese companies should only focus on application and commercialization, urging a commitment to long-term foundational research and innovation [80][82].

翁荔最新万字长文：Why We Think

量子位· 2025-05-18 05:20

Core Insights - The article discusses the concepts of "Test-time Compute" and "Chain-of-Thought" (CoT) as methods to significantly enhance model performance in artificial intelligence [1][2][6] Group 1: Motivation and Theoretical Background - Allowing models to think longer before providing answers can be achieved through various methods, enhancing their intelligence and overcoming current limitations [2][8] - The core idea is deeply related to human thinking processes, where humans require time to analyze complex problems, aligning with Daniel Kahneman's dual-system theory from "Thinking, Fast and Slow" [10][11] - By consciously slowing down and reflecting, models can engage in more rational decision-making, akin to human System 2 thinking [11][12] Group 2: Computational Resources and Model Architecture - Deep learning views neural networks as capable of accessing computational and storage resources, optimizing their use through gradient descent [13] - In Transformer models, the computational load (flops) for each generated token is approximately double the number of parameters, with sparse models like Mixture of Experts (MoE) utilizing only a fraction of parameters during each forward pass [13] - CoT allows models to perform more computations for each token based on the difficulty of the problem, enabling variable computational loads [13][18] Group 3: CoT and Learning Techniques - Early improvements in CoT involved generating intermediate steps for mathematical problems, with subsequent research showing that reinforcement learning can significantly enhance CoT reasoning capabilities [19][20] - Supervised learning on human-written reasoning paths and appropriate prompts can greatly improve the mathematical abilities of instruction-tuned models [21][23] - The effectiveness of CoT prompts in increasing success rates for solving mathematical problems is more pronounced in larger models [23] Group 4: Sampling and Revision Techniques - The fundamental goal of test-time computation is to adaptively modify the model's output distribution during reasoning [24] - Parallel sampling methods are straightforward but limited by the model's ability to generate correct solutions in one go, while sequential revision requires careful execution to avoid introducing errors [24][25] - Combining both methods can yield optimal results, with simpler problems benefiting from sequential testing and more complex problems performing best with a mix of both approaches [24][25] Group 5: Advanced Techniques and Future Directions - Various advanced algorithms, such as Best-of-N and Beam Search, are employed to optimize the search process for high-scoring samples [29][30] - The RATIONALYST system focuses on synthesizing reasoning based on vast unannotated data, providing implicit and explicit guidance for generating reasoning steps [32][33] - Future challenges include enhancing computational efficiency, integrating self-correction mechanisms, and ensuring the reliability of reasoning outputs [47][50]