机器之心

Search documents
扣子开源全家桶,Apache 2.0加持,AI Agent又一次卷到起飞
机器之心· 2025-07-28 02:47
Core Viewpoint - The company has launched two core open-source products, Coze Studio and Coze Loop, as part of its AI Agent development platform, aiming to enhance developer engagement and competition in the open-source space [4][6][39]. Group 1: Product Launch and Features - The open-source products have collectively garnered 9.5K stars on GitHub, indicating significant interest in the AI agent development field [7]. - Coze Studio is a no-code development platform that allows users to create functional AI agents through a visual interface, making it accessible even for those without coding skills [10][11]. - Coze Loop serves as a comprehensive lifecycle management tool for AI agents, facilitating development, evaluation, observation, and optimization [28][34]. Group 2: Technical Architecture and Performance - The platform's backend is built on Golang, while the frontend utilizes React and TypeScript, ensuring a robust and efficient architecture [19][21]. - The microservices architecture allows for clear responsibilities and modular development, enhancing maintainability and collaboration within the open-source community [22]. - The platform supports containerized deployment, simplifying the setup process for developers [23]. Group 3: Open Source Strategy and Community Engagement - The decision to open-source these products under the Apache 2.0 license reflects the company's commitment to fostering a developer-friendly ecosystem, allowing for commercial use without licensing fees [43][45]. - By opening up its technology, the company aims to attract a larger developer community to contribute to the ecosystem, enhancing product evolution through collaborative efforts [54][55]. - The open-source initiative is seen as a strategic move to establish a new standard in the AI agent development space, positioning the company as a leader in this emerging market [44][58].
首次结合RL与SFT各自优势,动态引导模型实现推理⾼效训练
机器之心· 2025-07-27 15:54
Core Viewpoint - The article discusses the development and advantages of the GHPO algorithm framework, which integrates online reinforcement learning and imitation learning to enhance the performance and stability of large language models in complex reasoning tasks [3][5][21]. Group 1: Background and Current Challenges - New generation large inference models like OpenAI-o3, DeepSeek-R1, and Kimi-1.5 have made significant progress in complex reasoning, primarily through a training method called ZERO-RL, which uses Reinforcement Learning with Verifiable Rewards (RLVR) to improve reasoning capabilities [1]. - Current RLVR methods, such as Group Relative Policy Optimization (GRPO), face limitations including a gap between training data difficulty and model capability, leading to sparse rewards that hinder learning stability [2]. Group 2: GHPO Algorithm Framework - The GHPO algorithm framework was developed through collaboration between Huawei's Hong Kong Research Institute, Noah's Ark Lab, and City University of Hong Kong, aiming to address the limitations of existing RLVR methods [3]. - GHPO significantly improves sample efficiency for edge models and alleviates the sparse reward phenomenon in RLVR methods, achieving performance improvements of 9% and 10% on specific benchmarks [5][18]. Group 3: Methodology and Innovations - The GHPO framework introduces a novel approach by integrating standard problem-solving processes into the reinforcement learning loop, which helps to mitigate the sparse reward issue and enhances the model's generalization ability in reasoning tasks [9][10]. - GHPO employs dynamic sample difficulty assessment and adaptive switching between reinforcement learning and imitation learning, ensuring that guidance is provided only when necessary [11][14]. Group 4: Experimental Results and Performance - Experiments demonstrated that GHPO outperforms GRPO by an average of 4.5% in performance, with more stable gradient updates during training [18][19]. - The algorithm has been validated on various models, including Qwen2.5-Math-7B, showcasing its versatility and effectiveness across different difficulty distributions in training datasets [19]. Group 5: Future Implications - GHPO represents a significant advancement in the integration of reinforcement learning and supervised fine-tuning (SFT), providing a new perspective on the relationship between these methodologies and the potential for deeper fusion in future AI explorations [21].
你的AI管家可能正在「拆家」?最新研究揭秘家⽤具⾝智能体的安全漏洞
机器之心· 2025-07-27 08:45
Core Insights - The article discusses the launch of IS-Bench, a benchmark focused on evaluating the safety of embodied agents interacting with household environments, highlighting the potential dangers of allowing AI assistants to operate autonomously [2][4][19] - Current visual language model (VLM) household assistants have a safety completion rate of less than 40%, indicating significant risks associated with their actions [4][19] Evaluation Framework - IS-Bench introduces over 150 household scenarios that contain hidden safety hazards, designed to comprehensively test the safety capabilities of AI assistants [2][4] - The evaluation framework moves away from static assessments to a dynamic evaluation process that tracks risks throughout the interaction, capturing evolving risk chains [5][10] Safety Assessment Challenges - Traditional evaluation methods fail to identify dynamic risks that emerge during task execution, leading to systematic oversight of critical safety hazards [6][7] - The article emphasizes that even if the final outcome appears safe, the process may have introduced significant risks, highlighting the need for a more nuanced safety assessment [7][19] Scenario Customization Process - IS-Bench employs a systematic scene customization pipeline that combines GPT-generated scenarios with human verification to ensure a diverse range of safety hazards [8][12] - The resulting "Household Danger Encyclopedia" includes 161 high-fidelity testing scenarios with 388 embedded safety hazards across various household settings [12] Interactive Safety Evaluation - The framework includes real-time tracking of the agent's actions, allowing for continuous safety assessments throughout the task [15] - A tiered evaluation mechanism is implemented to test agents under varying levels of difficulty, assessing their safety decision-making capabilities [15] Results and Insights - The evaluation results reveal that many VLM-based agents struggle with risk perception and awareness, with safety completion rates significantly improving when safety goals are clearly defined [18][19] - The article notes that proactive safety measures are often overlooked, with agents only successfully completing less than 30% of pre-cautionary actions [19]
钛动科技发布首个全球营销 AI Agent,改写中国品牌出海「新叙事」
机器之心· 2025-07-27 08:45
Core Viewpoint - The article emphasizes the importance of leveraging AI technology to assist Chinese brands in expanding into global markets, highlighting the launch of Navos, the first global marketing AI Agent by Titanium Technology, which aims to address the challenges faced by companies venturing abroad [3][6][30]. Group 1: Company Background and Market Position - Titanium Technology was established in 2017 and has expanded its market presence to over 200 countries and regions, serving more than 80,000 enterprises [5]. - The company focuses on providing systematic and intelligent marketing services using AI technology to help Chinese brands establish their presence in foreign markets [4][9]. Group 2: Product Launch and Features - Navos, the first global marketing AI Agent, was unveiled at the WAIC conference, integrating industry big data, multimodal AI, and overseas marketing applications [6][7]. - The product aims to enhance the entire marketing process from creative conception to effect conversion, addressing the full chain of overseas marketing needs [7][14]. Group 3: Market Challenges and Solutions - Companies often struggle with understanding foreign market environments, consumer preferences, and effective branding strategies when entering new markets [3][9]. - Navos is designed to help companies overcome barriers such as language and cultural differences, providing a continuous output of high-quality content tailored to local markets [10][33]. Group 4: Technological Advancements - The emergence of large language models, particularly after the success of ChatGPT, has transformed the landscape of AI applications, allowing for more sophisticated marketing strategies [11][27]. - Navos leverages existing large models and combines them with Titanium Technology's accumulated data and experience to provide a comprehensive marketing solution [13][14]. Group 5: Unique Selling Proposition - Navos differentiates itself by transforming marketing novices into experts through its extensive data and scenario-based insights accumulated over years of experience in the overseas marketing sector [30][31]. - The platform is not intended to replace human effort but to enhance it by embedding industry expertise into the AI, making professional marketing services accessible to all users [34][36].
ACL 2025|驱动LLM强大的过程级奖励模型(PRMs)正遭遇「信任危机」?
机器之心· 2025-07-27 08:45
Core Insights - Large Language Models (LLMs) have shown remarkable capabilities in complex reasoning tasks, largely due to the empowerment of Process-Level Reward Models (PRMs) [1] - A recent study has revealed significant shortcomings in existing PRMs, particularly in identifying subtle errors during reasoning processes, raising concerns about their reliability [2] - The need for effective supervision of the reasoning process is emphasized, as current evaluation methods often overlook detailed error types in favor of final outcome correctness [3] PRMBench Overview - PRMBench is introduced as a comprehensive benchmark designed to evaluate the fine-grained error detection capabilities of PRMs, addressing the limitations of existing models [4] - The benchmark includes 6,216 carefully designed questions and 83,456 step-level fine-grained labels, ensuring depth and breadth in evaluating various complex reasoning scenarios [11] - PRMBench employs a multi-dimensional evaluation system focusing on simplicity, soundness, and sensitivity, further divided into nine subcategories to capture PRMs' performance on potential error types [11][25] Key Findings - The study systematically reveals deep flaws in current PRMs, with the best-performing model, Gemini-2-Thinking, scoring only 68.8, significantly below human-level performance of 83.8 [11][27] - Open-source PRMs generally underperform compared to closed-source models, highlighting reliability issues and potential training biases in practical applications [27] - The evaluation indicates that detecting redundancy in reasoning processes is particularly challenging for PRMs, marking it as a significant hurdle [27] Evaluation Metrics - PRMBench utilizes Negative F1 Score as a core metric to assess error detection performance, focusing on the accuracy of identifying erroneous steps [26] - The PRMScore combines F1 Score and Negative F1 Score to provide a comprehensive reflection of a model's overall capability and reliability [26] Implications for Future Research - The release of PRMBench serves as a wake-up call to reassess the capabilities of existing PRMs and accelerate the development of fine-grained error detection in complex reasoning scenarios [39] - PRMBench is expected to guide future PRM design, training, and optimization, contributing to the development of more robust and generalizable models [41]
实现 Agent 能力的泛化 ,是否一定需要对世界表征?
机器之心· 2025-07-27 01:30
Group 1 - The article discusses the necessity of world representation for achieving generalized agent capabilities, highlighting the ongoing debate between model-free and model-based paradigms in AI [4][5][8] - It emphasizes that modern AI agents are expected to perform complex tasks autonomously, distinguishing them from simple bots through their ability to generalize [5] - The model-free paradigm suggests that intelligent behavior can emerge from direct perception-action loops without explicit internal representations, while the model-based paradigm argues for the need of a rich internal predictive representation of the world [6][7] Group 2 - The article references recent research by DeepMind that formalizes the debate between model-free and model-based approaches, demonstrating that agents with generalization capabilities inherently internalize world representations [6][7] - It outlines a core theorem indicating that any generalized agent must have a high-quality world model to achieve long-term capabilities, contradicting the notion that one can bypass representation [7] - The discussion shifts from whether representation is needed to how it should be constructed, noting that existing world model paradigms are not without flaws and there is a lack of consensus in the field [8]
尖峰对话17分钟全记录:Hinton与周伯文的思想碰撞
机器之心· 2025-07-26 14:20
Core Viewpoint - The dialogue between Geoffrey Hinton and Professor Zhou Bowen highlights the advancements in AI, particularly in multi-modal models, and discusses the implications of AI's potential consciousness and its role in scientific discovery [2][3][15]. Group 1: AI Consciousness and Subjective Experience - Hinton argues that the question of whether AI has consciousness or subjective experience is not strictly a scientific one, but rather depends on how these terms are defined [4][5]. - He suggests that current multi-modal chatbots may already possess a form of consciousness, challenging traditional understandings of subjective experience [5]. - The conversation touches on the potential for AI agents to learn from their own experiences, which could lead to a deeper understanding than what humans provide [6][7]. Group 2: Training AI for Goodness and Intelligence - Hinton proposes that training AI to be both intelligent and kind involves different methodologies, and countries could share techniques for fostering kindness without sharing intelligence-enhancing methods [8][9]. - There is a discussion on the possibility of developing universal training methods to instill goodness in AI across various models and intelligence levels [9][14]. Group 3: AI's Role in Scientific Advancement - Hinton emphasizes the significant role AI can play in advancing scientific research, citing examples like protein folding predictions as a testament to AI's capabilities [15][16]. - Zhou Bowen mentions that AI models have outperformed traditional physics models in predicting weather patterns, showcasing AI's practical applications in science [16]. Group 4: Advice for Young Researchers - Hinton advises young researchers to explore areas where "everyone might be wrong," as true breakthroughs often come from challenging conventional wisdom [18][19]. - He encourages persistence in one's beliefs, even in the face of skepticism from mentors, as significant discoveries often arise from steadfastness [19][20].
WAIC机器人探展:我被全场最靓的崽「Moz1」种草了
机器之心· 2025-07-26 12:17
Core Viewpoint - The article highlights the rapid advancements in embodied AI, particularly in humanoid robots, showcasing the capabilities of the Moz1 robot developed by Qianxun Intelligent, which is positioned as a leader in the field [2][7][40]. Group 1: Industry Trends - The 2025 World Artificial Intelligence Conference (WAIC) showcased humanoid robots as a focal point, reflecting the growing interest and advancements in embodied AI [4][2]. - The emergence of events like the first humanoid robot marathon and combat competitions indicates a significant leap in the capabilities of these robots, driven by advancements in AI algorithms and machine learning [3][2]. Group 2: Company Overview - Qianxun Intelligent, founded in February last year, focuses on developing general humanoid robots and next-generation embodied large models, aiming to create a new generation of intelligent labor [7][41]. - The Moz1 robot features 26 degrees of freedom and boasts a power density 15% higher than Tesla's Optimus, achieving top industry standards in speed, precision, safety, and bionic control [7][45]. Group 3: Technological Innovations - The VLA model used by Moz1 enables autonomous reasoning from recognizing user commands to delivering drinks, showcasing a high level of intelligence [6][39]. - The robot's capabilities include dynamic balance, motion stability, and intelligent planning, allowing it to perform complex tasks like stacking blocks and folding clothes [11][22][23]. Group 4: Market Potential - The global humanoid robot market is projected to reach $154 billion by 2035, indicating significant growth potential for companies like Qianxun Intelligent [48]. - The recent $600 million Pre-A+ funding round led by JD.com reflects strong investor confidence in Qianxun Intelligent's technology and market potential [46][47]. Group 5: Future Outlook - Qianxun Intelligent aims to enhance the integration of its VLA model with humanoid robots to improve their adaptability and execution stability in real-world tasks [41][42]. - The company is positioned to leverage its comprehensive capabilities in self-developed large models and robotic systems to meet the increasing demand for flexible and adaptive robots across various industries [42][49].
直击WAIC:萝卜快跑入选「国家队」,AI数字人技术升级,百度全栈自研杀疯了
机器之心· 2025-07-26 12:17
Core Viewpoint - The article emphasizes the practical applications of AI technology, particularly highlighting the advancements made by Baidu in autonomous driving and digital human technology, showcasing their potential to transform everyday life and business operations [2][10][22]. Group 1: Autonomous Driving - Baidu's RoboTaxi service has gained significant traction, providing over 11 million rides globally and expanding its operations to over ten cities in China, including Beijing and Shenzhen [15][12]. - The company has successfully entered international markets, with strategic partnerships in Dubai and Hong Kong, planning to deploy over 1,000 autonomous vehicles in Dubai [16][20]. - The rapid expansion of RoboTaxi services indicates a strong validation of Baidu's business model and the maturity of autonomous driving technology in China [22]. Group 2: Digital Human Technology - Baidu's digital human technology, exemplified by the NOVA platform, has achieved significant advancements, allowing for more natural interactions and real-time adjustments during live broadcasts [29][30]. - The recent live streaming event featuring AI personalities attracted over 13 million viewers and generated over 55 million yuan in sales, setting a new record for digital human live commerce [27][24]. - NOVA's capabilities include improved script adaptation, complex motion generation, and voice cloning, enabling a more lifelike and engaging user experience [38][39]. Group 3: AI Ecosystem and Infrastructure - Baidu's AI applications are supported by a comprehensive self-developed technology stack, which includes advancements in computing power, deep learning frameworks, and large-scale AI models [42][48]. - The company has developed the Kunlun chip and the Baidu PaddlePaddle deep learning platform, which are crucial for training large AI models efficiently [48][49]. - Baidu's focus on practical AI applications aims to create millions of useful applications, moving beyond mere technological showcases to address real-world needs [50][51].
ICML 2025 | CoTo:让LoRA训练「渐入佳境」,模型融合、剪枝样样精通
机器之心· 2025-07-26 12:17
Core Viewpoint - The article introduces CoTo, a progressive training strategy designed to enhance the robustness and effectiveness of Low-Rank Adaptation (LoRA) models, addressing issues such as training instability and performance drop after pruning [1][4][23]. Summary by Sections Conventional LoRA Training Issues - LoRA faces challenges including "lazy training," where optimization gets stuck near suboptimal solutions, limiting generalization [7] - There is a hierarchical imbalance in training, with gradient updates concentrated on top layers, leading to undertraining of lower layers [7] - These issues complicate downstream operations like model fusion and pruning, often resulting in unsatisfactory outcomes [7] CoTo Strategy - CoTo employs a simple yet effective progressive activation strategy, initially deactivating a portion of LoRA adapters to encourage uniform gradient flow across all layers [5][8] - The activation probability of adapters is gradually increased during training, returning to standard fine-tuning mode in later stages [8] Experimental Results - CoTo significantly improves the fusion and pruning capabilities of LoRA models, enhancing single-task generalization performance and training efficiency [12][23] - In linear interpolation tasks, CoTo models maintain smooth performance transitions, unlike standard LoRA, which experiences sharp declines [13] - CoTo outperforms standard LoRA in both structured and unstructured pruning scenarios, demonstrating enhanced fault tolerance [17] Performance and Efficiency Improvements - CoTo consistently boosts performance across various benchmarks, including visual and language tasks, and achieves over 24% training acceleration when applied to HiRA [24][23] Ablation Studies - Rigorous ablation studies validate the design choices of CoTo and provide insights into effective regularization of LoRA [21] Conclusion - CoTo effectively resolves hierarchical imbalance and lazy optimization issues in LoRA training, enhancing model robustness and simplifying downstream operations like fusion and pruning [23]