Workflow
强化学习(Reinforcement Learning)
icon
Search documents
卡卡卡卡卡……马卡龙是真的卡,但态度也是真的好
36氪· 2025-08-23 09:06
Personal Agent更偏向于私人的个性化需求,通过对用户的深入了解而赋能于个人。这似乎是最近很受关注的话题,之前Meta的CEO扎克伯格提到过个人超 级智能(Personal Superintelligence),ChatGPT的产品负责人Nick Turley也说GPT目标是构建一个"懂你、能行动、建立关系"的东西(具体他们是怎么定义 的不知道,之前说的是超级助手,后来又觉得助手这个词太局限了)。反正最终目标都可以总结为通过对个人的全方位记录而成为用户最亲密的智能伙伴或 助理。 以下文章来源于未来人类实验室 ,作者马渝囝、巴芮 未来人类实验室 . 即刻上场,与未来交手。36氪旗下账号。 能力不够,态度来凑。 文 | 马渝囝 巴芮 编辑 | 巴芮 来源| 未来 人类实验室(ID: LabforAI ) 封面来源 | AI生成 最近新上了一个Agent,叫马卡龙(Macaron),有点小火。概念也新,被定义为Personal Agent,还是全球首款。它在官方宣传的时候就说自己 要向所有 Productivity Agents宣战 ,"它们提供便利,但让你精疲力竭。马卡龙带您重生,不为输出而生,而为你而 ...
OTC‑PO重磅发布 | 揭开 o3 神秘面纱,让 Agent 少用工具、多动脑子!
机器之心· 2025-05-07 04:34
王鸿儒目前就读于香港中文大学博士四年级 (预计今年7月毕业),导师为黄锦辉教授,研究方向主要包括对话系统,工具学习以及大语言模型智能体等, 英国爱丁堡大学和美国伊利诺伊大学香槟分校(UIUC)访问学者,在国际顶级会议如NeurIPS, ACL, EMNLP等发表30余篇相关论文,其中包括10多篇一作或 共一论文,代表工作有Cue-CoT, SAFARI, AppBench, Self-DC, OTC等,谷歌学术引用超600次,NeurIPS Area Chair以及多个国际顶级会议审稿人, NICE社区初创成员,曾获得国际博士生论坛最佳论文奖,ACL 2024@SIGHAN 最佳论文奖,WWW2024 Online Safety Prize Challenge冠军等多项荣 誉。 Agent 即一系列自动化帮助人类完成具体任务的智能体或者智能助手,可以自主进行推理,与环境进行交互并获取环境以及人类反馈,从而最终完成给定的 任务,比如最近爆火的 Manus 以及 OpenAI 的 o3 等一系列模型和框架。 强化学习(Reinforcement Learning)被认为是当下最具想象力、最适合用于 Agent 自 ...
用多模态LLM超越YOLOv3!强化学习突破多模态感知极限|开源
量子位· 2025-05-03 04:05
Core Insights - The article discusses the introduction of Perception-R1 (PR1), a groundbreaking multimodal large language model (MLLM) that surpasses previous models like YOLOv3 and Faster-RCNN by achieving over 30 AP on the COCO2017 validation set [1][16]. Group 1: Introduction of Perception-R1 - Perception-R1 is developed by research teams from Huazhong University of Science and Technology, Beijing University of Posts and Telecommunications, and others, focusing on enhancing visual reasoning through rule-based reinforcement learning (RL) [1][5]. - The model aims to improve capabilities in pure visual tasks such as counting and general object detection, as well as visual-language tasks like grounding and OCR [1][4]. Group 2: Importance of Visual Perception in AI - The article emphasizes the need for a revolution in AI visual perception, highlighting the rapid advancements in AI's ability to understand visual information, which is crucial for applications ranging from autonomous driving to medical diagnostics [3][4]. - It points out the subtle differences between recognizing objects and understanding their interactions in detail, indicating that current MLLMs often struggle with complex visual reasoning tasks [4]. Group 3: Role of Reinforcement Learning - The rise of reinforcement learning, particularly techniques like RLHF (Reinforcement Learning from Human Feedback) and rule-based RL, is noted as a transformative factor for language models, prompting the development of Perception-R1 [5][6]. - The article raises the question of whether RL can similarly enhance MLLM's visual perception capabilities, suggesting that early attempts have shown promise but not universal success [6]. Group 4: Perception-R1 Framework - Perception-R1 is not a new MLLM from scratch but a post-training framework designed to significantly enhance the visual perception abilities of existing capable MLLMs [7]. - It employs a technique called Group Relative Policy Optimization (GRPO) to optimize the perception strategy, which is crucial for improving visual task performance [9]. Group 5: Reward Engineering - The article discusses the importance of reward modeling in reinforcement learning, where the reward function guides the learning process by quantifying the model's performance on visual tasks [11]. - Perception-R1's reward structure includes extracting relevant visual details, executing logical operations based on visual understanding, and generating outputs in the correct format [11][17]. Group 6: Experimental Results - Perception-R1's performance is evaluated against strong benchmarks and specialized models, demonstrating significant improvements in visual counting and object detection tasks [16][19]. - For instance, in visual counting tasks, Perception-R1 achieved 78.1 on Pixmo-Count, outperforming other models [19]. Group 7: Scalability and Future Implications - The article concludes that Perception-R1 lays a critical foundation for future advancements in intelligent AI visual perception, suggesting that its principles could play a key role in developing next-generation perceptual AI systems [24][25].