Workflow
机器之心
icon
Search documents
不愁了!开源智能体Paper2Poster「一键生成」学术海报
机器之心· 2025-06-06 09:12
Core Insights - The article discusses the development of Paper2Poster, a system designed to automatically generate academic posters from lengthy research papers using large language models (LLMs) [2][4][7]. Group 1: Paper2Poster Overview - Paper2Poster aims to create a complete framework for generating academic posters from research papers, addressing the challenges of condensing and reorganizing information [4][7]. - The system utilizes a multi-agent approach called PosterAgent, which breaks down the poster creation process into manageable tasks, enhancing efficiency and control [9][12]. Group 2: Challenges in Poster Generation - The main challenges identified include compressing lengthy texts while maintaining coherence, extracting multimodal information (text, images, tables), and planning the layout effectively [11][12]. - The need for a visually appealing and informative poster that conveys the essence of the research is emphasized, highlighting the complexity of the task [7][11]. Group 3: PosterAgent Methodology - PosterAgent consists of three main components: Parser for content extraction, Planner for layout design, and Painter-Commenter for visual optimization [10][14]. - The Parser extracts structured information from the paper, while the Planner organizes this information into a coherent layout, and the Painter-Commenter iteratively refines the visual presentation [14][19]. Group 4: Evaluation Metrics - A benchmark dataset was created to evaluate the effectiveness of the generated posters, focusing on visual quality, textual coherence, and overall quality [14][15]. - The PaperQuiz method assesses the effectiveness of the poster in conveying information by generating questions based on the original paper and evaluating the answers derived from the poster [15][16]. Group 5: Comparative Results - The results indicate that PosterAgent outperforms other methods, including those based on GPT-4o, in terms of clarity, structure, and visual appeal [21][19]. - The PosterAgent-Qwen variant, based on open-source models, showed superior performance across various evaluation metrics compared to closed-source alternatives [21][22]. Group 6: Cost and Accessibility - The cost of generating a poster from a 22-page paper is approximately $0.005, making it a cost-effective solution for researchers [24]. - The complete code, model weights, and dataset have been made open-source, allowing broader access and potential for further development [23][24]. Group 7: Future Directions - Future improvements may focus on enhancing the visual appeal and creativity of AI-generated posters, as well as exploring human-AI collaboration in poster design [25][26]. - The potential applications of AI in academic dissemination, including automatic paper reviews and research assistance, are highlighted as areas for future exploration [27][28].
10行代码,AIME24/25提高15%!揭秘大模型强化学习熵机制
机器之心· 2025-06-05 07:14
Core Insights - The article discusses the entropy collapse problem in reinforcement learning for large language models (LLMs) and proposes solutions to enhance exploration capabilities during training [3][5][24]. Group 1: Entropy Collapse in Reinforcement Learning - The core challenge in reinforcement learning is the trade-off between exploitation and exploration, where policy entropy is a key indicator of exploration potential [4]. - A significant finding is that policy entropy rapidly decreases to near zero within a few training steps, indicating a loss of exploration ability, which leads to performance stagnation [4][5]. - The relationship between policy entropy and downstream performance is quantitatively analyzed, revealing that performance is entirely determined by policy entropy in the absence of entropy interventions [4][5]. Group 2: Mechanisms Behind Entropy Changes - The study identifies the driving factors behind the changes in policy entropy during reinforcement learning, focusing on the covariance between action probabilities and their corresponding advantages [5][13]. - It is found that high-advantage and high-probability actions reduce policy entropy, while rare high-advantage actions increase it [13][17]. Group 3: Proposed Solutions for Enhancing Entropy - The article introduces two simple yet effective entropy-enhancing reinforcement learning strategies, Clip-Cov and KL-Cov, which can be implemented with minimal code changes [5][22]. - Experimental results demonstrate that these methods significantly improve performance, achieving a 6.4% increase on Qwen2.5-32B and up to 15% on challenging datasets like AIME24/25 [22][24]. - The research emphasizes the importance of maintaining exploration capabilities to achieve scalable reinforcement learning, suggesting that merely increasing computational power may yield limited benefits without addressing the entropy bottleneck [7][24].
刚刚,新一届ACM博士论文奖正式公布
机器之心· 2025-06-05 07:14
Core Viewpoint - The article discusses the 2024 ACM Doctoral Dissertation Award, highlighting the significance of the awarded research on human-AI collaboration in mental health support, addressing the shortage of professional psychologists and the increasing mental health issues globally [2][3][4]. Group 1: Awarded Research - The awarded dissertation by Ashish Sharma focuses on improving accessibility and quality of mental health support through human-AI collaboration, particularly in the context of a shortage of trained professionals [4][8]. - The AI-assisted mental health tool developed by Sharma has over 160,000 users, with more than 50% of them coming from households earning less than $40,000 annually [5]. - The research includes a randomized trial involving 300 peer supporters, demonstrating that AI feedback can enhance empathetic communication in conversations [10]. Group 2: Honorary Nominations - Two additional dissertations received honorary nominations: one explores the use of pseudorandom distributions to reveal inherent computational limitations in low-complexity models, while the other focuses on how large language models utilize vast amounts of training data [5][19]. - The first honorary nominated dissertation by Alexander Kelley discusses explicit pseudorandom distributions for restricted computation models [16]. - The second honorary nominated dissertation by Sewon Min examines the data usage in large language models, emphasizing their context learning capabilities and the development of nonparametric language models [19][21].
重磅!2025智源大会完整日程公布——全球AI先锋全阵容集结
机器之心· 2025-06-05 04:40
以下文章来源于智源社区 ,作者智源社区 智源社区 . 继承学术出版严谨与系统,兼具新闻报道及时与多元;为内行搭建思想交流媒介,以事实启迪公众对AI认知 2025年 6月6日-7日, 第7届北京智源大会 将以线上+线下联动的形式召开, 4位图灵奖获 得者演讲,30余位AI企业创始人&CEO分享,100余位全球青年科学家报告,两天会议将 密集开展180余场人工智能主题演讲,在思辨与实证的交织中,为 AI 的未来绘制航图。报 名通道已开启 。 北京智源大会倒计时: 1 天 2 0 2 5 北 京 智 源 大 会 完 整 日 程 线下参会地址: 北京中关村国家自主创新示范区展示中心 线上直播链接: https://2025.baai.ac.cn/ 2025智源大会日程: https://2025.baai.ac.cn/schedule 本 届 大 会共计 20场专题论坛 ,180余场精彩报告研讨 6 月 6 日 开幕式及全体大会,大模型产业CEO论坛,NeuroAI:神经智能,自主智能体,AI+理工&医学,AI系统和开源,AI for Industry,类脑大模型, InnoVibe共创场特色活动 6 月 7 日 具 ...
ICML 2025|趣丸研发新型人脸动画技术,声音+指令精准控制表情
机器之心· 2025-06-05 04:40
Core Viewpoint - The article discusses the innovative Playmate framework developed by Guangzhou Quwan Technology, which utilizes AI-driven technology to generate high-quality and controllable portrait animation videos based on audio input and images [1][3]. Group 1: Technology Overview - Playmate is a dual-stage training framework based on a 3D implicit space-guided diffusion model, designed to generate high-quality and controllable portrait animation videos [3]. - The framework decouples facial attributes such as expressions, lip movements, and head poses, allowing for precise control over the generated videos [3][12]. - Playmate has shown significant advancements in video quality, lip synchronization accuracy, and emotional control flexibility compared to existing methods [3][28]. Group 2: Methodology - The core idea of Playmate is to utilize a 3D implicit space to decouple facial attributes and achieve high-quality generation through a dual-stage training framework [13]. - The first stage involves constructing a motion decoupling module to separate expressions, lip movements, and head poses directly from audio [16]. - The second stage introduces an emotional control module that encodes emotional conditions into the latent space for fine emotional control over the generated videos [16][22]. Group 3: Performance Evaluation - Playmate has been evaluated using various datasets, including AVSpeech and CelebV-Text, and has demonstrated superior performance in metrics such as FID and FVD, indicating its generated videos are closer to real data distributions [28]. - In qualitative assessments, Playmate excels in generating realistic expressions and natural head movements across different styles of portraits, showcasing its versatility and robustness [28][31]. - The framework allows for the generation of dynamic videos reflecting different emotional states from the same audio segment, highlighting its advantages in emotional control [31]. Group 4: Future Prospects - Playmate significantly enhances the quality and flexibility of audio-driven portrait animation generation, providing strong technical support for fields such as film production, virtual reality, and interactive media [33]. - The potential for future expansion into full-body animation generation and the incorporation of more diverse training data is anticipated to improve its robustness and adaptability [33].
真实联网搜索Agent,7B媲美满血R1,华为盘古DeepDiver给出开域信息获取新解法
机器之心· 2025-06-05 04:40
机器之心发布 大型语言模型 (LLM) 的发展日新月异,但实时 「 内化 」 与时俱进的知识仍然是一项挑战。如何让模型在面对复杂的知识密集型问题时,能够自主决策获取外部 知识的策略? 机器之心编辑部 华为诺亚方舟实验室研究团队提出了 Pangu DeepDiver 模型,通过 Search Intensity Scaling 实现了 LLM 搜索引擎自主交互的全新范式,使得 Pangu 7B 模型在开域 信息获取能力上可以接近百倍参数的 DeepSeek-R1,并优于 DeepResearcher、R1-Searcher 等业界同期工作! 论文链接 :https://github.com/pangu-tech/pangu-ultra/blob/main/pangu-deepdiver-report.pdf arxiv 链接: https://arxiv.org/abs/2505.24332 该项研究的主要发现如下:(1)对于复杂信息获取任务,端到端 Agentic RL 训练相比直接蒸馏老师轨迹能更好地实现 Search Intensity Scaling,从而带来平均 10 PCT 效果提升;(2)基于真实 ...
OpenAI久违发了篇「正经」论文:线性布局实现高效张量计算
机器之心· 2025-06-05 02:00
Core Viewpoint - OpenAI has reduced the frequency of publishing research papers, focusing instead on practical implementations and optimizations in their models, as evidenced by their recent paper on Linear Layouts for efficient tensor computation [2][4]. Group 1: Research Publication Trends - OpenAI's research output on arXiv has been limited, reflecting a cautious approach to publicizing research findings, likely due to commercial confidentiality and security concerns [2][4]. - The recent paper titled "Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using ₂" introduces a new algebraic framework for tensor mapping, addressing long-standing challenges in deep learning compilers [2][4]. Group 2: Tensor Layouts and Their Importance - Tensor layouts define the mapping between logical tensors and hardware resources, which is crucial for optimizing performance in modern deep learning workloads [5][7]. - The complexity of tensor layouts has increased due to the rapid evolution of deep learning hardware, necessitating new modeling methods to accommodate diverse architectures [7][9]. Group 3: Challenges in Current Layout Systems - Existing tensor layout systems struggle to meet performance requirements, leading to inefficiencies and bugs, particularly in low-level backends like Triton [8][40]. - Key challenges include the need for efficiency, flexibility, composability, and the ability to scale without hardcoding rules [8][9]. Group 4: Introduction of Linear Layouts - Linear layouts provide a unified and composable representation for tensor mapping, facilitating layout transformations and integration with low-level hardware optimizations [22][28]. - The paper outlines the definitions and constructions of linear layouts, emphasizing their potential to streamline tensor operations and reduce bugs in layout conversions [28][35]. Group 5: Performance Evaluation of Triton-Linear - OpenAI compared the performance of Triton with and without the linear layout optimizations across various hardware platforms, demonstrating significant performance improvements [36][41]. - On the GH200 platform, Triton-Linear achieved speedups ranging from 0.92x to 1.57x, with an average speedup exceeding 1.0x across all benchmarks [41][42]. - The performance gains were particularly notable in specific benchmarks like int4_gemm and layer_norm, showcasing the effectiveness of the new layout system [42][43].
ACL 2025 | 基于Token预算感知的大模型高效推理技术
机器之心· 2025-06-05 02:00
本位作者分别来自南京大学,罗格斯大学和马萨诸塞大学阿默斯特分校。第一作者韩廷旭与共同第一作者王震霆是分别来自南京大学和罗格斯大学的博士生,研 究方向聚焦于大模型推理以及安全负责任的生成式人工智能。通讯作者为南京大学房春荣教授。 随着大型语言模型(LLM)技术的不断发展, Chain-of-Thought(CoT) 等推理增强方法被提出,以期提升模型在数学题解、逻辑问答等复杂任务中的 表现,并通过引导模型逐步思考,有效提高了模型准确率。 然而,这类方法也带来了新的挑战:模型生成的中间推理过程往往冗长,产生了大量冗余 Token ,这显著增加了推理阶段的计算成本和资源消耗。在 LLM 日益走向实际部署的背景下,如何在保证推理能力的同时控制成本,已成为制约其大规模应用的核心问题。 为解决这一矛盾,近日来自南京大学、罗格斯大学和马萨诸塞大学阿默斯特分校的研究团队提出了一种基于 Token 预算感知 的 LLM 推理新框架 TALE , 旨在保证推理准确率的同时,显著压缩输出长度、降低计算开销。 TALE 的核心理念是在推理过程中引入「Token 预算」这一约束机制,引导模型在限定的 Token 预算范围内完成有效推理 ...
开启 AI 自主进化时代,普林斯顿Alita颠覆传统通用智能体,GAIA榜单引来终章
机器之心· 2025-06-04 09:22
智能体技术日益发展,但现有的许多通用智能体仍然高度依赖于人工预定义好的工具库和工作流,这极大限制了其创造力、可扩展性与泛化能力。 近期,普林斯顿大学 AI Lab 推出了 Alita ——一个秉持「 极简即是极致复杂 」哲学的通用智能体,通过「 最小化预定义 」与「 最大化自我进化 」的设 计范式,让智能体可以自主思考、搜索和创造其所需要的 MCP 工具。 Alita 目前已在 GAIA validation 基准测试中取得 75.15% pass@1 和 87.27% pass@3 的成绩,一举超越 OpenAI Deep Research 和 Manus 等知名智 能体,成为通用智能体新标杆。Alita 在 GAIA test 上也达到了 72.43% pass@1 的成绩。 极简架构设计,最大自我进化 「让智能体自主创造 MCP 工具而不靠人工预设」,是 Alita 的核心设计理念。 现有的主流智能体系统通常依赖大量人工预定义的工具和复杂的工作流,这种方法有三个关键缺陷: 覆盖范围有限 : 通用智能体面临的现实任务种类繁多,预先定义好所有可能需要的工具既不可行亦不现实。而且预定义工具很容易过拟合 GAI ...