智能体框架

Search documents
余承东发布纯血鸿蒙2.0!功能演示叫好一片,安卓和苹果都不香了
量子位· 2025-06-20 08:53
Core Viewpoint - The article emphasizes the significant updates in HarmonyOS 6, showcasing a comprehensive embrace of AI and intelligent agents, marking a pivotal evolution in the operating system's capabilities [2][60]. Group 1: AI Integration - The new AI features include a video call capability for the Xiao Yi assistant, allowing real-time interaction and explanation of its surroundings [3][26]. - AI will enhance various system applications, including advanced photo editing capabilities, with AI-driven style effects and composition assistance based on over 500,000 images [5][20][22]. - The Xiao Yi assistant has been upgraded to integrate with Huawei's Pangu and DeepSeek models, utilizing a training dataset of 20 trillion tokens [14][15]. Group 2: Ecosystem Expansion - Over 3,000 applications and meta-services are currently in accelerated development to become more integrated with HarmonyOS [9]. - The HarmonyOS 6 developer Beta version introduces a new interconnected architecture, supporting over 660 applications for a smoother and more innovative experience [8]. - The launch of over 50 HarmonyOS smart agents, including popular applications like Weibo and DingTalk, highlights the ecosystem's growth [8][34]. Group 3: Cross-Device Connectivity - The "One Touch Share" feature allows users to share content across devices seamlessly, now supporting over 50 applications and enabling multi-device sharing without data consumption [37][40]. - The system supports reverse transmission, allowing users to transfer edited images back to their mobile devices effortlessly [54]. - The seamless integration across devices is showcased in a promotional video, demonstrating the fluid transfer of media between various Huawei devices [56]. Group 4: Developer Engagement - The event featured numerous developers showcasing their contributions to the HarmonyOS ecosystem, indicating a robust collaborative environment [57][58]. - The article suggests that HarmonyOS is not merely a replacement for Android but is designed for the AI era, emphasizing its unique capabilities and fully domestic development [59][60].
o3-pro通关“推箱子”,人类怀旧小游戏成了大模型新Benchmark
量子位· 2025-06-16 04:50
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI o3-pro刚刚也挑战了这两款游戏,而且表现还都不错,直接 突破了benchmark上限 。 具体来说,benchmark中推箱子一共就只做到了被o3-pro突破的第六关;俄罗斯方块则是强行终止的结果,实际上o3-pro根本停不下来。 如果和前SOTA——o3比较,o3-pro的成绩也是直接翻倍。 还有网友直言,比起大模型竞技场,这套标准才更适合做测试大模型的基准。 经典小游戏成为新Benchmark 推箱子、俄罗斯方块……这些人类的经典怀旧小游戏,也成大模型benchmark了。 o3-pro挑战的这两个游戏,出自一套名为 Lmgame 的benchmark,顾名思义就是让大模型玩游戏。 o3-pro挑战的推箱子是从1989年的版本修改而来,在o3-pro之前,评估指标是游戏结束之前推动到目标位置的箱子总数。 不过这次o3-pro直接把所有关卡都通了,颇有种"得一百分是因为卷面只有一百分"的感觉。 但也不必担心,测试基准会动态更新,GItHub仓库中半个月前更新的游戏地图还只有四关,原版游戏更是有足足50多个关卡。 而在o3-pro挑战之前,表现最好的 ...
o3-pro通关“推箱子”,人类怀旧小游戏成了大模型新Benchmark
量子位· 2025-06-16 04:49
克雷西 发自 凹非寺 量子位 | 公众号 QbitAI 推箱子、俄罗斯方块……这些人类的经典怀旧小游戏,也成大模型benchmark了。 o3-pro刚刚也挑战了这两款游戏,而且表现还都不错,直接 突破了benchmark上限 。 还有网友直言,比起大模型竞技场,这套标准才更适合做测试大模型的基准。 经典小游戏成为新Benchmark o3-pro挑战的这两个游戏,出自一套名为 Lmgame 的benchmark,顾名思义就是让大模型玩游戏。 o3-pro挑战的推箱子是从1989年的版本修改而来,在o3-pro之前,评估指标是游戏结束之前推动到目标位置的箱子总数。 具体来说,benchmark中推箱子一共就只做到了被o3-pro突破的第六关;俄罗斯方块则是强行终止的结果,实际上o3-pro根本停不下来。 如果和前SOTA——o3比较,o3-pro的成绩也是直接翻倍。 不过这次o3-pro直接把所有关卡都通了,颇有种"得一百分是因为卷面只有一百分"的感觉。 但也不必担心,测试基准会动态更新,GItHub仓库中半个月前更新的游戏地图还只有四关,原版游戏更是有足足50多个关卡。 而在o3-pro挑战之前,表现最好的 ...
论文秒变海报!开源框架PosterAgent一键生成顶会级学术Poster
量子位· 2025-06-03 07:59
Core Viewpoint - The article introduces PosterAgent, a tool designed to convert academic papers into visually appealing posters, highlighting its efficiency and effectiveness compared to existing methods like GPT-4o [2][18]. Group 1: PosterAgent Overview - PosterAgent can transform a 22-page paper into an editable ".pptx" poster for only $0.0045, significantly reducing token usage by 87% compared to GPT-4o [2][36]. - The tool is built upon the Paper2Poster framework, which establishes the first academic poster evaluation standard, addressing gaps in long-context and multi-modal compression assessments [4][18]. Group 2: Evaluation Metrics - Paper2Poster includes 100 pairs of AI-related papers and their corresponding posters, covering various subfields like computer vision (19%), natural language processing (17%), and reinforcement learning (10%) [20]. - The evaluation metrics focus on four dimensions: visual quality, text coherence, overall assessment, and PaperQuiz, which simulates communication between authors and readers [22][23]. Group 3: PosterAgent Components - The PosterAgent framework consists of three key components: a parser for extracting key content, a planner for organizing text and visuals, and a painter-commenter for generating and refining the poster layout [28][29]. - The system employs a top-down design approach to ensure coherence and alignment of content [25]. Group 4: Performance Comparison - In comparative tests, PosterAgent achieved the highest graphic relevance and visual similarity to human-designed posters, scoring an average of 3.72 when evaluated by a visual language model (VLM) [31][32]. - While GPT-4o-image had the highest visual similarity, it recorded the lowest coherence, indicating that its outputs may appear attractive but lack textual clarity [30][31]. Group 5: Cost Efficiency - PosterAgent demonstrated significant cost efficiency, requiring only 101.1K and 47.6K tokens for different variants, translating to a cost of $0.55 (based on GPT-4o) or $0.0045 (based on Qwen) per poster [36].