Workflow
AI科技大本营
icon
Search documents
从「记忆解题」到「深度推理」:港科大推出首个本科数学动态评测基准 UGMathBench
AI科技大本营· 2025-06-09 10:41
数学推理能力作为衡量模型智能水平的关键指标,需对其进行全面公平的评估。然而,现有的 GSM8K、MATH 数学基准因覆盖不足和易被数据污染饱 受诟病,要么缺乏对本科水平数学问题的广泛覆盖,要么可能受到测试集的污染。 为了填补这些空白,来自香港科技大学的研究团队近日发表在 ICLR 2025的最新研究 UGMathBench——首个针对本科数学的多元化动态评测体系, 专为评估 LLM 在本科阶段各类数学主题下的推理能力而设计。它提供了动态多样的评估工具,首次将数学推理评测带入「动态污染防控」时代, 标志 着 LLMs 数学推理评估从"浅层解题"迈向"深层理解"。 论文地址:https://arxiv.org/pdf/2501.13766 | AGI-Eval | 评测榜单 入人机竞赛 | 评测集社区 | Data Studio 団 | | | など | | --- | --- | --- | --- | --- | --- | --- | | | 评测集社区:UGMathBench | | | | | | | | UGMathBench ☞▩ | | | | 我要参评 | | | | UGMathBench 是 ...
从「记忆解题」到「深度推理」:港科大推出首个本科数学动态评测基准 UGMathBench
AI科技大本营· 2025-06-09 09:41AI Processing
数学推理能力作为衡量模型智能水平的关键指标,需对其进行全面公平的评估。然而,现有的 GSM8K、MATH 数学基准因覆盖不足和易被数据污染饱 受诟病,要么缺乏对本科水平数学问题的广泛覆盖,要么可能受到测试集的污染。 为了填补这些空白,来自香港科技大学的研究团队近日发表在 ICLR 2025的最新研究 UGMathBench——首个针对本科数学的多元化动态评测体系, 专为评估 LLM 在本科阶段各类数学主题下的推理能力而设计。它提供了动态多样的评估工具,首次将数学推理评测带入「动态污染防控」时代, 标志 着 LLMs 数学推理评估从"浅层解题"迈向"深层理解"。 论文地址:https://arxiv.org/pdf/2501.13766 | AGI-Eval | 评测榜单 入人机竞赛 | 评测集社区 | Data Studio 団 | | | など | | --- | --- | --- | --- | --- | --- | --- | | | 评测集社区:UGMathBench | | | | | | | | UGMathBench ☞▩ | | | | 我要参评 | | | | UGMathBench 是 ...
Claude Code 首席工程师揭秘 AI 如何重塑开发日常!
AI科技大本营· 2025-06-07 09:42
Core Viewpoint - AI is revolutionizing software development, with tools like Claude Code enabling seamless integration of AI assistance in coding environments, enhancing productivity and changing programming paradigms [1][3]. Group 1: Claude Code Overview - Claude Code is designed to assist coding directly in the terminal, eliminating the need for switching tools or IDEs, making it universally applicable for developers [6][7]. - The tool has been validated through extensive internal use by Anthropic engineers, showcasing its effectiveness as a productivity tool [5][12]. - The evolution of programming paradigms is likened to a transition from "punch cards" to "prompts," indicating a significant shift in how coding is approached [5][23]. Group 2: User Experience and Adoption - The initial release of Claude Code saw a rapid increase in daily active users, indicating strong community interest and positive feedback from both internal and external testers [12][13]. - The tool is particularly suited for large enterprises, capable of handling extensive codebases without additional setup [16]. - Users can access Claude Code through a subscription model, with costs varying based on usage, typically around $50 to $200 per month for serious work [15][17]. Group 3: Functionality and Integration - Claude Code operates in various terminal environments and can be integrated with IDEs, enhancing its functionality and user experience [8][9]. - The latest models, such as Claude 3.5 Sonnet and Opus, have significantly improved the tool's ability to understand user commands and execute tasks effectively [25][26]. - Users can interact with Claude Code in a more intelligent manner, allowing it to autonomously handle tasks like writing tests and managing GitHub actions [20][28]. Group 4: Future Directions and Enhancements - Future developments for Claude Code include better integration with various tools and enhancing its capabilities for simpler tasks without needing to open a terminal [46][47]. - The use of `Claude.md` files allows users to share instructions and preferences, enhancing the tool's adaptability and efficiency across projects [38][41]. - The ongoing evolution of AI models necessitates continuous learning and adaptation from users to fully leverage the capabilities of tools like Claude Code [34][35].
对话智源王仲远:具身智能“小组赛”才刚刚开打,机器人需要“安卓”而非 iOS
AI科技大本营· 2025-06-07 09:42
悟道 1.0 发布时,学术界对" 大模型是通往 AGI 的技术路线 "尚未得出统一结论。 现在的具身智能,也处于这个阶段。 作者 | 王启隆 出品丨AI 科技大本营(ID:rgznai100) 大模型的热潮之下,一种微妙的瓶颈感,正成为行业共识。 "过往所说的 '百模大战',更多是大语言模型的竞争," 智源大会前夕, 智源研究院院长王仲远 在 与 CSDN 的对话中,开门见山地指出了问题的核 心,"而大语言模型受限于互联网数据的使用,性能虽然还在提升,但速度已大不如前。" 出路何在?在王仲远看来,AI 要突破天花板,就必须在"读万卷书"(互联网数据)后,去"行万里路"(物理世界)。 这并非孤立的判断。今年三月, 英伟达 CEO 黄仁勋就在 GTC 大会上为 AI 的下半场指明了方向 :打造"AI 工厂",迎接"物理 AI"时代,让 AI 走出屏 幕,与现实世 界交互。 思考趋于一致,行动便接踵而至。6 月 6 日,CSDN 在北京智源大会现场,见证了王仲远在他的主题演讲中给出的答案。如果说 2021 年的"悟道"系列 代表着对技术路径的探索(" 道 "),那么他所揭晓的全新"悟界"系列,则亮明了新的野心——用 ...
强化学习之父Richard Sutton:人类数据耗尽,AI正在进入“经验时代”!
AI科技大本营· 2025-06-06 10:18
Core Viewpoint - The article emphasizes that true intelligence in AI should stem from experience rather than pre-set human data and knowledge, marking a shift towards an "Era of Experience" in AI development [5][16]. Summary by Sections Introduction to the Era of Experience - The current era in AI is characterized by a transition from reliance on human-generated data to a focus on experiential learning, where AI systems learn through interaction with the world [9][16]. Key Insights from Richard Sutton's Speech - Richard Sutton argues that genuine AI must have a dynamic data source that evolves with its capabilities, as static datasets will become inadequate [6][9]. - He highlights that the essence of intelligence lies in the ability to predict and control sensory inputs, which is fundamental to AI and intelligence [13]. The Learning Process - The learning process in both humans and animals is based on interaction with the environment, where actions determine the information received, leading to a deeper understanding [10][11]. - Sutton illustrates that AI should emulate this learning process by engaging with the world to generate new data and enhance its capabilities [10][12]. Transition from Human Data to Experience - The article outlines a timeline of AI evolution, indicating that the current "Human Data Era" is nearing its end, paving the way for the "Experience Era" where AI learns through real-world interactions [14][16]. - Sutton emphasizes that the future of AI lies in its ability to continuously learn from experiences, which is essential for unlocking the full potential of the "Experience Era" [17]. Decentralized Cooperation - The concept of "decentralized cooperation" is introduced as a framework for understanding social organization, where multiple agents pursue their own goals while collaborating for mutual benefit [24][25]. - Sutton argues that human prosperity and the future of AI should be built on this foundation of decentralized cooperation rather than centralized control [27][28]. Conclusion - The article concludes by encouraging a shift in perspective towards viewing interactions between humans and AI through the lens of decentralized cooperation versus centralized control, which could provide valuable insights into future developments in AI [28].
“AGI 五年内或将实现”:AI 教父 Bengio 呼吁中美达成共识,警惕 AI 沦为人类武器
AI科技大本营· 2025-06-06 10:18
【编者按】作为深度学习三巨头之一,图灵奖得主、AI 教父 Yoshua Bengio 在 2025 北京智源大会上,他表示: AI 能完成的任务时长,每七个月就翻一 番,大约五年后,AI 就将达到人类水平, 通用人工智能(AGI)或将在五年内到来,而人类社会却尚未在规则、立法乃至全球治理层面达成一致。 整理 | 梦依丹 出品丨AI 科技大本营(ID:rgznai100) 自从 ChatGPT 横空出世,AI 进入了加速进化的轨道。从最初能写代码、生成文案,到如今能上网查资料、远程操控家电,它早就不再是那个只会聊天 解闷的"电子嘴替"。它开始自己"思考"任务,能在多个软件之间协同操作,甚至具备控制电脑、读写数据库的能力。AI 从幕后算法,变成了贴身助 手,再逐步演化成能自主执行复杂操作的"智能体"——从"听话"走向"行动",它正成为一个真正能"做事"的多面选手。 他呼吁,我们正处在一个关键的时间窗口,必须尽快建立可验证、安全、负责任的控制机制。 演讲伊始,Bengio 教授便分享了一段深刻的个人心路历程。他坦言,在亲身体验 ChatGPT 并目睹 AI 飞速进化后,深感此前对 AI 失控风险的认知不 足。而一个 ...
图灵奖得主 Bengio 官宣创业:要在 AGI 到来前守住 AI 最后一公里
AI科技大本营· 2025-06-05 02:22
"坐在我身边的是我的孩子,我的孙辈,我的学生,还有许多其他人。那你呢?是谁坐在你的副驾驶座?"——图灵奖得主 Yoshua Bengio 在 TED 演讲中发 出灵魂提问,沉甸甸地指向 AI 时代的人类命运共同体。 当「AGI」正以令人眩目的速度逼近,谁在为"安全"这道防线筑基? 整理 | 梦依丹 出品丨AI 科技大本营(ID:rgznai100) 图灵奖得主、深度学习奠基人、全球被引用次数最多的 AI 科学家 Yoshua Bengio 官宣创业。成立一家名为 LawZero 非营利 AI 安全研究机构,以"安 全优先"原则回应人工智能可能带来的系统性风险。 LawZero 是一家以研究和技术开发为核心使命的非营利组织,旨在构建"设计即安全"的 AI 系统,并组建一支由世界顶尖研究者组成的技术团队。 "当前的 AI 系统已展现出自我保护和欺骗行为迹象,而随着其能力和自主性的增强,这种趋势只会加速。"Bengio 在博文中列出了多个案例: 以上这些 AI 行为所展现出来的是 AI 系统在缺乏安全约束机制下,可能发展出不受控制的目标偏差与策略选择。 深度学习三巨头纷纷发出 AI 安全警告 作为 AI 领域的殿堂 ...
Cursor 1.0 正式发布:AI 代码编辑器进入“自动审查 + 记忆”时代!
AI科技大本营· 2025-06-05 02:22
Core Viewpoint - The official release of Cursor 1.0 marks a significant evolution of the AI-driven code editor from an "assistant tool" to an intelligent programming platform with review, memory, and collaboration capabilities [1][19]. Feature Highlights - Cursor 1.0 introduces several key features, including the automatic code review assistant BugBot, native support for Jupyter Notebooks, project-level AI memory (Memories), and the comprehensive opening of the Background Agent [2][19]. - BugBot can automatically review Pull Requests on GitHub, identifying potential bugs and issues, and allows developers to quickly implement suggested fixes [5][6]. - The Background Agent, previously in early testing, is now available to all users, enhancing remote coding capabilities [8][9]. - The integration of Jupyter Notebooks allows developers in data science and research to make changes directly within the platform [11]. Memory Functionality - The introduction of the Memories feature enables the storage of knowledge points and contextual information at the project level, which can be automatically recalled in future interactions [12][13]. Enhanced User Experience - Cursor 1.0 improves user experience with the ability to view visual content like Mermaid charts and Markdown tables directly in chat conversations, making communication more intuitive [18]. - The settings page and dashboard have been optimized for better usage statistics and data analysis [18]. Deployment and Integration - Developers can now quickly deploy Model Control Protocol (MCP) services with one-click installation and OAuth support, facilitating easier integration of additional model capabilities [15][16]. - MCP developers can add an "Add to Cursor" button in their documentation to enhance service accessibility for other developers [17].
辛顿、杨立昆等 AI 先驱都源自信号处理——对话 IEEE 首位华人主席、美国双院院士刘国瑞 | 万有引力
AI科技大本营· 2025-06-04 05:42
Core Viewpoint - The article highlights the journey and achievements of K. J. Ray Liu, emphasizing his contributions to the field of wireless sensing and AI, as well as his philosophy of pursuing dreams and maintaining one's original intentions in life and career [2][15][40]. Group 1: Personal Journey - K. J. Ray Liu was born in Taiwan and showed early interest in communication and signal processing, which became his lifelong profession [2][4]. - He faced challenges during his academic journey, including a difficult transition to studying in the U.S. and overcoming biases as a Chinese scholar [5][6]. - Liu became the first Asian president of IEEE in 2022, implementing significant reforms during his tenure [6][9]. Group 2: Contributions to Education - Liu has mentored over 70 doctoral and postdoctoral students, many of whom have achieved notable success in academia and industry [11][30]. - His teaching philosophy emphasizes the importance of independent thinking and problem discovery among students, rather than merely solving assigned problems [31][32]. Group 3: Transition to Industry - Liu retired from academia to pursue entrepreneurship in wireless AI, believing that practical applications require real-world data and environments [39][40]. - His company, Origin Wireless, focuses on utilizing wireless signals for environmental sensing, which has significant implications for health monitoring and safety [41][42]. Group 4: Vision for Wireless AI - Wireless AI aims to leverage ubiquitous wireless signals to perceive and understand human activities and health conditions without the need for wearable devices [41][42]. - The technology has already been deployed in various regions for remote monitoring, demonstrating its potential to save lives and improve health outcomes [42].
智能体时代,人类与AI如何分工?
AI科技大本营· 2025-06-04 05:42
Core Insights - The rise of intelligent agents is fundamentally reshaping the dimensions of work, liberating it from fixed physical spaces and designated time periods, marking a transition from the industrial and information eras to the intelligent agent era [1][4][5] - The division of labor between humans and AI is shifting from execution to definition, where humans must now answer "why to do" as machines take over "how to do" [3][5] Work Transformation - The traditional work model, which required synchronous presence in a specific location, is being disrupted by intelligent agents, allowing for asynchronous collaboration and task completion [6][11] - The emergence of remote work during the pandemic has accelerated this transformation, leading to a deeper paradigm shift in how work is structured [4][6] Task Atomization - Work is being "atomized" into discrete tasks that can be dynamically assigned to the most suitable executors, whether human or AI, reflecting a significant shift from fixed positions to flexible task collections [8][9] - The Upwork report indicates a 73% increase in task-based contracts compared to a 12% growth in traditional time-based contracts, highlighting the labor market's transition towards task-oriented work [8] Collaboration Dynamics - Intelligent agents are evolving into collaborative intermediaries, facilitating communication and cooperation among team members with diverse backgrounds [12][11] - The boundaries between work and life are blurring, leading to a new reality where work and personal life are increasingly integrated rather than balanced [12][13] Challenges of Integration - The "always-on" culture is emerging, with many remote workers finding it difficult to disconnect from work, leading to longer working hours and potential family conflicts [13][16] - Social isolation is a growing concern, particularly among younger professionals who miss out on networking opportunities typically found in traditional workplaces [14] Skills for the Intelligent Agent Era - The skill set required for collaboration with intelligent agents is evolving, emphasizing the need for cognitive strategies and meta-skills alongside technical abilities [19][20] - System thinking, judgment, and decision-making are becoming critical skills as humans navigate complex interactions with intelligent agents [21][22] Future Outlook - The intelligent agent revolution is not just a transformation of work but also a redefinition of personal identity and societal structures, necessitating a reevaluation of what constitutes meaningful work and a fulfilling life [24][25]