机器之心

Search documents
谷歌约战,DeepSeek、Kimi都要上,首届大模型对抗赛明天开战
机器之心· 2025-08-05 04:09
Core Viewpoint - The upcoming AI chess competition aims to showcase the performance of various advanced AI models in a competitive setting, utilizing a new benchmark testing platform called Kaggle Game Arena [2][12]. Group 1: Competition Overview - The AI chess competition will take place from August 5 to 7, featuring eight cutting-edge AI models [2][3]. - The participating models include notable names such as OpenAI's o4-mini, Google's Gemini 2.5 Pro, and Anthropic's Claude Opus 4 [7]. - The event is organized by Google and aims to provide a transparent and rigorous testing environment for AI models [6][8]. Group 2: Competition Format - The competition will follow a single-elimination format, with each match consisting of four games. The first model to score two points advances [14]. - If a match ends in a tie (2-2), a tiebreaker game will be played, where the white side must win to progress [14]. - Models are restricted from using external tools like Stockfish and must generate legal moves independently [17]. Group 3: Evaluation and Transparency - The competition will ensure transparency by open-sourcing the game execution framework and environment [8]. - The performance of each model will be displayed on the Kaggle Benchmarks leaderboard, allowing real-time tracking of results [12][13]. - The event is designed to address the limitations of current AI benchmark tests, which struggle to keep pace with the rapid development of modern models [12].
清华叉院教授手把手教你写强化学习
机器之心· 2025-08-05 04:09
Core Insights - The article discusses AReaL-lite, a reinforcement learning training framework designed for algorithm developers, allowing users to modify a single file to implement various RL training algorithms and custom agent workflows, while achieving optimal model performance through Fully Async RL [1][10]. Group 1: Event Details - The sharing session will feature Professor Wu Yi from Tsinghua University's Interdisciplinary Information Institute and core members of the AReaL team, using a multi-turn math reasoning example to teach RL [2][10]. - The live session is scheduled for August 7, 19:30-20:30 Beijing time, and participants are encouraged to prepare a GPU server, preferably with 4 cards [8][10]. Group 2: AReaL-lite Features - AReaL-lite's key characteristics include: - Fully async RL for rapid training [10]. - Ecosystem-friendly, compatible with various open-source ecosystems [10]. - Algorithm-first approach, ensuring minimal file modifications for complex algorithms [10]. Group 3: Team Introduction - The team includes: - Wu Yi, Assistant Professor at Tsinghua University and Chief Scientist of the AReaL team [10]. - Fu Wei, a PhD student at Tsinghua University and core member of the AReaL project [10]. - Mei Zhiyu, a researcher at Ant Group's reinforcement learning lab and a PhD from Tsinghua University [10].
南大周志华团队最新力作:一个算法通吃所有,在线学习迎来新范式?
机器之心· 2025-08-05 04:09
在在线凸优化(online convex optimization)的框架下,已有一些算法能够有效地最小化自适应遗憾值。然而,现有算法存在通用性不足的问题:它们通常只能处 理某一类特定的凸函数,并且需要预先知道某些参数,这限制了它们在实际场景中的应用。 机器之心报道 编辑:冷猫、Panda 世界是动态变化的。为了理解这个动态变化的世界并在其中运行,AI 模型必须具备在线学习能力。为此,该领域提出了一种新的性能指标 —— 适应性遗憾值 (adaptive regret),其定义为任意区间内的最大静态遗憾值。 为了解决这一局限,南京大学周志华团队研究了具有 双重 自适 应性(dual adaptivity) 的通用算法。这类算法不仅能够自动适应函数的性质(如凸、指数凹或强 凸),还能够适应环境的变化(如静态或动态环境)。 论文标题:Dual Adaptivity: Universal Algorithms for Minimizing the Adaptive Regret of Convex Functions 论文链接:https://arxiv.org/pdf/2508.00392 具体而言,该团队提出了一 ...
全球首个人形机器人通用视觉感知系统,Humanoid Occupancy建立多模态环境理解新范式
机器之心· 2025-08-05 04:09
第一作者崔巍,北京人形机器人创新中心感知算法负责人;共同一作王浩宇,极佳科技算法工程师,项目负责人;通讯作者张强,北京人形机器人创新中心 学术委员会主任。 凭借类人化的结构设计与运动模式,人形机器人被公认为最具潜力融入人类环境的通用型机器人。其核心任务涵盖操作 (manipulation)、移动 (locomotion) 与导航 (navigation) 三大领域,而这些任务的高效完成,均以机器人对自身所处环境的全面精准理解为前提。 然而,传统感知系统存在明显局限:有些仅能适配特定场景,难以应对复杂多变的真实环境;有些无法有效融合多种传感器信息,导致数据利用率低下。这 直接造成机器人在实际应用中频繁出现感知失效问题,严重制约了任务执行效率。 为此,北京人形机器人创新中心推出 Humanoid Occupancy 感知系统,为破解这一行业难题提供了革命性方案。该系统通过创新性融合多模态传感器信 息,构建起基于语义占用 (occupancy) 表征的通用感知框架,能够精准捕捉环境中的语义属性与几何特征,为机器人的任务规划和导航决策奠定坚实基 础,也为人形机器人向实际场景大规模部署迈出了关键的一步。 论文标题:Hu ...
手机也能跑,腾讯混元一口气开源4款小模型
机器之心· 2025-08-04 09:01
Core Viewpoint - Tencent's Hunyuan team has open-sourced four small language models, with the largest model being 7 billion parameters, aimed at low-power consumption scenarios and supporting vertical domain fine-tuning [1][3]. Model Characteristics - The four models can run on consumer-grade graphics cards, making them suitable for laptops, smartphones, and smart home devices [3]. - They are designed as fusion inference models, offering fast inference speeds and high cost-performance ratios, with capabilities in language understanding, mathematics, and reasoning [6]. - The models have a long context window of 256k, allowing them to process extensive content equivalent to reading three "Harry Potter" novels [12]. Deployment and Usability - All four models can be deployed on a single card, with compatibility for various consumer devices [12]. - They have been tested in multiple core business applications within Tencent, demonstrating their practicality and effectiveness [15]. Industry Context - The trend of open-sourcing AI models is gaining momentum in China, with Tencent being a significant player in this movement [16][20]. - The recent release of the Hunyuan 3D World Model has also gained significant traction, indicating a growing interest in multi-modal AI capabilities [17]. Application Scenarios - The models are utilized in productivity tools like Tencent Meeting AI Assistant and WeChat Reading AI, achieving precise understanding and summarization of extensive texts [18]. - In the financial sector, AI assistants using these models can achieve over 95% intent recognition accuracy with minimal fine-tuning [18].
3D-R1:让AI理解3D世界的下一步
机器之心· 2025-08-04 09:01
在人工智能快速发展的今天,我们已逐渐习惯于让 AI 识别图像、理解语言,甚至与之对话。但当我们进入真实三维世界,如何让 AI 具备「看懂场景」、「理解 空间」和「推理复杂任务」的能力?这正是 3D 视觉语言模型(3D VLM)所要解决的问题。 背景:3D 场景理解为何重要? 让 AI 理解一个真实的三维环境,远比识别一张图片复杂得多。无论是服务机器人、自动驾驶,还是 AR/VR 应用,都离不开 AI 对空间结构、物体布局和多步任务 的精准理解。但当前大多数 3D VLM 依然存在两大核心问题: 3D-R1:增强推理能力的 3D 通用模型 本文介绍的一项新研究 —— 3D -R 1 ,提出了一种更通用、更具推理能力的三维视觉语言模型,它在多个 3D 任务中表现出了 显著的性能提升 ,有望成为 3D 人 工智能通用系统的新范式。 论文标题: 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding 论文链接: https://arxiv.org/pdf/2507.23478 为解决上述挑战,研究团队提出了 3D-R1。它不仅聚焦于对 ...
在WAIC耳朵听出茧子的「智能体」,是时候系统学一下了
机器之心· 2025-08-04 07:05
摘自Deep(Learning)Focus 作者: Cameron R. Wolfe 机器之心编译 在今年的世界人工智能大会(WAIC)上,智能体是绝对的主角,从 C 端产品到企业级应用,每家参展的 AI 厂商似乎都要提一下在智能体方向的布局。 这其实揭示了一个重要转变:人们不再把 AI 大模型当成一个单纯的聊天机器人,而是希望它能像人一样主动思考、制定计划、使用各种工具来完成任务,这是接 下来大模型走向应用的重要方向。 看来,对于 AI 从业者来说,是时候系统了解一下「智能体」了。 刚好,我们找到了一篇写得非常全面的博客。博客作者是 Netflix 高级研究科学家、莱斯大学博士 Cameron R. Wolfe 。他从最基础的 LLM 说起,逐步引入工具、 推理、自主规划的能力,深度分析了 AI 智能体的底层逻辑。 博客地址:https://cameronrwolfe.substack.com/p/ai-agents 以下是博客的详细内容。 LLM及其能力 标 准 LLM 的输入输出特征 标准 LLM 的功能如上所示。给定一个文本提示,LLM 生成一个文本响应。从许多方面来看, LLM 的通用性是 其最大的 ...
机器人手画圆圈,怎么就成为了一大难题了?
机器之心· 2025-08-04 07:05
Core Viewpoint - The article discusses advancements in robotics, particularly focusing on a flexible robotic hand developed by Daxo Robotics, which showcases unique capabilities such as drawing a circle, highlighting the potential of soft robotics in achieving human-like dexterity and flexibility [1][10]. Group 1: Innovations in Robotics - The upcoming World Robot Conference has prompted increased attention to robotic innovations, including humanoid robots and household robots [1][2]. - Daxo Robotics has introduced a robotic hand that can perform tasks like drawing circles, which is claimed to be a significant achievement in robotic dexterity [5][10]. - The robotic hand features 40 tendons without traditional joints, allowing for a unique range of motion and control that traditional robotic arms cannot achieve [7][8]. Group 2: Technical Specifications - The robotic hand is said to possess "unlimited" degrees of freedom and a grip strength of 7 kilograms, indicating its advanced design and functionality [8]. - Unlike rigid robots, the flexible robotic hand can exhibit hundreds of controllable degrees of freedom, enhancing its adaptability and performance [10]. - The hand is designed for machine learning, utilizing both remote control and simulation to gather data, which allows for extensive exploration in its learning capabilities [10]. Group 3: Market Implications - The demonstration of the robotic hand's capabilities, such as drawing and manipulating objects, suggests a shift towards more sophisticated and flexible robotic solutions in various applications [10][11]. - The interest generated by Daxo Robotics' innovations indicates a growing market for soft robotics, which may outperform traditional rigid robots in specific tasks [10][11].
ACM MM 2025 | 小红书AIGC团队提出风格迁移加速算法STD
机器之心· 2025-08-04 07:05
本论文主要作者来自小红书 AIGC 团队(Dynamic-X-Lab),Dynamic‑X‑LAB 是一个专注于 AIGC 领域的研究团队,致力于推动姿态驱动的人像生成与视频动画 技术。他们以高质量、高可控性的生成模型为核心,围绕文生图(t2i)、图像生成(i2i)、图像转视频(i2v)和风格迁移加速等方向展开研究,并通过完整的开 源方案分享给开发者与研究者社区。 基于一致性模型(Consistency Models, CMs)的轨迹蒸馏(Trajectory Distillation)为加速扩散模型提供了一个有效框架,通过减少推理步骤来提升效率。然而,现 有的一致性模型在风格化任务中会削弱风格相似性,并损害美学质量 —— 尤其是在处理从部分加噪输入开始去噪的图像到图像(image-to-image)或视频到视频 (video-to-video)变换任务时问题尤为明显。 这一核心问题源于当前方法要求学生模型的概率流常微分方程(PF-ODE)轨迹在初始步骤与其不完美的教师模型对齐。这种仅限初始步骤对齐的策略无法保证整 个轨迹的一致性,从而影响了生成结果的整体质量。 为了解决这一问题,文章提出了 单轨 迹 蒸馏( ...
刚刚,全球首个集成云端Agent团队的IDE登场,项目级开发「全程全自动」
机器之心· 2025-08-04 07:05
Core Viewpoint - The article discusses the recent incident involving AI programming tool Replit, which mistakenly deleted a company's production database, raising concerns about the reliability of AI in coding [1][2][24]. Group 1: Incident and Response - On March 19, Jason Lemkin revealed that while using Replit, an AI tool, the company's production database was deleted after rewriting a core page [1]. - Replit's CEO Amjad Masad acknowledged the incident as "completely unacceptable" and announced measures to prevent future occurrences, including automatic isolation of database development and production environments [2][3]. - Despite the incident, the rapid iteration of AI tools continues, with new developments emerging shortly after the event [3]. Group 2: Evolution of AI Programming - AI programming is evolving from single-agent systems to multi-agent systems, emphasizing task decomposition and parallel collaboration [7]. - The shift from local to cloud-based agent programming allows for the integration of remote model capabilities and resources, facilitating the construction of complex agent systems [7][8]. - Vinsoo Code is developing a cloud-based multi-agent programming team, aiming to enhance project-level development efficiency [9][10]. Group 3: Features of Vinsoo Code - Vinsoo's cloud-based agent system integrates various engineering roles, significantly increasing development efficiency by allowing parallel task distribution among agents [11][13]. - The system operates on a "local IDE + cloud agent" model, enabling developers to synchronize projects to the cloud and assign tasks to different agents for a complete development cycle [13][14]. - Two operational modes, Vibe Mode and Full Cycle Mode, cater to different development needs, from rapid prototyping to comprehensive project execution [15][16]. Group 4: System Capabilities - The cloud agent system supports multi-terminal coordination, allowing distributed components to communicate and collaborate effectively [19][20]. - It features a robust debugging strategy that automates the entire project process, enhancing the developer's experience by minimizing manual intervention [20][21]. - The system's design includes long-context engineering compression and dynamic task execution planning, improving reliability and adaptability in complex projects [23][25]. Group 5: Security and Isolation - The cloud environment provides a secure and isolated execution space for agents, mitigating risks associated with local environments, such as dependency conflicts and security vulnerabilities [27]. - Each agent operates within a sandbox, preventing unauthorized access to local files and reducing the likelihood of data breaches [27]. - The system's architecture enhances the safety and traceability of code execution, addressing concerns raised by previous incidents involving AI tools [27]. Group 6: Local Development Experience - Vinsoo has developed a local AI IDE that complements the cloud-based system, offering features like codebase indexing and command execution tools [28][29]. - The local IDE supports both Vibe Mode and Full Cycle Mode, ensuring a seamless development experience [28][29]. - The integration of local and cloud capabilities aims to enhance the overall programming experience for developers [33]. Group 7: Company Background - Vinsoo Code is developed by AiYouthLab, a startup founded in Tsinghua Science Park, focusing on AI applications in programming [35][36]. - The founding team comprises members from prestigious universities and has a history of impactful educational projects [38]. - The company aims to revolutionize the development landscape by addressing fragmentation and collaboration challenges faced by individual developers [38]. Group 8: Future Trends - The article highlights a significant technological shift in the development field, with AI tools rapidly evolving and changing the programming paradigm [40]. - By 2025, the trend of "everything being an agent" is expected to dominate the AI landscape, enhancing productivity and efficiency in software development [41][42]. - The integration of AI agents into development processes is anticipated to transform how developers manage projects, focusing on high-level management rather than direct coding [42].