机器之心

Search documents
「开发者私下更喜欢用GPT-5写代码」,Claude还坐得稳编程王座吗?
机器之心· 2025-08-27 03:18
Core Viewpoint - The article discusses the competitive landscape between Anthropic's Claude and OpenAI's GPT-5 in the programming model space, highlighting a shift in user preference towards GPT-5 due to its superior performance in various programming tasks [1][3][8]. Summary by Sections Performance Comparison - Claude Opus 4.1 has shown significant improvements in programming tasks, particularly in multi-file code refactoring, as per the SWE-bench Verified tests [1]. - However, GPT-5 has gained popularity among users, with many reporting a preference for its capabilities over Claude, especially in handling complex programming tasks [3][8]. User Feedback - Users have noted that GPT-5 is perceived as the best programming model available, with one developer stating it is the most effective model they have used [5]. - Feedback indicates that GPT-5 excels in instruction following and large-scale refactoring tasks, outperforming Claude in these areas [6]. User Experience - Some users express a continued appreciation for Claude, particularly for its speed in code completion tasks, but acknowledge that GPT-5 is gaining their trust for more complex tasks [4]. - A software engineer highlighted that Claude tends to perform poorly outside of coding tasks, exhibiting high hallucination rates in other domains, while GPT-5 maintains lower hallucination rates and better search capabilities [9][10]. General Sentiment - There is a growing consensus among users that GPT-5's programming capabilities are superior, with many shifting their focus from Claude to GPT-5 for coding tasks [7][8]. - Users who initially doubted GPT-5 have reported positive experiences after using it, indicating a shift in perception regarding its effectiveness across various fields [11].
打磨7年,李航新书《机器学习方法(第2版)》发布,有了强化学习,赠书20本
机器之心· 2025-08-27 03:18
机器之心报道 机器之心编辑部 每个领域的发展,都离不开几本奠定基础的经典书籍,人工智能亦是如此。 此前,李航老师的《统计学习方法》《统计学习方法(第 2 版)》可以说是机器学习宝典,很多学生、老师都将此书奉为必读书籍。 然而,随着 AI 技术的快速发展,特别是深度学习的飞跃式进展,一本仅覆盖传统机器学习的教材,已无法全面反映当前机器学习技术的全貌。 因此,李航老师在前两版的基础上,又推出了《机器学习方法》,新增深度学习内容。 而近期,AI 圈对于强化学习的关注也在迅速升温。从大模型与智能体的融合尝试,到强化学习在游戏、机器人控制、决策优化中的广泛应用,这一方向再次成为 焦点。然而,此前许多教材对此涉及较少,甚至完全缺席,导致很多人无法系统学习。 现在这个问题也解决了。 李航老师全新上线新书《机器学习方法(第 2 版)》 ,将强化学习独立成篇,系统介绍了强化学习的基本框架与代表算法,包括马尔可 夫决策过程、多臂老虎机问题、深度 Q 网络等。 全书共分为 4 篇( 或 4 册) ,对应 监督学习、无监督学习、深度学习和强化学习 4 个主要分支。 至此,《机器学习方法(第 2 版)》构建起了一个覆盖监督学习、无监督 ...
谷歌nano banana正式上线:单图成本不到3毛钱,比OpenAI便宜95%
机器之心· 2025-08-27 00:46
机器之心报道 编辑:Panda 昨晚,神秘且强大的图像生成与编辑模型 nano banana 终于正式显露真身。没有意外,它果然来自谷歌,并且也获得了一个正式但无趣的名字: gemini-2.5-flash- image-preview 。 据介绍,该模型具有「SOTA 的图像生成与编辑能力、惊人的角色一致性以及闪电般的速度」。 下面是谷歌官方分享的一些示例: 从其名字也可以猜测,谷歌应该还有一个非 flash 的 gemini-2.5-image 模型 —— 其性能应该会更加强大,但速度会更慢。 现目前,gemini-2.5-flash-image-preview 已经在 Google AI Studio 和 Gemini API 中提供了预览。用户可以免费试用。 可以看到,gemini-2.5-flash-image-preview 支持 32k 上下文,提供了温度(可以控制模型的创造力)以及一些高级设置。 然而,遗憾的是,该模型尚不支持对中文输入执行图像生成和编辑,而是会给出文本响应。 另外,在 Gemini 中,用户只需选择 2.5 Flash 并使用合适的提示词,也可以使用该模型。 价格方面,gem ...
手把手教机器人:斯坦福大学提出RTR框架,让机械臂助力人形机器人真机训练
机器之心· 2025-08-27 00:46
我们将这一创新的软硬件协同系统命名为 RTR (Robot-Trains-Robot),凸显了由机器人教师提供主动物理辅助,对于实现人形机器人真机强化学习有重要意义。同 时,为解决真实世界数据采集成本高昂的难题,团队还提出了一种新颖的强化学习算法,通过仅优化一个与环境动力学相关的低维隐变量来快速调整机器人的行 为,极大地提升了样本效率。这一算法上的贡献进一步解放了真机强化学习的潜力,在评测中显著超越了 RMA 等传统的在线系统识别基准。 人形机器人的运动控制,正成为强化学习(RL)算法应用的下一个热点研究领域。当前,主流方案大多遵循 "仿真到现实"(Sim-to-Real)的范式。研究者们通过 域随机化(Domain Randomization)技术,在成千上万个具有不同物理参数的仿真环境中训练通用控制模型,期望它能凭借强大的泛化能力,直接适应动力学特性 未知的真实世界。尽管这类 "零样本迁移"(Zero-Shot Transfer)方案在多种运动任务上取得了卓越表现,但其本质目标是训练一种在任何环境下都 "能用" 的保守 策略。这种策略牺牲了机器人在特定真实环境中的性能上限,因为对于最终落地而言,真实世界的 ...
将数据优势发挥到极致:「杭州六小龙」开源搭建空间智能的第一步
机器之心· 2025-08-26 09:38
数据可以用来训练模型,这些模型又可以进一步强化工具的能力,以此形成了数据飞轮,在三个环节(工具、数据、模型)相互循环。 在三维领域,数据一直是困扰人工智能对空间理解的长期问题。在昨天,我们应邀参加了「杭州六小龙」之一群核科技的首届 TechDay,看到了在室内空间设计领 域的企业对于空间智能的思考。 机器之心报道 编辑:冷猫 如果你拥有了庞大的三维空间数据,你会用来做什么? 大模型时代之后,数据成了支撑模型的承重柱。能否获取足够的可用高质量数据,直接决定了某个领域的 AI 的发展上限。 而有了足够的数据,构建一个强大的大模型和生成模型,似乎总是水到渠成的事情。 想想看,视频生成模型里,可灵即梦等高质量模型,都是依托最大的视频内容 UGC 平台的海量数据而生的。这些数据自然也成为了模型进步最大优势。 我们想象的人工智能改变生活,都希望人工智能帮助我们打扫卫生做饭,我们可以吟诗作画。但现在反过来了,人工智能在吟诗作画,我们在那边打扫卫生 。 要实现对人工智能改变生活的美好愿景,必须让人工智能从数字世界走向物理世界。 群核科技的联合创始人黄晓煌认为,「空间智能是非常关键的桥梁。」 首席科学家周子寒在演讲中提到:「群 ...
FlashAttention-4震撼来袭,原生支持Blackwell GPU,英伟达的护城河更深了?
机器之心· 2025-08-26 09:38
Core Viewpoint - FlashAttention-4, introduced by Tri Dao at the Hot Chips 2025 conference, demonstrates significant performance improvements over previous versions and competitors, particularly in the context of NVIDIA's GPU architecture [1][2][10]. Summary by Sections FlashAttention-4 Introduction - FlashAttention-4 is reported to be up to 22% faster than NVIDIA's cuDNN library implementation on the Blackwell architecture [2]. - The new version incorporates two key algorithmic improvements: a new online softmax algorithm that skips 90% of output rescaling and a software simulation for better throughput [4][5]. Performance Enhancements - The kernel developed by Tri Dao's team outperforms NVIDIA's latest cuBLAS 13.0 library in specific computation scenarios, particularly when the reduction dimension K is small [7]. - FlashAttention-4 utilizes CUTLASS CuTe Python DSL, which is significantly more challenging to port to ROCm HIP compared to CUDA C++ [6]. Competitive Landscape - The development of FlashAttention is seen as a core advantage for NVIDIA, as Tri Dao and his team primarily use NVIDIA GPUs and have open-sourced much of their work for the developer community [10]. - There are implications for AMD, suggesting that financial incentives may be necessary to encourage Tri Dao's team to develop for ROCm [10]. Historical Context and Evolution - FlashAttention was first introduced in 2022, addressing the quadratic time and memory overhead of traditional attention mechanisms by reducing memory complexity from O(N²) to O(N) [12]. - Subsequent versions, including FlashAttention-2 and FlashAttention-3, have continued to enhance performance, with FlashAttention-2 achieving speed improvements of 2-4 times over its predecessor [21]. Technical Innovations - FlashAttention-3 achieved a speed increase of 1.5-2.0 times over FlashAttention-2, reaching up to 740 TFLOPS on H100 GPUs [23]. - FlashAttention-4 introduces native support for Blackwell GPUs, addressing previous compilation and performance issues [24]. Community Engagement - The GitHub repository for FlashAttention has garnered over 19,100 stars, indicating strong community interest and engagement [25].
英伟达再出手!新型混合架构模型问世,两大创新实现53.6倍吞吐提速
机器之心· 2025-08-26 09:38
机器之心编辑部 又一个真正轻量、快速、强悍的大语言模型闪亮登场! Transformer 架构对计算和内存的巨大需求使得大模型效率的提升成为一大难题。为应对这一挑战,研究者们投入了大量精力来设计更高效的 LM 架构。 与此同时,大量工作致力于构建混合模型,将全注意力和线性注意力相结合,以在准确性和效率之间取得平衡。虽然这些模型比全注意力架构具有更高的效率, 但其准确性仍明显落后于 SOTA 全注意力模型。 近日, 来自英伟达的研究者提出了一种新的混合架构语言模型新系列 ——Jet-Nemotron 。其在达到 SOTA 全注意力模型精度的同时,还具备卓越的效率。 机器之心报道 具体来说,2B 版本的 Jet-Nemotron 性能就能赶超 Qwen3、Qwen2.5、Gemma3 和 Llama3.2 等最 SOTA 开源全注意力语言模型,同时实现了显著的效率提升。在 H100 GPU 上,其生成吞吐量实现了高达 53.6 倍的加速(上下文长度为 256K,最大 batch size)。 此外,在 MMLU 和 MMLU-Pro 基准上,Jet-Nemotron 的准确率也超过了近期一些先进的 MoE 全注意 ...
谷歌偷偷搞了个神秘模型Nano-Banana?实测:强到离谱,但有3大硬伤
机器之心· 2025-08-26 08:53
机器之心报道 编辑:杨文 神秘AI模型Nano-Banana火了,冒出一堆假网站,李鬼和李逵傻傻分不清。 最近,AI 社区又冒出一个神秘的图像生成和编辑模型,名叫 Nano-Banana。 起初它在 LMArena 平台的「Battle」模式中被发现,但未在公开排行榜上列出,也没有官方开发者明确声称其归属。 不过很多网友循着蛛丝马迹,猜测 这可能是谷歌的研究模型 。 上周二,谷歌 AI Studio 产品负责人 Logan Kilpatrick 在 X 上发布了一个香蕉表情符号。 | Logan Kilpatrick � □ @OfficialLoganK · Aug 20 | | | | | --- | --- | --- | --- | | 2 | | | | | C) 358 | C 2.7K | 11 310 | III 626K | 谷歌 DeepMind 产品经理 Naina Raisinghani 也发布了一张与意大利艺术家 Maurizio Cattelan 2019 年创作的胶带粘贴香蕉艺术作品类似的图片。 以上种种,似乎都在暗示它出自谷歌之手。 上传一张模特照再加上一张棒球帽子图,输入提示 ...
一天之内,Meta痛失两员大将,小扎钞能力失效?
机器之心· 2025-08-26 08:53
Core Viewpoint - Meta is experiencing significant talent attrition, particularly among top AI researchers, due to internal management issues and a lack of alignment with the company's vision and culture [1][9][39]. Group 1: Talent Departure - Two senior researchers, Rishabh Agarwal and Bert Maher, recently announced their departure from Meta, with Agarwal moving to an unspecified location and Maher joining Anthropic [3][24]. - Agarwal's exit highlights the argument that even high salaries cannot retain top talent, as he follows Zuckerberg's advice on taking risks in a rapidly changing world [14][39]. - Maher, who worked at Meta for 12 years, contributed to significant projects like PyTorch and HHVM, indicating the loss of valuable expertise [25][27]. Group 2: Internal Management Issues - Meta's internal management culture is cited as a reason for its low employee retention rate of 64%, compared to Anthropic's 80% [30][33]. - Previous complaints from former employees, including John Carmack and Tijmen Blankevoort, point to issues such as poor resource utilization, performance evaluation pressures, and internal competition [33][34]. - The lack of a strong CTO to balance the power of the CEO is seen as a potential risk for the company's future stability [11]. Group 3: Cultural Misalignment - Many top researchers are leaving Meta due to a misalignment with the company's focus on speed and profitability, which contrasts with their values of safety, independence, and long-term research [39][40]. - The absence of a compelling mission at Meta makes it difficult for some employees to justify staying, as exemplified by Tesla engineer Yun-Ta Tsai's decision to remain with his current employer for its meaningful goals [40][42]. - The perception that Meta's culture prioritizes financial gain over meaningful work is leading to a reluctance among potential recruits to join the company [39][42].
英伟达通用机器人芯片来了:AI算力提升7.5倍,宇树、银河通用已搭载
机器之心· 2025-08-26 04:11
Core Viewpoint - Nvidia has launched its new robot-specific chip, Jetson Thor, which significantly enhances computing power compared to its predecessor, Jetson Orin, to support advanced humanoid robots and other forms of embodied intelligence [4][12]. Group 1: Product Features - Jetson Thor features a new Blackwell architecture GPU with AI computing capabilities up to 2070 FP4 TFLOPS, which is 7.5 times more powerful than the previous generation [4][8]. - The power consumption of Jetson Thor is 130W, with an energy efficiency improvement of 3.5 times compared to the previous model [4]. - Memory capacity has doubled to 128GB, and memory bandwidth is 273GB/s [4][8]. - The chip is designed for generative AI models and supports real-time operations with minimal reliance on cloud computing [8][12]. Group 2: Software and Ecosystem - Jetson Thor supports all major generative AI frameworks and inference models, enabling developers to conduct local experiments and run inferences efficiently [9][11]. - The product includes a developer kit priced at $3,499 and a module priced at $2,999 for bulk orders [12]. Group 3: Market Impact and Partnerships - Major robotics companies, including Yushu Technology and Galaxy General Robotics, have announced plans to adopt Jetson Thor for their products [14][15]. - Nvidia's strategy includes a focus on the trillion-dollar markets of robotics and autonomous vehicles, with a significant portion of its revenue coming from major tech companies [18][19].