Workflow
机器之心
icon
Search documents
这就是大厂的AI「氛围编程」:老工程师现身说法后,大家绷不住了
机器之心· 2025-08-25 04:13
Core Viewpoint - Vibe coding, popularized by Andrej Karpathy, has gained traction in the tech industry, particularly among FAANG companies, although its definition and implementation remain contentious [1][5]. Group 1: Vibe Coding Popularity - A Reddit post suggests that vibe coding may be more prevalent than expected, with many employees at FAANG companies engaging in this practice [1][5]. - The post's author, an AI software engineer with over 15 years of experience, highlights the integration of AI in coding processes [3][4]. Group 2: Coding Process and Methodology - The coding process begins with reliable design documents and architecture, followed by writing tests before development [4][6]. - Key steps in the process include design reviews, task planning, software development using Test Driven Development (TDD), code review, and pre-release testing [6][13]. - Despite the involvement of AI, the process still requires significant human input, leading to debates about whether it truly qualifies as vibe coding [9][11]. Group 3: Perspectives on the Process - Some developers see value in the structured approach, advocating for detailed technical specifications and pre-development reviews [14][15]. - Others argue that the complexity of the process can hinder development speed, which may benefit independent founders [13][14].
全球百万网友迷上赛博「养鱼」,我也被这群AI小丑鱼拿捏了
机器之心· 2025-08-25 02:48
机器之心报道 编辑:杨文 为何画一条小丑鱼能给人《王者荣耀》上分的快感? 见过赛博遛狗,你见过赛博养鱼吗? 最近一款名叫 Draw A Fish 的 AI 小游戏,让全球百万网友疯狂上头。 玩法很简单。只要在画布上随手画一条小鱼,就能在虚拟鱼缸中看到它活灵活现地「游动」。 体验地址:https://drawafish.com/ 打开网站,你会看到一个简单的绘图工具,可以选择颜色和笔刷粗细,然后在画布上绘制一条面朝右侧的小鱼,AI 就会实时判断你的作品是否像鱼,并通过画布 背景颜色的变化给予即时反馈。 当相似度达到 60% 以上时,点击 「make it swim」 按钮,再给小鱼起个名字,就能把它丢到一个共享的虚拟鱼缸中,观看小鱼和其他玩家的作品一起游来游去。 你还可以和这群奇形怪状的小鱼互动,点一下即可点赞或拉踩。 官方也挺会整花活,直接来了个排行榜,目前最高分是一条画的极其抽象的涂鸦小鱼,得分高达 53245,甚至还有一条长得像鸟的鱼拿到了 - 40182。 左右滑动查看更多 如果注册了账号,还可以把自己画的鱼放到专属鱼缸。 而且,AI 会在每一笔落下时给出「像鱼」的概率提示。哪怕你承认自己画技不佳,但当 ...
AI智能体加持,爆款视频产出速度提升了10倍,全民导演时代已来
机器之心· 2025-08-25 02:48
Core Viewpoint - The article emphasizes the transformative impact of AI on creative processes, particularly in video production, enabling creators to focus on creativity and efficiency rather than tedious tasks [1][4]. Group 1: Software and AI Integration - Vibe Coding aims to free developers from tedious coding tasks by leveraging AI, allowing them to focus on higher-level product iteration and creative exploration [1]. - Video Ocean represents a shift in video creation, allowing a single creator to handle all aspects of filmmaking, significantly reducing production time from weeks to minutes [2][10]. - The AI Video Agent can generate complete videos from simple prompts, showcasing a new paradigm in video creation that prioritizes efficiency and creativity [3][6]. Group 2: User Experience and Feedback - Global user feedback indicates a smooth generation process with practical functionalities, highlighting the ease of creating complete videos with minimal input [3][10]. - The interest in Video Ocean stems from its innovative interaction methods rather than just performance improvements, marking a significant shift in user engagement with AI tools [4][5]. Group 3: Creative Process and Automation - Video Ocean's design changes the collaborative creation model, focusing on delivering complete creative projects quickly rather than just faster individual outputs [5][12]. - The platform allows creators to input a single creative directive, with the AI handling everything from scriptwriting to video generation, thus transforming users into "creative directors" [8][17]. - The system is designed to learn and adapt to individual brand styles, enhancing the creative process by eliminating repetitive tasks [8][30]. Group 4: Commercial Applications - Video Ocean can efficiently produce professional-grade commercial videos, meeting diverse business needs with simple commands [11][12]. - The platform enhances content production efficiency by tenfold, enabling rapid responses to market trends and the creation of viral videos [10][11]. Group 5: Versatility and Accessibility - Video Ocean covers a wide range of visual creation needs, from short films to educational content, demonstrating its versatility [13][26]. - The platform is user-friendly, allowing even novices to create high-quality videos effortlessly, thus democratizing video production [25][30].
大模型能否为不同硬件平台生成高性能内核?南大、浙大提出跨平台内核生成评测框架MultiKernelBench
机器之心· 2025-08-25 02:48
Core Viewpoint - The article discusses the emergence of MultiKernelBench, a new open-source evaluation framework developed by Nanjing University and Zhejiang University, aimed at assessing the performance of large language models (LLMs) in generating high-performance deep learning kernels across diverse hardware platforms [3][6][10]. Group 1: Background and Motivation - The majority of computations in deep learning rely on low-level computation kernels executed on hardware accelerators like GPUs, NPUs, and TPUs, which are typically manually coded using specialized programming languages [2]. - Recent advancements in LLMs for code generation have sparked interest in automating the generation of high-performance deep learning kernels [2][3]. - Existing evaluation benchmarks are limited by platform coverage, assessment dimensions, and scalability, raising questions about the transferability of LLM advantages from CUDA ecosystems to heterogeneous platforms [3][6]. Group 2: MultiKernelBench Framework - MultiKernelBench introduces an open evaluation scenario for LLMs to automatically generate high-performance deep learning kernels across multiple platforms, marking a shift from single-platform capabilities to a more versatile approach [6][9]. - The framework is designed with modularity in mind, featuring four core characteristics: cross-hardware platform support, fine-grained task system, end-to-end automated evaluation, and category-aware one-shot prompting strategies [9][11][14][16]. - It covers 14 categories of core deep learning operators, including convolution and normalization, and incorporates both classic and newly added tasks to reflect LLM capabilities comprehensively [11][12]. Group 3: Evaluation and Results - MultiKernelBench has been used to evaluate seven major LLMs, including GPT-4o and Claude, with parameter sizes ranging from 32 billion to 681 billion [19]. - The evaluation metrics include Compilation@k, Pass@k, and SpeedUp@k, assessing the success of code generation, functional correctness, and performance optimization [21]. - Results indicate that while LLMs perform well on CUDA platforms, their success rates significantly drop on non-CUDA platforms, highlighting the need for further development in this area [23][27]. Group 4: Future Directions - The authors plan to expand support for various GPU and NPU architectures and invite collaboration from manufacturers to build an open-source ecosystem [10][24]. - Future efforts will focus on enhancing cross-platform collaboration, improving generation quality on low-resource platforms, and integrating more hardware backends [23][24].
超越宇宙极限:第六位海狸数再次突破,无法用常规数学符号表达
机器之心· 2025-08-24 04:02
选自quantamagazine 作 者 : Ben Brubaker 机器之心编译 当数字逃离人类的想象力:BB (6) 的故事。 现在给你一串数字,你能猜到一下个是多少吗:1、6、21、107,47176870…… 如果你没头绪,不必气馁。因为这些数字并不是随意凑出来的,它们就是所谓的 「忙碌海狸数」的前五项。它们构成的数列,与理论计算机科学中最令人头疼的 问题之一紧密相关。要想弄清这些数的具体值,是一项堪称不可攀登高峰的挑战。六十多年来,这个难题不仅吸引了顶尖数学家的持续攻坚,还让无数业余爱好 者为之痴迷,形成了一种独特的「数学文化圈」。 最近,这条探索之路上又出现了新的突破。忙碌海狸猎人们找到了一个全新的冠军程序,它的运行步数之大,以至于用标准的数学符号体系根本无法完整写出。 换句话说,他们已经抵达了 超出常规数学所能承载的境地。 在上世纪六七十年代,研究人员先后确定了前四个忙碌海狸数。而那个远远庞大的 第五个数 BB (5),直到去年才被彻底锁定 。完成这项壮举的,并不是某个顶尖 实验室,而是一支由业余数学爱好者组成的团队,他们通过一个名为 「Busy Beaver Challenge」的网络社区, ...
仅靠5000+样本,全新强化学习范式让30B轻松击败671B的DeepSeek V3
机器之心· 2025-08-24 04:02
传统强化学习(RL)在有标准答案的指令遵循任务(如数学、代码)上已趋成熟,但在开放式的创意写作领域却因缺乏客观对错而举步维 艰。如何让 RL 突破「可验证奖励」的边界?蚂蚁技术研究院联合浙江大学开源全新强化学习范式 Rubicon,通过构建业界最大规模的 10,000+ 条「评分标尺」,成功将强化学习的应用范围拓展至更广阔的主观任务领域。用 5000 样本即超越 671B 模型,让 AI 告别「机械 味」。 自 OpenAI o1 系列模型问世以来,基于「 可验证奖励 」的强化学习(RLVR)已成为提升大模型推理能力的主流。通过海量的数学题、代码题进行训练,AI 在客 观对错分明的领域取得了巨大成功。 然而,这也暴露了当前技术路线的瓶颈:当面对没有标准答案的开放性、主观性任务时,AI 怎么办? 如何让 AI 写出情感充沛的文字,而不是「AI 味」十足的模板?如何让它进行有深度的创意构思,而不是简单的信息罗列?这正是当前 AI 迈向更高层次智能需要 破解的「 灵魂难题 」。 基于此,蚂蚁技术研究院联合浙江大学,正式开源其最新研究成果 ——Rubicon-preview 模型,并推出一套名为 「 基于评分标尺的强 ...
三个月、零基础手搓一块TPU,能推理能训练,还是开源的
机器之心· 2025-08-24 04:02
Core Viewpoint - The recent advancements in large model technology have renewed interest in AI-specific chips, particularly Google's TPU, which has evolved significantly since its deployment in 2015, now reaching its 7th generation [1][9]. Group 1: TPU Overview - TPU is a specialized chip designed by Google to enhance the speed of machine learning model inference and training, focusing on executing mathematical operations efficiently [9]. - The architecture of TPU allows it to perform matrix multiplication efficiently, which constitutes a significant portion of computations in deep learning models [14][31]. Group 2: TinyTPU Project - The TinyTPU project was initiated by engineers from Western University in Canada to create an open-source ML inference and training chip, motivated by the lack of a complete open-source codebase for such accelerators [5][7]. - The project emphasizes a hands-on approach to learning hardware design and deep learning principles, avoiding reliance on AI tools for coding [6]. Group 3: Hardware Design Insights - The project team established a design philosophy of exploring unconventional ideas before consulting external resources, leading to the re-invention of many key mechanisms used in TPU [6]. - The hardware design process involves understanding clock cycles, using Verilog for hardware description, and implementing a systolic array architecture for efficient matrix multiplication [10][12][26]. Group 4: Training and Inference Mechanisms - The TinyTPU architecture allows for continuous inference by utilizing a double buffering mechanism, which enables the loading of new weights while processing current computations [61][64]. - The training process leverages the same architecture as inference, with additional modules for gradient calculation and weight updates, allowing for efficient training of neural networks [71][118]. Group 5: Control and Instruction Set - The control unit of TinyTPU employs a custom instruction set architecture (ISA) to manage control signals and data flow, enhancing the efficiency of operations [68][117]. - The ISA has evolved to include 94 bits, ensuring that all necessary control flags and data fields are accounted for without compromising performance [117].
视频生成 vs 空间表征,世界模型该走哪条路?
机器之心· 2025-08-24 01:30
机器之心PRO · 会员通讯 Week 34 --- 本周为您解读 ② 个值得细品的 AI & Robotics 业内要事 --- 1. 视频生成 vs 空间表征,世界模型该走哪条路? 视频预测生成的高质量画面,是否真的意味着模型理解了物理与因果规律?直接在潜在空间建模能否有效避免像素噪声干扰,同时保持决策与规划能力?混合路线是否能成为未来世界模型的 最优路径?随着生成模型和潜在表征技术的发展,AGI 的「思想实验沙盒」能否真正落地应用于物理世界任务?... 2. 抢天才还是拼算力?前 Llama 推理负责人详解 AI 的真实天花板 真正决定 AI 行业天花板的,是天才研究员的灵感,还是指数级增长的算力?如果算力增长放缓,AI 行业会否面临「增长乏力」的拐点?高阶概念想法,如果没有系统实验验证,能否真正推 动模型跃迁?模型泛化的天花板,到底靠升级模型,还是靠设计更高质量的新考题?... 本期完整版通讯含 2 项专题解读 + 30 项本周 AI & Robotics 赛道要事速递,其中技术方面 12 项,国内方面 8 项,国外方面 10 项。 本期通讯总计 20464 字,可免费试读至 9% 消耗 288 微信 ...
第一名方案公开,代码智能体安全竞赛,普渡大学拿下90%攻击成功率
机器之心· 2025-08-23 10:51
你的 AI 编程助手有多安全?也许比你想象的要脆弱得多。近期多项研究 [1-2] 表明,即使是经过安全对齐的大语言模型,也可能在正常开发场景中 无意间生成存 在漏洞的代码 ,为后续被利用埋下隐患;而在恶意用户手中,这类模型还能显著 加速恶意软件的构建与迭代 ,降低攻击门槛、缩短开发周期。许多风险源自模型 推理链条中的细微缺陷,而不仅仅是输入输出层面的显性问题。 在亚马逊举办的针对代码智能体的安全比赛 (Amazon Nova AI Challenge) 中,普渡大学的团队 PurCL 作为红队以超过 90% 的攻击成功率获得比赛第一名,赢得 25 万美元奖金。 在比赛中,12 名团队成员耗时八个月和百万美元开发出基于 AI 认知建模的全过程红队系统,现开放给领域研究者共享使用。 他们的研究发现,对齐代码模型的关键问题在于把对齐技术扩大到复杂的真实领域问题中和提升模型推理的安全相关性。 亚马逊代码模型安全比赛 亚马逊代码模型安全比赛是一场针对大模型代码安全的比赛。 举办方邀请全球顶尖研究队伍提交科研企划,最终在 90 份企划中资助 10 个团队参加比赛,每个团 队在半年的时间内获得了 25 万美元的科研基金和 ...
OpenAI重大发现:GPT-4b micro改造诺奖研究,山中因子重编程效率提高50倍
机器之心· 2025-08-23 10:51
Core Viewpoint - The collaboration between OpenAI and Retro Bio aims to enhance the efficiency of stem cell reprogramming through the development of a new model, GPT-4b micro, which significantly improves the reprogramming efficiency of Yamanaka factors by 50 times compared to standard methods [2][3][26]. Group 1: Collaboration and Investment - OpenAI announced its partnership with Retro Bio to develop a new model, GPT-4b micro, which focuses on enhancing Yamanaka factors for stem cell reprogramming [2]. - Sam Altman personally invested $180 million in Retro Bio prior to this collaboration [3]. Group 2: Technological Advancements - The new model, GPT-4b micro, has a similar architecture to GPT-4o but employs a novel training method and a custom biological dataset to allow scientists to redesign proteins according to their needs [9]. - The model can handle a context length of up to 64,000 tokens, a first for protein sequence models, and exhibits scaling laws similar to language models, indicating predictable improvements with larger datasets [12]. Group 3: Research Findings - The Retro team utilized human fibroblasts to create a wet lab screening platform, where GPT-4b micro proposed diverse "RetroSOX" sequences that outperformed wild-type SOX2 in expressing pluripotency markers [14][15]. - For KLF4, the model generated enhanced RetroKLF variants, achieving a hit rate close to 50%, significantly higher than traditional methods [18]. - Combining the best RetroSOX and RetroKLF variants led to notable increases in early and late pluripotency markers, with the appearance of late markers occurring days earlier than with standard OSKM combinations [20]. Group 4: Clinical Potential and Validation - The study demonstrated that over 30% of cells began expressing key pluripotency markers within 7 days using mRNA delivery methods, with over 85% activating endogenous expression of critical stem cell markers by day 12 [24]. - The engineered variants showed robust genomic stability and the ability to differentiate into all three germ layers, supporting their potential for cell therapy applications [24]. Group 5: Future Outlook - OpenAI's work illustrates that specialized models can lead to rapid breakthroughs in scientific research, potentially solving problems in days that previously took years [32].