Workflow
机器之心
icon
Search documents
突破长视频生成瓶颈:南大、TeleAI推出全新AI生成范式MMPL,让创意一镜到底
机器之心· 2025-08-25 06:08
向迅之,南京大学 R&L 课题组在读博士生,导师是范琦副教授。研究聚焦图像/视频生成与世界模型等 AIGC 方向。 你是否曾被 AI 生成视频的惊艳开场所吸引,却在几秒后失望于⾊彩漂移、画面模糊、节奏断裂? 当前 AI 长视频⽣成普遍⾯临 "高开低走 " 的困境:前 几 秒惊艳 夺⽬ ,之后却质量骤降、细节崩坏;更别提帧间串行生成导致的低效问题 —— 动辄数小时的等待,实时预览几乎难以企及。 这—行业难题,如今迎来突破性解法! 南京大学联合 TeleAI 推出长视频自回归生成新范式——Macro-from-Micro Planning( MMPL),重新定义 AI 视频创作流程。 灵感源自电影工业的 "分镜脚本 + 多组并行拍摄" 机制,MMPL 首创 "宏观规划、微观执行 " 的双层⽣成架构: 成果令人振奋: MMPL 不仅是—项技术升级,更是向 "AI 导演" 迈进的重要—步 —— 让机器不仅会 "拍镜头" ,更能 "讲好—个故事"。 先谋全局:在宏观层面统—规划整段视频的叙事脉络与视觉—致性,确保剧情连贯、风格统—; 再精细节:将长视频拆解为多个短片段,并通过并行化⽣成管线⾼效填充每—帧细节,大幅提升速 ...
超97万:Yoshua Bengio成历史被引用最高学者,何恺明进总榜前五
机器之心· 2025-08-25 06:08
Core Insights - The article highlights the prominence of AI as the hottest research direction globally, with Yoshua Bengio being the most cited scientist ever, accumulating a total citation count of 973,655 and 698,008 citations in the last five years [1][3]. Group 1: Citation Rankings - The AD Scientific Index ranks 2,626,749 scientists from 221 countries and 24,576 institutions based on total citation counts and recent citation indices [3]. - Yoshua Bengio's work on Generative Adversarial Networks (GANs) has surpassed 100,000 citations, outpacing his co-authored paper "Deep Learning," which also exceeds 100,000 citations [3][4]. - Geoffrey Hinton, a pioneer in AI, ranks second with over 950,000 total citations and more than 570,000 citations in the last five years [4][5]. Group 2: Notable Papers and Their Impact - The paper "AlexNet," co-authored by Hinton, Krizhevsky, and Sutskever, has received over 180,000 citations, marking a significant breakthrough in deep learning for computer vision [5][6]. - Kaiming He’s paper "Deep Residual Learning for Image Recognition" has over 290,000 citations, establishing ResNet as a foundational model in modern deep learning [10][11]. - The article notes that ResNet is recognized as the most cited paper of the 21st century, with citation counts ranging from 103,756 to 254,074 across various databases [11]. Group 3: Broader Implications - The high citation counts of these influential papers indicate their lasting impact on the academic community and their role in shaping future research directions in AI and related fields [17].
刚刚,2025年科学探索奖出炉,复旦姜育刚、清华吴嘉敏等获奖
机器之心· 2025-08-25 04:13
Core Viewpoint - The 2025 "Science Exploration Award" highlights the importance of encouraging originality and rewarding young scientists in China, with a focus on innovative research in various fields [2][4]. Group 1: Award Overview - The "Science Exploration Award" was established in 2018 by prominent scientists and Tencent's founder, aimed at supporting young researchers in mainland China and Hong Kong [2]. - This year, the evaluation mechanism emphasizes originality, with questions focusing on the uniqueness and innovation of the applicants' work during the final review [4]. - A total of 50 young scientists were selected from 1,238 applicants, including 13 young scientists under the age of 35 for males and 38 for females, with 6 being from the post-90s generation [4]. Group 2: Award Recipients in Information Electronics - The award recipients in the information electronics category include: - Chang Yi from Jilin University, known for his extensive research and numerous publications in computer science [6][8]. - Du Bo from Wuhan University, recognized for his contributions to artificial intelligence and computer vision [10][12]. - Jiang Yugang from Fudan University, a leader in multimedia information processing and artificial intelligence [13]. - Li Wei from the Chinese Academy of Sciences, specializing in micro-nano photonics and materials [14][16]. - Liao Qing from Harbin Institute of Technology (Shenzhen), focusing on data mining and artificial intelligence [19]. - Wu Jiamin from Tsinghua University, involved in optical computing and photonic intelligent computing [20].
这就是大厂的AI「氛围编程」:老工程师现身说法后,大家绷不住了
机器之心· 2025-08-25 04:13
Core Viewpoint - Vibe coding, popularized by Andrej Karpathy, has gained traction in the tech industry, particularly among FAANG companies, although its definition and implementation remain contentious [1][5]. Group 1: Vibe Coding Popularity - A Reddit post suggests that vibe coding may be more prevalent than expected, with many employees at FAANG companies engaging in this practice [1][5]. - The post's author, an AI software engineer with over 15 years of experience, highlights the integration of AI in coding processes [3][4]. Group 2: Coding Process and Methodology - The coding process begins with reliable design documents and architecture, followed by writing tests before development [4][6]. - Key steps in the process include design reviews, task planning, software development using Test Driven Development (TDD), code review, and pre-release testing [6][13]. - Despite the involvement of AI, the process still requires significant human input, leading to debates about whether it truly qualifies as vibe coding [9][11]. Group 3: Perspectives on the Process - Some developers see value in the structured approach, advocating for detailed technical specifications and pre-development reviews [14][15]. - Others argue that the complexity of the process can hinder development speed, which may benefit independent founders [13][14].
AI智能体加持,爆款视频产出速度提升了10倍,全民导演时代已来
机器之心· 2025-08-25 02:48
Core Viewpoint - The article emphasizes the transformative impact of AI on creative processes, particularly in video production, enabling creators to focus on creativity and efficiency rather than tedious tasks [1][4]. Group 1: Software and AI Integration - Vibe Coding aims to free developers from tedious coding tasks by leveraging AI, allowing them to focus on higher-level product iteration and creative exploration [1]. - Video Ocean represents a shift in video creation, allowing a single creator to handle all aspects of filmmaking, significantly reducing production time from weeks to minutes [2][10]. - The AI Video Agent can generate complete videos from simple prompts, showcasing a new paradigm in video creation that prioritizes efficiency and creativity [3][6]. Group 2: User Experience and Feedback - Global user feedback indicates a smooth generation process with practical functionalities, highlighting the ease of creating complete videos with minimal input [3][10]. - The interest in Video Ocean stems from its innovative interaction methods rather than just performance improvements, marking a significant shift in user engagement with AI tools [4][5]. Group 3: Creative Process and Automation - Video Ocean's design changes the collaborative creation model, focusing on delivering complete creative projects quickly rather than just faster individual outputs [5][12]. - The platform allows creators to input a single creative directive, with the AI handling everything from scriptwriting to video generation, thus transforming users into "creative directors" [8][17]. - The system is designed to learn and adapt to individual brand styles, enhancing the creative process by eliminating repetitive tasks [8][30]. Group 4: Commercial Applications - Video Ocean can efficiently produce professional-grade commercial videos, meeting diverse business needs with simple commands [11][12]. - The platform enhances content production efficiency by tenfold, enabling rapid responses to market trends and the creation of viral videos [10][11]. Group 5: Versatility and Accessibility - Video Ocean covers a wide range of visual creation needs, from short films to educational content, demonstrating its versatility [13][26]. - The platform is user-friendly, allowing even novices to create high-quality videos effortlessly, thus democratizing video production [25][30].
全球百万网友迷上赛博「养鱼」,我也被这群AI小丑鱼拿捏了
机器之心· 2025-08-25 02:48
Core Viewpoint - The article discusses the viral success of the AI game "Draw A Fish," which allows users to draw a fish and see it swim in a virtual fish tank, highlighting its simplicity and engaging mechanics that attract millions of players [3][14]. Group 1: Game Mechanics - The gameplay is straightforward: users draw a fish on a canvas, and the AI evaluates its resemblance to a fish, providing real-time feedback based on a similarity score [4][5]. - Once the drawing reaches a similarity of 60% or more, users can name their fish and place it in a shared virtual fish tank, where it swims alongside creations from other players [7][11]. - Players can interact with the fish in the tank by liking or disliking them, fostering a community atmosphere [8][16]. Group 2: User Engagement - The low barrier to entry, with no login or tutorial required, makes the game accessible, reminiscent of the addictive nature of "Flappy Bird" [15]. - The instant feedback mechanism from the AI enhances the sense of achievement, encouraging users to continue improving their drawings [15]. - The shared experience of seeing their creations come to life in the fish tank amplifies user satisfaction compared to traditional AI drawing tools [15]. Group 3: Technical Aspects - The game utilizes a convolutional neural network based on the ResNet18 architecture, trained on the Google QuickDraw dataset to classify drawings as "fish" or "not fish" [19][20]. - The model's design includes features like transparency handling and early stopping to optimize performance and reduce overfitting [21]. - The project also addresses data imbalance issues through weighted sampling and loss functions [23].
大模型能否为不同硬件平台生成高性能内核?南大、浙大提出跨平台内核生成评测框架MultiKernelBench
机器之心· 2025-08-25 02:48
Core Viewpoint - The article discusses the emergence of MultiKernelBench, a new open-source evaluation framework developed by Nanjing University and Zhejiang University, aimed at assessing the performance of large language models (LLMs) in generating high-performance deep learning kernels across diverse hardware platforms [3][6][10]. Group 1: Background and Motivation - The majority of computations in deep learning rely on low-level computation kernels executed on hardware accelerators like GPUs, NPUs, and TPUs, which are typically manually coded using specialized programming languages [2]. - Recent advancements in LLMs for code generation have sparked interest in automating the generation of high-performance deep learning kernels [2][3]. - Existing evaluation benchmarks are limited by platform coverage, assessment dimensions, and scalability, raising questions about the transferability of LLM advantages from CUDA ecosystems to heterogeneous platforms [3][6]. Group 2: MultiKernelBench Framework - MultiKernelBench introduces an open evaluation scenario for LLMs to automatically generate high-performance deep learning kernels across multiple platforms, marking a shift from single-platform capabilities to a more versatile approach [6][9]. - The framework is designed with modularity in mind, featuring four core characteristics: cross-hardware platform support, fine-grained task system, end-to-end automated evaluation, and category-aware one-shot prompting strategies [9][11][14][16]. - It covers 14 categories of core deep learning operators, including convolution and normalization, and incorporates both classic and newly added tasks to reflect LLM capabilities comprehensively [11][12]. Group 3: Evaluation and Results - MultiKernelBench has been used to evaluate seven major LLMs, including GPT-4o and Claude, with parameter sizes ranging from 32 billion to 681 billion [19]. - The evaluation metrics include Compilation@k, Pass@k, and SpeedUp@k, assessing the success of code generation, functional correctness, and performance optimization [21]. - Results indicate that while LLMs perform well on CUDA platforms, their success rates significantly drop on non-CUDA platforms, highlighting the need for further development in this area [23][27]. Group 4: Future Directions - The authors plan to expand support for various GPU and NPU architectures and invite collaboration from manufacturers to build an open-source ecosystem [10][24]. - Future efforts will focus on enhancing cross-platform collaboration, improving generation quality on low-resource platforms, and integrating more hardware backends [23][24].
超越宇宙极限:第六位海狸数再次突破,无法用常规数学符号表达
机器之心· 2025-08-24 04:02
Core Insights - The article discusses the ongoing exploration of the Busy Beaver numbers, particularly focusing on BB(6), which has reached levels beyond human comprehension and traditional mathematical notation [2][4][5]. Group 1: Busy Beaver Numbers - The Busy Beaver sequence is a series of numbers that represent the maximum steps a Turing machine with a given number of rules can take before halting, with BB(6) being the latest focus of research [10][11]. - Recent breakthroughs have shown that BB(6) is so large that it cannot be fully expressed in standard mathematical notation, and even attempts to write it down would exceed the number of atoms in the universe [4][24]. - The community of amateur mathematicians, known as Busy Beaver hunters, has made significant progress in determining lower bounds for BB(6), with new records being set frequently [5][19]. Group 2: Research Community and Collaboration - The Busy Beaver Challenge community was established in 2022, aiming to collaboratively tackle the problem of determining the values of Busy Beaver numbers, particularly BB(5) and BB(6) [27]. - The community has successfully proven the value of BB(5) using advanced proof assistants, showcasing a shift from individual efforts to collaborative research [27][28]. - The collaborative nature of the Busy Beaver Challenge has led to rapid advancements in the understanding of these complex numbers, with contributions from various researchers leading to new records [25][37]. Group 3: Mathematical Implications - The exploration of Busy Beaver numbers highlights the limitations of computability and the challenges posed by the Halting Problem, as demonstrated by Alan Turing's work [7][50]. - The growth of Busy Beaver numbers, particularly with the introduction of new mathematical operations like tetration and pentation, illustrates the vastness of these numbers and their implications for mathematical theory [20][40]. - The ongoing research into Busy Beaver numbers not only pushes the boundaries of mathematical understanding but also emphasizes the artistic and exploratory nature of mathematics [50].
仅靠5000+样本,全新强化学习范式让30B轻松击败671B的DeepSeek V3
机器之心· 2025-08-24 04:02
传统强化学习(RL)在有标准答案的指令遵循任务(如数学、代码)上已趋成熟,但在开放式的创意写作领域却因缺乏客观对错而举步维 艰。如何让 RL 突破「可验证奖励」的边界?蚂蚁技术研究院联合浙江大学开源全新强化学习范式 Rubicon,通过构建业界最大规模的 10,000+ 条「评分标尺」,成功将强化学习的应用范围拓展至更广阔的主观任务领域。用 5000 样本即超越 671B 模型,让 AI 告别「机械 味」。 自 OpenAI o1 系列模型问世以来,基于「 可验证奖励 」的强化学习(RLVR)已成为提升大模型推理能力的主流。通过海量的数学题、代码题进行训练,AI 在客 观对错分明的领域取得了巨大成功。 然而,这也暴露了当前技术路线的瓶颈:当面对没有标准答案的开放性、主观性任务时,AI 怎么办? 如何让 AI 写出情感充沛的文字,而不是「AI 味」十足的模板?如何让它进行有深度的创意构思,而不是简单的信息罗列?这正是当前 AI 迈向更高层次智能需要 破解的「 灵魂难题 」。 基于此,蚂蚁技术研究院联合浙江大学,正式开源其最新研究成果 ——Rubicon-preview 模型,并推出一套名为 「 基于评分标尺的强 ...
三个月、零基础手搓一块TPU,能推理能训练,还是开源的
机器之心· 2025-08-24 04:02
Core Viewpoint - The recent advancements in large model technology have renewed interest in AI-specific chips, particularly Google's TPU, which has evolved significantly since its deployment in 2015, now reaching its 7th generation [1][9]. Group 1: TPU Overview - TPU is a specialized chip designed by Google to enhance the speed of machine learning model inference and training, focusing on executing mathematical operations efficiently [9]. - The architecture of TPU allows it to perform matrix multiplication efficiently, which constitutes a significant portion of computations in deep learning models [14][31]. Group 2: TinyTPU Project - The TinyTPU project was initiated by engineers from Western University in Canada to create an open-source ML inference and training chip, motivated by the lack of a complete open-source codebase for such accelerators [5][7]. - The project emphasizes a hands-on approach to learning hardware design and deep learning principles, avoiding reliance on AI tools for coding [6]. Group 3: Hardware Design Insights - The project team established a design philosophy of exploring unconventional ideas before consulting external resources, leading to the re-invention of many key mechanisms used in TPU [6]. - The hardware design process involves understanding clock cycles, using Verilog for hardware description, and implementing a systolic array architecture for efficient matrix multiplication [10][12][26]. Group 4: Training and Inference Mechanisms - The TinyTPU architecture allows for continuous inference by utilizing a double buffering mechanism, which enables the loading of new weights while processing current computations [61][64]. - The training process leverages the same architecture as inference, with additional modules for gradient calculation and weight updates, allowing for efficient training of neural networks [71][118]. Group 5: Control and Instruction Set - The control unit of TinyTPU employs a custom instruction set architecture (ISA) to manage control signals and data flow, enhancing the efficiency of operations [68][117]. - The ISA has evolved to include 94 bits, ensuring that all necessary control flags and data fields are accounted for without compromising performance [117].