Workflow
机器之心
icon
Search documents
OpenAI前CTO首个创业产品Tinker,这里全量升级开放了,还有羊毛可薅
机器之心· 2026-01-07 05:16
Core Insights - The article discusses the launch of the Luchenyun Fine-tuning SDK, which is based on the Tinker SDK from Thinking Machines Lab, marking a shift from "craft-style" model training to "industrialized fine-tuning" [1][3][26] - The SDK allows developers to focus on algorithm design while abstracting away the complexities of distributed training infrastructure, enabling a more efficient and cost-effective approach to fine-tuning large models [4][6][26] Group 1: Technological Advancements - The introduction of Tinker SDK simplifies the training process by providing standard APIs for various training functions, allowing developers to define data and loss functions without worrying about infrastructure [4][6] - The SDK supports both supervised fine-tuning (SFT) and complex reinforcement learning (RL) pipelines, enabling users to easily construct training flows using atomic functions [8][24] Group 2: Cost Structure and Efficiency - The Luchenyun SDK adopts a serverless architecture with a "pay-per-token" pricing model, which allows users to only pay for effective computation tokens used during prefill, sampling, and training, while other processes are free [14][18] - This new pricing model significantly reduces wasted budget on non-productive time, as users are no longer charged for GPU usage during data loading or debugging [14][18] Group 3: User Experience and Accessibility - The SDK provides a seamless experience for users, allowing them to work in familiar environments like Jupyter Notebook with standard Python syntax, thus enhancing productivity [8][10] - The system includes an intelligent queue that ensures tasks are executed promptly, with no charges during waiting periods, optimizing resource utilization [12] Group 4: Target Users and Applications - The SDK is designed to cater to various user groups, including researchers who can conduct experiments without worrying about infrastructure, and startups that require rapid validation of MVPs [19][20] - In industrial applications, the SDK allows engineers to define loss logic and reinforcement learning reward functions, providing complete control over model training [21] Group 5: Future Outlook - The article emphasizes that post-training is evolving from an academic niche to a mainstream engineering focus, aiming for a "zero cognitive load" experience for developers [26] - The Luchenyun Fine-tuning SDK is now fully open for use, with promotional offers for early adopters, indicating a push for widespread adoption [27][28]
注意力机制大变革?Bengio团队找到了一种超越Transformer的硬件对齐方案
机器之心· 2026-01-07 05:16
Core Insights - The article discusses the evolution of large language models (LLMs) and highlights the limitations of existing linear recurrence and state space models in terms of computational efficiency and performance [1][3]. - A new approach proposed by Radical Numerics and the Montreal University team focuses on redefining linear recurrences as hardware-aligned matrix operations, aiming to enhance GPU memory utilization and computational efficiency [1][2]. Group 1: Challenges and Limitations - The primary challenge identified is breaking through the "memory wall" associated with linear recurrences, which limits performance due to high communication costs in modern hardware [3][7]. - Traditional parallel scan algorithms, while theoretically efficient, struggle with data access patterns that lead to frequent global memory synchronization, thus failing to leverage data locality effectively [4][5][6]. Group 2: Proposed Solutions - The paper introduces the Sliding Window Recurrences (SWR) as a method to achieve high throughput by strategically truncating the computational horizon, utilizing a jagged window structure that aligns with hardware workloads [10][11]. - The Block Two-Pass (B2P) algorithm is developed to implement this theory, dividing the computation into two phases to optimize memory access and minimize data movement [14][15]. Group 3: Phalanx Layer and Performance - A new computing layer called Phalanx is designed based on the B2P algorithm, serving as a seamless replacement for sliding window attention or linear recurrence layers, ensuring numerical stability during long sequence processing [19][20]. - In systematic tests on a model with 1.3 billion parameters, the Phalanx hybrid model demonstrated significant performance advantages, achieving 10% to 40% end-to-end speedup in training throughput across varying context lengths [23][24]. Group 4: Industry Implications - The findings from the paper indicate that true efficiency in LLMs arises not just from reduced algorithmic complexity but from a deep understanding and alignment with the physical characteristics of underlying computational hardware [31][32]. - As LLMs evolve towards larger context sizes and real-time embodied intelligence post-2025, hardware-aware operator design will be crucial for developing more efficient and powerful AI systems [33].
近十年后谷歌与波士顿动力再「牵手」,这次要为人形机器人注入「灵魂」
机器之心· 2026-01-07 00:49
Core Viewpoint - Boston Dynamics and Google DeepMind have announced a new AI partnership aimed at ushering in a new era of artificial intelligence for humanoid robots, with a focus on enhancing industrial tasks and transforming the manufacturing sector, particularly in the automotive industry [1][7]. Group 1 - The collaboration will integrate DeepMind's advanced Gemini Robotics AI model with Boston Dynamics' new Atlas humanoid robot [6]. - The joint research efforts are expected to commence in the coming months, with activities taking place within both companies [8]. - Boston Dynamics aims to create the world's most capable humanoid robot and sees DeepMind as the ideal partner to develop a new visual-language-action (VLA) model for these complex robots [9]. Group 2 - DeepMind's Gemini Robotics model is designed to bring AI into the physical world, enhancing the capabilities of Boston Dynamics' Atlas robots [10]. - The partnership is viewed as a strong alliance, with DeepMind providing intelligence and Boston Dynamics offering a top-tier hardware platform [10]. - The combination of Gemini Robotics' foundational capabilities with Atlas hardware represents a significant advancement in embodied intelligence for robotics [12]. Group 3 - The collaboration has generated excitement among observers, with some anticipating a competitive showdown between Western robots like Atlas and Chinese counterparts [13]. - Historical context reveals that this is not the first collaboration between the two companies; Google previously acquired Boston Dynamics in 2013 but sold it due to unmet market expectations [14]. - The renewed partnership reflects a maturation of technology conditions, with both companies now better positioned to leverage each other's strengths [14][15]. Group 4 - The significance of this collaboration raises questions about which company stands to gain more, whether it is Boston Dynamics' victory or the beginning of a new chapter for Google in robotics [15]. - The partnership is poised to create a future where humans and machines coexist and collaborate [16].
曾对AI嗤之以鼻,如今2周生成7万行代码:Rust大佬与Claude联手打造新语言Rue
机器之心· 2026-01-07 00:49
Core Insights - The article discusses Steve Klabnik's journey with Rust and his new programming language, Rue, highlighting the evolution of his perspective on AI as a valuable tool in software development [1][3][21] Group 1: Klabnik's Perspective on AI - Klabnik transitioned from being an AI skeptic to recognizing the practical benefits of AI tools in coding, particularly with the use of Claude for generating code [3][10] - He emphasizes that AI serves as a high-level tool, enhancing productivity for those with a foundational understanding of software engineering principles [10][21] Group 2: Introduction of Rue - Rue is a new programming language designed by Klabnik, aiming to bridge the gap between high-performance languages like Rust and more accessible languages like Go [6][20] - The name Rue reflects both a sense of self-deprecation and a botanical reference, indicating a blend of good and bad qualities [6] Group 3: Development Process of Rue - The Rue project has rapidly accumulated around 70,000 lines of Rust code within two weeks, showcasing the efficiency of AI-assisted coding [8][20] - Klabnik's workflow involves AI (Claude) handling the implementation details while he focuses on design and architecture [14][20] Group 4: Rust's Role in AI Programming - Rust's strict compiler serves as a quality control mechanism, ensuring that AI-generated code meets safety and type-checking standards [13][19] - This strictness, once seen as a barrier for beginners, is now viewed as an advantage in the context of AI programming, as it helps eliminate critical errors [17][21] Group 5: Future of Programming Roles - Klabnik's experiment with Rue suggests a shift in developer roles from "bricklayers" to "architects," where human developers focus on higher-level design while AI handles more routine coding tasks [21]
独家解读|2025年AI五大趋势与底层数据革命
机器之心· 2026-01-06 09:38
机器之心发布 2025 年,人工智能的发展重心正在发生一次根本性转移:从追求模型的规模,转向构建其理解与解决复杂现实问题的能力。在这一转型中,高质量数据正成为定 义 AI 能力的新基石。作为人工智能数据服务的前沿探索者,数据堂深度参与并支撑着这场变革的每一个关键环节。本文将深入解读 2025 年 AI 五大技术趋势及其 背后的数据需求变革。 「人情味」与「实时性」革命 趋势解码:追求更细腻的情感与更自然的实时互动 当前,语音合成技术已超越追求「清晰准确」的基础阶段,正同时向两个深度智能化维度演进:一是为合成语音注入情感、个性与文化适配性,让虚拟助手、数 字人、有声内容更具感染力和亲和力;二是从单向反应升级为支持实时打断、重叠对话与上下文连贯的全双工自然交互,这已成为高端智能座舱、实时翻译、拟 真客服等前沿场景的刚需。技术的核心挑战在于,让 AI 不仅能「读」出文字,更能「理解」语境与情绪,并像真人一样实时聆听、思考与回应,实现有情感、有 逻辑的连续对话。 数据需求跃迁:从「清晰样本」到「生动语料」与「交互流」 训练数据的重心正经历双重跃迁。一方面,需构建服务于音色、韵律、情感和风格精细控制的「表现力语料库」, ...
刚刚,智元提出SOP,让VLA模型在真实世界实现可扩展的在线进化
机器之心· 2026-01-06 09:38
对于电子产品,我们已然习惯了「出厂即巅峰」的设定:开箱的那一刻往往就是性能的顶点,随后的每一天都在折旧。 但对于通用机器人来说,这个设定必须被颠覆。 试想,如果一个在实验室里完成训练的 AI 机器人,一进家门面对光线稍暗的房间或堆满杂物的茶几就大脑宕机,那它就永远只能是一个昂贵的实验品。这正是当 前具身智能面临的尴尬真相:我们在互联网知识里训练出了博学的预训练模型,可一旦让它们走进充满未知的物理世界,这些「理论巨人」往往会因为环境变化 而束手无策:「懂」很多道理,却依然干不好家务。 通用机器人的出路,绝不应是被困在出厂设置里的「静态标品」,而应当是能在真实部署中、在每一次失败和纠正中持续变强的生命体。 为了实现这一跨越,智元具身研究中心提出了 SOP(Scalable Online Post-training)框架 。 在过去几年里,基于互联网海量数据预训练的 VLA(视觉 - 语言 - 动作)模型,虽然赋予了机器人一定的通用泛化能力,但始终面临一个难以逾越的鸿沟: 「懂」不代表「能」 。 预训练模型或许「懂」什么是叠衣服,但当它真正面对一件材质松软、光照复杂的真实衣物时,往往会因为 分布偏移 而束手无策。 ...
别被室内基准高分骗了:大模型是在推理空间,还是在「背答案」?
机器之心· 2026-01-06 09:38
Core Insights - The article highlights the emergence of "Spatial Intelligence" as a new frontier in AI, particularly in large models, driven by advancements from scholars like Fei-Fei Li [2] - It raises concerns about the validity of recent performance improvements in models, questioning whether they genuinely understand spatial reasoning or are merely overfitting to similar indoor data distributions [2][16] Group 1: Limitations of Indoor Scene Data - Research in spatial intelligence has predominantly focused on indoor scenes due to a lack of diverse outdoor datasets, which are often based on autonomous driving perspectives, differing fundamentally from first-person pedestrian views [5] - The over-reliance on indoor data leads to high homogeneity between training and testing datasets, making it difficult to fairly assess models' spatial perception and reasoning capabilities [6] Group 2: OSI-Bench Introduction - The OSI-Bench, developed by the University of Chinese Academy of Sciences in collaboration with Microsoft Research Asia and ETH Zurich, aims to provide a more accurate assessment of spatial intelligence by utilizing original video data with precise 3D annotations from open-world environments [2][11] - This benchmark allows for the evaluation of models' true spatial capabilities by decoupling semantic priors from visual spatial intelligence, particularly in complex outdoor settings [9] Group 3: Evaluation Results - Evaluation results from OSI-Bench indicate that current state-of-the-art (SOTA) multimodal large language models generally fail to perform well on spatial reasoning tasks [13] - Despite some models showing significant improvements in indoor benchmarks, such as VSI-Bench, they consistently underperform in OSI-Bench, suggesting overfitting to specific scene distributions rather than genuine spatial intelligence acquisition [16] Group 4: Language Priors and Model Performance - When faced with spatial tasks, models tend to rely on language priors rather than engaging in visual geometric reasoning, leading to minimal performance differences with or without visual input [19][22] - Experiments reveal that models struggle significantly in atypical scenarios where language priors fail, indicating a lack of robust spatial reasoning capabilities [23] Group 5: Future Directions - The article calls for a new paradigm in spatial intelligence that empowers models to perceive and think in spatial contexts, moving beyond mere data-driven distribution fitting [27] - OSI-Bench's benchmark and evaluation code are open-sourced, with plans to continue releasing high-precision 3D information datasets to advance spatial intelligence from indoor to complex open-world scenarios [28]
开源1万小时具身智能数据,这家公司是为了什么?
机器之心· 2026-01-06 09:38
Core Viewpoint - The article emphasizes the importance of high-quality, real-world data for advancing embodied intelligence, highlighting the release of the "10Kh RealOmni-Open DataSet" by JianZhi Robotics as a significant step in this direction [1][4][16]. Data Set Overview - The "10Kh RealOmni-Open DataSet" consists of over 10,000 hours and nearly one million clips of embodied data, making it the largest and most generalized open data set in the industry [1][4]. - This data set focuses on 10 common household tasks, ensuring that each skill has over 10,000 clips, which enhances both the scale and depth of the data [4][5]. Data Quality and Features - The data set boasts a resolution of 1600x1296 pixels at 30 frames per second, with centimeter-level trajectory precision, and aims for millimeter-level accuracy for robotic applications [4][12]. - It includes diverse scenarios and natural human actions, collected from 3,000 real households, addressing the limitations of traditional data collection methods [7][9]. Data Collection Methodology - JianZhi Robotics employs a comprehensive data production chain, utilizing the Gen DAS Gripper for efficient data collection without the need for extensive site preparation [11][13]. - The data collection process is automated, allowing for rapid uploading and processing, with a current accumulation of nearly one million hours of data and a daily growth rate of approximately 10,000 hours [13][14]. Open Source Strategy - The open-sourcing of this data set is intended to bridge data gaps, unify technical standards, and lower research and development barriers, ultimately accelerating the transition of embodied intelligence from laboratory settings to practical applications [16]. - The initiative aims to create a positive feedback loop of data sharing, model optimization, scene implementation, and data reinforcement [16].
黄仁勋CES放出大杀器:下一代Rubin架构推理成本降10倍
机器之心· 2026-01-06 00:31
Core Insights - The article discusses the transformative impact of AI on various industries, highlighting the advancements presented by NVIDIA at CES 2026, particularly focusing on the new Rubin platform and the Alpamayo open-source model for autonomous driving [1][3][5]. Group 1: NVIDIA Rubin Platform - The NVIDIA Rubin platform introduces six new chips aimed at creating a leading AI supercomputer that excels in cost, performance, and security, significantly reducing training time and inference token costs [8][10]. - The platform features innovations such as the latest NVIDIA NVLink interconnect technology, a Transformer engine, and advanced security measures, which collectively enhance AI capabilities and reduce the GPU count needed for training models by four times compared to previous generations [13][17]. - The Rubin platform is designed to meet the increasing demand for AI computing, with a total bandwidth of 260TB/s, surpassing the entire internet's bandwidth, and is expected to be commercially available in the second half of 2026 [19][20]. Group 2: Alpamayo Open-Source Model - The Alpamayo series introduces a visual-language-action (VLA) model that enhances autonomous driving capabilities by enabling vehicles to reason through rare scenarios, thereby improving safety and interpretability [27][28]. - This model is part of a cohesive open ecosystem that includes open-source models, simulation tools, and datasets, allowing developers to build upon it for autonomous driving technology [29][30]. - The Alpamayo model, featuring 10 billion parameters, is designed to generate driving trajectories and reasoning traces from video inputs, providing a foundation for developers to create tailored autonomous driving solutions [30][31]. Group 3: Robotics and Physical AI - NVIDIA has launched new open-source models and frameworks for physical AI, aimed at accelerating the development of versatile robots capable of learning multiple tasks [35][36]. - The company emphasizes the importance of simulation and evaluation frameworks, such as the Isaac Lab-Arena, to streamline the development process and ensure robust performance before deployment [43][45]. - Collaborations with industry leaders in robotics are highlighted, showcasing the integration of NVIDIA's technology into next-generation robots, which are expected to revolutionize various sectors [36][50].
检索做大,生成做轻:CMU团队系统评测RAG的语料与模型权衡
机器之心· 2026-01-06 00:31
Core Insights - The core argument of the research is that expanding the retrieval corpus can significantly enhance Retrieval-Augmented Generation (RAG) performance, often providing benefits that can partially substitute for increasing model parameters, although diminishing returns occur at larger corpus sizes [4][22]. Group 1: Research Findings - The study reveals that the performance of RAG is determined by both the retrieval module, which provides evidence, and the generation model, which interprets the question and integrates evidence to form an answer [7]. - The research indicates that smaller models can achieve performance levels comparable to larger models by increasing the retrieval corpus size, with a consistent pattern observed across multiple datasets [11][12]. - The findings show that the most significant performance gains occur when moving from no retrieval to having retrieval, with diminishing returns as the corpus size increases [13]. Group 2: Experimental Design - The research employed a full factorial design, varying only the corpus size and model size while keeping other variables constant, using a large dataset of approximately 264 million real web documents [9]. - The evaluation covered three open-domain question-answering benchmarks: Natural Questions, TriviaQA, and Web Questions, using common metrics such as F1 and ExactMatch [9]. Group 3: Mechanisms of Improvement - The increase in corpus size enhances the probability of retrieving answer-containing segments, leading to more reliable evidence for the generation model [16]. - The study defines the Gold Answer Coverage Rate, which measures the probability that at least one of the top chunks provided to the generation model contains the correct answer string, showing a monotonic increase with corpus size [16]. Group 4: Practical Implications - The research suggests that when resources are constrained, prioritizing the expansion of the retrieval corpus and improving coverage can allow medium-sized generation models to perform close to larger models [20]. - The study emphasizes the importance of tracking answer coverage and utilization rates as diagnostic metrics to identify whether bottlenecks are in the retrieval or generation components [20].