Workflow
机器之心
icon
Search documents
NVIDIA港大MIT联合推出Fast-dLLM v2:端到端吞吐量提升2.5倍
机器之心· 2025-10-26 04:03
自回归(AR)大语言模型逐 token 顺序解码的范式限制了推理效率;扩散 LLM(dLLM)以并行生成见长,但过去难以稳定跑赢自回归(AR)模型,尤其是在 KV Cache 复用、和 可变长度 支持上仍存挑战。 Fas t-dLLM v2 给出了一条务实路线:将预训练 AR 模型适配为适配为能并行解码的 Block-dLLM—— 且 只需~1B tokens 量级的微调 即可达到 "无损" 迁移,不必 训练数百 B tokens(如 Dream 需~580B tokens)。在 A100/H100 上,它在保持精度的同时,将端到端吞吐显著拉高, 最高可达 2.5× 。 核心看点 作者单位:HKU、NVIDIA、MIT。 论文链接:https://arxiv.org/pdf/2509.26328 项目网站链接:https://nvlabs.github.io/Fast-dLLM/v2/ 代码链接:https://github.com/NVlabs/Fast-dLLM Fast-dLLM v2 按固定块大小把序列切成若干块:块内双向注意力以并行去噪,块间保持左到右的因果关系,从而既能并行、又能沿用 AR 的语义 ...
连马桶都会看图说话了,巨头敢卖,你敢坐么?
机器之心· 2025-10-26 04:03
Core Viewpoint - The integration of AI into toilets is transforming them into passive health monitoring devices, allowing for continuous health data collection without user interaction [20][49][50]. Group 1: Product Innovations - Kohler has launched the Dekoda, a toilet camera priced at $599, which monitors health by analyzing waste and sending results to a mobile app [10][11]. - The Dekoda can identify users through fingerprint recognition and analyze stool and urine for health indicators, such as hydration levels and potential gastrointestinal issues [12][18]. - Throne, a startup, is developing a similar product that uses AI to analyze waste characteristics and is expected to launch in January 2026 [24][29]. Group 2: Health Monitoring Potential - The concept of using waste to extract health information is based on the significant role gut microbiota plays in human health, influencing digestion, immunity, and even mental health [42][44]. - The technology aims to provide early detection of health issues rather than replace clinical diagnostics, targeting health-conscious elderly users and those with chronic digestive conditions [46][47]. Group 3: Market Dynamics - The market for smart toilets is expanding, driven by the aging population and increasing awareness of gut health, with potential for integration into broader health monitoring ecosystems [56][74]. - Current pricing models for smart toilets include initial device costs and ongoing subscription fees for health monitoring services, with Kohler's annual subscription ranging from $70 to $156 [71][72]. Group 4: Challenges and Considerations - Privacy concerns and the need for user trust are significant barriers to adoption, as companies emphasize data protection and anonymization [57][59]. - The accuracy and reliability of the technology in real-world settings remain to be validated, with challenges such as environmental factors and user identification needing to be addressed [70][68].
深度拆解,硬核解构,揭开vLLM推理系统实现高效吞吐的秘籍
机器之心· 2025-10-26 04:03
Core Insights - The article discusses the rapid development of large model applications and the focus on making inference faster and more efficient, highlighting the emergence of vLLM as a high-performance inference framework specifically optimized for large language models [1][4]. Inference Engine Basics - The vLLM framework includes fundamental processes such as input/output request handling, scheduling, paged attention, and continuous batching [4]. - Advanced features of vLLM include chunked prefill, prefix caching, guided decoding, speculative decoding, and decoupled prefill/decoding [4]. Performance Measurement - The performance of the inference system is measured through metrics such as latency (including time to first token, iteration latency, end-to-end latency, and throughput time) and throughput, along with GPU performance roofline models [4]. Architecture and Components - The LLM engine is the core module of vLLM, capable of achieving high throughput inference in offline scenarios [8]. - Key components of the engine include the engine core, processor, output processor, model executor, and scheduler, each playing a critical role in the inference process [15][16]. Scheduling Mechanism - The scheduling mechanism prioritizes decode requests over prefill requests, allowing for more efficient processing of inference tasks [38][39]. - The vLLM V1 scheduler can intelligently mix prefill and decode requests within the same step, enhancing overall efficiency [39]. Advanced Features - Chunked prefill allows for processing long prompts by breaking them into smaller chunks, preventing resource monopolization [57]. - Prefix caching avoids redundant computations for shared tokens across multiple prompts, significantly speeding up prefill requests [69][73]. Guided and Speculative Decoding - Guided decoding utilizes a finite state machine to constrain logits based on grammar rules, ensuring only syntactically valid tokens are sampled [93][95]. - Speculative decoding introduces a draft model to quickly generate candidate tokens, reducing the time required for each forward pass in autoregressive generation [106][110]. Distributed System Deployment - vLLM can be deployed across multiple GPUs and nodes, utilizing tensor and pipeline parallelism to manage large models that exceed single GPU memory limits [146][150]. - The architecture supports both data parallelism and load balancing, ensuring efficient handling of incoming requests [130][156].
「看」能否取代「读」,为何DeepSeek-OCR 爆火的重点不在性能?
机器之心· 2025-10-26 01:30
Group 1 - The core idea of DeepSeek-OCR is that it utilizes visual tokens to achieve a compression efficiency that is ten times greater than text tokens while maintaining 97% response accuracy [7][8] - DeepSeek-OCR introduces the concept of "Contextual Optical Compression," which processes text as a two-dimensional image rather than a one-dimensional symbol sequence, allowing for more efficient compression [7][8] - The AI community is focusing on the implications of using visual tokens over traditional text tokens, particularly in addressing the economic challenges of long-context processing in models that use the Next Token Prediction (NTP) mechanism [8][9] Group 2 - Huang Renxun argues that the current AI wave will not repeat the internet bubble, highlighting a shift in underlying logic and the emergence of full-stack AI factory competition that is changing the computing power landscape [2][3] - The next generation of intelligent systems may prioritize "energy efficiency advantages" over "computing power advantages," indicating a potential shift in how AI systems are developed and deployed [2][3] - The concept of an "intelligent economy" is discussed, questioning how far we are from realizing this vision, particularly in the context of digital labor and physical AI [2][3]
NeurIPS 2025 | ARGRE框架实现高效LLM解毒:自回归奖励引导,安全对齐更快、更准、更轻
机器之心· 2025-10-25 05:14
作者为北京航空航天大学的肖宜松,刘艾杉,应宗浩,刘祥龙,新加坡国立大学的梁思源,新加坡南洋理工大学的陶大程。 本文已被 NeurIPS 2025 录用。 LLM 已在智能创作、企业服务等领域广泛应用,但其内容安全问题仍是落地过程中的关键挑战。仇恨、歧视、威胁性言论等潜在风险,使得 LLM 的安全部署与 可信使用面临困难,而现有的内容过滤或对齐方案在效果、效率与成本之间往往难以兼顾。 近期,来自北航等机构的研究提出了一种新的解决思路: 自回归奖励引导表征编辑(ARGRE)框架 。该方法首次在 LLM 的潜在表征空间中可视化了毒性从高到 低的连续变化路径,实现了在测试阶段进行高效「解毒」。 论文标题:Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing 论文地址: https://arxiv.org/abs/2510.01243 图 1 ARGRE 框架图 研究背景 当前大语言模型解毒技术虽已形成两大主流路径,但均存在难以突破的核心瓶颈,严重制约其在实际场景中的落地效果: 其一,以直接偏好 ...
让机器人「不仅会想,还能准确去做」,VLA-R1把「推理+行动」带进真实世界
机器之心· 2025-10-25 05:14
Core Insights - The article discusses the VLA-R1 model, which enhances reasoning in Vision-Language-Action (VLA) models by integrating chain-of-thought (CoT) supervision with reinforcement learning (RL) to improve both reasoning quality and execution accuracy [4][5]. Group 1: VLA-R1 Overview - VLA-R1 is a foundational model that emphasizes "reasoning first, then executing" [4]. - It combines CoT supervision with verifiable rewards from RL to optimize the reasoning and execution processes [4][5]. Group 2: Key Innovations - Two-stage training approach: The model first undergoes supervised fine-tuning (SFT) with explicit CoT supervision, followed by reinforcement learning based on GRPO to stabilize the transition from reasoning to action [6][8]. - Three types of verifiable rewards (RLVR) are introduced to ensure accurate perception, trajectory execution, and structured output [9][11]. - The VLA-CoT data engine generates a structured dataset of 13,000 visual-language-action samples to provide high-quality supervision signals for SFT [12][19]. Group 3: Experimental Results - VLA-R1 was evaluated across four levels: in-domain testing, out-of-domain testing, simulation platforms, and real robot experiments [16][17]. - In the in-domain benchmark, VLA-R1 achieved a perception IoU of 36.51, improving by 17.78% over the baseline [22]. - In real robot experiments, VLA-R1 demonstrated a success rate of 62.5% for affordance perception and 75% for trajectory execution under various environmental complexities [26]. Group 4: Applications - VLA-R1 is applicable in home automation tasks, such as object retrieval and organization in cluttered environments, by effectively reasoning through similar targets and multiple container options [34]. - It can also be utilized in warehouse picking and light industrial assembly processes, where it clarifies the relationships between parts, tools, and containers [34]. - The model's structured output format is suitable for educational demonstrations and automated assessments, allowing for easy evaluation of reasoning and execution steps [34].
Yoshua Bengio,刚刚成为全球首个百万引用科学家!
机器之心· 2025-10-25 05:14
Core Insights - Yoshua Bengio has become the first individual to surpass 1 million citations on Google Scholar, marking a significant milestone in the field of artificial intelligence (AI) research [1][5][7] - The citation growth of Bengio aligns closely with the rise of AI technology from the periphery to the center of global attention over the past two decades [5][7] - Bengio, along with Geoffrey Hinton and Yann LeCun, is recognized as one of the "three giants" of deep learning, collectively awarded the Turing Award for their contributions to computer science [8][47] Citation Milestones - Bengio's citation count reached 1,000,244, with an h-index of 251 and an i10-index of 977, indicating a high level of impact in his published works [1][3] - His most cited paper, "Generative Adversarial Nets," has garnered 104,225 citations since its publication in 2014 [1][22][33] - The second most cited work is the textbook "Deep Learning," co-authored with Hinton and LeCun, which has received over 103,000 citations [1][26][33] Personal Background and Academic Journey - Born in Paris in 1964 to a family with a rich cultural background, Bengio developed an early interest in science fiction and technology [9][10] - He pursued his education at McGill University, obtaining degrees in electrical engineering and computer science, and later conducted postdoctoral research at MIT and AT&T Bell Labs [12][13] - Bengio returned to Montreal in 1993, where he began his influential academic career [12] Contributions to AI and Deep Learning - Bengio has made foundational contributions to AI, particularly in neural networks, during a period known as the "AI winter," when skepticism about the field was prevalent [13][15] - His research has led to significant advancements, including the development of long short-term memory networks (LSTM) and the introduction of word embeddings in natural language processing [18][19] - He has been instrumental in promoting ethical considerations in AI, advocating for responsible development and use of AI technologies [19][27] Ethical Advocacy and Future Vision - As AI technologies rapidly advance, Bengio has expressed concerns about their potential misuse, transitioning from a pure scientist to an active advocate for ethical AI [18][19] - He has participated in drafting ethical guidelines and has called for international regulations to prevent the development of autonomous weapons [19][27] - Bengio emphasizes the importance of ensuring that AI serves humanity positively, drawing inspiration from optimistic visions of the future [18][19][27] Ongoing Research and Influence - At 61, Bengio continues to publish influential research, including recent papers on AI consciousness and safety [36][37][38] - He remains a mentor to emerging researchers, fostering the next generation of talent in the AI field [41] - His legacy is characterized by both groundbreaking scientific contributions and a commitment to ethical considerations in technology [47][48]
Anthropic、Thinking Machines Lab论文曝光:30万次压力测试揭示AI规范缺陷
机器之心· 2025-10-25 05:14
Core Insights - The article discusses the limitations of current model specifications for large language models (LLMs), highlighting internal conflicts and insufficient granularity in ethical guidelines [1][5] - A systematic stress-testing methodology is proposed to identify and characterize contradictions and ambiguities in existing model specifications [1][3] Group 1: Model Specifications and Ethical Guidelines - Current LLMs are increasingly constrained by model specifications that define behavioral and ethical boundaries, forming the basis of Constitutional AI and Deliberate Alignment [1] - Existing specifications face two main issues: internal conflicts among principles and a lack of granularity needed for consistent behavioral guidance [1][5] - Researchers from Anthropic and Thinking Machines Lab have developed a detailed taxonomy of 3,307 values exhibited by the Claude model, surpassing the coverage and detail of mainstream model specifications [3][4] Group 2: Methodology and Testing - The research team generated over 300,000 query scenarios that force models to make clear trade-offs between values, revealing potential conflicts in model specifications [3][5] - The methodology includes value bias techniques that tripled the number of queries, resulting in a dataset of over 410,000 effective scenarios after filtering out incomplete responses [9][10] - The analysis of 12 leading LLMs, including those from Anthropic, OpenAI, Google, and xAI, showed significant discrepancies in responses across various scenarios [4][12] Group 3: Findings and Analysis - In the testing, over 220,000 scenarios exhibited significant divergence between at least two models, while more than 70,000 scenarios showed clear behavioral differences across most models [7][11] - The study found that higher divergence in model responses correlates with potential issues in model specifications, especially when multiple models following the same guidelines show inconsistencies [13][20] - A two-stage evaluation method was employed to quantify the degree of value bias in model responses, enhancing measurement consistency [14][15] Group 4: Compliance and Conformity Checks - The evaluation of OpenAI models revealed frequent non-compliance with their own specifications, indicating underlying issues within the specifications themselves [17][18] - The study utilized multiple leading models as reviewers to assess compliance, finding a strong correlation between high divergence and increased rates of non-compliance [20][22] - The analysis highlighted fundamental contradictions and interpretive ambiguities in model responses, demonstrating the need for clearer guidelines [25][27][32]
让VLM学会「心中有世界」:VAGEN用多轮RL把视觉智能变成「世界模型」推理机器
机器之心· 2025-10-25 03:20
Core Insights - The article discusses the limitations of Visual-Language Models (VLMs) in complex visual tasks, highlighting their tendency to act impulsively rather than thoughtfully due to their perception of the world being limited and noisy [2][6]. - The VAGEN framework aims to enhance VLMs by teaching them to construct an internal world model before taking actions, thereby promoting a more structured thinking process [3][12]. Group 1: VAGEN Framework - VAGEN enforces a structured "thinking template" for VLMs, which includes two core steps: State Estimation (observing the current state) and Transition Modeling (predicting future outcomes) [7][11]. - The framework utilizes reinforcement learning (RL) to reward this structured thinking process, demonstrating that the "World Modeling" strategy significantly outperforms both "No Think" and "Free Think" approaches [12][32]. Group 2: Internal Monologue and Reward Mechanism - The research explores the best format for the internal monologue of the agent, finding that the optimal representation depends on the nature of the task [13][14]. - VAGEN introduces two key components in its reward mechanism: World Modeling Reward, which provides immediate feedback after each thought process, and Bi-Level GAE for efficient reward distribution [18][20]. Group 3: Performance Results - The VAGEN-Full model, based on a 3B VLM, achieved an impressive overall score of 0.82 across five diverse tasks, outperforming various other models including GPT-5 [27][30]. - The results indicate that VAGEN-Full not only surpasses untrained models but also exceeds the performance of several proprietary models, showcasing its effectiveness in enhancing VLM capabilities [30][32].
「我受够了Transformer」:其作者Llion Jones称AI领域已僵化,正错失下一个突破
机器之心· 2025-10-25 03:20
Core Viewpoint - The AI field is experiencing a paradox where increased resources and funding are leading to decreased creativity and innovation, as researchers focus on safe, publishable projects rather than high-risk, transformative ideas [3][11][29]. Group 1: Current State of AI Research - Llion Jones, CTO of Sakana AI and co-author of the influential paper "Attention is All You Need," expressed frustration with the current focus on the Transformer architecture, suggesting it may hinder the search for the next major breakthrough [2][5][24]. - Despite unprecedented investment and talent influx into AI, the field has become narrow-minded, with researchers feeling pressured to compete rather than explore new ideas [3][11][16]. - Jones highlighted that the current environment leads to rushed publications and a lack of true scientific exploration, as researchers are concerned about being "scooped" by competitors [11][16]. Group 2: Historical Context and Comparison - Jones recalled the organic and pressure-free environment that led to the creation of the Transformer, contrasting it with today's competitive atmosphere where researchers feel compelled to deliver quick results [19][30]. - He emphasized that the freedom to explore ideas without pressure from management was crucial for the development of the Transformer, a condition that is now largely absent [19][22]. Group 3: Proposed Solutions and Future Directions - To foster innovation, Jones proposed increasing the "exploration dial" and encouraging researchers to share their findings openly, even at the cost of competition [21][26]. - At Sakana AI, efforts are being made to recreate a research environment that prioritizes exploration over competition, aiming to reduce the pressure to publish [22][30]. - Jones believes that the next significant breakthrough in AI may be overlooked if the current focus on incremental improvements continues, urging a shift towards collaborative exploration [26][31].