量子位
Search documents
太疯狂了!Meta裁员裁到田渊栋头上,连组员一锅端
量子位· 2025-10-23 03:52
Core Viewpoint - The recent layoffs at Meta AI, led by the new Chief AI Officer Alexander Wang, are not merely organizational streamlining but indicate a significant shift in the company's AI strategy, impacting prominent figures like Tian Yuandong, who has been with Meta for over a decade [1][6]. Group 1: Tian Yuandong's Background and Contributions - Tian Yuandong has a strong academic background with degrees from Shanghai Jiao Tong University and a PhD from Carnegie Mellon University, specializing in robotics [7][8]. - He joined Facebook (now Meta) in 2014 and has made significant contributions to AI, including the development of the Go AI "Dark Forest," which achieved a level comparable to top amateur players before AlphaGo [9][12]. - His research focus shifted towards AI interpretability and foundational principles, rejecting an invitation from OpenAI to work on language models to concentrate on understanding neural network operations [13]. Group 2: Recent Developments and Innovations - Recently, Tian Yuandong led a team focused on planning and reasoning within AI, publishing a paper on the role of key hyperparameters in "Grokking" and the effectiveness of optimizers like Muon [14][15]. - His innovative work includes memory-efficient training methods like GaLore, which compresses the memory required for pre-training a 7B model to under 24GB, enabling training on consumer-grade GPUs [16]. - The Dualformer model integrates "fast thinking" and "slow thinking" processes, allowing dynamic responses to simple and complex problems, while the Coconut paradigm compresses reasoning trajectories into a continuous latent space [16]. Group 3: Industry Reactions and Future Prospects - Following the layoffs, companies like OpenAI and various startups quickly expressed interest in recruiting Tian Yuandong and his team members, indicating a competitive job market in the AI sector [4][6]. - Tian Yuandong's experiences in the workplace may inspire his creative endeavors, as he is also a science fiction author, with his first novel set to be published in 2024 [17][20].
一个指令误导智能模型!北航等首创3D语义攻击框架,成功率暴涨119%
量子位· 2025-10-23 03:52
Core Viewpoint - The article discusses the security alignment issues of artificial intelligence models, particularly focusing on the newly proposed InSUR framework for generating adversarial samples that are independent of specific tasks and models [1][2]. Group 1: InSUR Framework Overview - The InSUR framework is based on the concept of instruction uncertainty reduction, allowing for the generation of adversarial samples that mislead both known and unknown models with just one instruction [2][4]. - The framework integrates a 3D generation approach, achieving the first-ever generation of natural 3D adversarial objects through a single instruction, validating the effectiveness of the introduced sampling technique (ResAdv-DDIM) [6][8]. Group 2: Challenges in Semantic Adversarial Sample Generation - The existing methods for generating semantic adversarial samples face three main challenges: referring diversity, description incompleteness, and boundary ambiguity [14][21]. - InSUR addresses these challenges through a combination of stable attack direction driven by residuals, rule encoding for the generation process, and semantic hierarchical abstraction evaluation methods [8][12]. Group 3: Sampling Method and Task Modeling - The ResAdv-DDIM sampling method stabilizes the attack direction by predicting a rough outline of the final target during the denoising process, which enhances the robustness and transferability of adversarial samples [12][16]. - Task modeling is achieved by incorporating task goal embedding strategies, enabling effective generation of both 2D and 3D semantic adversarial samples [22][27]. Group 4: Evaluation and Results - The evaluation of the InSUR framework shows significant improvements in attack success rates (ASR) across various models and tasks, with an average ASR increase of at least 1.19 times and a minimum ASR increase of 1.08 times while maintaining low perceptual loss (LPIPS) [40][41]. - The framework's design decouples it from specific models and tasks, demonstrating scalability and effectiveness in generating high-fidelity adversarial test scenarios for safety-critical systems [45][46].
不改模型也能提升推理性能?ICLR投稿提出测试时扩展新范式OTV
量子位· 2025-10-23 00:08
Core Insights - The article discusses the challenges faced by large language models, including hallucinations, logical errors, and reasoning flaws, prompting researchers to explore new methods to enhance output reliability [1] - A novel approach called One-Token Verification (OTV) is introduced, which allows models to monitor their reasoning process in real-time without altering the original model structure or parameters [2] Summary by Sections Current Mainstream Paradigms - LoRA fine-tuning is highlighted as a popular parameter-efficient tuning method that avoids full parameter training and is easy to deploy, but it often relies on detailed supervised data and can lead to "forgetting effects" [3] - Quality screening of generated results can enhance output credibility but tends to be reactive, making it difficult to correct the model's reasoning in real-time and lacking insight into the internal reasoning process [4] Parallel Thinking Framework - The article introduces the concept of Parallel Thinking, which allows language models to generate multiple reasoning paths simultaneously and then filter them through a specific mechanism [5] - OTV builds on this framework by focusing on efficiently selecting correct reasoning paths at a lower cost rather than generating multiple paths [5] OTV Mechanism - OTV employs an internal verifier that analyzes the reasoning process using a lightweight role vector implemented via LoRA, running in parallel with the original model [9] - The internal verifier utilizes the key-value cache (KV Cache) from the Transformer architecture to capture rich information about the model's internal dynamics during the reasoning process [9] - A special token, referred to as "Token of Truth" (ToT), is inserted during the verification phase to assess the correctness of the reasoning path [9] Training and Efficiency - OTV's internal verifier is designed to be lightweight, with a training logic that assigns heuristic pseudo-labels based on the correctness of the final answer [10] - The training process is highly parallelized, allowing simultaneous scoring predictions for all positions, making it computationally comparable to traditional LoRA fine-tuning [10] Experimental Validation - OTV was systematically evaluated on various open-source models, demonstrating superior accuracy and a preference for shorter, more accurate reasoning paths compared to baseline methods [14] - The results indicate that OTV can read the internal reasoning state and output quality, significantly outperforming general methods that rely solely on output text [15] Dynamic Control of Computational Costs - OTV enables models to dynamically control computational expenses by real-time elimination of low-quality paths based on confidence scores, leading to a reduction in computational load by nearly 90% while maintaining optimal accuracy [17] Future Prospects - The OTV framework opens avenues for deeper integration with original models and the potential for a three-state system that includes "uncertain" states, enhancing selective prediction capabilities [25][26] - The approach could also be extended to different model architectures, optimizing KV cache structures to further improve reasoning efficiency and representation utilization [26]
Meta AI大裁600人,亚历山大王操刀重点砍向LeCun团队
量子位· 2025-10-23 00:08
Core Viewpoint - Meta is undergoing significant layoffs in its AI division, with 600 employees being cut, particularly affecting the FAIR lab and AI product departments, while the newly established TBD Lab remains unaffected and continues to hire [1][2][5]. Group 1: Layoffs and Organizational Changes - The layoffs are part of a restructuring effort led by the new Chief AI Officer, Alexander Wang, who aims to create a more agile operational model within Meta AI [5][7]. - Employees were informed about their job status by Wednesday morning, Pacific Time, indicating a swift decision-making process [6]. - Wang's internal memo emphasized the need for fewer discussions in decision-making and encouraged affected employees to apply for other positions within the company [8]. Group 2: Leadership and Research Direction - CEO Mark Zuckerberg has expressed deep concerns regarding the lack of breakthroughs or performance improvements in Meta AI, which has driven the decision for layoffs [8]. - Yann LeCun, head of the FAIR lab, has distanced himself from the Llama project and expressed frustration over new policies requiring additional reviews for research papers, which he views as a threat to academic freedom [9][10][11]. Group 3: Talent Acquisition and Future Outlook - TBD Lab is actively recruiting talent, having recently hired key personnel from Thinking Machines and OpenAI, indicating a strategic focus on building a strong team for future AI developments [2]. - Despite the layoffs, Wang remains optimistic about the models being trained and the overall direction towards achieving superintelligence [8].
用激光给芯片散热,摩尔定律天花板盖不住了
量子位· 2025-10-23 00:08
Core Viewpoint - The article discusses a new cooling method for chips called "photon cooling," developed by Maxwell Labs, which converts heat into light to efficiently remove heat from chip hotspots, significantly improving cooling efficiency compared to traditional methods like air and liquid cooling [4][5][27]. Group 1: Photon Cooling Technology - Photon cooling utilizes the principle of fluorescence, where low-energy light is absorbed and higher-energy light is emitted, leading to cooling effects [9]. - Maxwell Labs has integrated this principle into a thin-film chip-level photon cooling plate that targets hotspots on chips, allowing for precise temperature control [11][13]. - The photon cooling plate consists of several components, including a coupler, micro-cooling area, back reflector, and sensors to detect hotspots and guide the laser [14]. Group 2: Efficiency and Performance Benefits - The photon cooling method can eliminate the "dark silicon" problem, allowing more transistors to operate simultaneously by effectively removing heat from hotspots [27][28]. - This technology can maintain chip temperatures below 50°C, compared to traditional cooling methods that often see temperatures rise to 90-120°C, enabling higher clock frequencies and better performance without increasing transistor density [29][30]. - The method allows for more manageable thermal management in 3D chip designs, making it simpler to remove heat from stacked layers [31]. Group 3: Energy Efficiency and Future Prospects - Laser cooling can reduce overall power consumption by 50% or more when combined with air cooling systems [32]. - The technology can recycle more waste energy than traditional cooling methods, potentially achieving up to 60% energy recovery through thermal photovoltaics [33]. - By 2027, photon cooling is expected to be practical, enhancing cooling efficiency for high-performance computing and AI clusters, with broader deployment in data centers anticipated by 2028-2030 [34].
让LLM扔块石头,它居然造了个投石机
量子位· 2025-10-22 15:27
Core Insights - The article discusses a new research platform called BesiegeField, developed by researchers from CUHK (Shenzhen), which allows large language models (LLMs) to design and build functional machines from scratch [2][39] - The platform enables LLMs to learn mechanical design through a process of reinforcement learning, where they can evolve their designs based on feedback from physical simulations [10][33] Group 1: Mechanism of Design - The research introduces a method called Compositional Machine Design, which simplifies complex designs into discrete assembly problems using standard parts [4][5] - A structured representation mechanism, similar to XML, is employed to facilitate understanding and modification of designs by the model [6][7] - The platform runs on Linux clusters, allowing hundreds of mechanical experiments simultaneously, providing comprehensive physical feedback such as speed, force, and energy changes [9][10] Group 2: Collaborative AI Workflow - To address the limitations of single models, the research team developed an Agentic Workflow that allows multiple AIs to collaborate on design tasks [23][28] - Different roles are defined within this workflow, including a Meta-Designer, Designer, Inspector, Active Env Querier, and Refiner, which collectively enhance the design process [28][31] - The hierarchical design strategy significantly outperforms single-agent or simple iterative editing approaches in tasks like building a catapult and a car [31] Group 3: Self-Evolution and Learning - The introduction of reinforcement learning (RL) through a strategy called RLVR allows models to self-evolve by using simulation feedback as reward signals [33][34] - The results show that as iterations increase, the models improve their design capabilities, achieving better performance in tasks [35][37] - The combination of cold-start strategies and RL leads to optimal scores in both catapult and car tasks, demonstrating the potential for LLMs to enhance mechanical design skills through feedback [38] Group 4: Future Implications - BesiegeField represents a new paradigm for structural creation, enabling AI to design not just static machines but dynamic structures capable of movement and collaboration [39][40] - The platform transforms complex mechanical design into a structured language generation task, allowing models to understand mechanical principles and structural collaboration [40]
刚拿诺奖就登Nature封面!谷歌“量子回声”算法计算提速13000倍,可重复验证结果
量子位· 2025-10-22 15:27
Core Viewpoint - Google's quantum team, recently awarded the Nobel Prize in Physics, has introduced a new algorithm called "Quantum Echoes," which allows for repeatable verification of quantum computing results, addressing previous challenges in confirming quantum outcomes [1][4]. Group 1: Quantum Computing Advancements - The new algorithm enables quantum computers to perform calculations that would take traditional supercomputers 3.2 years in just 2.1 hours, achieving a speed increase of 13,000 times [2][17]. - The research involved over 200 authors, including notable scientists from Princeton University, UC Berkeley, and MIT, highlighting the collaborative effort in advancing quantum computing [4]. Group 2: Algorithm Functionality and Applications - The "Quantum Echoes" algorithm utilizes a method called "non-temporal correlation function" (OTOC), which enhances the ability to observe quantum systems over extended periods compared to traditional methods [12][14]. - OTOC has demonstrated the capability to reveal phenomena that classical computers cannot simulate, such as "large-loop interference," which classical simulations struggle to replicate [16][17]. - The algorithm has practical applications in determining unknown parameters of quantum systems, showcasing its utility in real-world scenarios like analyzing quantum materials [21][23]. Group 3: Hardware and Future Directions - The breakthrough relies on the Willow chip's hardware advantages, which has shown low error rates and high-speed operations, essential for the algorithm's performance [23]. - The current generation of the Willow chip achieves a fidelity of 99.97% for single-qubit gates and 99.88% for entangling gates, indicating significant advancements in quantum hardware [23]. - Looking ahead, the Google quantum team plans to focus on developing "long-lived logical qubits" to lay the groundwork for larger-scale, error-corrected practical quantum computers [26].
智谱运气是差一点点,视觉Token研究又和DeepSeek撞车了
量子位· 2025-10-22 15:27
Core Viewpoint - The article discusses the competition between Zhipu and DeepSeek in the AI field, particularly focusing on the release of Zhipu's visual token solution, Glyph, which aims to address the challenges of long context in large language models (LLMs) [1][2][6]. Group 1: Context Expansion Challenges - The demand for long context in LLMs is increasing due to various applications such as document analysis and multi-turn dialogues [8]. - Expanding context length significantly increases computational costs; for instance, increasing context from 50K to 100K tokens can quadruple the computational consumption [9][10]. - Merely adding more tokens does not guarantee improved model performance, as excessive input can lead to noise interference and information overload [12][14]. Group 2: Existing Solutions - Three mainstream solutions to the long context problem are identified: 1. **Extended Position Encoding**: This method extends the existing position encoding range to accommodate longer inputs without retraining the model [15][16]. 2. **Attention Mechanism Modification**: Techniques like sparse and linear attention aim to improve token processing efficiency, but do not reduce the total token count [20][21]. 3. **Retrieval-Augmented Generation (RAG)**: This approach uses external retrieval to shorten inputs, but may slow down overall response time [22][23]. Group 3: Glyph Framework - Glyph proposes a new paradigm by converting long texts into images, allowing for higher information density and efficient processing by visual language models (VLMs) [25][26]. - By using visual tokens, Glyph can significantly reduce the number of tokens needed; for example, it can represent the entire text of "Jane Eyre" using only 80K visual tokens compared to 240K text tokens [32][36]. - The training process for Glyph involves three stages: continual pre-training, LLM-driven rendering search, and post-training, which collectively enhance the model's ability to interpret visual information [37][44]. Group 4: Performance and Results - Glyph achieves a token compression rate of 3-4 times while maintaining accuracy comparable to mainstream models [49]. - The implementation of Glyph results in approximately four times faster prefill and decoding speeds, as well as two times faster supervised fine-tuning (SFT) training [51]. - Glyph demonstrates strong performance in multimodal tasks, indicating its robust generalization capabilities [53]. Group 5: Contributors and Future Implications - The primary author of the paper is Jiale Cheng, a PhD student at Tsinghua University, with contributions from Yusen Liu, Xinyu Zhang, and Yulin Fei [57][62]. - The article suggests that visual tokens may redefine the information processing methods of LLMs, potentially leading to pixels replacing text as the fundamental unit of AI input [76][78].
清华联手英伟达打造扩散模型新蒸馏范式!视频生成提速50倍,4步出片不穿模
量子位· 2025-10-22 09:12
Core Insights - The article discusses a new distillation paradigm called rCM that significantly enhances video generation speed by up to 50 times while maintaining high quality and diversity in the generated content [4][20][33] Group 1: Introduction of rCM - rCM is a novel large-scale diffusion model distillation paradigm developed by Tsinghua University and NVIDIA, which successfully extends continuous time consistency distillation to billion-parameter models [5][9] - The method addresses bottlenecks in existing approaches, particularly in real-world applications involving large-scale text-to-image and text-to-video models [3][9] Group 2: Technical Innovations - The rCM framework introduces a forward-reverse divergence joint optimization approach, which enhances inference speed while ensuring high-quality and diverse generation results [4][11] - By utilizing self-developed FlashAttention-2 JVP CUDA operators and compatible distributed training strategies, rCM successfully applies continuous time consistency distillation to leading models like Cosmos and Wan2.1 [13][18] Group 3: Performance Metrics - rCM demonstrates exceptional performance across various large-scale text-to-image and text-to-video tasks, compressing the sampling process from hundreds of steps to an impressive 1-4 steps, achieving a speedup of 15-50 times [20][21] - In evaluations, the rCM model matches or even surpasses the performance of teacher models that require hundreds of sampling steps [21][25] Group 4: Quality and Diversity - The rCM model effectively addresses the quality shortcomings of previous models by incorporating reverse divergence as a regularization term, allowing it to maintain high diversity while improving quality [19][22] - Compared to previous state-of-the-art distillation methods, rCM exhibits significantly higher diversity in generated video content, effectively avoiding "mode collapse" issues [25][31] Group 5: Future Applications - rCM is expected to be widely applied in NVIDIA's Cosmos series of world models, indicating its potential for broader industry adoption [34]
KTransformers入选计算机系统顶会、与主流框架合作,趋境&清华让「异构」成为推理新范式
量子位· 2025-10-22 09:12
Core Insights - KTransformers, an open-source project developed by Turing Technology and Tsinghua University's KVCache.AI team, focuses on system innovation during the inference phase of large models, enabling efficient operation on diverse hardware architectures with lower computational power [2][4]. Group 1: KTransformers Overview - KTransformers is a high-performance heterogeneous inference framework that optimally utilizes various computing resources such as GPUs, CPUs, and memory [2]. - The project paper was recognized at the prestigious SOSP 2025 conference, highlighting its significance in the field of computer systems [2][4]. Group 2: Technical Innovations - The framework introduces an "Expert Deferral" mechanism, allowing for efficient scheduling of experts in Mixture of Experts (MoE) models, which reduces computational load without sacrificing model performance [7][13]. - KTransformers achieves nearly 4x speedup on a single Intel Xeon processor compared to traditional PyTorch implementations, significantly enhancing CPU performance in expert calculations [12]. - The system allows for dynamic overlapping of CPU and GPU loads, resulting in a model throughput increase of approximately 1.45 times, with minimal impact on model accuracy [15][16]. Group 3: Collaboration and Ecosystem - KTransformers has partnered with SGLang, a mainstream inference framework, to integrate full GPU inference with heterogeneous inference, enhancing the overall architecture for large model deployment [5][19]. - This collaboration enables developers to access both full GPU and heterogeneous inference capabilities seamlessly, particularly beneficial in scenarios with limited GPU resources [21]. Group 4: Market Position and Future Directions - KTransformers has gained significant traction in the developer community, with over 15.2K stars on GitHub, indicating its widespread adoption as a foundational framework for large model inference [24]. - The project aims to democratize AI capabilities, making them accessible beyond elite computational paths, and is actively collaborating with various domestic CPU and GPU platforms to promote cost-effective solutions [28][29].