Workflow
量子位
icon
Search documents
SGLang原生支持昇腾,新模型一键拉起无需改代码
量子位· 2025-12-21 14:13
Core Insights - The article discusses the increasing focus on the ability of inference systems to handle real-world loads as agents accelerate on the application side [1][4] - The SGLang AI financial meetup highlighted engineering challenges in inference systems, including high concurrency requests, long context windows, multi-turn reasoning, memory management, and consistency generation in financial agent scenarios [4][9] Group 1: Inference System Engineering Solutions - The SGLang event, co-hosted with AtomGit, focused on large model inference architecture, agents, reinforcement learning, and their application in finance [7] - Key participants included engineering teams from inference systems, models, and computing power, emphasizing the higher demands for efficiency in high concurrency, long context windows, multi-turn reasoning, and memory management for agents compared to traditional LLMs [8] - Specific deployment scenarios, such as financial agents, have stricter requirements for low latency, response stability, consistency, and cost control [9] Group 2: Technical Innovations and Implementations - SGLang introduced the HiCache system to address issues of KV cache redundancy and high memory demand in high concurrency and long context scenarios, significantly reducing memory usage and improving inference stability and throughput [11] - For mixed models like Qwen3-Next and Kimi Linear, SGLang implemented Mamba Radix Tree for unified prefix management and Elastic Memory Pool for efficient inference and memory optimization in long context and high concurrency scenarios [13] - The Mooncake system, based on Transfer Engine, significantly reduced weight loading and model startup times, achieving weight update preparation in under 20 seconds and cold start times from 85 seconds to 9 seconds [17] Group 3: Collaboration with Ascend Platform - The capabilities of the inference systems are not limited to a specific computing platform, as HiCache, Mooncake, and GLM can run directly on the Ascend platform, indicating a shift in Ascend's role in the inference system ecosystem [24][25] - SGLang's latest advancements on the Ascend platform include model adaptation, performance optimization, and modular acceleration capabilities, achieving a throughput of 15 TPS per card for DeepSeek V3.2 under specific conditions [29] - System-level optimizations included load balancing, operator fusion to reduce memory access, and multi-stream parallel execution to enhance resource utilization [30][31] Group 4: Future Directions and Open Source Commitment - Ascend's collaboration with SGLang aims to fully embrace open source and accelerate ecosystem development, having completed gray testing of DeepSeek V3.2 in real business scenarios [46] - Future developments will focus on systematic engineering investments around inference systems, enhancing throughput for high concurrency and low latency workloads, and aligning with open-source engines for model deployment and performance tuning [47] - The integration of models, inference engines, and computing platforms into a stable collaborative framework will shift the focus from whether a model can run to whether the system can run sustainably and at scale [47]
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026
量子位· 2025-12-21 05:45
Core Viewpoint - The embodiment intelligence model is considered an independent foundational model parallel to language and multimodal models, specifically designed for the physical world [6][12][61] Group 1: Differences Between Physical and Virtual Worlds - The fundamental differences between the physical and virtual worlds are recognized, with the physical world characterized by continuity, randomness, and processes related to force, contact, and timing [2][10] - Existing models based on language and visual paradigms are structurally misaligned with the complexities of the physical world [3][21] Group 2: Need for a Separate Foundational Model - A separate foundational model is necessary due to the significant randomness in the physical world, which existing models struggle to accurately represent [10][17] - The current reliance on multimodal models for embodiment intelligence is seen as inadequate, necessitating a complete rethinking of model architecture and training methods [9][21] Group 3: Future of Multimodal Models - Shifting perspectives on embodiment intelligence will lead to new insights in model architecture and data utilization [24][30] - The learning processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models must adapt to these differences [25][28] Group 4: Scaling Laws and Data Utilization - The concept of Scaling Law is crucial in the development of large models, particularly in robotics, where data sourcing remains a significant challenge [47][49] - A phased approach to training and data collection is recommended, emphasizing the importance of real-world data for effective learning [52][53] Group 5: Hardware and AI Integration - A new learning paradigm necessitates the redesign of hardware in the physical world, advocating for AI to define hardware rather than the other way around [54][55] - The potential for embodiment intelligence to drive exponential growth in resources and capabilities is highlighted, drawing parallels to historical industrial advancements [60][61]
LeCun离职前的吐槽太猛了
量子位· 2025-12-21 05:45
Core Viewpoint - LeCun expresses skepticism about the potential of large language models (LLMs) to achieve artificial general intelligence (AGI), arguing that the path to superintelligence through LLMs is fundamentally flawed [2][78]. Group 1: Departure from Meta - LeCun is leaving Meta after nearly 12 years, criticizing the company's increasingly closed approach to research and its focus on short-term projects [3][11][26]. - He plans to establish a new company named Advanced Machine Intelligence (AMI), which will prioritize open research and focus on world models [10][19]. Group 2: World Models vs. LLMs - LeCun believes that world models, which handle high-dimensional and continuous data, are fundamentally different from LLMs, which excel at discrete text data [28][29]. - He argues that relying solely on text data will never allow AI to reach human intelligence levels, as the complexity of real-world data is far greater than that of text [31][32]. Group 3: Research Philosophy - LeCun emphasizes the importance of open research and publication, stating that without sharing results, research lacks validity [15][17]. - He critiques Meta's shift towards short-term projects, suggesting that true breakthroughs require long-term, open-ended research [18][26]. Group 4: Future of AI - LeCun envisions that the development of world models and planning capabilities could lead to significant advancements in AI, but achieving human-level intelligence will require substantial foundational work and theoretical innovation [84][85]. - He asserts that the most challenging aspect of AI development is not reaching human intelligence but rather achieving the intelligence level of dogs, as this requires a deep understanding of foundational theories [88][89]. Group 5: Personal Mission - At 65, LeCun remains committed to enhancing human intelligence, viewing it as the most scarce resource and a key driver for societal progress [92][94]. - He reflects on his career, expressing a desire to continue contributing to the field and emphasizing the importance of open collaboration in scientific advancement [103].
为什么这篇谷歌论文被称为「Attention is all you need」V2
量子位· 2025-12-21 05:45
Core Insights - The article discusses a groundbreaking research paper by Google titled "Nested Learning: The Illusion of Deep Learning Architectures," which is being referred to as "Attention is All You Need" V2, emphasizing a new perspective on AI's learning capabilities [1][5]. Group 1: AI Limitations - Current large language models (LLMs) suffer from a condition termed "digital amnesia," where they forget recently learned information shortly after it is taught [2][3]. - The industry has focused on making models deeper and larger, believing that increasing scale would lead to emergent memory capabilities, but this approach has significant limitations [3][4]. Group 2: Nested Learning Paradigm - The research introduces the concept of "nested learning," which posits that effective intelligent learning requires two orthogonal dimensions: depth (model layers and capacity) and frequency (the rhythm and speed of internal component updates) [9][10]. - The paper argues that mainstream optimizers, traditionally viewed as mere training engines, actually function as associative memory systems that continuously record gradient changes [6]. Group 3: HOPE Architecture - The new architecture proposed, named HOPE, features a continuous memory system with multiple MLP modules arranged like a spectrum, each updating at different frequencies [14]. - This architecture mimics the human brain's memory processes, allowing new knowledge to be integrated without causing systemic collapse or forgetting [17][16]. Group 4: Future Implications - The value of "nested learning" lies not in immediately replacing existing models like Transformers but in providing a new design logic and framework for AI development [18]. - The exploration of memory and learning processes is still in its early stages, suggesting that future AI advancements may require systems capable of learning and evolving rather than being static repositories of knowledge [18].
让大模型不再过度思考!上海AI Lab后训练新范式重塑CoT,推理又快又好
量子位· 2025-12-21 02:00
RePro团队 投稿 量子位 | 公众号 QbitAI 这篇论文将推理的过程视为模型内部状态的优化过程,从而对如何重塑大模型的CoT提供了一个全新视角: 核心观察:推理即优化 RePro 基于这样一个核心思想:将模型的推理轨迹 (Trajectory) 看作是在损失曲面上寻找最优解的路径。 然而,"长思考"并非总是完美的。我们常发现模型会陷入 "过度思考" (Overthinking) 的陷阱:为了得出一个简单的结论,模型可能会生成 数千个冗余Token,甚至在错误的路径上反复横跳 (Backtracking) 。这不仅浪费了宝贵的算力,还增加了推理延迟。 RePro的三大"矫正"机制 近年来,随着o1、DeepSeek-R1等模型的爆发,Long Chain-of-Thought (Long CoT) 已成为提升LLM复杂推理能力的标配。 如何让模型在"深思熟虑"的同时,保持"思维敏捷"? 基于上述视角,RePro设计了一套过程奖励机制,直接嵌入到RLVR (如PPO,GRPO) 流程中。 近日,上海人工智能实验室的研究团队提出了一种全新的后训练范式—— RePro (Rectifying Process- ...
库克提拔复旦校友掌舵苹果基础模型!庞若鸣走后涨薪止血,谷歌旧部占据半壁江山
量子位· 2025-12-21 02:00
Core Viewpoint - The transition of leadership in Apple's AI model team following the departure of Ruoming Pang to Meta has been swift and relatively quiet, with Zhifeng Chen taking over the reins [1][2]. Group 1: Leadership Transition - Zhifeng Chen, who previously worked at Google for nearly 20 years, has stepped into the role of leading Apple's foundational model team, managing over 20 subordinates [8][14]. - Chen's familiarity with Apple's model system, having joined earlier this year, and his extensive experience at Google, including contributions to TensorFlow and Gemini, make him a suitable candidate for this position [16][17]. Group 2: Team Dynamics and Challenges - Following Pang's departure, Apple initiated a retention plan for key researchers, including salary increases, to stabilize the team [4]. - Despite these efforts, the foundational model team at Apple is facing challenges, with over half of its direct reports coming from Google, indicating potential issues with team cohesion and internal identity [24][26]. Group 3: Industry Context and Competition - The current AI landscape sees companies like Meta, OpenAI, and Google focusing on pursuing superintelligence, while Apple's approach remains product-oriented, emphasizing practical applications of AI in everyday tasks [35][36]. - This divergence in focus may lead to talent retention issues, as some researchers prioritize groundbreaking exploration over product implementation [38][39]. Group 4: Organizational Changes - In March, Apple restructured its AI reporting lines, removing the Siri team from the oversight of John Giannandrea, a significant figure in AI at Apple, signaling internal dissatisfaction with AI progress [43][44]. - Giannandrea's upcoming transition to a consulting role and the subsequent division of his responsibilities among other executives suggest a shift back to integrating AI within specific product teams rather than maintaining it as a standalone department [50][56]. Group 5: Competitive Threats - OpenAI is reportedly targeting talent from Apple's hardware and supply chain sectors, indicating a shift in competitive dynamics as companies traditionally focused on software begin to encroach on hardware domains [58][60]. - This trend poses a significant challenge for Apple, which has historically relied on its control over hardware and design to maintain its competitive edge [61][62].
清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026
量子位· 2025-12-21 02:00
Core Insights - The rapid development of AI and large models has created a competitive landscape where companies are driven by fear of missing out (FOMO) and are compelled to invest heavily in scaling their models and capabilities [2][6][40] - The emergence of capabilities in large models is characterized by non-linear changes, leading to significant uncertainty but also the potential for breakthroughs that can surpass expectations [3][19][15] - The relationship between language, knowledge, and action remains a fundamental challenge for AI, with the goal of achieving a true integration of these elements [15][38][37] Group 1: Development of AI and Large Models - The AI field has evolved significantly over the past eight years, transitioning into the era of pre-trained models and large models since around 2017 [11][10] - Key milestones in this development include the release of models like GPT-3 and ChatGPT, which have demonstrated remarkable capabilities in various tasks [16][24] - The ability of large models to perform well on complex tasks has increased dramatically, with benchmarks being surpassed in text, code, and multi-modal models [20][26][25] Group 2: Challenges and Risks - The costs associated with scaling AI models are becoming increasingly high, raising concerns about the sustainability of such investments [42][43] - There is a significant risk that the pursuit of scaling could lead to diminishing returns, especially if performance begins to plateau [40][41] - The uncertainty surrounding the limits of Scaling Laws poses a challenge for companies, as they must balance the need to invest in AI with the potential for wasted resources [7][68] Group 3: Strategic Recommendations - Companies with substantial resources may continue to pursue large-scale developments, while the majority should focus on niche applications to minimize risks and maximize potential [60][74] - The strategy of "致广大而尽精微" (to strive for greatness while paying attention to details) is recommended, emphasizing the importance of vertical applications in AI [63][69] - There is potential for new AI algorithms to emerge from specific vertical applications, suggesting that focusing on detailed, specialized work can also lead to broader advancements [71][74]
对话文远知行韩旭:中国真正的L4只有3家,马斯克不上激光雷达干不过Waymo | MEET2026
量子位· 2025-12-20 11:19
Core Insights - The article discusses the evolution of the autonomous driving industry, highlighting the achievements of the company WeRide under the leadership of Han Xu, who emphasizes the importance of talent acquisition and technological advancements in the field of Robotaxi [1][2][5]. Group 1: Company Achievements - WeRide has become the first publicly listed Robotaxi company in both the US and Hong Kong, marking a significant milestone in its eight-year journey [2][8]. - The company has successfully transitioned from a phase of skepticism about autonomous driving to achieving operational milestones, including the removal of safety drivers from vehicles [17][18]. - WeRide has deployed its Robotaxi services in 11 countries, demonstrating its global reach and operational capabilities [15][18]. Group 2: Industry Insights - Han Xu asserts that only three companies in China can truly operate Level 4 (L4) autonomous vehicles, emphasizing the technical barriers that still exist between Level 2 (L2) and L4 technologies [6][19][22]. - The article highlights the distinction between companies claiming to have L4 capabilities and those that have demonstrated actual operational success with a fleet of autonomous vehicles [21][24]. - Han Xu predicts that if Tesla continues to rely solely on production vehicles without integrating advanced sensor technologies, it will struggle to achieve the same level of autonomy as competitors like Waymo [45]. Group 3: Talent Acquisition and AI Impact - The company is actively recruiting top talent with salaries ranging from 3 to 5 million, reflecting the increasing demand for skilled professionals in the AI and autonomous driving sectors [46][49]. - Han Xu describes AI as a significant amplifier of talent value, suggesting that exceptional individuals can command much higher salaries in the current market [46][48]. - The company aims to attract talent by offering competitive compensation packages, indicating its financial strength and commitment to innovation [50][51]. Group 4: Future Predictions - Han Xu forecasts that within three years, if Tesla does not adopt multi-modal sensor technology for its Robotaxi, it will not reach the operational standards set by Waymo [53]. - He also predicts the emergence of a "Superdriver" within eight years, a level of autonomous driving that surpasses the capabilities of the best human drivers [53].
潞晨尤洋:日常办公没必要上私有模型,这三类企业才需要 | MEET2026
量子位· 2025-12-20 08:02
Core Viewpoint - The application of large models extends beyond chatbots and programming assistants, and their true value will be realized across various industries in the future [8]. Group 1: Types of Companies Needing Private Models - Three types of companies require industry-specific or private models: traditional large enterprises, small and medium-sized enterprises with vast amounts of data, and disruptive new companies [8][34]. - Traditional large enterprises often possess valuable industry-specific data [34]. - Small and medium-sized enterprises specializing in niche areas can leverage their data as a source for large models [35]. - Disruptive companies in sectors like finance, pharmaceuticals, and e-commerce are most likely to benefit from developing their own private models [35]. Group 2: Implementation Criteria - Companies that only handle daily office tasks or primarily text data do not need to develop private models and can utilize existing large model APIs [4][37]. - If a company has sufficient text data, it can implement a Retrieval-Augmented Generation (RAG) model combined with a large model API instead of building its own [38]. - Companies with vast multimodal data or stringent privacy requirements, such as those in oil exploration or pharmaceuticals, should consider developing a private model [38]. Group 3: Market Predictions - The large language model market is predicted to be divided into three segments: domain-specific LLMs, general-purpose LLMs, and private LLMs [39][41]. - By 2033, domain-specific models are expected to capture approximately 40% of the market share, while general-purpose and private models are projected to each hold around 30% [47]. Group 4: Training and Optimization - The key to successfully deploying large models for business is post-training or agentization, which differentiates models from standard APIs [42]. - Companies should focus on maximizing computational efficiency and developing effective fine-tuning templates to create their industry-specific models [43][44]. - The company has developed a fine-tuning SDK to facilitate the creation of private models, allowing users to focus on model and algorithm innovation [17][45]. Group 5: Real-World Applications - A world-renowned automotive company has utilized this technology to create a multimodal automated decision support system [53]. - A leading e-commerce company's autonomous driving business has significantly improved with the help of this technology [53]. - Another world-class automotive company has developed an intelligent cockpit model with assistance from this technology [53].
ChatGPT文风,原产地肯尼亚
量子位· 2025-12-20 08:02
Core Viewpoint - The article discusses the similarities between the writing style of a Kenyan author and that of ChatGPT, suggesting that AI may inadvertently mimic the structured and formal writing style taught in certain educational systems, particularly in Kenya [2][9][12]. Group 1: Author's Experience - A Kenyan author, Marcus Olang', expressed frustration over being told his writing resembles that of ChatGPT, leading to a need to "prove he is not AI" [5][6]. - Olang' and his peers have received feedback indicating their writing is too similar to AI-generated content, highlighting a broader issue faced by many non-native English speakers [6][14]. - The structured writing style taught in Kenyan education emphasizes clarity and logic, which aligns with the output of AI models like ChatGPT [11][12]. Group 2: AI's Learning Process - AI models, including ChatGPT, learn from a vast array of texts that often reflect formal and classic writing styles, which are similar to those taught in strict educational systems [12][28]. - The process of Reinforcement Learning from Human Feedback (RLHF) involves human testers, often from African countries, who provide feedback that shapes the AI's writing style [28][29]. - The frequent use of certain words, such as "delve," in AI-generated text can be attributed to the natural and formal English used by these testers in their daily lives [30][31]. Group 3: Community Response - The author's sentiments resonate with others, as many non-native English speakers feel their writing is unfairly categorized as AI-generated due to its structured nature [15]. - The article highlights a growing awareness of the impact of AI on perceptions of human writing, particularly among those from regions with rigorous educational standards [15][19]. - The phenomenon has sparked discussions on social media, with users sharing their experiences and insights regarding AI-generated content [23][26].